CodePipeline: Python AWS Lambda Functions Without Timeouts

Hello!

If you’re new to CodePipeline lambda actions check out this complete example first.

There’s a gotcha when writing CodePipeline lambda functions that’s easy to miss and if you miss it your pipeline can get stuck in timeout loops that you can’t cancel. Here’s how to avoid that.

This article assumes you’re familiar with CodePipeline and lambda and that you’ve granted the right IAM permissions to both. You may also want to check out lambda function logging.

This is Python 3. Python 2 is out of support.

CodePipeline uses a callback pattern for running lambda functions: it invokes the function and then waits for that function to call back with either put_job_success_result or put_job_failure_result.

Here’s an empty lambda action:

import json
import logging
import boto3

def lambda_handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    logger.debug(json.dumps(event))

    codepipeline = boto3.client('codepipeline')
    job_id = event['CodePipeline.job']['id']

    logger.info('Doing cool stuff!')
    response = codepipeline.put_job_success_result(jobId=job_id)
    logger.debug(response)

It’s a successful no-op:

SimpleWorking

Now let’s add an exception:

import json
import logging
import boto3

def lambda_handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    logger.debug(json.dumps(event))

    codepipeline = boto3.client('codepipeline')
    job_id = event['CodePipeline.job']['id']

    logger.info('Doing cool stuff!')
    raise ValueError('Fake error for testing!')
    response = codepipeline.put_job_success_result(jobId=job_id)
    logger.debug(response)

The log shows the exception, like we’d expect:

SimpleFailing

But, the pipeline action takes 20 minutes to time out. The CodePipeline limits doc says it takes 1 hour for lambda functions to time out and that used to apply to functions that didn’t send results, I tested it. Sadly, I didn’t think to keep screenshots back then. In my latest tests it took 20 minutes: ConsistentTwentyMinuteTimeout

It doesn’t matter what the lambda function’s timeout is. Mine was set to 3 seconds. We’re hitting a timeout that’s internal to CodePipeline.

At least the action’s details link give an error saying specifically that it didn’t receive a result: NoResultReturnedErrorMinimal.png

There’s a workaround. You should usually only catch specific errors that you know how to handle. It’s an anti-pattern to use except Exception. But, in this case we need to guarantee that the callback always happens. In this one situation (not in general) we need to catch all exceptions:

import json
import logging
import boto3

def lambda_handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    logger.debug(json.dumps(event))

    codepipeline = boto3.client('codepipeline')
    job_id = event['CodePipeline.job']['id']

    try:
        raise ValueError('This message will appear in the CodePipeline UI.')
        logger.info('Doing cool stuff!')
        response = codepipeline.put_job_success_result(jobId=job_id)
        logger.debug(response)
    except Exception as error:
        logger.exception(error)
        response = codepipeline.put_job_failure_result(
            jobId=job_id,
            failureDetails={
              'type': 'JobFailed',
              'message': f'{error.__class__.__name__}: {str(error)}'
            }
        )
        logger.debug(response)

(logger.exception(error) logs the exception and its stack trace. Even though we’re catching all errors, we shouldn’t let them pass silently.)

Now the failure will be visible to CodePipeline and the action won’t get stuck waiting.

The failureDetails message will appear in the CodePipeline UI. We send the exception message so it’s visible to operators:

HealthyError2

Of course, you’ll want to remove that ValueError. It’s just to demonstrate the handling.

You should use this pattern in every lambda action: catch all exceptions and return a JobFailed result to the pipeline. You can still catch more specific exceptions inside the catchall try/except, ones specific to the feature you’re implementing, but you need that catchall to ensure the result returns when the unexpected happens.

Happy automating!

Adam

Need more than just this article? I’m available to consult.

You might also want to check out these related articles: