CF Custom Resources: Avoiding the Two Hour Exception Timeout

ManagementTools_GRAYSCALE_AWSCloudFormation

There’s a gotcha when writing CloudFormation Custom Resources that’s easy to miss and if you miss it your stack can get stuck, ignoring its timeout setting. It’ll fail on its own after an hour, but if it tries to roll back you have to wait a second hour. Here’s how to avoid that.

This post assumes you’re already working with Custom Resources and that yours are backed by lambda.

Here’s an empty custom resource:

import logging
import cfnresponse

def handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    if event['RequestType'] == 'Delete':
        logger.info('Deleted!')
        cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
        return

    logger.info('It worked!')
    cfnresponse.send(event, context, cfnresponse.SUCCESS, {})

It’s a successful no-op:

SuccessfulNoOp

Now let’s add an exception:

import logging
import cfnresponse

def handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    if event['RequestType'] == 'Delete':
        logger.info('Deleted!')
        cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
        return

    raise Exception
    logger.info('It worked!')
    cfnresponse.send(event, context, cfnresponse.SUCCESS, {})

We can see the exception in the logs:

ExceptionThreeRetries

But, then the stack gets stuck because the cfnresponse callback never happened and CF doesn’t know there was a problem:

FailureTimeouts

It took exactly an hour to fail, which suggests CF hit some internal, fallback timeout. My stack timeout was set to five minutes. We can see it retry the lambda function once a minute for three minutes, but then it never tries again in the remaining 57 minutes. I got the same delays in reverse when it tried to roll back (which is really just another update to the previous state). And, since the rollback failed, I had to manually edit the lambda function code and remove the exception to get it to finish rolling back.

Maybe this is a bug? Either way, there’s a workaround.

You should usually only catch specific errors that you know how to handle. It’s an anti-pattern to use except Exception. But, in this case we need to guarantee that the callback always happens. In this one situation (not in general) we need to catch all exceptions:

import logging
import cfnresponse

def handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    try:
        if event['RequestType'] == 'Delete':
            logger.info('Deleted!')
            cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
            return

        raise Exception
        logger.info('It worked!')
        cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
    except Exception:
        logger.exception('Signaling failure to CloudFormation.')
        cfnresponse.send(event, context, cfnresponse.FAILED, {})

Now, the failure is visible to CF and it doesn’t wait:

ExceptionHandled.png

You should use this pattern in every Custom Resource: catch all exceptions and return a FAILED result to CF. You can still catch more specific exceptions inside the catchall try/except, ones specific to the feature you’re implementing, but you need that catchall to ensure the result returns when the unexpected happens.

Happy automating!

Adam

Lamba: Filter boto3’s Logs into CloudWatch

Compute_GRAYSCALE_AWSLambda

Good morning!

For those custom cases that don’t fit into Terraform or CloudFormation, a little bit of Python and some boto3 in a lambda function can save you. Lambda captures the output of both print() and logging.Logger calls into CloudWatch so it’s easy to log information about what your code is doing. When things go wrong, though, I often find that just the output I wrote doesn’t give me enough to diagnose the problem. In those cases, it’s helpful to see the log output both for your code and boto3. Here’s how you do that.

Use the logging library. It’s a Python core library that provides standard features like timestamped prefixes and support for levels (e.g. INFO or DEBUG). For simple deployment helpers this is usually all you need:

logger = logging.getLogger(logging.INFO)
logger.info('Message at the INFO level.')
logger.debug('Message at the DEBUG level.')

This sets the root logger (which sees all log messages) to the INFO level. Normally you’d have to configure the root logger, but lambda does that automatically (which is actually annoying if you need to change your formatter, but that’s for another post). Now, logger.info() calls will show up in the logs and logger.debug() calls won’t. If you increase the level to DEBUG you’ll see both.

Because logging is the standard Python way to handle log output, maintainers of libraries like boto3 use it throughout their code to show what the library is doing (and they’re usually smart about choosing what to log at each level). By setting a level on the root logger, you’re choosing which of your output to capture and which of boto3’s output to capture. Powerful when you’re diagnosing a failure.

Here’s a demo function to show how the output looks. You might notice that it puts the logger setup calls inside the handler even though the AWS docs tell you to put them under the import. Function calls made directly in modules (e.g. not inside functions declared within the module) are import-side effects and import side-effects are an anti-pattern. I put the calls in the handler so they only run when the handler is called. This isn’t likely to matter much in a lambda function, but I like to stick to good patterns.

import logging

import boto3


def lambda_handler(event, context):
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)

    client = boto3.client('sts')
    account_id = client.get_caller_identity()['Account']

    logger.info('Getting account ID...')
    logger.debug('Account ID: {}'.format(account_id))
    return account_id

This is the output when run at the INFO level:

START RequestId: a61471fe-c3fd-11e8-9f43-bdb22e22a203 Version: $LATEST
[INFO]	2018-09-29T15:38:01.882Z	a61471fe-c3fd-11e8-9f43-bdb22e22a203	Found credentials in environment variables.
[INFO]	2018-09-29T15:38:02.83Z	a61471fe-c3fd-11e8-9f43-bdb22e22a203	Starting new HTTPS connection (1): sts.amazonaws.com
[INFO]	2018-09-29T15:38:02.531Z	a61471fe-c3fd-11e8-9f43-bdb22e22a203	Getting account ID...
END RequestId: a61471fe-c3fd-11e8-9f43-bdb22e22a203
REPORT RequestId: a61471fe-c3fd-11e8-9f43-bdb22e22a203	Duration: 734.96 ms	Billed Duration: 800 ms Memory Size: 128 MB	Max Memory Used: 29 MB

This is the output when run at the DEBUG level:

START RequestId: 9ea3bbef-c3fe-11e8-8eb1-730a799b5405 Version: $LATEST
[DEBUG]	2018-09-29T15:44:58.850Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.880Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable config_file from defaults.
[DEBUG]	2018-09-29T15:44:58.881Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable credentials_file from defaults.
[DEBUG]	2018-09-29T15:44:58.881Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable data_path from defaults.
[DEBUG]	2018-09-29T15:44:58.881Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable region from environment with value 'us-west-2'.
[DEBUG]	2018-09-29T15:44:58.900Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.900Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable ca_bundle from defaults.
[DEBUG]	2018-09-29T15:44:58.900Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.900Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable api_versions from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable credentials_file from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable config_file from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable metadata_service_timeout from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.901Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable metadata_service_num_attempts from defaults.
[DEBUG]	2018-09-29T15:44:58.942Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:58.960Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Looking for credentials via: env
[INFO]	2018-09-29T15:44:58.960Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Found credentials in environment variables.
[DEBUG]	2018-09-29T15:44:58.961Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading JSON file: /var/runtime/botocore/data/endpoints.json
[DEBUG]	2018-09-29T15:44:59.1Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading variable profile from defaults.
[DEBUG]	2018-09-29T15:44:59.20Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Event choose-service-name: calling handler 
[DEBUG]	2018-09-29T15:44:59.60Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading JSON file: /var/runtime/botocore/data/sts/2011-06-15/service-2.json
[DEBUG]	2018-09-29T15:44:59.82Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Event creating-client-class.sts: calling handler 
[DEBUG]	2018-09-29T15:44:59.100Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	The s3 config key is not a dictionary type, ignoring its value of: None
[DEBUG]	2018-09-29T15:44:59.103Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Setting sts timeout as (60, 60)
[DEBUG]	2018-09-29T15:44:59.141Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Loading JSON file: /var/runtime/botocore/data/_retry.json
[DEBUG]	2018-09-29T15:44:59.141Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Registering retry handlers for service: sts
[DEBUG]	2018-09-29T15:44:59.160Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Event before-parameter-build.sts.GetCallerIdentity: calling handler 
[DEBUG]	2018-09-29T15:44:59.161Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Making request for OperationModel(name=GetCallerIdentity) (verify_ssl=True) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.7.74 Python/3.6.1 Linux/4.14.62-65.117.amzn1.x86_64 exec-env/AWS_Lambda_python3.6 Botocore/1.10.74'}, 'body': {'Action': 'GetCallerIdentity', 'Version': '2011-06-15'}, 'url': 'https://sts.amazonaws.com/', 'context': {'client_region': 'us-west-2', 'client_config': , 'has_streaming_input': False, 'auth_type': None}}
[DEBUG]	2018-09-29T15:44:59.161Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Event request-created.sts.GetCallerIdentity: calling handler 
[DEBUG]	2018-09-29T15:44:59.161Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Event choose-signer.sts.GetCallerIdentity: calling handler 
[DEBUG]	2018-09-29T15:44:59.162Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Calculating signature using v4 auth.
[DEBUG]	2018-09-29T15:44:59.180Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	CanonicalRequest:
POST
/

content-type:application/x-www-form-urlencoded; charset=utf-8
host:sts.amazonaws.com
x-amz-date:20180929T154459Z
x-amz-security-token:FQoGZXIvYXdzEKn//////////wEaDOOlIItIhtRakeAyfCLrAWPZXQJFkNrDZNa4Bny102eGKJ5KWD0F+ixFqZaW+A9mgadICpLRxBG4JGUzMtPTDeqxPoLT1qnS6bI/jVmXXUxjVPPMRiXdIlP+li0eFyB/xOK+PN/DOiByee0eu6bjQmkjoC3P5MREvxeanPY7hpgXNO52jSBPo8LMIdAcjCJxyRF7GHZjtZGAMARQWng6DJa9RAiIbxOmXpSbNGpABBVg/TUt8XMUT+p9Lm2Txi10P0ueu1n5rcuxJdBV8Jr/PUF3nZY+/k7MzOPCnzZNqVgpDAQbwby+AVIQcvVwaKsXePqubCqBTHxoh/Mo0ay+3QU=

content-type;host;x-amz-date;x-amz-security-token
ab821ae955788b0e33ebd34c208442ccfc2d406e2edc5e7a39bd6458fbb4f843
[DEBUG]	2018-09-29T15:44:59.181Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	StringToSign:
AWS4-HMAC-SHA256
20180929T154459Z
20180929/us-east-1/sts/aws4_request
7cf0af0e8f55fb1b9c0009104aa8f141097f00fea428ddf1654321e7054a920d
[DEBUG]	2018-09-29T15:44:59.181Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Signature:
c00de0a12c9ee0fce348df452f2833749b854915db58f8d106e3166545a70c43
[DEBUG]	2018-09-29T15:44:59.183Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Sending http request: 
[INFO]	2018-09-29T15:44:59.201Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Starting new HTTPS connection (1): sts.amazonaws.com
[DEBUG]	2018-09-29T15:44:59.628Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	"POST / HTTP/1.1" 200 461
[DEBUG]	2018-09-29T15:44:59.628Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Response headers: {'x-amzn-requestid': '9f421e56-c3fe-11e8-b622-2d5da14a8dc9', 'content-type': 'text/xml', 'content-length': '461', 'date': 'Sat, 29 Sep 2018 15:44:58 GMT'}
[DEBUG]	2018-09-29T15:44:59.640Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Response body:
b'\n \n arn:aws:sts::268133297303:assumed-role/demo-boto3-logging/demo-boto3-logging\n AROAITTVSA67NGZPH2QZI:demo-boto3-logging\n 268133297303\n \n \n 9f421e56-c3fe-11e8-b622-2d5da14a8dc9\n \n\n'
[DEBUG]	2018-09-29T15:44:59.640Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Event needs-retry.sts.GetCallerIdentity: calling handler 
[DEBUG]	2018-09-29T15:44:59.641Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	No retry needed.
[INFO]	2018-09-29T15:44:59.641Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Getting account ID...
[DEBUG]	2018-09-29T15:44:59.641Z	9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Account ID: 268133297303
END RequestId: 9ea3bbef-c3fe-11e8-8eb1-730a799b5405
REPORT RequestId: 9ea3bbef-c3fe-11e8-8eb1-730a799b5405	Duration: 813.73 ms	Billed Duration: 900 ms Memory Size: 128 MB	Max Memory Used: 29 MB

boto3 can be very verbose in DEBUG so I recommend staying at INFO unless you’re actively troubleshooting.

Happy debugging!

Adam

The Better Alexa Quick Start

You’ve seen the Life Coach. That was my second Alexa project. The one I used to learn the platform.

I began with Amazon’s quick start tutorial, but I didn’t like the alexa-skills-kit-color-expert-python example code. It feels like a spaghetti-nest and is more complicated than it needs to be for a quick start (it’s 200 lines, the Life Coach is 40).

Hoping to make the intro process smoother for you, I’ve turned the Life Coach into my own quick start for Alexa. It gives you an example that’s simple but still has the tools you need to do your own development. Essential things like error log capturing and scripts to build and publish code updates that aren’t included in the Amazon guide.

Especially without Ops or AWS background, setting up AWS well enough that you don’t want to pull out your hair can be harder than writing your first skill. I’ve included automation to handle the infrastructure. You can skip that and just use the Life Coach code if you’d rather.

I’ve also included the better lambda packaging practices from my previous post.

Hopefully this makes your Alexa journey easier. Check it out!

If you want an intro to how Alexa works before you dive in to code, check out the slides from my lighting talk on this project at the May 2017 meeting of the San Diego Python group.

Talk to you soon,

Adam

Python on Lambda: Better Packaging Practices

Update 2018-10-22: This is out of date! Since I wrote this, lambda released support for Python 3 and in the new version I don’t have to do the handler import described below (although I don’t know if that’s because of a difference in Python 3 or because of a change in how lambda imports modules). In a future post I’ll cover Python 3 lambda functions in more detail.

Lambda is an AWS service that runs your code for you, without you managing servers. It’s my new favorite tool, but the official docs encourage a code structure that I think is an anti-pattern in Python. Fortunately, after some fiddling I found what I think is a better way. Originally I presented it to the San Diego Python meetup (if you’re in Southern California you should come to the next one!), this post is a recap.

The Lambda Getting Started guide starts you off with simple code, like this from the Hello World “blueprint” (you can find the blueprints in the AWS web console):

def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))
    print("value1 = " + event['key1'])
    print("value2 = " + event['key2'])
    print("value3 = " + event['key3'])
    return event['key1']  # Echo back the first key value
    #raise Exception('Something went wrong')

This gets uploaded to Lambda as a flat file, then you set a “handler” (the function to run). They tell you the handler has this format:

python-file-name.handler-function

So for the Hello World blueprint we set the handler to this:

lambda_function.lambda_handler

Then Lambda knows to look for the lambda_handler function in the lambda_function file. For something as small as a hello world app this is fine, but it doesn’t scale.

For example, here’s a piece of the Alexa skill blueprint (I’ve taken out most of it to keep this short):

from __future__ import print_function

def build_speechlet_response(title, output, reprompt_text, should_end_session):
    ...human interface stuff

def build_response(session_attributes, speechlet_response):
    ...more human interface stuff

def get_welcome_response():
    ...more human interface stuff

def handle_session_end_request():
    ...session stuff

def create_favorite_color_attributes(favorite_color):
    ...helper function

def set_color_in_session(intent, session):
    ...the real logic of the tool part 1

def get_color_from_session(intent, session):
    ...the real logic of the tool part 2

etc...

This doesn’t belong in one long module, it belongs in a Python package. The session stuff goes in its own module, the human interface stuff in a different module, and probably a bunch of modules or packages for the logic of the skill itself. Something like the pypa sample project:

sampleproject/
├── LICENSE.txt
├── MANIFEST.in
├── README.rst
├── data
│   └── data_file
├── sample
│   ├── __init__.py  <-- main() ('handler') here
│   ├── package_data.dat
├── setup.cfg
├── setup.py
├── tests
│   ├── __init__.py
│   └── test_simple.py
└── tox.ini

The project is super simple (all its main() function does is print "Call your main application code here"), but the code is organized, the setup.py tracks dependencies and excludes tests from packaging, there's a clear place to put docs and data, etc. All the awesome things you get from packages. This is how I write my projects, so I want to just pip install and then set the handler to sample.main and let Lambda find main() because it's part of the package namespace.

It turns out this is possible, and the Deployment Package doc sort of tells you how to do it when they talk about dependencies.

Their example is the requests package. They tell you to create a virtualenv, pip install requests, then zip the contents of the site-packages directory along with your module (that’s the directory inside the virtualenv where installed packages end up). Then you can import requests in your code. If you do the same thing with your package, Lambda can run the handler function and you don’t have to upload a module outside the package.

I’m going to use the pypa sample project as an example because it follows a common Python package structure, but we need a small tweak because Lambda calls the handler with args and the main() function of the sample project doesn’t take any.

Change this:

def main():

To this:

def main(*args):

Then do what you usually do with packages (here I’m going to create and install a wheel because I think that’s the current best practice, but there are other ways):

  1. Make a wheel.
    python setup.py -q bdist_wheel --universal
    
  2. Create and activate a virtualenv.
  3. Install the wheel.
    pip install sample-1.2.0-py2.py3-none-any.whl
    
  4. Zip up the contents of this directory:
    $VIRTUAL_ENV/lib/python2.7/site-packages
  5. Upload the zip.
  6. Set the handler to sample.main.

Then it works!

START RequestId: bba5ce1b-9b16-11e6-9773-7b181414ea96 Version: $LATEST
Call your main application code here
END RequestId: bba5ce1b-9b16-11e6-9773-7b181414ea96
REPORT RequestId: bba5ce1b-9b16-11e6-9773-7b181414ea96	Duration: 0.37 ms...

… except when it doesn’t.

The pypa sample package defines its main() in __init__.py, so the handler path is sample.main.

sampleproject/
├── sample
│   ├── __init__.py  <-- main() here

But what if your entry function is in some other module?

sampleproject/
├── sample
│ ├── __init__.py
│ ├── cool.py <– main() here

We can just set the handler to sample.cool.main, right? Nope! Doesn't work.

{
"errorMessage": "Unable to import module 'sample.cool'"
}

I'm working on figuring out why this happens, but the workaround is to import the function I need in __init__.py so my hanlder path only needs one dot. That's annoying but not too bad; lots of folks do that to simplify their package namespace anyway so most Python readers know to look for it. I met a chap at the SD Python group who has some great ideas on why this might be happening, if we figure it out I'll post the details.

To sum up:

  • You don’t have to cram all your code into one big file.
  • If you pip install your package, all you need to upload is the site-packages directory.
  • Remember to import your handler in the root __init__.py if you get import errors.

Thanks for reading!

Adam