CloudWatch Logs Structured as JSON with Python Lambda Functions

Hello!

If you’re setting up JSON logging in a script and instead of a lambda function, check out this instead. The pattern is a little simpler.

Recently, I’ve been switching to logs structured as JSON. That means output like this:

{
    "asctime": "2019-09-20 13:55:13,604",
    "levelname": "INFO",
    "name": "root",
    "message": "Huzzah!"
}

Instead of this:

2019-09-21 15:52:43,240 INFO root Huzzah!

This way, your log processor can reference keys in a JSON object instead of splitting strings and hoping the split works right. Modern log processors are pretty good with delimited strings, but I’ve still seen them break because of things like unexpected whitespace or messages that happen to contain delimiter characters.

CloudWatch Logs reads JSON out of box. To send it JSON from a Python lambda function, we need to do to two things:

  • Configure a JSON handler for the root logger. I use the python-json-logger library.
  • Remove lambda’s default log handler (to prevent duplicate messages).

First, the full example. Then the details. Here’s the code:

import logging
from pythonjsonlogger import jsonlogger

def setup_logging(log_level):
    logger = logging.getLogger()

    # Testing showed lambda sets up one default handler. If there are more,
    # something has changed and we want to fail so an operator can investigate.
    assert len(logger.handlers) == 1

    logger.setLevel(log_level)
    json_handler = logging.StreamHandler()
    formatter = jsonlogger.JsonFormatter(
        fmt='%(asctime)s %(levelname)s %(name)s %(message)s'
    )
    json_handler.setFormatter(formatter)
    logger.addHandler(json_handler)
    logger.removeHandler(logger.handlers[0])

def lambda_handler(event, context):
    setup_logging(logging.DEBUG)
    logger = logging.getLogger()
    logger.info('Huzzah!')

For more details on what that assert is and how it works, check out this.

By default, CloudWatch renders this function’s JSON messages on one line:

log_output_collapsed.png

But you can expand them into pretty-printed objects:

log_output_expanded.png

The expanded form is extra nice when you have large log messages or lots of fields.

Like we saw in my previous post, lambda’s out-of-box setup logs delimited strings. I figured out from testing that it does this with a handler on the root logger. If we added a JSON handler and did nothing else, we’d get duplicate log messages; one from each handler. We have to call removeHandler() on the default one.

To see how this works, let’s add debug messages to our setup function (and switch to logging.DEBUG level, of course):

...
def setup_logging(log_level):
    logger = logging.getLogger()

    # Testing showed lambda sets up one default handler. If there are more,
    # something has changed and we want to fail so an operator can investigate.
    assert len(logger.handlers) == 1

    logger.setLevel(log_level)
    json_handler = logging.StreamHandler()
    formatter = jsonlogger.JsonFormatter(
        fmt='%(asctime)s %(levelname)s %(name)s %(message)s'
    )
    json_handler.setFormatter(formatter)
    logger.debug(f'Number of handlers: {len(logger.handlers)}')
    logger.debug('Adding JSON handler.')
    logger.addHandler(json_handler)
    logger.debug(f'Number of handlers: {len(logger.handlers)}')
    logger.debug('This message logs twice because there are two handlers.')
    logger.debug('Removing original handler.')
    logger.removeHandler(logger.handlers[0])
    logger.debug('This message logs once because we removed the original handler.')
...

Now our output shows what’s happening with the handlers:

debug_output_collapsed.png

There’s a downside to this method: the default handler logs an aws_request_id that our JSON handler doesn’t. I haven’t figured out how to preserve this yet. There’s an %(aws_request_id)s LogRecord attribute in the default handler’s format string, but it just returns null if I add it to the JSON handler. I’ll look deeper on this in a future post. We still get the request ID in the START message of each run, and for my use cases that’s good enough. If you’re tracing transactions through an ecosystem of serverless apps, it may not be.

Like we also saw in another previous post, one of the reasons to use Python’s logging library is it automatically captures messages from third-party tools like boto. We still get that:

boto_output_collapsed

I recommend switching to logs structured as JSON. It can reduce failures and complexity in log processing, and the output is cleaner overall.

Happy logging!

Adam

If this was helpful and you want to save time by getting “copy and paste” patterns for Cloud DevOps in your inbox, subscribe here. If you don’t want to wait for the next one, check out these: