How to Paginate in boto3: Use Collections Instead

Hello!

When working with boto3, you’ll often find yourself looping. Like if you wanted to get the names of all the objects in an S3 bucket, you might do this:

import boto3

s3 = boto3.client('s3')

response = s3.list_objects_v2(Bucket='my-bucket')
for object in response['Contents']:
    print(object['Key'])

But, methods like list_objects_v2 have limits on how many objects they’ll return in one call (up to 1000 in this case). If you reach that limit, or if you know you eventually will, the solution used to be pagination. Like this:

import boto3

s3 = boto3.client('s3')

paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='my-bucket')

for page in pages:
    for object in page['Contents']:
        print(object['Key'])

I always forget how to do this. I also feel like it clutters up my code with API implementation details that don’t have anything to do with the objects I’m trying to list.

There’s a better way! Boto3 has semi-new things called collections, and they are awesome:

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-buycket')
objects = bucket.objects.all()

for object in objects:
    print(object.key)

If they look familiar, it’s probably because they’re modeled after the QuerySets in Django’s ORM. They work like an object-oriented interface to a database. It’s convenient to think about AWS like that when you’re writing code: it’s a database of cloud resources. You query the resources you want to interact with and read their properties (e.g. object.key like we did above) or call their methods.

You can do more than list, too. For example, in S3 you can empty a bucket in one line (this works even if there are pages and pages of objects in the bucket):

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-buycket')
bucket.objects.all().delete()

Boom 💥. One line, no loop. Use wisely.

I recommend collections whenever you need to iterate. I’ve found the code is easier to read and their usage is easier to remember than paginators. Some notes:

  • This is just an introduction, collections can do a lot more. Check out filtering. It’s excellent.
  • Collections aren’t available for every resource (yet). Sometimes you have to fall back to a paginator.
  • There are cases where using a collection can result in more API calls than you expect. Most of the time this isn’t a problem, but if you’re seeing performance problems you might want to dig into the nuances in the doc.

Hopefully, this helps simplify your life in the AWS API.

Happy automating!

Adam

If this was helpful and you want to save time by getting “copy and paste” patterns for Python in your inbox, subscribe here. If you don’t want to wait for the next one, check out these: