EC2: Stateful Statelessness

networkingcontentdelivery_grayscale_amazonvpc.png

Hello!

While studying for an AWS certification, I rediscovered a fiddly networking detail: although ICMP’s ping is stateless, EC2 security groups will pass return ping traffic even when only one direction is defined in their rules. I wanted to see this in action, so I built a lab.

If you just asked, “Wat❓”, keep reading. Skip to the next section if you just want the code.

Background

Network hosts using stateful protocols (like TCP) distinguish between packets that are part of an established connection and packets that are new. For example, when I SSH (which runs on TCP) from A to B:

  1. A asks B to start a new connection.
  2. B agrees.
  3. A and B exchange bunches of packets that are part of the connection they agreed to.
  4. A and B agree to close the connection.

There’s a difference between a new packet and a packet that’s part of an ongoing connection. That means the connection, and its packets, have state (e.g. new vs established). Stateful firewalls (vs stateless) are aware of this:

  1. A ask B to start a new connection.
  2. Firewalls in between allow these packets if there is an explicit rule allowing traffic from A to B.
  3. A and B exchange bunches of packets.
  4. Firewalls in between allow the packets from A to B because of the explicit rule above. However, they allow the return traffic from B to A even if there is no explicit rule to allow it. Since B agreed to the connection the firewall assumes that packets in that connection should be allowed.

In EC2, this is why you only need an outgoing rule on A’s Security Group (SG) and an incoming rule on B’s Security Group to SSH from A to B. EC2 SGs are stateful, and allow the return traffic implicitly.

Ok, here’s the gnarly bit. ICMP (the protocol behind ping) is stateless. Hosts don’t have a negotiation phase where the agree to establish a connection. They just send packets and hope. So, doesn’t that mean I need to write explicit firewall rules in the SGs to allow the return traffic? If the firewall can’t see the state of the connection, it won’t be able to implicitly figure out to allow that traffic, right?

Nope, they infer state based on timeouts and packet types. ICMP pings are ECHO requests answered by ECHO replies. If the SG has seen a request within the timeout, it makes the educated guess that replies are essentially part of “established” connections and allows them. This is what I wanted to see in action.

The Lab

I setup a VPC with two hosts, A (10.0.1.70) and B (10.0.2.35). They’re in different subnets but the ACLs allow all traffic so they don’t influence the test. Here are the SG rules for A (10.0.0.0/16 covers the entire VPC):

test_a_inboundtest_a_outbound

And the rules for B:

test_b_inboundtest_b_outbound

A allows outgoing ICMP to B, and B allows incoming ICMP from A. The return traffic is not allowed by any rules.

The Test Script

I didn’t find a way to send just replies without requests in Linux, so I bodged together a Python script:

"""
This is a stripped-down version of ping that allows you to send a reply without responding a request. This was needed
to test the details of how Amazon EC2 security groups handle state with ICMP traffic. You shouldn't use this for normal
pings.

The ping implementation was based on Samuel Stauffer's python-ping: https://github.com/samuel/python-ping (which only
works with Python 2).

This must be run as root.
You must tell the Linux kernel to ignore ICMP before you run this or it'll eat some of the traffic:
    echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all
"""

import argparse, socket, struct, time

def get_arguments():
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--send-request', metavar='IP_ADDRESS', type=str, help='IP address to send ECHO request.')
    parser.add_argument('--receive', action='store_true', help='Wait for a reply.')
    parser.add_argument('--send-reply', metavar='IP_ADDRESS', type=str, help='IP address to send ECHO reply.')
    return parser.parse_args()

def receive(my_socket):
    while True:
        recPacket, addr = my_socket.recvfrom(1024)
        icmpHeader = recPacket[20:28]
        icmp_type, code, checksum, packetID, sequence = struct.unpack("bbHHh", icmpHeader)
        print('Received type {}.'.format(icmp_type))

def ping(my_socket, dest_addr, icmp_type):
    dest_addr = socket.gethostbyname(dest_addr)
    bytesInDouble = struct.calcsize("d")
    data = (192 - bytesInDouble) * "Q"
    data = struct.pack("d", time.time()) + data
    dummy_checksum = 1 & 0xffff
    dummy_id = 1 & 0xFFFF
    # Header is type (8), code (8), checksum (16), id (16), sequence (16)
    header = struct.pack("bbHHh", icmp_type, 0, socket.htons(dummy_checksum), dummy_id, 1)
    packet = header + data
    my_socket.sendto(packet, (dest_addr, 1))

if __name__ == '__main__':
    args = get_arguments()
    icmp = socket.getprotobyname("icmp")
    my_socket = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp)
    if args.send_request:
        ping(my_socket, args.send_request, icmp_type=8)  # Type 8 is ECHO request.
    if args.receive:
        receive(my_socket)
    if args.send_reply:
        ping(my_socket, args.send_reply, icmp_type=0)  # Type 0 is ECHO reply.

You can skip reading the code, the important thing is that we can individually choose to listen for packets, send ECHO requests, or send ECHO replies:

python ping.py --help
usage: ping.py [-h] [--send-request IP_ADDRESS] [--receive]
               [--send-reply IP_ADDRESS]

optional arguments:
  -h, --help            show this help message and exit
  --send-request IP_ADDRESS
                        IP address to send ECHO request. (default: None)
  --receive             Wait for a reply. (default: False)
  --send-reply IP_ADDRESS
                        IP address to send ECHO reply. (default: None)<span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>

The Experiment

SSH to each host and tell Linux to ignore ICMP traffic so I can use the script to capture it (see docstring in the script above):

sudo su -
echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all
exit

Normal Ping

I send a request from A to B and expect the reply from B to A to be allowed. Here’s what happened:

Remember A is 10.0.1.70 and B is 10.0.2.35.

normal_ping_test_anormal_ping_test_b

Ok, nothing surprising. I sent a request from A to B and started listening on A, a little later I send a reply from B to A and it was allowed. You can do this test with the normal Linux ping command (but not until you tell the kernel to stop ignoring ICMP traffic). This test just validates that my bodged Python actually works.

Reply Only

First we wait a bit. The previous test sent a request from A to B, which started a timer in the SG. Until that timer expires, reply traffic will be allowed. We need to wait for that expiration before this next test is valid.

reply_only_ping_test_areply_only_ping_test_b

Boom! I start listening on A, without sending a request. On B I send a reply to A but it never arrives. The Security Group didn’t allow it. This demonstrates that EC2 Security Groups are inferring the state of ICMP pings by reading their type.

Other Tests

I also tried a couple other things that I’ll leave to you to reproduce in your own lab if you want to see them.

  • Start out like the normal test. Send a request from A to B and start listening on A. Then send several replies from B to A. They’re all allowed. This shows that the SG isn’t counting to ensure it only allows one reply for each request; if it has seen just one request within the timeout it allows replies even if there are multiple.
  • Edit the script above to set the hardcoded ID to be different on A than it is on B. Then nothing works at all. I’m not actually sure what causes this. Could be that the SG is looking at more than just the type, but it could also be in the kernel or the network drivers or somewhere else I haven’t thought of. If you figure it out, message me!

Conclusion

I had free time over the holidays this year! Realistically, understanding this demo isn’t a priority for doing good work on AWS. I just enjoy unwrapping black boxes to see how the parts move.

Be well!

Adam

AWS re:Invent 2017

I’ll be at AWS re:Invent 2017! I’m there Monday through Wednesday, and I’ll be running Tuesday morning to support Girls Who Code and the American Heart Association. Get in touch if you’ll be at re:Invent or in the run so we can find each other!

🏃 Update: I finished the race at an 8:14 min/mile pace, which beats last year’s pace! But, this year’s run was shorter. Still, there was a huge turnout and it was a ton of fun.

See you soon,

Adam

AWS Certification and Networking

Hello!

Recently I’ve been working on the AWS Certification exams, and I’ve found they require much deeper understanding of networking on the platform than I had. For example, ICMP is a stateless protocol, so to ping between two servers do you need ingress and egress rules on both Security Groups? I knew from past experience with iptables that the answer varies by setup, but I didn’t know how it worked in EC2.

For me, gnarly networking is easiest to learn hands-on. Docs get me part of the way but I really need to engineer it myself before I’ll remember it. To prep for certification I ended up building a sandbox environment in my AWS account where I could play around. It took some doing; many AWS patterns come pre-baked with Security Groups, ACLs, etc. that make everything work, but I wanted everything turned off so I could verify what was really needed for different traffic flows. If I delete the egress rule on one side of a connection, does traffic still flow? Hard to validate if there are broad, generic rules in place. Easy to validate if only exactly what’s needed is present.

Since it was tricky, I published the automation for the sandbox I’ve been using. If you want to do your own deep dive of networking in AWS, hopefully this will help you out.

github.com/operatingops/aws_study

diagram

Happy Operating!

Adam

Production-ready Scripts and Python

Production is hard. Even a simple script that looks up queue state and sends it to an API gets complex in prod. Without tests, the divide by zero case you missed will mask queue overloads. Someone won’t see that required argument you didn’t enforce and break everything when they accidentally publish a null value. You’ll forget to timestamp one of your output lines and then when the queue goes down you won’t be able to correlate queue status to network events.

Python can help! Out of box it can give you essential but often-skipped features, like these:

  • Automated tests for multiple platforms.
  • A --simulate option.
  • Command line sanity like a --help option and enforcement of required arguments.
  • Informative log output.
  • An easy way to build and package.
  • An easy way to install a build without a git clone.
  • A command that you can just run like any other command. No weird shell setup or invocation required.

It can be a little tricky, though, if you haven’t already done it. So I wrote a project that demonstrates it for you. It includes an example of a script that isn’t ready for prod.

Hopefully this will save you from some of the many totally avoidable, horrible problems that bad scripts have caused in my prods.

Thanks for reading!

Adam

Pear: A Better Way to Deploy to AWS

Update 29 September 2018: Since the release of AWS Step Functions, this pattern is out of date. Orchestrating deployment with a slim lambda function is still a solid pattern, but Step Functions could make the implementation simpler and could provide more features (like graphical display of deployment status). You can even see some of the ways this might work in AWS’s DevOps blog. Those implementation changes would drive some re-architecture, too. Because those are major changes, for now Pear is an interesting exploration of a pattern but isn’t ready for use in new deployments.

A while back I wanted to put a voice interface in front of my deployment automation. I think passwords on voice interfaces are annoying and aren’t secure, so I wanted an unauthenticated system. I didn’t want to lay awake at night worried about all the things people could break with it, so I set out to engineer a deployment infrastructure that I could put behind voice control and still feel was secure and reliable.

After a lot of learning (the journey towards this is where the Life Coach and the Better Alexa Quick Start came from) and several revisions, I had built my new infrastructure. For reasons I can’t remember, I called it Pear.

Pear is designed to make it easy to add slick features like voice interfaces while giving you enough control to stay secure and enough stability to operate in production. It’s meant for infrastructures too complex to fit in tools like Heroku or Elastic Beanstalk, for situations where you need to be able to turn all the knobs. I think it achieves those goals, so I decided to publish it. Check out the repo for an example implementation and more complete documentation of its features, but to give you a taste here’s a diagram of the basic architecture:

pear architecture

Cheers!

Adam

The Service Checklists

One day I get a text from the illimitable Kai Davis. He’s had a Bad Moment.

Adam. I have terrible OpSec.

A former user had deleted a bunch of files. Luckily, he was able to recover.

Teach me how to OpSec.

No worries buddy. I got you.

Kai is a power user, and in today’s Internet that means he subscribes to two dozen hosted services. How do you manage two dozen services and keep any kind of sanity? I do it with checklists (⬅️ read this book).

Before I show them to you, we need to cover one of the Big Important Things from Mr. Gawande’s book. Kai already knows how to manage his services. He just needs to make sure he hasn’t forgotten something important like disabling access for former users.

I wrote Kai two checklists. One to use monthly to make sure nothing gets missed and one to use when setting up new services to reduce the monthly work. I assume he has a master spreadsheet listing all his services. Kai’s Bad Moment categorizes as OpSec, but I didn’t limit these lists to that category.

Hopefully, these help you as well.

The Monthly Checklist

  • Can I cancel this service?
  • Should I delete users?
  • Should I change shared passwords?
  • Should I un-share anything?
  • Should I force-disconnect any devices?
  • Is the domain name about to expire?
  • Is the credit card about to expire?
  • Am I paying for more than I use?
  • Should I cancel auto-renewal?
  • Are there any messages from the provider in my account? (new!)
  • Is the last backup bigger than the one before it?

The Setup Checklist

  • Add row to master spreadsheet.
  • Save URL, account ID, username, password, email address, and secret questions in 1password.
  • Sign up for paperless everything.
  • Enter phone number and mailing address into account profile.
  • Review privacy settings.
  • Enable MFA.
  • Send hardcopy of MFA backup codes offsite.
  • Setup recurring billing.
  • Set alarm to manually check the first auto-bill.
  • Set alarm to revisit billing choices.
  • Set schedule for backups.
  • Check that backups contain the really important data.
  • Create a user for my assistant.
  • Confirm my assistant has logged in.

Some Notes

Monthly

  • Can I cancel this service? I always ask “can I”, not “should I”. There’s always a reason to keep it, but I want a reason to nuke it.
  • Am I paying for more than I use? I look at current usage, not predicted usage. The number is often not actionable, but it’s a good lens.

Setup

  • Save URL, account ID, username, password, email address, and secret questions in 1password. The URL matters because 1password will use it to give you warnings about known vulnerabilities that you need to change your password to remediate. The email address and username may seem redundant, but having both has saved me a bunch of times. Same with secret questions.
  • Enter phone number and mailing address into account profile. These make recovery and support calls easier.
  • Review privacy settings. Remember, Kai already knows how to manage his services. He knows how to pick good privacy settings. But privacy settings are often hidden and it’s easy to forget them when signing up.
  • Enable MFA. I know it sucks, but the security landscape gets worse every day. Use this for anything expensive or private.
  • Send hardcopy of MFA backup codes offsite. I have watched people spend months on account recovery when their phones die and they lose their Google Auth.
  • Set alarm to manually check the first auto-bill. This saves me all the time. All. The. Time.
  • Set alarm to revisit billing choices. This has saved me thousands of dollars.
  • Set schedule for backups. Even if it’s an alarm to do a manual backup once a month.

Stay safe!

Adam

Python on Lambda: Better Packaging Practices

Update 2018-10-22: This is out of date! Since I wrote this, lambda released support for Python 3 and in the new version I don’t have to do the handler import described below (although I don’t know if that’s because of a difference in Python 3 or because of a change in how lambda imports modules). In a future post I’ll cover Python 3 lambda functions in more detail.

Lambda is an AWS service that runs your code for you, without you managing servers. It’s my new favorite tool, but the official docs encourage a code structure that I think is an anti-pattern in Python. Fortunately, after some fiddling I found what I think is a better way. Originally I presented it to the San Diego Python meetup (if you’re in Southern California you should come to the next one!), this post is a recap.

The Lambda Getting Started guide starts you off with simple code, like this from the Hello World “blueprint” (you can find the blueprints in the AWS web console):

def lambda_handler(event, context):
    #print("Received event: " + json.dumps(event, indent=2))
    print("value1 = " + event['key1'])
    print("value2 = " + event['key2'])
    print("value3 = " + event['key3'])
    return event['key1']  # Echo back the first key value
    #raise Exception('Something went wrong')

This gets uploaded to Lambda as a flat file, then you set a “handler” (the function to run). They tell you the handler has this format:

python-file-name.handler-function

So for the Hello World blueprint we set the handler to this:

lambda_function.lambda_handler

Then Lambda knows to look for the lambda_handler function in the lambda_function file. For something as small as a hello world app this is fine, but it doesn’t scale.

For example, here’s a piece of the Alexa skill blueprint (I’ve taken out most of it to keep this short):

from __future__ import print_function

def build_speechlet_response(title, output, reprompt_text, should_end_session):
    ...human interface stuff

def build_response(session_attributes, speechlet_response):
    ...more human interface stuff

def get_welcome_response():
    ...more human interface stuff

def handle_session_end_request():
    ...session stuff

def create_favorite_color_attributes(favorite_color):
    ...helper function

def set_color_in_session(intent, session):
    ...the real logic of the tool part 1

def get_color_from_session(intent, session):
    ...the real logic of the tool part 2

etc...

This doesn’t belong in one long module, it belongs in a Python package. The session stuff goes in its own module, the human interface stuff in a different module, and probably a bunch of modules or packages for the logic of the skill itself. Something like the pypa sample project:

sampleproject/
├── LICENSE.txt
├── MANIFEST.in
├── README.rst
├── data
│   └── data_file
├── sample
│   ├── __init__.py  <-- main() ('handler') here
│   ├── package_data.dat
├── setup.cfg
├── setup.py
├── tests
│   ├── __init__.py
│   └── test_simple.py
└── tox.ini

The project is super simple (all its main() function does is print "Call your main application code here"), but the code is organized, the setup.py tracks dependencies and excludes tests from packaging, there's a clear place to put docs and data, etc. All the awesome things you get from packages. This is how I write my projects, so I want to just pip install and then set the handler to sample.main and let Lambda find main() because it's part of the package namespace.

It turns out this is possible, and the Deployment Package doc sort of tells you how to do it when they talk about dependencies.

Their example is the requests package. They tell you to create a virtualenv, pip install requests, then zip the contents of the site-packages directory along with your module (that’s the directory inside the virtualenv where installed packages end up). Then you can import requests in your code. If you do the same thing with your package, Lambda can run the handler function and you don’t have to upload a module outside the package.

I’m going to use the pypa sample project as an example because it follows a common Python package structure, but we need a small tweak because Lambda calls the handler with args and the main() function of the sample project doesn’t take any.

Change this:

def main():

To this:

def main(*args):

Then do what you usually do with packages (here I’m going to create and install a wheel because I think that’s the current best practice, but there are other ways):

  1. Make a wheel.
    python setup.py -q bdist_wheel --universal
    
  2. Create and activate a virtualenv.
  3. Install the wheel.
    pip install sample-1.2.0-py2.py3-none-any.whl
    
  4. Zip up the contents of this directory:
    $VIRTUAL_ENV/lib/python2.7/site-packages
  5. Upload the zip.
  6. Set the handler to sample.main.

Then it works!

START RequestId: bba5ce1b-9b16-11e6-9773-7b181414ea96 Version: $LATEST
Call your main application code here
END RequestId: bba5ce1b-9b16-11e6-9773-7b181414ea96
REPORT RequestId: bba5ce1b-9b16-11e6-9773-7b181414ea96	Duration: 0.37 ms...

… except when it doesn’t.

The pypa sample package defines its main() in __init__.py, so the handler path is sample.main.

sampleproject/
├── sample
│   ├── __init__.py  <-- main() here

But what if your entry function is in some other module?

sampleproject/
├── sample
│ ├── __init__.py
│ ├── cool.py <– main() here

We can just set the handler to sample.cool.main, right? Nope! Doesn't work.

{
"errorMessage": "Unable to import module 'sample.cool'"
}

I'm working on figuring out why this happens, but the workaround is to import the function I need in __init__.py so my hanlder path only needs one dot. That's annoying but not too bad; lots of folks do that to simplify their package namespace anyway so most Python readers know to look for it. I met a chap at the SD Python group who has some great ideas on why this might be happening, if we figure it out I'll post the details.

To sum up:

  • You don’t have to cram all your code into one big file.
  • If you pip install your package, all you need to upload is the site-packages directory.
  • Remember to import your handler in the root __init__.py if you get import errors.

Thanks for reading!

Adam