Securing IAM Policies

SecurityIdentityCompliance_GRAYSCALE_IAM

Since the beginning, writing IAM policies with the minimum necessary permissions has been hard. Some services don’t have resource-level permissions (you have to grant to *), but then later they do. When a service has resource-level permissions, it may only be for some of its permissions (the rest still need *). Some services have their own Condition Operators (separate from the global ones) that may or may not help you tighten control. Et cetera. The details are documented differently for each service and it’s a lot of hunting and testing to try to put together a tight policy.

Amazon made it easier! There’s new magic in the IAM UI to help you create policies. It has some limitations, but it’s a big improvement. Here are some of the things it can do that I used to have to do myself:

  • Knows which S3 permissions require the resource list to include a bucket name and which require the bucket name and an object path.StatementSplitting
  • Tries to group permissions and resources into statements when it results in equivalent access (but sometimes ends up granting extra access, see below).StatementGrouping
  • Knows when a service doesn’t support resource-level permissions.ResourceSpecificPermissionsDetection
  • Knows about the Condition Operators specific to each service (not just the global ones).ConditionOperators

There are some limitations:

  • Doesn’t deduplicate. If you add permissions it doesn’t go back and put them into existing statements, it just adds new statements that may duplicate parts of old ones.
  • Only generates JSON, so if you’re writing a YAML CloudFormation template you should translate.
  • Seems to have limited form validation on Condition Operators. You can put in strings that will never match because the API calls for that service can’t contain what you entered (making the statement a no-op).
  • Can end up grouping permissions in a way that makes some resource restrictions meaningless and grants more access than might be expected.TooMuchPermission
  • Sometimes it messes up the syntax. Seems to happen if you don’t put exactly what it expects into the forms.Bug

 

So there are a few problems, but this is still way better than it was before! My plan is to use the visual editor to write policies, then go through and touch it up afterward. Based on what I’ve seen so far, this cuts the time it takes me to develop policies by about 30%.

Happy securing,

Adam

The Fallacy of Rest

Hello!

A while back I made a bad scheduling mistake. I knew about the anti-pattern that caused it, but didn’t see myself using it. It forced me to push out dates that cost me some money.

Later I looked back to see what went wrong. It was exactly what I have advised others not to do. It’s easy to miss! I’m writing this article to re-expose the anti-pattern I used.

The project was Move to a New City. I would be taking my job with me. This is the schedule I wrote:

  • Week 1
    • Pack
    • Work
  • Week 2
    • Weekdays
      • Pack
      • Work
      • Clean
    • Weekend
      • Clean
      • Say goodbye to friends
  • Week 3
    • Monday (Vacation Day)
      • Exercise and rest
      • Say goodbye to friends
    • Tuesday (Vacation Day)
      • Return keys
      • Drive to new city (5 hours on the road)
      • Check in to AirBnB
      • Hang out with friend who lives in new city
    • Wednesday through Friday
      • Work
      • Look at new housing

Seems fine! I even budgeted time to exercise.

Tuesday of week 3. 100% on schedule. It’s bedtime and I’m watching an episode of The Dick Van Dyke Show on my laptop and laughing myself to sleep with Mary Tyler Moore’s performance. I feel awesome. I sleep like I’ve just run a marathon.

Wednesday. Mild headache (whatever – I’m an engineer, we get headaches). I catch up on work, message about a couple rentals, and attend the morning meetings. As the meetings are wrapping up I get a reply on a rental with a proposed time to view it. I can just barely make it, so I head out.

See the mistake yet? I still hadn’t. Wednesday was a busy day and I felt rushed, but I’ve had lots of busy days. I just kept going. I didn’t make the mistake on Wednesday.

That afternoon I got one more email about a rental. It was a wafer-thin mint (see Monty Python’s The Meaning of Life ⬅️ this is how I am making the post about Python). Suddenly getting through the rest of my inbox felt like climbing a mountain. I was burnt out.

The mistake happened when I first wrote the schedule. Here’s the fallacy I used:

People are like horses. Rest them two hours a day and one full day every week or so and they’re fine. Feed and water three times a day.

People are not like horses. They can’t sustain themselves on periodic rest intervals.

Here’s how people work:

Productive workers have a budget of hours per week. When those hours are spent they spend themselves to keep going. Once too much of themselves is gone, they stop producing.

I wrote a schedule in the mindset of making sure I had rest intervals, but I should have figured out the hours needed and divided that by my sustainable weekly hours (a number I’ve learned during two decades of working). That would be the total weeks really needed to complete the move.

Going back over the hours I spent I found I had scheduled 200% of my sustainable capacity and had expected to sustain that for most of a month. (╯°□°)╯︵ ┻━┻

Another way to look at my mistake is that I didn’t count saying goodbye to friends as work (just like I sometimes forget to count attending meetings as work). In the context of human capacity, leaving behind your friends is absolutely work (just like sitting in a frustrating meeting is). It drains your budget of hours. If you do too much of it, you exhaust.

To write a schedule that workers can reliably complete, budget based on what workers can do per week and make sure you get that amount from their real history of work. Don’t make it up, look back at the past and compute it.

I’m going to bed. Happy scheduling!

Adam

A Book from 2017: Stretch Goals and Prescriptions

Happy New Year!

Today’s post is a little outside my usual DevOps geekery, but it’s been an influencer on my work and my career choices this year so I wanted to share it.

For the record, I have zero connections to 3M.

In my teens, I noticed that whenever I bought something with the 3M logo it was noticeably better than the other brands. I didn’t know what 3M was, but this pattern kept repeating and I started to always choose them. Years later, deep inside a career in technology, I was still choosing 3M. I started to ask myself how they did it. Why were all their products better than everyone else’s?

I didn’t know anyone at 3M, so I found a book. The 3M Way to Innovation: Balancing People and Profit.

the3mwaytoinnovation.jpg

Balance? At work? And still better than everyone else? Bring it on.

The book approaches 3M through their innovations. They built hugely successful product lines in everything from sandpaper to projectors, and it turns out other companies have long looked to them as the top standard for the innovation that drives such diverse success. As I worked through the book, one thing really stuck with me: 3M’s definition of Stretch Goals.

I’ve seen a lot of managers ask their teams what can be accomplished in the next unit of time (sprint, quarter, etc.). Often, the team replies with a list that’s shorter than the manager would like. The manager then over-assigns the team by adding items as “stretch goals”. If the team works hard enough and accomplishes enough, they’ll have time to stretch themselves to meet these goals. The outcome I usually see is pressure for teams to work longer hours (with no extra pay) so they can deliver more product (at no extra cost to the company).

This book described 3M’s stretch goals very differently, which I’ll summarize in my own words because it’s characterized throughout the book and there’s no single quote that I think captures it. 3M sets these goals to stretch an aspect of the business that’s needed for it to remain a top competitor, and they’re deliberately ambitious. For example, one that 3M actually used: 30% of annual sales should come from products introduced in the last four years. Goals like these drive innovation because they’re too big to meet with the company’s current practices.

The key difference is that 3M isn’t trying to stretch the capacity of individuals. They’re not trying to increase Scrum points by pushing everyone to work late. They’re setting targets for the company that are impossible to meet unless the teams find new ways to work. They’re driving change by looking for things that can only be done with new approaches; things that can’t be done just by working longer hours. And after they set these goals, they send deeply committed managers out into the trenches to help their teams find and implement these changes. Most of the book is about what happens in those trenches. I highly recommend it.

There’s one other thing from the book I want to highlight: the process of innovation doesn’t simplify into management practices you can choose off a menu. There’s more magic to it than that. It takes skilled leaders and a delicate combination of freedom and pressure to build a company where the best engineers can do their best work, and trying to reduce that to a prescription doesn’t work. Here’s a quote from Dick Lidstad, one of the 3M leaders interviewed for the book, talking about staff from other companies who come to 3M looking to learn some of the innovation practices so they can implement them in their own teams:

They want to take away one or two things that will help them to innovate. … We say that maintaining a climate in which innovation flourishes may be the single biggest factor overall. As the conversation winds down, it becomes clear that what they want is something that is easily transferable. They want specific practices or policies, and get frustrated because they’d like to go away with a clear prescription.

I heard truth in that quote. Despite being a believer in the value of tools like Scrum, which are supposed to foster creativity and innovation, I’ve spent a lot of my career held back by the overhead of process that’s good in principle but applied with too little care to be effective. Ever spent an entire day in Scrum ceremonies? There’s more value in the experience of 3M’s teams overall than there is in any list of process.

This book was written in 2000, but not only has 3M stock continued to perform well, I found many parallels in the stories this author tells and my own experience in the modern tech world. It’s heavy with references and first-hand interviews, and I think it’s a valuable read for anyone in tech today.

If you read it, let me know what you think!

Adam

Production-ready Scripts and Python

Production is hard. Even a simple script that looks up queue state and sends it to an API gets complex in prod. Without tests, the divide by zero case you missed will mask queue overloads. Someone won’t see that required argument you didn’t enforce and break everything when they accidentally publish a null value. You’ll forget to timestamp one of your output lines and then when the queue goes down you won’t be able to correlate queue status to network events.

Python can help! Out of box it can give you essential but often-skipped features, like these:

  • Automated tests for multiple platforms.
  • A --simulate option.
  • Command line sanity like a --help option and enforcement of required arguments.
  • Informative log output.
  • An easy way to build and package.
  • An easy way to install a build without a git clone.
  • A command that you can just run like any other command. No weird shell setup or invocation required.

It can be a little tricky, though, if you haven’t already done it. So I wrote a project that demonstrates it for you. It includes an example of a script that isn’t ready for prod.

Hopefully this will save you from some of the many totally avoidable, horrible problems that bad scripts have caused in my prods.

Thanks for reading!

Adam

Pear: A Better Way to Deploy to AWS

A while back I wanted to put a voice interface in front of my deployment automation. I think passwords on voice interfaces are annoying and aren’t secure, so I wanted an unauthenticated system. I didn’t want to lay awake at night worried about all the things people could break with it, so I set out to engineer a deployment infrastructure that I could put behind voice control and still feel was secure and reliable.

After a lot of learning (the journey towards this is where the Life Coach and the Better Alexa Quick Start came from) and several revisions, I had built my new infrastructure. For reasons I can’t remember, I called it Pear.

Pear is designed to make it easy to add slick features like voice interfaces while giving you enough control to stay secure and enough stability to operate in production. It’s meant for infrastructures too complex to fit in tools like Heroku or Elastic Beanstalk, for situations where you need to be able to turn all the knobs. I think it achieves those goals, so I decided to publish it. Check out the repo for an example implementation and more complete documentation of its features, but to give you a taste here’s a diagram of the basic architecture:

pear architecture

Cheers!

Adam

The Service Checklists

One day I get a text from the illimitable Kai Davis. He’s had a Bad Moment.

Adam. I have terrible OpSec.

A former user had deleted a bunch of files. Luckily, he was able to recover.

Teach me how to OpSec.

No worries buddy. I got you.

Kai is a power user, and in today’s Internet that means he subscribes to two dozen hosted services. How do you manage two dozen services and keep any kind of sanity? I do it with checklists (⬅️ read this book).

Before I show them to you, we need to cover one of the Big Important Things from Mr. Gawande’s book. Kai already knows how to manage his services. He just needs to make sure he hasn’t forgotten something important like disabling access for former users.

I wrote Kai two checklists. One to use monthly to make sure nothing gets missed and one to use when setting up new services to reduce the monthly work. I assume he has a master spreadsheet listing all his services. Kai’s Bad Moment categorizes as OpSec, but I didn’t limit these lists to that category.

Hopefully, these help you as well.

The Monthly Checklist

  • Can I cancel this service?
  • Should I delete users?
  • Should I change shared passwords?
  • Should I un-share anything?
  • Should I force-disconnect any devices?
  • Is the domain name about to expire?
  • Is the credit card about to expire?
  • Am I paying for more than I use?
  • Should I cancel auto-renewal?
  • Are there any messages from the provider in my account? (new!)
  • Is the last backup bigger than the one before it?

The Setup Checklist

  • Add row to master spreadsheet.
  • Save URL, account ID, username, password, email address, and secret questions in 1password.
  • Sign up for paperless everything.
  • Enter phone number and mailing address into account profile.
  • Review privacy settings.
  • Enable MFA.
  • Send hardcopy of MFA backup codes offsite.
  • Setup recurring billing.
  • Set alarm to manually check the first auto-bill.
  • Set alarm to revisit billing choices.
  • Set schedule for backups.
  • Check that backups contain the really important data.
  • Create a user for my assistant.
  • Confirm my assistant has logged in.

Some Notes

Monthly

  • Can I cancel this service? I always ask “can I”, not “should I”. There’s always a reason to keep it, but I want a reason to nuke it.
  • Am I paying for more than I use? I look at current usage, not predicted usage. The number is often not actionable, but it’s a good lens.

Setup

  • Save URL, account ID, username, password, email address, and secret questions in 1password. The URL matters because 1password will use it to give you warnings about known vulnerabilities that you need to change your password to remediate. The email address and username may seem redundant, but having both has saved me a bunch of times. Same with secret questions.
  • Enter phone number and mailing address into account profile. These make recovery and support calls easier.
  • Review privacy settings. Remember, Kai already knows how to manage his services. He knows how to pick good privacy settings. But privacy settings are often hidden and it’s easy to forget them when signing up.
  • Enable MFA. I know it sucks, but the security landscape gets worse every day. Use this for anything expensive or private.
  • Send hardcopy of MFA backup codes offsite. I have watched people spend months on account recovery when their phones die and they lose their Google Auth.
  • Set alarm to manually check the first auto-bill. This saves me all the time. All. The. Time.
  • Set alarm to revisit billing choices. This has saved me thousands of dollars.
  • Set schedule for backups. Even if it’s an alarm to do a manual backup once a month.

Stay safe!

Adam