The New Project Manager’s Glossary: Cloud and DevOps

I often meet Project Managers who are new to the cloud or DevOps or sometimes new to software altogether. There’s plenty of jargon in these spaces, and often definitions are hard to find. Quite a few folks have asked me to help define the jargon, so I decided to write it up.

This is an opinionated list. It’s also a simplification. It summarizes what I’ve personally learned in my years in these spaces. There are other definitions, but these should get you close enough to work within the context of conversations.

This list starts with the boring terms that you’re most likely to already know and builds them up into the more esoteric ones. Sort of. There’s a lot of interconnection. If you see a term you don’t know, try looking farther down in the list.

To make the examples easier to follow, imagine you work for The Golf-Stuff Company. You make a golfing website where golf enthusiasts can buy golf stuff. The product is the golfing website. The customers are the golfers. My definitions are written around this example case.

Code: A synonym of software and of program. You “write” or “develop” code/software/programs. Code is the informal term, software is the formal one. Program is an older word that nobody says anymore.

Development: The process of writing code. A synonym of coding and of programming. Coding is the informal term, developing is the more formal one. Programming is still in use, but it’s less common. “Coders developing software” means the same thing as “software engineers writing code” means the same thing as “coders coding”. Technically those are the same as “programming programs”, but nobody would say it that way.

Application Development: The same as development, but specifically the development of the golfing website. This distinction matters because DevOps engineers are also developers who write software, but their software never gets used by customers.

End User: The customers who actually use the final product. Golfers who buy golf stuff from your golfing website. They’re the people at the “end” of the whole system of technology that makes the website work.

Server: A computer that runs the golfing website. Similar to a laptop running Netflix. Fundamentally, servers are the same type of thing as laptops, they’re just used for different purposes.

Compute Resource: A server, but in the cloud. This is one of the biggest simplifications in this list, but it’s good enough to get the context of most conversations. Engineers mostly say “server” even when they technically mean “compute resource”. See “serverless” below.

Infrastructure: A bunch of servers all hooked together. Infrastructure includes all the connecting bits (like the networks that they use to communicate). Individual servers aren’t good for much without the infrastructure they live in. Modern golfing websites run on complex infrastructures, not on individual servers. Infrastructure comes in endless varieties.

Serverless: Technically a better way to say this is “serverless platform”, but a lot of people just say “serverless”. A type of compute resource that doesn’t require you to manage your own servers. That reduces the amount of deployment automation that DevOps engineers have to write. Today, not all products are compatible with serverless. Serverless platforms are services sold as part of clouds, and each one is different. If your application works in one serverless platform it may not work in another. It’s common to say “going serverless” when you mean “assigning our application developers to make our product compatible with Amazon’s lambda serverless platform (because we’re tired of managing servers)”.

Containers: Containers allow engineers to create mini-servers for their products that can be easily started and stopped on whatever infrastructure needs them. This simplifies deploying the same product to different infrastructures (e.g. you might sell it as a product that multiple customers would each want to run in their own infrastructure). It can also simplify adding and removing capacity because it’s easy to add and remove more copies of the same container.

I’m going to pause the list here and note that servers, compute resources, serverless platforms, and containers are all interconnected concepts that can combine and overlap in endless varieties. A lot of the work done by DevOps engineers today is around deciding which patterns of these to use.

Deployment: The golfing website runs on infrastructure. To run, it has to be deployed. Code has to be copied over, configuration entered, commands run. Similar to how you have to install the Netflix app on a laptop before you can stream video. Together, the outcome of these actions is the deployment.

Deployment Automation: Software that deploys other software to infrastructure. It’s cheaper and more reliable to build a tool to deploy your product than to let an error-prone human do it by hand. Today, most golfing websites have two major components: the actual product code and the deployment automation code that manages its infrastructure.

Deployment Pipeline: Tooling built around deployment automation that delivers the golfing website to infrastructure. Like any software, deployment automation has to actually run somewhere (e.g. on compute resources). The deployment pipeline is that somewhere. You might ask, “what runs the deployment pipeline?” A fair question with no easy answer. This is a chicken-and-egg situation and the implementations vary a lot. Typically the pipeline and the deployment automation are part of the same code, but that’s not something that matters much outside of an engineer’s world.

Build Pipeline: This is beyond the scope of a cloud/DevOps list, but it’s worth distinguishing from deployment pipelines. Build pipelines are the tools that deliver the golfing website code to deployment automation. They’ll do things like run tests to see if there are bugs, do some formatting to make it easier to deploy, etc.

Build: A packaged version of the golfing website that’s ready to deploy. Typically this is the output of a build pipeline. It’s possible to deploy software that hasn’t been “built”, but that’s generally considered a bad practice. The details here vary a lot, but it’s usually good enough to know that a build is the outcome of application development and is also the thing that is deployed to infrastructure.

Release: A version of the golfing website. There is usually a “build” of a “release”. The distinction isn’t important in very many non-technical conversations. This can also be a verb: “we’re going to release the latest version of the golfing website on Thursday”.

The Cloud: A misnomer. There isn’t a cloud, there are many clouds. Clouds are products owned by corporations. Clouds provide infrastructure where you can run golfing websites. Each cloud is different, and if you build a product on one it won’t (easily) work on another. Typically clouds allow you to increase and decrease what you use (and pay for) day to day. Historically, you’d have to buy enough servers to handle your most busy day even if that meant a bunch of it sat idle on your least busy day. Clouds have grown far beyond just that one benefit, they provide all kinds of ancillary services, but at the core their value is on-demand pricing. You pay for what you’re using right now, not what you might need to use tomorrow.

AWS: Amazon Web Services. A cloud. Owned by Amazon. Distinct from amazon.com. Amazon.com is an e-commerce product that is deployed to AWS. If someone says they’re going to “the cloud”, they likely mean AWS. At time of writing, AWS had the largest market share of all the clouds.

Azure: A cloud. Owned by Microsoft.

Google Cloud: A cloud. Owned by Google. Distinct from the Google search engine.

Application Developer: An engineer who writes the golfing website code.

System Administrator: Also called a sysadmin. An engineer who manually deploys the golfing website to infrastructure. These roles have been mostly replaced by DevOps.

Operator: A technician who monitors running infrastructure and responds if there are problems (so if golfers report that they can’t get to the golfing site, an operator will be the first person to do something about it). In environments without automation, operators are also typically responsible for deploying code to infrastructure. Increasingly these roles are being replaced by automation developed by DevOps Engineers.

DevOps Engineer: An engineer who writes deployment automation. So if you want your golfing website deployed to the AWS cloud, you’d need a DevOps engineer to write automation to do that. DevOps roles often include other responsibilities, but this is the core.

SRE: Site Reliability Engineer. Usually this is the same role as DevOps engineer, just under a different name. ⬅️ This definition will start fights with a lot of people. I recommend never saying this. It’s enough to know that SREs typically have very similar jobs to DevOps engineers.

I hope this helped! Happy project managing,

Adam

The Fallacy of Rest

Hello!

A while back I made a bad scheduling mistake. I knew about the anti-pattern that caused it, but didn’t see myself using it. It forced me to push out dates that cost me some money.

Later I looked back to see what went wrong. It was exactly what I have advised others not to do. It’s easy to miss! I’m writing this article to re-expose the anti-pattern I used.

The project was Move to a New City. I would be taking my job with me. This is the schedule I wrote:

  • Week 1
    • Pack
    • Work
  • Week 2
    • Weekdays
      • Pack
      • Work
      • Clean
    • Weekend
      • Clean
      • Say goodbye to friends
  • Week 3
    • Monday (Vacation Day)
      • Exercise and rest
      • Say goodbye to friends
    • Tuesday (Vacation Day)
      • Return keys
      • Drive to new city (5 hours on the road)
      • Check in to AirBnB
      • Hang out with friend who lives in new city
    • Wednesday through Friday
      • Work
      • Look at new housing

Seems fine! I even budgeted time to exercise.

Tuesday of week 3. 100% on schedule. It’s bedtime and I’m watching an episode of The Dick Van Dyke Show on my laptop and laughing myself to sleep with Mary Tyler Moore’s performance. I feel awesome. I sleep like I’ve just run a marathon.

Wednesday. Mild headache (whatever – I’m an engineer, we get headaches). I catch up on work, message about a couple rentals, and attend the morning meetings. As the meetings are wrapping up I get a reply on a rental with a proposed time to view it. I can just barely make it, so I head out.

See the mistake yet? I still hadn’t. Wednesday was a busy day and I felt rushed, but I’ve had lots of busy days. I just kept going. I didn’t make the mistake on Wednesday.

That afternoon I got one more email about a rental. It was a wafer-thin mint (see Monty Python’s The Meaning of Life ⬅️ this is how I am making the post about Python). Suddenly getting through the rest of my inbox felt like climbing a mountain. I was burnt out.

The mistake happened when I first wrote the schedule. Here’s the fallacy I used:

People are like horses. Rest them two hours a day and one full day every week or so and they’re fine. Feed and water three times a day.

People are not like horses. They can’t sustain themselves on periodic rest intervals.

Here’s how people work:

Productive workers have a budget of hours per week. When those hours are spent they spend themselves to keep going. Once too much of themselves is gone, they stop producing.

I wrote a schedule in the mindset of making sure I had rest intervals, but I should have figured out the hours needed and divided that by my sustainable weekly hours (a number I’ve learned during two decades of working). That would be the total weeks really needed to complete the move.

Going back over the hours I spent I found I had scheduled 200% of my sustainable capacity and had expected to sustain that for most of a month. (╯°□°)╯︵ ┻━┻

Another way to look at my mistake is that I didn’t count saying goodbye to friends as work (just like I sometimes forget to count attending meetings as work). In the context of human capacity, leaving behind your friends is absolutely work (just like sitting in a frustrating meeting is). It drains your budget of hours. If you do too much of it, you exhaust.

To write a schedule that workers can reliably complete, budget based on what workers can do per week and make sure you get that amount from their real history of work. Don’t make it up, look back at the past and compute it.

I’m going to bed. Happy scheduling!

Adam

A Book from 2017: Stretch Goals and Prescriptions

Happy New Year!

Today’s post is a little outside my usual DevOps geekery, but it’s been an influencer on my work and my career choices this year so I wanted to share it.

For the record, I have zero connections to 3M.

In my teens, I noticed that whenever I bought something with the 3M logo it was noticeably better than the other brands. I didn’t know what 3M was, but this pattern kept repeating and I started to always choose them. Years later, deep inside a career in technology, I was still choosing 3M. I started to ask myself how they did it. Why were all their products better than everyone else’s?

I didn’t know anyone at 3M, so I found a book. The 3M Way to Innovation: Balancing People and Profit.

the3mwaytoinnovation.jpg

Balance? At work? And still better than everyone else? Bring it on.

The book approaches 3M through their innovations. They built hugely successful product lines in everything from sandpaper to projectors, and it turns out other companies have long looked to them as the top standard for the innovation that drives such diverse success. As I worked through the book, one thing really stuck with me: 3M’s definition of Stretch Goals.

I’ve seen a lot of managers ask their teams what can be accomplished in the next unit of time (sprint, quarter, etc.). Often, the team replies with a list that’s shorter than the manager would like. The manager then over-assigns the team by adding items as “stretch goals”. If the team works hard enough and accomplishes enough, they’ll have time to stretch themselves to meet these goals. The outcome I usually see is pressure for teams to work longer hours (with no extra pay) so they can deliver more product (at no extra cost to the company).

This book described 3M’s stretch goals very differently, which I’ll summarize in my own words because it’s characterized throughout the book and there’s no single quote that I think captures it. 3M sets these goals to stretch an aspect of the business that’s needed for it to remain a top competitor, and they’re deliberately ambitious. For example, one that 3M actually used: 30% of annual sales should come from products introduced in the last four years. Goals like these drive innovation because they’re too big to meet with the company’s current practices.

The key difference is that 3M isn’t trying to stretch the capacity of individuals. They’re not trying to increase Scrum points by pushing everyone to work late. They’re setting targets for the company that are impossible to meet unless the teams find new ways to work. They’re driving change by looking for things that can only be done with new approaches; things that can’t be done just by working longer hours. And after they set these goals, they send deeply committed managers out into the trenches to help their teams find and implement these changes. Most of the book is about what happens in those trenches. I highly recommend it.

There’s one other thing from the book I want to highlight: the process of innovation doesn’t simplify into management practices you can choose off a menu. There’s more magic to it than that. It takes skilled leaders and a delicate combination of freedom and pressure to build a company where the best engineers can do their best work, and trying to reduce that to a prescription doesn’t work. Here’s a quote from Dick Lidstad, one of the 3M leaders interviewed for the book, talking about staff from other companies who come to 3M looking to learn some of the innovation practices so they can implement them in their own teams:

They want to take away one or two things that will help them to innovate. … We say that maintaining a climate in which innovation flourishes may be the single biggest factor overall. As the conversation winds down, it becomes clear that what they want is something that is easily transferable. They want specific practices or policies, and get frustrated because they’d like to go away with a clear prescription.

I heard truth in that quote. Despite being a believer in the value of tools like Scrum, which are supposed to foster creativity and innovation, I’ve spent a lot of my career held back by the overhead of process that’s good in principle but applied with too little care to be effective. Ever spent an entire day in Scrum ceremonies? There’s more value in the experience of 3M’s teams overall than there is in any list of process.

This book was written in 2000, but not only has 3M stock continued to perform well, I found many parallels in the stories this author tells and my own experience in the modern tech world. It’s heavy with references and first-hand interviews, and I think it’s a valuable read for anyone in tech today.

If you read it, let me know what you think!

Adam

Credit Card Debt

Technical debt is like a new credit card, it often comes with a 0% introductory interest rate. In the short term tech debt can look like a win; you get the new feature on time, you automate a manual process, you patch the bug. Maybe the implementation wasn’t perfect, but dealing with a bit of funky code or living with a few bugs is better than missing the deadline.

That loan comes due right away, you have to live with what you wrote, but the interest comes later. In a month (or three or six) something will happen that magnifies the impact of that funkiness or those bugs. You’ll need an unrelated feature but because you monkey-patched in the config for the first feature you’ll be forced to rewrite the config system before you can start, adding days to your timeline. You’ll need to install a zero-day security patch but your runtime hack will force you to shut down before you can patch, causing an outage.

Like a credit card, tech debt is manageable. If you pay back the new card on the right schedule it can get you the new TV without making you miss a rent payment. If you clean up your runtime hack in the next couple weeks it’s unlikely that a zero-day patch will be released before you’re done. If you don’t pay it back or you take out too many new cards, you can end up like the guy who makes six figures but rents a basement because his credit cards cost him $3,000 every month. You’ll fall behind on new feature development because you can’t build anything without fixing three old hacks first.

Unlike a credit card, the introductory rates of tech debt are hard to predict. You don’t know how many months of freedom from interest you get, and they may expire at the worst times. That zero-day patch might come out the week after you push your funky code to prod and you’ll be stuck with an outage. You might gamble if you know you’ll still be within the month’s SLA, but if you’ve gambled on twenty things like that you’ve got great odds that the bill on several debts will blow up at a bad time.

Every win has to come with hard questions about the debt it sits on. How much of this implementation will we be forced to rewrite? Does this new feature really work or does it just mostly work but we haven’t looked deep enough to see the problems? Do the funky parts of this code overlap with upcoming work?

Loans can get you ahead, and are manageable if you’re careful, but if you win by taking out too many it won’t matter how far ahead they got you. You’ll fall behind when they get too heavy. You’ll be a six figure team living in a basement.