Notes from Software Engineering at Google

Differences between programming and software engineering

Time

  • On a software engineering project, engineers need to be more concerned with the passage of time and the eventual need for change.
Should I upgrade?
  • When you are fundamentally incapable of reacting to a change in underlying technology or product direction, you’re placing a high-risk bet on the hope that such a change never becomes critical.
  • For any project that didn’t plan for upgrades from the start, that transition is likely very painful.
  • You’re performing a task that hasn’t yet been done for this project; more hidden assumptions have been baked-in.
  • The engineers trying to do the upgrade are less likely to have experience in this sort of task.
  • The size of the upgrade is often larger than usual, doing several years’ worth of upgrades at once instead of a more incremental upgrade. And thus, after actually going through such an upgrade once (or giving up part way through), it’s pretty reasonable to overestimate the cost of doing a subsequent upgrade and decide “Never again.” Companies that come to this conclusion end up committing to just throwing things out and rewriting their code, or deciding to never upgrade again.
  • Expect the first upgrade for a codebase to be significantly more expensive than later upgrades, even controlling for other factors.

Scale

Hyrum’s Law

  • With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.
  • We cannot assume perfect adherence to published contracts or best practices.
  • In practice, the complexity and difficulty of a given change also depends on how useful a user finds some observable behavior of your API. If users cannot depend on such things, your API will be easy to change. Given enough time and enough users, even the most innocuous change will break something;

How to upgrade

A new Widget has been developed. The decision is made that everyone should use the new one and stop using the old one.

Not scalable
  • To motivate this, project leads say “We’ll delete the old Widget on August 15th; make sure you’ve converted to the new Widget.”
  • Teams depend on an ever-increasing number of Widgets, and a single build break can affect a growing percentage of the company.
  • Forcing users to respond to churn means that every affected team does a worse job ramping up, solves their immediate problem, and then throws away that now-useless knowledge.
Scalable
  • Instead of pushing migration work to customers, teams can internalize it themselves, with all the economies of scale that provides.
  • Infrastructure teams must do the work to move their internal users to new versions themselves or do the update in place, in backward-compatible fashion. * Dependent projects are no longer spending progressively greater effort just to keep up.
  • Having a dedicated group of experts execute the change scales better than asking for more maintenance effort from every user: experts spend some time learning the whole problem in depth and then apply that expertise to every subproblem.
  • “If a product experiences outages or other problems as a result of infrastructure changes, but the issue wasn’t surfaced by tests in our Continuous Integration (CI) system, it is not the fault of the infrastructure change.”
    • “If you liked it, you should have put a CI test on it,”
    • Complicated, one-off bespoke tests that aren’t triggered by our common CI system do not count. Without this, an engineer on an infrastructure team could conceivably need to track down every team with any affected code and ask them how to run their tests.

Trade-offs

  • As software engineers, we are asked to make more complex decisions with higher-stakes outcomes, often based on imprecise estimates of time and growth.
  • We might sometimes defer maintenance changes, or even embrace policies that don’t scale well, with the knowledge that we’ll need to revisit those decisions. Those choices should be explicit and clear about the deferred costs.

Make decisions

  • It is important for there to be a decider for any topic and clear escalation paths when decisions seem to be wrong, but the goal is consensus, not unanimity. It’s fine and expected to see some instances of “I don’t agree with your metrics/valuation, but I see how you can come to that conclusion.”
  • Some of the quantities are subtle, or we don’t know how to measure them. Sometimes this manifests as “We don’t know how much engineer-time this will take.” Sometimes it is even more nebulous: how do you measure the engineering cost of a poorly designed API? Or the societal impact of a product choice?

How to Work Well on Teams

  • Ensuring that there is at least good documentation in addition to a primary and a secondary owner for each area of responsibility
  • Projects run into unpredictable design obstacles or political hazards, or we simply discover that things aren’t working as planned. Requirements morph unexpectedly. How do you get that feedback loop so that you know the instant your plans or designs need to change? Answer: by working in a team. Most engineers know the quote, “Many eyes make all bugs shallow,” but a better version might be, “Many eyes make sure your project stays relevant and on track.”
  • Group teams of four to eight people together in small rooms (or large offices) to make it easy (and non-embarrassing) for spontaneous conversation to happen.

Lose the ego

  • If you perform a root-cause analysis on almost any social conflict, you can ultimately trace it back to a lack of humility, respect, and/or trust.
  • Even if you know you’re the wisest person in the discussion, don’t wave it in people’s faces. For example, do you always feel like you need to have the first or last word on every subject? Do you feel the need to comment on every detail in a proposal or discussion?
  • “The appearance of conforming gets you a long way.” If you chose to assert your ego in any number of ways, “I am going to do it my way,” you pay a small steady price throughout the whole of your professional career. And this, over a whole lifetime, adds up to an enormous amount of needless trouble.

Learn to give and take criticism

  • For example, if you have an insecure collaborator, here’s what not to say: “Man, you totally got the control flow wrong on that method there. You should be using the standard xyzzy code pattern like everyone else.” This feedback is full of antipatterns: you’re telling someone they’re “wrong” (as if the world were black and white), demanding they change something, and accusing them of creating something that goes against what everyone else is doing (making them feel stupid). Your coworker will immediately be put on the offense, and their response is bound to be overly emotional.
  • A better way to say the same thing might be, “Hey, I’m confused by the control flow in this section here. I wonder if the xyzzy code pattern might make this clearer and easier to maintain?”

Knowledge sharing

  • Engineers have a tendency to reach for “this is bad!” far more quickly than is often warranted, especially for unfamiliar code, languages, or paradigms
  • The first time you learn something is the best time to see ways that the existing documentation and training materials can be improved

Challenges to Learning

A group of people that is split between people who know “everything” and novices, with little middle ground. This problem often reinforces itself if experts always do everything themselves and don’t take the time to develop new experts through mentoring or documentation. In this scenario, knowledge and responsibilities continue to accumulate on those who already have expertise, and new team members or novices are left to fend for themselves and ramp up more slowly.

Tribal and written knowledge complement each other. Even a perfectly expert team with perfect documentation needs to communicate with one another, coordinate with other teams, and adapt their strategies over time. No single knowledge-sharing approach is the correct solution for all types of learning,

Psychological Safety in Large Groups

  • No “well-actuallys”
    • Pedantic corrections that tend to be about grandstanding rather than precision.
  • No back-seat driving
    • Interrupting an existing discussion to offer opinions without committing to the conversation.