Developer Uncertainty

One of the most challenging codebases I've ever worked in was a legacy Java codebase. And before your mind wanders to the Enterprise Fizz Buzz codebase, let me say that it wasn't hard for the reasons you'd expect.
The developers that worked on it before me were incredibly talented. The company itself invested a ton in training new engineers. We read and discussed Effective Java, Working Effectively with Legacy Code, and Clean Code (which can be great if you don't take it at face value).
Like any codebase, it had a healthy mix of tech debt (from times when things needed to ship fast) and really well abstracted code (from refactors often from engineers angry at how "bad" the code was).
The biggest problem with this codebase was that it had a plugin system that allowed you to extend anything.
Plugins: Great for extensibility, awful if there's no API
The plugin system was one of the ways the company was able to ship so quickly. Product teams can often get bogged down in tiny requests for individual customers. The product team can continue working on higher impact problems if plugins could be built to satisfy those requests.
The problem was there was no formal API for these plugins - at least not for a long time. Plugins could depend on functionality that was undocumented. They could depend on fields that weren't meant to be used.
The other problem was plugins were very important. When you hire talented people and show them a customer's problem, those people will figure something out. Some of the plugins were incredibly creative and part of customer’s daily workflows. Some plugins were meta-plugins to help people build plugins faster.
But all this came at an unfortunate cost.
Every change is a breaking change
Notably, plugins weren't centralized / built alongside the product.
Imagine you get a bug report and it looks pretty trivial. You implement a fix and maybe clean up the code a bit along the way because you are a good citizen.
The tests pass, the build passes, someone reviews it, it merges, and gets released. All sounds pretty normal? Well, you removed a function that a plugin depended on and now your fix is destined to break some customers.
Every change becomes scary.
You definitely don't refactor / clean up code anymore, it's not worth the risk.
I do want to mention that the company actually did a good job of getting out of this situation. You can't really solve this overnight, you have to solve it in steps. Popular plugins got explicit testing before releases, a formal API was developed and plugins were moved over, and over time this product was slowly replaced with something with a better extensibility story.
Developer uncertainty comes in many sizes
The underlying problem here is it's difficult for an engineer to ever be confident in the code they wrote. This is obviously among the most extreme versions of this possible, but every repo has versions of it.
Committing to a python repo with no type hints and minimal tests? You are likely going to spend some time on each commit repeatedly checking “Find References” to see if you accidentally broke some existing code.
There are so many things that can give a developer uncertainty, like:
- A confusing abstraction they have to use
- “Magic”-heavy code/frameworks, that takes some deciphering to understand
- Not using their normal IDE
- Imposter syndrome (because it’s not just about the code sometimes)
And while every repo and person is different, developer uncertainty always has the same outcome: everything takes longer.
Figuring out how to approach the problem takes longer.
The time between when you are “done” and when you put up the PR takes longer.
Code reviews take longer.
So, how do we combat it?
Reducing developer uncertainty (a.k.a. ship faster)
It’s worth acknowledging the big, obvious cases that’ll reduce uncertainty like:
- Adding good tests to important areas of the codebase
- Choosing a type-safe language
- Clear, explicit boundaries between different services
- Custom linting rules for common footguns
- Reducing the number of languages/frameworks/infrastructure pieces that a developer needs to context switch between
These are all important, but they are large undertakings for existing repos - if they are possible at all.
They do all get at a similar idea, which is they are trying to:
Reduce the number of things a developer needs to worry about to get their job done
Let’s look at a more practical example of how to achieve this.
Over time, your code becomes a template for others
As a simple example, let’s contrast these two blocks of code (overly simplified for the sake of example):
// Option 1
app.post('/item', requireUser, (req, res) => {
const item: CreateItemRequest = parseBody(req.body)
// Throws an error when the item is invalid
validateUserCanCreateItem(req.user, item)
const createdItem = ItemController.createItem(req.user, item)
res.json(createdItem)
})
// Option 2
app.post('/item', (req, res) => {
const item: CreateItemRequest = parseBody(req.body)
// Note: a different type is returned from this function
const validatedItem: Validated<CreateItemRequest> =
validateUserCanCreateItem(req.user, item);
// createItem takes in the Validated version
const createdItem = ItemController.createItem(req.user, validatedItem)
res.json(createdItem)
})
In option 1, I can forget validateUserCanCreateItem
with no repercussions. In option 2, trying to pass item
to my createItem
function would error and remind me that I need to validate it.
At a small scale, these options are both fine, but as you have more and more cases that you may need to check, it can be cumbersome to keep that all in your head. Eventually, you may end up with custom validation logic, custom authorization logic, pricing plan checks, feature flag checks, and more.
The uncertainty with option 1 comes from the fear that you forgot an important step and there’s no real guardrails to force you to do things correctly. When I work in projects like this, I often find myself searching through the codebase for other, similar examples that I can copy from or some indication that I did things “correctly.”
We can actually take option 2 even further by adding a custom builder for generating the boilerplate logic within a route:
app.post('/item', Routes.new()
.requireLoggedInUser()
.noPaymentPlanRequired()
.parseJsonBody()
.checkUserPermission((user, body) => {...})
.handle(async ({req, res, user, body}) => {
// ...
})
)
We added a stepwise builder to force the developer to answer some questions about the routes' requirements.
After you type .requireLoggedInUser().
, your IDE will pop up with a few options like noPaymentPlanRequired
, disallowPaymentPlans
, and allowOnlyThesePaymentPlans
. You pick one and then are asked another question about how the route should work.
Note that there are a bunch of other ways to model this (e.g. turning this into re-usable middleware may be more appropriate), but I have a soft spot for stepwise builders.
The important thing here is I am actually forced to do the right thing. I can't get the user
unless I also answer the question of what payment plan is required to call this route.
When I make my first route, I am way more confident in it. And as an added bonus, it takes me less time to do, because I don't have to spend time reading a bunch of other routes, collecting possible options for my route.
Is this pattern overkill? Sometimes, absolutely.
I would never bother doing this for a side project or an MVP. But as more and more people work on the same codebase, figuring out the footguns and putting guardrails in place becomes very important.
What else do you worry about?
What common problems do you see at your job?
For us, we have a product that has external-facing APIs, and we care a lot about keeping them backwards compatible.
In our tests, we instrumented every API call with Insta Snapshots. We still have separate assertions about the responses, but the snapshots allow us to ensure the shape of the responses either didn’t change or it’s changes are documented.
Why? For one, it allows us to quickly answer the question of “Did this change affect any APIs?” but I’m very confident that both the person writing the code and the person reviewing it could answer that question themselves.
The more important thing we get is that we can remove it entirely from our mental checklist of things we check PRs for.
Homework for the class
A good exercise is to take the common problems / sources of confusion and see if you can’t make them impossible (short of someone being adversarial). Maybe there’s a fancy meta-test, maybe it’s a new abstraction that removes some commonly mis-used options, maybe it's a formal API for your plugin framework and enforcement that plugins can only use that API.
Whatever it is, the north star is a new developer can’t even accidentally make the mistake.
This is not always possible, but when it is, you don’t just get the benefit of no one makes that mistake again. You get the benefit of - no one needs to worry whether they are making that mistake again.
That’s where a lot of the magic happens. I can code, review, and ship faster because a class of problems has been taken off my plate.
Programming is easier when you have fewer things to juggle in your head, and everything moves a bit faster.
And whatever you do, please, please make formal APIs for your plugins.