10 Engineering Best Practices Startups Should Ignore

Nov 10, 2021

There’s a lot of dogma in the field of software engineering. There always has been (and there used to actually be a lot more!). Part of the development of any craft is figuring out which practices are old and outdated and developing new practices in response to new challenges. Usually new standards are released and popularized at a faster rate than old standards are thrown out. So I’ve decided to throw my hat in the bag to try to even the scales.

Most (but not all) of these observations are thematically centered around exchanging some robustness in the name of developing software faster. Not all of these recommendations are going to make sense for large engineering organizations where the cost of mistakes is higher. Indeed, working as a software engineer at a large company is a completely different occupation than working as a software engineer on a small team.

Here are some “best practices” that I have gotten little to no value out of while working as a software engineer within organizations ranging from three to twenty people. Much credit should be given to places I’ve worked for being flexible and in general reasonable about these doctrines.

1. Set up 4 environments: local, dev, staging, and production.

It is common wisdom that teams should maintain four environments for running their applications: locally (on developers computers), remotely and auto-deployed from the development branch (dubbed “remote dev”), on production (in front of customers), and in a staging environment meant to nearly track production, but not be in front of customers. The idea is that developers can develop their features locally, test them locally or on the remote development environment, run some QA processes in the staging environment, and then finally push changes to production.

From my experience, having a staging environment has always felt awkward. It’s theoretical advantage over the remote dev environment lies in its ability to have more environment variables in common with production (for example application secrets for various integrations) and its ability and to be able to set in place a soon-to-be-deployed commit so that a QA team can thoroughly test a release without worrying about new commits being merged into the dev branch during testing.

For small companies, where the velocity of code changes being pushed into the dev branch is smaller, maintaining this extra environment and forcing deployments to go through manual testing on staging before deployment does more harm than good. It increases friction on teams hoping to deploy code which oftentimes leads to more shoddy manual QA rather than better testing. Staging environments generally have a smaller amount of persistent test data than the remote dev environment does, too, making tests in staging less exhaustive than tests in remote dev would be.

Even as teams scale, the right way to evolve a deployment process is to invest in thorough and automated CI/CD processes that automatically deploy modular bundles of code from remote dev directly to production in many cases, skipping the need for a staging environment at all.

2. Engineers should be expected to work up and down the stack.

Every software engineer at a startup considers themselves a “full stack engineer”, meaning they can (in theory) write code that renders visual components, handles data caching and memoization, exchanges data between services, writes to a persistent database, and defines infrastructure and networking throughout the app. In time, engineers at any small enough company will get experience working with all of these aspects of the stack.

But it does not, in practice, make sense to allocate engineers (especially new hires) to projects that span across the stack. Individuals almost always gravitate to their specialty, and are less motivated and less experienced when attacking problems outside of their domain. It is better for the company and for individuals to build knowledge deeply and then broadly, instead of broadly and then deeply. Encouraging engineers to dive into full-stack tasks just for the sake of having one person own a feature end-to-end or sharing knowledge among the team leads to half-hearted code that often has to be picked up and fixed later since it doesn’t fit within the conventions or frameworks used by specialists in other areas of the codebase.

It is far better to have a rigorously standardized way of addressing problems in each area of the stack built by specialized engineers than to have a mix of good code and bad code living throughout the full stack.

In time, engineers will pick up on more than their area of expertise, and having this knowledge distributed will be invaluable. But full stack expertise can’t and shouldn’t be pushed onto engineers.

3. Teams should decide together how to solve hard problems.

The de facto way of making big architecture decisions is for the responsible team to gather together (or virtually) to brainstorm solutions. Generally these meetings are lead by an engineering manager or a tech lead and focus around finding consensus. At best, these meetings are inefficient and are dominated by the one or two people most capable of architecting whatever particular software needs to be built. At worst, these meetings turn into settings where small disagreements are hashed out ad absurdum, or suboptimal decisions are committed to because of ill-preparation or a desire to incorporate elements of everyone’s ideas into the final design spec.

It is almost always better to have a strong engineer tackle the problem solo for a few days before getting any critique from the team. Only once a specific proposal (or perhaps a few options) are thoroughly pondered should these well-structured proposals be heard by other members of the team. Yes, there is still room for feedback within this model, but there is also no space for open-ended pondering or senseless squabbling.

4. It is okay to have bad code quality if you’re just building an MVP

The point of an MVP, once launched, is rapid iteration. The biggest inhibitor of rapid iteration is bad code. It is really difficult to re-engineer code once it is in front of customers as an MVP, both for the obvious reasons (the necessity of migrating code chief among them) and for more subtle reasons, such as different people tweaking the product from the ones who originally wrote its code.

It is the product function’s job at any startup to figure out what the various potential enhancements to an MVP might look like. Given this problem space, it is the engineering function’s job to design a system that can meet minimum requirements for what the first version will look like and feel like while also being extendable to meet whatever likely product evolution will inevitably manifest.

It is usually the product team that is pushing for a version of the feature to be done as quickly as possible, so that product feedback can be measured and digested pronto. Lazy product teams will assert that having a tangible version of a feature is a pre-requisite to having any insight about how the feature should or might evolve over time. The reality is that product and engineering teams should align very early on possible product directions and the engineering work that would need to be done for each trajectory. This can allow the engineering team to build an MVP with high code-quality that can quickly extend and evolve to align with the any of the product team’s plans.

5. It’s a good use of time to pull someone else’s branch and test it as part of your review process

It is generally a good instinct to want to write correct code and to enforce and protect standards that prevent one person’s changes from being deployed to production without oversight. This is not at trial.

What is at trial is effectively using time and making use of the remotely deployed development environment which can handle some degree of imperfection and is a very suitable (if not more suitable) environment to test other people’s code on. Pulling other people’s code and testing it can be very time consuming, and can interrupt whatever work an engineer is doing currently. What oft happens in cultures where extensive local testing is a normal part of the review process is that it will take hours for another engineer to get around to testing the PR in question, while the author becomes essentially blocked and demotivated.

There are exceptions to this rule that are important to carve out. It is very important to hold off on merging code into a remote development environment that could cause bugs that would block other engineers or that is more suitable to test locally than in the remote development environment. An example of the latter is work on a feature that is built to handle edge cases that might exist in production but don’t exist and aren’t easily reproducible in the remote development environment (but can through some means be experienced locally).

6. You should not merge a branch until the entire feature is done

From an operational point of view, it is perfectly logical to think of PRs as features or fixes to code that should be merged when that feature or fix is done and no sooner for fear of causing other bugs or making it harder to isolate problems down the line. This point of view takes a simplistic view of the various types of engineering hurdles that have to be overcome to write seemingly simple functionalities.

Meaningful features require migrations, backend changes, frontend changes, tests, minor refactoring, and the installment of new infrastructure or the setting of new environment variables. Some of these steps (migrations, infrastructure changes) are always safer to do in isolated PRs so that the (ideally automated) application of these changes can happen in sequential order. Migrations, for example, need to be applied before backend code expecting them can run error free.

Another reason to break up features into smaller parts is so that a team can parallelize work on such features. Merging in backend changes to a feature in advance can allow a frontend engineer to get working on their contribution without waiting on irrelevant parts of a monolith feature to come online. This also naturally fast-forwards bug discovery and disseminates knowledge about a soon-to-be-online fixture.

7. You should have clean commit messages within your Pull Requests

Pull requests should be small and atomic (see #6 above). They should be so small that they can be squashed into a single commit when merged into the dev branch with no essential information loss about what the function of that commit onto dev was (in case it needs to be reverted or consulted later). Since there’s generally no use in having teammates develop or test your code at a sub-PR portion size (see #5), the commit history within a PR should be viewed as scratch paper.

8. It is good to let engineers customize their development environments

Uniformity is more important than individuality within a codebase, and the same goes for development environments. This feels like an undemocratic ideal to promote, but I’ll write it down anyways.

Engineers with the urge to customize their development environments (setting up alternate debugging configurations, going bananas with custom bash aliases, etc) should be applauded, it’s hard work to do this and usually worth the investment. It is most valuable for the team, though, if they can expend this energy on tools that will become integral to the entire team’s development. This means that engineers must see themselves not only as builders and optimizers but also as internal evangelists for whatever tips of the trade they discover or can port over from previous experience.

I would go as far as saying that everyone should be running the same operating system with the same code editor and the same repository of developer tools, pre-commit scripts, and password managers unless (1) they really know what their doing, (2) can operate within the standard system with ease, and (3) have tried and failed to convince the rest of the team to migrate to whatever development environment they prefer.

Enforcing this is not easy, especially when onboarding engineers that are highly opinionated and experienced. In the long run, though, engineers with strong beliefs on how to optimize a development environment will thrive in a culture where the impact of such code goes beyond their own personal productivity.

9. You should unit test everything that you write

Unit testing is different from integration testing, and writing tests is an important part of software development. But unit tests in particular (tests that operate on a single function or component) are a huge waste of time and are hard to maintain and debug. Popular integration testing frameworks like Jest and Mocha are a lot of work to configure and generally don’t catch pernicious bugs as well as you would think. These tests often have to be re-written when code is refactored and this refactoring frequently introduces new bugs into the tests themselves (should you write tests for your tests?).

Integration tests, on the other hand, spin up multiple services and test an entire flow. They have to be rewritten whenever a flow changes, but generally this happens much less frequently than simple code refactoring that requires maintenance of unit tests. They are harder to set up than unit tests, but can test an order of magnitude more quirks than unit tests can for the same number of lines of code.

The one downside of integration tests with respect to unit tests is speed — it is far quicker to run tests that don’t require spinning up the entire application. This is why having strong types within your codebase and linting / pre-commit rules that can catch a lot of what unit tests would otherwise catch is still really important.

10. Whoever wrote a piece of code should be default responsible for fixing or enhancing it

This rule has irked me more than any of the previous nine throughout my (so far short) career as a software engineer. Its defiance is also possibly the most controversial, since disregarding it seems to enable a sort of willful neglect of personal responsibility while setting a standard by which some engineers feel like they are constantly fixing their coworkers’ code.

But the downsides of this rule are also very expensive. It penalizes productive engineers, who, by virtue of having written the most code, also have the most recurring work. It may be true that these engineers oftentimes have an absolute advantage in being able to fix or enhance whatever work they may have done in the past (though not by a lot). Overlooked, though, is the fact that when the team’s responsibilities are considered on the whole, top engineers may have a comparative advantage in doing higher-impact work elsewhere.

The hardest part of building anything complex is making the right design decisions and getting a working proof of concept running. High leverage engineers are better if directed to focus on these types of problems instead of extending or even debugging minor issues related to previous work.

\mathbb{R}

Discussion about this post