Feature Flags: A cure for production-phobia

Feature flags are an extremely important technique for deploying high quality changes quickly to production. My group at Microsoft has embraced them heavily over the past few years with great success. They allow us to keep almost all of our changes integrated into the same branch. They enable us to control feature roll out and provide a safety valve in case things go wrong. However, relying on feature flags as long term fail-safes can lead to problems.

Code Integration

Before embracing feature flags our teams created large long-lived branches. All development took place in these branches and when everything was completed they were finally merged to the master/main branch. This approach worked fine for one team, but as multiple teams also used this branch method, complicated merge conflicts were common.

I remember working on the now defunct chat room feature. Even though our framework supported the use of feature flags, culturally they weren’t embraced and employed as deeply. So instead of relying on them we started the work in a feature branch where we made some large changes. After weeks of designing, coding & testing, we tried to merge. However, that same week another team was merging a branch that was also weeks in development. Unfortunately (for me), they merged first and I spent a week fixing conflicts, re-stabilizing and re-validating. What a waste!

When feature flags are embraced, code changes become incremental and consistently merged into the master branch. The code does not need to be 100% finished to merge in. Get some initial scenarios working, feature flag the change and merge (after code review, of course). This minimizes the merge conflict waste and helps with validating the real state of the product early in the sprint.

Production-phobia

“I am 100% sure this will work in production” — No one ever

“Anything that can go wrong, will go wrong” — Murphy’s Law

Regardless of the amount of local testing, once customers get their hands on a change Murphy’s Law takes effect. It is hard to effectively account for all situations that will happen once you are running at scale on top of production data.

This is why a staged roll-out of new functionality is critical. By utilizing features flags, functionality can be turned on for subsets of users. This is truly a powerful mechanism for both quality control and feedback. Is my change breaking something for the user? How is this feature received? Getting this type of feedback early on makes iterating easy. This leads to greater confidence taking changes to production early.

As Buck Hodges mentions about VSTS’s implementation of feature flags, we divided our users into stages: earlier stages (0 and 1) get the changes quickly but with a greater chance of issues or raw experiences. These are self-selected early adopters and they are critical in ensuring the rest of the service is the highest quality. Almost every sprint, we catch bugs in stage 0 and 1 ranging from simple UI glitches to occasional critical issues.

Knowing you can control exposure and get feedback quickly makes deploying to production exciting and mitigates fear. This helps shrink deployment cycles from monthly to weekly (or even daily).

What’s the catch?

A single feature flag’s lifespan needs to be short lived. Once a feature flag is turned on for all customers it is time to remove it, despite the temptation to leave this flag around as a long-term fail safe.

“This feature works now but what if we hit issues tomorrow. I am afraid!” — Me

Serous issues often don’t reveal themselves immediately, so having the feature flag around for a short amount of time provides protection. But over weeks and months the feature flag will begin to rot.

I hit this first hand with a performance fix I made a year ago. Concerned with the potential, unintended impact of the change, I feature flagged it and left it flagged for months (despite the fact that things seemed to be working fine). About six months later, I received a customer report of a rare race condition where my change was causing an issue. While investigating the issue I decided the quickest mitigation for this customer was to turn the flag off. When the flag turned off the customer hit worse issues. The old code was no longer valid and my customer was now even more confused and frustrated.

Untested code

My rotten feature flag taught me first hand why it is important to routinely clean up feature flags. When a feature flag is default on everywhere it means that no one is testing and validating the branch when the flag is off. As time passes that untested code gets older, grumpier and less in touch with the world around it. There is no guarantee that branch will still work.

No more fear

Deploying changes quickly and with high quality is a paramount goal of any service. Feature flags are a big step towards achieving those goals. Just remember to clean them up and don’t assume they last forever.