Continuous Delivery with Feature Flags (toggles) is More Difficult Than it Seems

Back in the distant past, in a simpler time, version releases were very different from today. Each product version release was a huge ceremony. First came a very long period of planning that ended with a specification document. Then, the development of that spec, another long period of manual testing, and when all bugs were finally fixed, the deployment. This process took anywhere from two months to a year to finish.

These days, the software world is very different. We’re deploying versions every sprint, which last about one to three weeks. Some companies, like Facebook, deploy to production every few hours , without manual testing at all. How is this magic possible? How did we move from yearly releases to weekly? This was achieved, in part, with continuous delivery which acts as a quality gate to every code change. On each addition to the code, a build machine runs a bunch of automated tests that make sure the application works well. If the code doesn’t compile, or if one of the tests fails, the code addition isn’t approved. This way, we can deploy with confidence, without worrying (too much) that some bug broke the application.

Facebook actually does a bit more than that. They first deploy to a small percentage of users, then to a bigger percentage, and when they’re sure everything’s just fine, they deploy to everybody else.

What happens if you’re working on something that lasts much more than one sprint? Maybe 3 sprints or 10. Are you going to work on a separate branch, ending with a huge merge? Are you going to run automated tests on that branch? This matter is not that simple, as I recently experienced.

Developing Long Features

Developing long features adds a few challenges in a continuous delivery environment. Let’s say that you develop this feature in your own branch. This means that once in a while, you’ll need to back-merge from the master/trunk/develop branch to keep them close. But, if your feature involves system-wide design changes, this is going to prove difficult. As both you and other team members continue to add more code, the distance from your branch to master is going to keep growing and you’ll spend more and more time fixing conflicts and restoring things to working conditions.

Another way to go is to commit your feature’s code to the master branch before the feature is functional. That’s where Feature Flags enter the scene. A feature flag (or Feature Toggle) is just a configuration that controls whether your feature’s enabled in runtime. It’s going to be off on master and on in your own branch. With a feature flag, you can merge your code to master while still turning it off at runtime for the user. Sounds great, but this actually presents a whole new set of problems.

For one thing, creating a well-working feature flag isn’t always easy. If your feature is connected to a lot of pieces, that’s actually going to be pretty hard. But even if you’re able to do that, there’s the matter of tests.

Feature Flags and Tests

The reason we’re able to deploy with confidence so frequently is because of a good suite of tests. Those tests make sure that all of the existing features still work. While developing that new feature, you’ll probably want to run that suite of tests to make sure you didn’t break anything. Not to mention running a bunch of new tests added for the new feature. So here’s the big question: When running those tests in the build server, is the feature flag on or off?

Since the feature is disabled for the end-user, we have to run tests with the feature flag off to make sure the application works well in production. But if the tests run with the feature flag off, then other members of the team can easily break your new feature’s functionality without even realizing it. This might not seem like a big problem, but if you’re developing some kind of infrastructure change that’s going to affect the whole system, you’ll be fixing broken tests all day instead of moving forward with that feature.

One solution is to run the tests twice. Once with the feature flag on and another time with the feature flag off. That seems reasonable, except that what happens when you have several of these long features in development? Well, you’ll have to run all of your tests with every kind of permutation. This can be both long-running and hard to maintain. These features are still during development, they’re likely to break easily, and that’s going to halt progress for the entire team. After all, for each change in code, if even one permutation has a single faulty test then you have to fix it before moving forward.

So What to Do?

Unfortunately, there aren’t any easy solutions here. One way to go is to simply avoid long branches. Break your task into many small ones that can be integrated into the system without feature flags. But that’s probably a luxury you don’t have, or you wouldn’t need feature flags in the first place.

Another solution is to avoid running tests with the feature flag on. This means the developer owning that feature pays the price for the bugs that the rest of the team introduces. Not a terrible solution up to a certain point.

Maybe the solution to these kinds of situations is not technological at all. You might solve these matters with good old-fashioned coordination. Make sure to prevent big refactor tasks when there’s a feature flag, do design reviews, and just talk to your fellow team members.

Continuous Delivery with Feature Flags (toggles) is More Difficult Than it Seems

Facebook actually does a bit more than that. They first deploy to a small percentage of users, then to a bigger percentage, and when they’re sure everything’s just fine, they deploy to everybody else.

Developing Long Features

Feature Flags and Tests

So What to Do?

Recent Posts