The anatomy of downtime
We’ve all been there - it’s starting to get late and it’s one of those days where things just seem to be taking forever. Continuous delivery has pushed a new code version out, but error signals are slowly starting to accumulate. At first you dismiss them, but then you look closely and release a shout heard by every engineer on your team “F%$#ing s@#t - who renamed a database column without telling anyone?”. Steve, one of the junior devs, who up until a second ago was so excited about his code going to production for the first time, answers: “It was me, but I changed all the references so there shouldn’t be any problem”. You sigh and facepalm, yet thank Eric, that guy that finally convinced you to fix rollback last week. Just another story of averted downtime.
We’re engineers. Downtime is an admission of failure. “What, you couldn’t change data centers, migrate all the data, and deploy a re-architected version of your system with no downtime??” (for the record, we have done that in the past), and with web apps supporting multiple timezones, maintenance windows are no longer a valid option.
Why am I telling you this? Well, because it’s time we have patterns and tools that allow us to avoid it. We’re definitely moving in the right direction, for example…