Best DevOps Practices: Shutterstock
Hi all, we’ve got another awesome DevOps best practices interview, this time it’s with the folks at Shutterstock. Lead architect Nathan Milford & Cloud Solutions architect Sebastian Weigand will give you their insight into the best practices that they employ.
"DevOps" is a term loosely thrown around these days. What does it mean to you?
DevOps is a cultural movement, not a job title. Just like hip-hop is a cultural movement. You can’t be a DevOp, just like you can’t be a hip-hop (paraphrased from Adam Jacobs). It’s an attempt to break down the traditional silos between different disciplines within operations and engineering teams, to foster better communication and collaboration. It’s not tools focused, but uses tooling to facilitate the cultural goals espoused above. Using Chef, Jenkins, or writing code and racking servers does not make a DevOps culture. What's the most important thing an engineering organization can do to cultivate “systems thinking” in its engineers?
It’s important that engineers work on projects up and down the stack. We are actually liquidating our WebOps, Platform, and Infrastructure Tools teams and making a new team called Infrastructure. We’re keeping the original backlogs, but engineers from the larger team will be all over the backlogs, working on racking and stacking, coding, networking, data work, etc. Every engineer has strengths and weaknesses across the stack. Engineers with a weakness will be paired up with others who have it as a strength on projects. Eventually, the initial slowdown of people coming up to speed will flatten out, and we’ll have engineers who may not have complete mastery of everything, but have touched a broader swath of our systems, and are better able to see the broader system to approach problem solving more holistically. Cross-disciplinary training promotes knowledge sharing in a way that makes an engineer more cognisant of the impact his or her work has on the rest of the infrastructure. For example: Neutron DOSing Keystone, and in turn placing heavy load upon LDAP. One might not think a simple set of scripts would have that much network and auth impact.
Do you primarily run your apps/services in or out of the cloud? Why?
We use the right tool for the job. We use some things in a cloud environment frontend by APIs (and are building more of this) where it makes sense (for ephemeral processes), and we use hardware for longer running, IO intensive applications like data stores.
Where have you introduced automation into your systems engineering process? What were the results?
We have automation up and down our stack and are adding more where appropriate. The results are primarily positive, but we have some older automation that actually plays out like this XKCD comic (http://www.xkcd.com/1319/). Over the next year, our automation efforts are being poured into making as much of our infrastructure self-service for engineers as possible, and we have high hopes that this focus will do more to enable other engineering teams and make product development speed up and developer freedom increase. Automation is awesome, so long as it’s maintained and organized beforehand.
How does your team/organization approach automated testing? In my opinion, this is hit-and-miss, depending on what we’re testing. Our approach and tooling are basically sound, but our coverage could be better. In terms of Infrastructure-focused code, Puppet refactoring with added tests will be a big piece of work this year, and we’re hoping to add tools like serverspec to automatically QA/test server builds. Moving to a bakery/container will make this easier.
What do you think automation is especially *not* suited for?
Automation is appropriate and applicable in much of our environment. I think the larger issues are the dangers of automation:
- Failing to accept that there is a maintenance cost of automation. See the XKCD comic above, but also the danger of when automation is built, the creators are no longer around, and the reasons for a certain piece of automation or design decision are no longer known; you may end up maintaining something that is no longer needed or creating a golden calf.
- Failing to build in operability to automation. You must think of the guy on call at 3am who doesn’t know how to fix your automation if it breaks.
- Failing to move at a rate the business/culture can handle. Sometimes new ideas and workflows require cultural paradigm shifts that are larger than an organization can handle in one cycle.
- Failing to articulate the business need. Are we building it because it will enable people to get more done or save the company money, or are we building it because it’s cool?
- Failing to address how much to automate. People do not always face the deontological issues while writing automation. Do things happen automatically without user intervention, or is there a 'go' button pressed by a person giving humans the ability to initiate a course of action.
Another good resource is here: http://www.kitchensoap.com/2012/09/21/a-mature-role-for-automation-part-i/
What are your perspectives on continuous integration tools? Are you using any?
We are using Jenkins. CI is very important. Faster feedback on failures and tests make us quicker to learn what is broken and faster to get things into production, whether fixes or features, to give our contributors and customers what they need.
What profile of individual makes the best systems engineer (assuming automation is key)?
Excessive focusing on automation and tooling is exactly the opposite of what we want. We want mature engineers; it’s their mindset, not their skillset.
I subscribe very much to the thinking here: http://www.kitchensoap.com/2012/10/25/on-being-a-senior-engineer/
- Being able to write a Bloom Filter in Erlang, or write multi-threaded C in your sleep is insufficient. None of that matters if no one wants to work with you... the degree to which other people want to work with you is a direct indication on how successful you’ll be in your career as an engineer. Be the engineer that everyone wants to work with."
- "Mature engineers lift the skills and expertise of those around them."
- "In any project, the designers, product managers, operations engineers, developers, and business development folks all have goals and perspectives, and mature engineers realize that those goals and views may be different."
- "[Will ask that] [e]ven if it’s technically sound, is it understandable enough for the rest of the organization to operate, troubleshoot, and extend it?”
I think the real skill we want is not automation, but empathy (see: http://chadfowler.com/blog/2014/01/19/empathy/).
What tools/training do you suggest a more traditional systems administrator acquire in order keep his/her skills relevant (assuming systems automation will increasingly become the norm)?
A starting skillset would be your normal sysadmin skills, plus some basic scripting and ability to articulate how the web works (describe, step by step, what happens when you point your browser to shutterstock.com). Beyond that, the problems we face and solutions we need to employ change rapidly. An engineer's ability to rapidly assess needs, acquire knowledge, and make that knowledge applicable is the measure of how successful they will be. Moreover, the ability to pivot off of that approach or technology and try another paradigm is paramount. Sort of the Bruce Lee, “no way as way” approach. Be like water. I think those who wish to transition from more traditional sysadmin roles focus on becoming a mature engineer, as stated above, and on their google-fu. Attitude and approach are the biggest things; skills and technologies can be learned.
Which engineering teams/individuals do you watch for cutting-edge tips, tools, and architectures when it comes to systems automation?
Who to follow on Twitter:
- Andrew Clay Shafer: @littleidea
- Jez Humble: @jezhumble
- Mark Imbriaco: @markimbriaco
- Theo Schlossnagle: @postwait