Blog - dev2ops - Solving Large Scale Web Operations and DevOps Problems

Entries in Patterns (10)

Sunday

Mar182012

DevOps Lessons from Lean: Small Batches Improve Flow

Sunday, March 18, 2012 at 11:42PM

DevOps problems are fundamentally flow problems. Work doesn't flow properly from one end of the lifecycle (Dev) to the other end of the lifecycle (Ops).

While spirited discussions on tools are a regular occurrence in DevOps circles, there are other simple, yet profound, techniques that have nothing to do with technology but have proven to have a huge impact on improving flow.

Top of that list? Work in small batches.

It seems so simple that it couldn't possibly make that big of difference, but it does. And there is historical precedent for it as well. The principle of working is small batches has proved it's merit in Agile software development and on an even larger stage during the manufacturing revolutions of the 1970s and 1980s.

The reasons why working in small batches has such a strong net positive impact on flow might seem a bit counterintuitive at first. In the absence of relying on "because I told you so", below are the best explanations I could find as to why this works.

What is a "batch size"?
A batch is the unit of work that passes from one stage to the next stage in a process. The batch size the scale of that work product.

What are the benefits of reducing batch sizes?

Reduces cycle time and gets you quicker feedback - With a small batch size, each batch makes it through the full lifecycle quicker. Since work on a feature isn't complete until it is successfully running in production and getting feedback from users, large batch sizes simply delay that feedback. This means the larger the batch the longer you wait to find out if you did it right. It's easier to make business and technical decisions and easier to recover from a mistake if you are working on shorter time horizons.

Reduces risk of an error or outage - With a small batch size, you are reducing the amount of complexity that has to be dealt with at any one time by the people working on the batch. The reduction in complexity comes not only from the number and size of the moving parts that are touched while working on the batch, but also in the amount of person-to-person communication that needs to happen (due to smaller teams). This is just acknowledging the natural limitations of human beings. The more complexity people have to deal with, the more mistakes there will be. Smaller batch size also leads to quicker feedback, so if there is an error in the batch it will be caught sooner. A small batch size lends itself well to quicker problem detection and resolution (the field of focus in addressing the problem can be contained to the footprint of that small batch and the work that is still fresh in everyone's mind).

Reduces product risk - This builds on the idea of faster feedback. The sooner you can put an individual feature in front of your target audience, the sooner you will know if you've achieved the right product and market fit. The larger the batch size, the greater the product risk when you finally release that batch. Statistics shows us that it's beneficial to decompose a large risk into a series of small risks. For example, bet all of your money on a single coin flip and you have a 50% chance of losing all of your money. Break that bet into 4 smaller bets and it would take 4 sequential bets to result in financial ruin (1 in 16 or 6.25% chance of losing all of your money).

Large batch sizes also often lead to compounding schedule delays and cost overruns. The larger the batch, the more likely it is that a mistake was made in estimating or during the work itself. The chance and potential impact of these mistakes compounds as the batch size grows… increasing the delay in being able to get that all important feedback from the users and increasing your product risk.

Improves efficiency and lowers overhead - Conventional wisdom holds that large batches allow greater productivity (i.e. you get more done with large uninterrupted periods of work) and lower overhead (less batches = less transactional costs). As has been proven in the manufacturing world (Lean) and now software development (Agile), this simply isn't the case. The larger the scope of the batch, the more complexity the individual has to deal with. The complexity of a debug task grows as 2ⁿ when n things are changed in one batch. In knowledge work, the larger the uninterrupted period of work leads to greater change complexity, greater the volume of debug work, and more handoff complexity. That is all added overhead. But even assuming the individual was still being more efficient by working in a large batch, you would still be creating greater inefficiency for the end-to-end process.

For a large batch of changes, especially those made to an even larger system, the handoff to the next step in the process is going to be highly inefficient for the receiving party to deal with (think: Development to Operations "toss if over the wall" handoff of a major release). And if something goes wrong, the time between when the error was introduced and when it will be discovered is so long that it is no longer fresh in the mind of the person who introduced the error. Small batches also have been proven to actually reduce transaction costs because of a curious fact of human nature… people get better at and find ways to increasingly improve the things they are forced to do more often.

Improves management visibility and control - Reducing batch sizes gives you a greater number of instrumentation points by which you can visualize and measure the flow of work through your organization. It's notoriously difficult to accurately determine progress of in-flight work. You are largely going to be limited to the subjective analysis of project managers and the biased opinion of the person doing the work. The only points where you can have certainty is either when the work has just started or when the work has just completed (and accepted by the next step in the process). With large batch sizes you have to wait long periods of time between those start and completion points, making it difficult to see how things are flowing, providing little guarantee that you will have adequate warning if things are going wrong, and allowing for few opportunities to make adjustments to optimize or triage. With small batch sizes you can see work move through the lifecycle with certainty, spot problems early, and make ongoing adjustments to optimize the flow of delivery.

Encourages decoupled architectures with less dependency issues - Smaller batch sizes can also have a positive impact on architecture. Most IT systems are built from within the context of large projects. Large projects create them and then large projects are undertaken to change them. The result is a built-in tolerance for monolithic architectures with complex dependencies. As you move to small batch sizes you are naturally limiting the work in progress on a particular segment of your code/infrastructure. While initially this might seem like it will slow the organization down, the principles of flow show that this will actually give you greater throughput over time. But in order to speed things up even further, you will end up looking for ways to increasingly decouple and isolate (including making fault tolerant) your architecture to allow for greater parallelization of work.

What are the economic benefits of reducing batch size?
In manufacturing and in software development, reducing batch sizes has been showen to have a significant impact on the economics of the production process. The diagram below (scanned from Donald G. Reinertsen's "The Principles of Product Development Flow", pg 121) lays out the direct links between smaller batch sizes and improved economics. I think the logic speaks for itself.

What are your control points for reducing batch sizes?
Reducing batch sizes is a policy decision that needs to be implemented at multiple levels:

Project Initiation and Funding - How projects are formed and funded tends to have a strong correlation to batch size. The definition of requirements and success criteria, in addition to the allocation of budget, is usually done in a large batch that corresponds to a specific or set of business goals that were created at the quarterly or yearly scale. The inertia of this large batch is often carried throughout the rest of the lifecycle, becoming a pacemaker of sorts that encourages large batch sizes. Positive work done to break down these large initial batches into smaller batches can turn that inertia back into a net positive effect for the company. Reduction in the time horizon for the expected results of a project is usually a good way to force the issue (e.g. try scoping and budgeting projects to single month size rather than quarter/multi-quarter size).

Project management - When creating projects consider what is the smallest amount of change that can be undertaken in the shortest amount of time and still achieve a measurable result. This will naturally lead to smaller teams working on smaller batches of work that can flow independently through the lifecycle with faster feedback and lower risk to the overall system.

Testing - Demand that individual pieces of work are tested as soon as those pieces of work are completed (and not wait for the entire project/release to be code complete). Continuous integration and it's built in unit/smoke tests is a crude example of this principle. Carry that further. Ensure that full deployment and testing efforts are ongoing during any project. This will automatically force engineers to think about their work in small units that can be completed and handed off for testing at regular intervals (naturally creating the urge to reduce batch sizes).

Release management - Break down large releases into small units of deployment that employ standardized packaging and configuration management mechanisms. These units of deployment should be aligned towards the things that are changed (i.e. application services) rather than large project releases that change many things. In addition to reducing deployment and configuration woes, this also has the effect of standardizing batch sizing across lifecycle by determining the appropriate unit of change for your infrastructure.

I'm standing the on shoulders of people a lot smarter than me in this post. If you are interested in these ideas please check out:
http://www.amazon.com/Principles-Product-Development-Flow-Generation/dp/1935401009/ref=cm_cr_pr_product_top
http://www.startuplessonslearned.com/2009/02/work-in-small-batches.html http://www.dbrmfg.co.nz/Production%20Batch%20Issues.htm
http://www.informit.com/articles/article.aspx?p=1833567&seqNum=3

1 Comment | |

Permalink |

Email Article

Agile Operations,

Business Impact,

DevOps,

Lean,

Patterns

Tuesday

May182010

Panel discussion from OpsCamp San Francisco (video)

Damon Edwards |

Tuesday, May 18, 2010 at 6:13PM

My fellow dev2ops.org contributor, Lee Thompson, and I were asked to be a part of an impromptu panel discussion that kicked off OpsCamp San Francisco.

The results were an interesting and organic conversation that ranged from "DevOps", to "repository and dependency management", to "security is not compliance", to "managing multiple data centers", and to a variety of other topics in the span of 24 minutes.

Luckily, we got it all on video. (Special thanks to Erica Brescia from Bitnami for doing the camera work!)

OpsCamp San Francisco 2010 Panel Discussion from dev2ops.org on Vimeo.

DevOps Toolchain project announced at O'Reilly's Velocity online conference

Lee Thompson |

Thursday, March 18, 2010 at 9:09PM

If you are the type who gets distracted at work while trying to stay plugged into the industry, yesterday was a big big problem. In Austin, you had SXSW going on; in San Francisco, you had OSBC; in San Jose you had Cloud Connect; and on the internet you had the O'Reilly Velocity Online Conference. Wow!

The dev2ops guys were busy. Damon and Alex were presenting at Cloud Connect while I was presenting at Velocity OLC. I'm an Austin resident, but SXSW really isn't the DevOps hang-out, at least yet! (heh).

At Velocity, it was my privilege to announce the next generation of the provisioning toolchain project. Some of the feedback we received from the original toolchain paper was from the front lines of DevOps: "yeah that's pretty interesting, but there is alot more to a datacenter than just provisioning". Good point.

So we scope creeped the hell out of the automated provisioning paper and started the devops-toolchain project dedicated to defining best practices in DevOps and open source tools available to accomplish those practices.

So this time, the devops-toolchain project is an opensource community driven project, which due to its nature will need to be reved frequently due to the constantly shifting nature of "best practices". We've kicked started some of the content at http://code.google.com/p/devops-toolchain/ and formed a Google Group for the discussion at http://groups.google.com/group/devops-toolchain. Come join the conversation!

Here are the slides from my presentation:

The Velocity team did a great job hosting the conference! An example of the great content presented is from Ward Spangenberg from Zynga. He updated us on the latest on security in Cloud deployments. Getting security worked out gets more compute into the cloud:

I'm an OSBC alumni. If you're into vintage conference or need to find a way to get over insomnia, check this out from 2007...

Deployment management design patterns for DevOps

Alex Honor |

Thursday, February 18, 2010 at 2:22PM

If you are an application developer you are probably accustomed to drawing from established design patterns. A system of design pattern can play the role of a playbook offering solutions based on combining complimentary approaches. Awareness of design anti-patterns can also be helpful in avoiding future problems arising from typical pitfalls. Ideally, design patterns can be composed together to form new solutions. Patterns can also provide an effective vocabulary for architects, developers and administrators to discuss problems and weigh possible solutions.

It's a topic I have discussed before, but what happens once the application code is completed and must run in integrated operational environments? For companies that run their business over the web, the act of deploying, configuring, and operating applications is arguably as important as writing the software itself. If an organization cannot efficiently and reliably deploy and operate the software, it won't matter how good the application software is.

But where are the design patterns embodying best practices for managing software operations? Where is the catalog of design patterns for managing software deployments? What is needed is a set of design patterns for managing the operation of a software system in the large. Design patterns like these would be useful to those that automate any of these tasks and will facilitate those tools developers who have adopted the "infrastructure as code" philosophy.

So what are typical design problems in the world of software operation?

The challenges faced by software operations groups include:

Application deployments are complex: they are based on technologies from different vendors, are spread out over numerous machines in multiple environments, use different architectures, arranged in different topologies.
Management interfaces are inconsistent: every application component and supporting piece of infrastructure has a different way of being managed. This includes both how components are controlled and how they are configured.
Administrative management is hard to scale: As the layers of software components increase, so does the difficulty to coordinate actions across them. This is especially difficult when the same application can be setup to run in a minimal footprint while another can be designed to support massive load.
Infrastructure size differences: Software deployments must run in different sized environments. Infrastructure used for early integration testing is smaller than those supporting production. Infrastructure based on virtualization platforms also introduces the possibility of environments that can be re-scaled based on capacity needs.

Facing these challenges first hand, I have evolved a set of deployment management design patterns using a "divide and conquer" strategy. This strategy helps identify minimal domain-specific solutions (i.e., the patterns) and how to combine them in different contexts (i.e., using the patterns systematically). The set of design patterns also include anti-patterns. I call the system of design patterns "PAGODA". The name is really not important but as an acronym it can mean:

PAtterns GOod-for Deployment Administration
PAckaGe-Oriented Deployment Administration
Patterns for Application and General Operation for Deployment Administrators
Patterns for Applications, Operations, and Deployment Administration

Pagoda as an acronym might be a bit of a stretch but the image of a pagoda just strikes me a as a picture of how the set of patterns can be combined to form a layered structure.

Here is a diagram of the set of design patterns arranged by how they interrelate.

The diagram style is inspired by a great reference book, Release It. You can see the anti patterns are colored red while the design patterns that mitigate them are in green.

Here is a brief description of each design pattern:

Pattern	Description	Mitigates	Alternative names
Command Dispatcher	A mechanism used to lookup and execute logically organized named procedures within a data context permitting environment abstraction within the implementations.	Too Many Tools	Command Framework
Lifecycle	A formalized series of operational stages through which resources comprising application software systems must pass.	Control Hairball	Alternative names
Orchestrator	Encapsulates a multi-step activity that spans a set of administrative steps and or other process workflows.	Control Hairball, Too Many Cooks	Process Workflow, Control Mediator,
Composable Service	A set of independent deployments that can assembled together to support new patterns of integrated software systems.	Monolithic Environment	Composable Deployments
Adaptive Deployment	Practice of using an environment-independent abstraction along with a set of template-based automation, that customizes software and configuration at deployment time.	Control Hairball, Configuration Bird Nest, Unmet Integration	Environment Adaption
Code-Data Split	Practice of separating the executable files (the product) away from the environment-specific deployment files, such as configuration and data files that facilitates product upgrade and co-resident deployments.	Service Monolith	Software-Instance Split
Packaged Artifact	A structured archive of files used for distributing any software release during the deployment process.	Adhoc Release	Alternative names

The anti-patterns might be more interesting since they represent practices that have definite disadvantages:

Anti-Pattern	Description	Mitigates	Alternative names
Too Many Tools	Each technology and process activity needs its own tool, resulting in a multitude of syntaxes and semantics that must each be understood by the operator, and makes automation across them difficult to achieve.	Command Dispatcher	Tool Mishmash, Heterogeneous interfaces
Too Many Cooks	A common infrastructure must be maintained by various disciplines but each use their own tools to affect change increasing chances for conflicts and overall negative effects.	Control Mediator	Unmediated Action
Control Hairball	A process that spans activities that occur across various tools and locations in the network, is implemented in a single piece of code for convenience but turns out to be very inflexible, opaque and hard to maintain and modify.	Control Mediator, Adaptive Deployment, Workflow
Configuration Bird Nest	A network of circuitous indirections used to manage configuration and seem to intertwine like a labyrinth of straw in a bird nest. People often construct a bird nest in order to provide a consistent location for an external dependency.	Environment Adaptation
Service Monolith	Complex integrated software systems end up being maintained as a single opaque mass with no-one understanding entirely how it was put together, or what elements it is comprised, and how they interact.	Code-Data Split, Composable Service	House Of Cards, Monolithic Environment
Adhoc Release	The lack of standard practice and distribution mechanisms for releasing application changes.	Packaged Artifact

Of course, this isn't the absolute set of deployment management patterns. No doubt you might have discovered and developed your own. It is useful to identify and catalog them so they can be shared with others that will face scenarios already examined and resolved. Perhaps this set offered here will spurn a greater effort.

8 Comments | |

Permalink |

Email Article

DevOps,

Patterns

Friday

Sep252009

"Stability anti-patterns" highlight importance of tracking non-functional requirements

Damon Edwards |

Friday, September 25, 2009 at 12:19PM

Michael Nygard, author of the influential Release It!, delivered a fantastic speech at QCon London where he dives into the various flavors of "stability anti-patterns" that can (and probably will) plague your web operations.

The explicit lessons alone are worth watching his presentation. However, there is also a more subtle lesson delivered over the course of the speech. Whether it was intentional or not, Michael illustrates the importance (and difficulty) of tracking non-functional requirements across the application lifecycle.

Some of the knowledge of where your next failure could come from lives in Development. Some of it lives in Operations. Sometimes it even lives in the well-intentioned plans of your marketing department.

What are you doing about sharing and tracking that knowledge? If you are like most organizations, you are probably relying on tribal knowledge to avoid these pitfalls. Development handles what they know about, Operations handles what they know about, and everyone crosses their fingers and hopes it all works out. Unlike the well understood processes and tooling used to track business requirements, non-functional requirements all too often fly under the radar or are afterthoughts to be handled during production deployment.