SRE

Understanding DORA metrics

How does your DevOps team stack up against competitors? Learn about DORA metrics, what they can reveal about your team’s performance, and how they can help you make informed business decisions.

By
Cortex
-
January 26, 2022

There is almost nothing more valuable to an organization than data, especially data about itself. Without measuring performance over time, you can’t make informed decisions about how to improve or where to devote resources. A clear picture of how your development teams are operating will enable you to maximize your value stream and deliver the best possible product to end users.

After years of research, Google’s DevOps Research and Assessment (DORA) team identified four key metrics for evaluating your team’s performance:

  • Lead Time for Changes
  • Deployment Frequency 
  • Mean Time to Recovery 
  • Change Failure Rate 

DORA metrics have become the standard for gauging the efficacy of your software development teams, and can provide crucial insights into areas for growth. These metrics are essential for organizations looking to modernize, as well as those looking to gain an edge against competitors. Below, we’ll dive into each one, and discuss what these metrics can reveal about your development teams.

Lead Time for Changes

Lead Time for Changes (LTC) is the amount of time between a commit and production. LTC indicates how agile your team is—it not only tells you how long it takes to implement changes, but how responsive your team is to the ever-evolving needs of users. The DORA team identified these benchmarks for performance in their Accelerate State of DevOps 2021 report:

  • Elite Performers: <1 hour
  • High Performers: 1 day to 1 week
  • Medium Performers: 1 month to 6 months
  • Low Performers: 6+ months

LTC can reveal symptoms of poor DevOps practices: if it’s taking your team weeks or months to release code into production, there are inefficiencies in your process. You can minimize your LTC through continuous integration and continuous delivery (CI/CD)—encourage testers and developers are working closely together, so everyone has a comprehensive understanding of the software. You should also consider building automated tests to save even more time and improve your CI/CD pipeline.

Because there are a number of phases between the initiation and deployment of a change, it’s smart to define each step of your process and track how long each takes. Examine your cycle time for a thorough picture of how your team is functioning and further insight into exactly where they can save time.

Be careful not to let the quality of your software delivery suffer in a quest for quicker changes. While a low LTC may indicate that your team is efficient, if they can’t support the changes they’re implementing, or if they’re moving at an unsustainable pace, you risk sacrificing the user experience. Rather than compare your team’s Lead Time for Changes to other teams’ or organizations’ LTC, you should evaluate this metric over time, and consider it as an indication of growth (or stagnancy). 

Deployment Frequency

Deployment Frequency (DF) is how often you ship changes, how consistent your software delivery is. This metric is particularly useful when determining whether your team is meeting goals for continuous delivery. According to the DORA team, these are the benchmarks for Deployment Frequency:

  • Elite Performers: Multiple times a day
  • High Performers: Once a week to once a month
  • Medium Performers: Once a month to once every 6 months
  • Low Performers: Less than once every 6 months

If your Deployment Frequency is high, it might reveal bottlenecks in your development process, or could indicate that projects are too complex. The best way to enhance your DF is to ship a bunch of small changes, which has a few upsides. Shipping often means you are constantly perfecting your service, and if there is a problem with your code, it’s easier to find and remedy the issue.

If your team is large, this may not be a feasible option. Instead, you may consider building release trains and shipping at regular intervals. This approach will allow you to deploy more often without overwhelming your team members.

Mean Time to Recovery

Mean Time to Recovery (MTTR) is the average amount of time it takes your team to restore service when there’s a service disruption, like an outage. This metric offers a look into the stability of your software, as well as the agility of your team in the face of a challenge. These are the benchmarks identified in the State of DevOps report:

  • Elite Performers: <1 hour
  • High Performers: <1 day
  • Medium Performers: 1 day to 1 week
  • Low Performers: Over 6 months

To minimize the impact of degraded service on your value stream, there should be as little downtime as possible. If it’s taking your team more than a day to restore services, you should consider utilizing feature flags so you can quickly disable a change without causing too much disruption. If you ship in small batches, it should also be easier to discover and resolve problems.

Although Mean Time to Discover (MTTD) is different from Mean Time to Recovery, the amount of time it takes your team to detect an issue will impact your MTTR—the faster your team can spot an issue, the more quickly service can be restored.

Just like with Lead Time for Changes, you don’t want to implement hasty changes at the expense of a quality solution. Rather than deploy a quick fix, make sure that the change you’re shipping is durable and comprehensive. You should track MTTR over time to see how your team is improving, and aim for steady, stable growth. 

Change Failure Rate

Change Failure Rate (CFR) is the percentage of releases that result in downtime, degraded service, or rollbacks, which can tell you how effective your team is at implementing changes. As you can see, there is not much distinction between performance benchmarks for CFR:

  • Elite Performers: 0-15%
  • High, Medium, and Low Performers: 16-30%

Change Failure Rate is a particularly valuable metric because it can prevent your team from being mislead by the total number of failures you encounter. Teams who aren’t implementing many changes will see fewer failures, but that doesn’t necessarily mean they’re more successful with the changes they do deploy. Those following CI/CD practices may see a higher number of failures, but if CFR is low, then these teams will have an edge because of the speed of their deployments and their overall rate of success.

This rate can also have significant implications for your value stream: it can indicate how much time is spent remedying problems instead of developing new projects. Because high, medium, and low performers all fall within the same range, it’s best to set goals based on your team and your business, rather than compare to other organizations.

Putting it all together

As with any data, DORA metrics need context, and you should consider the story that all four of these metrics tell together. Lead Time for Changes and Deployment Frequency provide insight into the velocity of your team and how quickly they respond to the ever-changing needs of users. Mean Time to Recovery and Change Failure Rate, on the other hand, indicate the stability of your service and how responsive your team is to service outages or failures.

If you compare all four key metrics, you can evaluate how well your organization is balancing speed and stability. If your LTC is within a week, and you’re deploying at least once a week, but your Change Failure Rate is high, then teams may be rushing out changes before they’re ready, or they may not be able to support the changes they’re deploying. If you’re deploying once a month, on the other hand, and your MTTR and CFR are high, then your team may be spending more time correcting code than improving your product.

Because DORA metrics provide a high-level view of your team’s performance, they can be particularly useful for organizations trying to modernize—DORA metrics can help you identify exactly where and how to improve. Over time, you can see how your teams have grown, and which areas have been more stubborn.

Those who fall into the elite categories can leverage DORA metrics to continue improving services and to gain and edge over competitors. As the State of DevOps report reveals, the group of elite performers is rapidly growing (from 7% in 2018 to 26% in 2021), so DORA metrics can provide valuable insights for this group.

At the end of the day, data will only get you so far. To get the most out of DORA metrics, you have to know your organization and your teams, and use all of that knowledge to guide your goals and how resources are invested.

Start tracking DORA with Cortex today

Out of the box, Scorecards feature an array of integrations and customizations that will allow you to see DORA metrics and observe them over time. You can track Lead Time for Changes by monitoring JIRA tickets, and by integrating Pagerduty or OpsGenie, you can evaluate your Mean Time to Recovery. These are just a handful of options—with Scorecards, you can not only track DORA metrics, but set goals to improve DevOps practices and hold your team accountable.

SRE
By
Cortex
What's driving urgency for IDPs?