SRE

Building a DORA metrics scorecard

Deployment frequencyMean time to recoveryChange failure rateLead time for changesUsing ladders to set goalsStart using Cortex today

There are a lot of ways to gauge the performance of your DevOps teams and the health of your software, but DORA metrics have emerged as the industry standard. If you aren’t familiar with DORA metrics, take a few minutes to read this comprehensive guide to understanding DORA metrics.

DORA metrics were designed to offer a high-level, long-term view of how your teams are performing. They weren’t just meant to be tracked, though, but should actively encourage developers to improve their processes. It’s only natural, then, that as a platform built to promote a culture of accountability, Cortex is uniquely equipped to help you make the most of DORA metrics.

Through the use of CQL expressions and integrations, you’ll be able to easily build a Scorecard that helps your teams reach the goals that make sense for them. In this article, we’ll be looking at a sample scorecard called DORA Metrics. Here’s a look at its details page:

Deployment frequency

Deployment frequency is exactly what it sounds like — this metric measures how frequently your teams are shipping changes. To track deployment frequency in Cortex, you’ll first need to integrate with your deployment pipeline. This way, every time a deploy happens, it’s tracked within Cortex. 

You can access all of a service’s deploys within its homepage in Cortex, so in addition to determining your deployment frequency, you get your whole event timeline in one place:

Once your deployment pipeline is hooked up to Cortex, you can begin querying against this data and creating scorecard rules. Keep in mind that DORA metrics are designed to be evaluated over time — it’s not about hitting a goal once, but consistently maintaining a certain level of service. When you’re crafting your rules, you want to set an attainable goal that will motivate team members to improve.

In our example, the deployment rule is Averaging at least one deploy a day in the last 7 days — this window of time is long enough to be meaningful, and recent enough that it accurately reflects current performance. Because this metric relies on custom data, we’ll use the following CQL expression to set this rule:

deploys(lookback=duration(“P7D”),types=[“DEPLOY”]),count>=7

By creating a scorecard rule, you’re not just tracking this invaluable data — you’re setting standards for your organization and giving your developers a clear goal to work toward. With all of these metrics, it’s crucial that you consider your unique teams and product, and set a goal that makes sense for your organization.

Mean time to recovery

Mean time to recovery (MTTR) is the average amount of time it takes your team to restore service when there’s an incident. To track MTTR in Cortex, you’ll want to integrate with OpsGenie or PagerDuty. PagerDuty comes with a lot of the necessary analytics out of the box, so there’s very little you need to do to set up this rule.

Using PagerDuty, you can set rules for mean time to acknowledge (MTTA) — how long it takes your team to acknowledge that an incident has arisen — and MTTR, without using CQL. Just select PagerDuty as your integration and Oncall mean seconds to resolve, and enter the appropriate information in the fields that follow:

In our sample, the MTTR rule is Incident was resolved within 1 hour, and is examining the average over the last week. We’d enter the following values into the above fields: P7D for duration, < for operation, and 1.0 for the number. 

Just like with deployment frequency, you want enough of a lookback window that the metric is reflective of current performance and is specific to your organization.

Change failure rate

Change failure rate represents the percent of changes or releases that result in downtime, degraded service, or rollbacks. With your deployment data in Cortex, it’s easy to find the ratio of rollbacks to deploys, which is your change failure rate. 

In this example, our change failure rule is Ratio of rollbacks to deploys in the last 7 days and our goal is 0%. Just like with deployment frequency, this metric is using custom data, so we need a CQL expression to set this rule:

(deploys(lookback=duration(“P7D”),types=[“ROLLBACK”]).count/deploys(lookback=duration(“P7D”),types=[“DEPLOY”]).count) = 0

Because you also have your whole event timeline in Cortex, you can filter on specific types of rollbacks and deploys to get a more granular look. For example, you can tag events to distinguish between automated and failure-based rollbacks and filter your query accordingly. 

With this metric in particular, consider whether you want to set a specific goal or promote a particular outcome. Promoting a goal of 0% failure may lead developers to work more slowly or release fewer changes, so you may incentivize the wrong behavior in the process. Change failures are inevitable, so rather than set a goal of 0% like we did, you might evaluate how many of your rollbacks were automated and how many were manual. If the majority of your rollbacks are automated, that might indicate your teams are leveraging good tools that quickly detect outages, roll back changes, and improve your MTTR in the process.

Lead time for changes

Lead time for changes measures the amount of time between a commit and production. Like deployment frequency and change failure rate, you’ll use deploy data to track this metric in Cortex.

To get lead time for change, you can push this data into Cortex with your deploys by including metadata about when a pull request was first opened. You can then query on the deploys that took more than a week to deploy into production, for example. By tagging your deploys with that event data, you can get more granular insight into your services.

Using ladders to set goals

The DORA team conducted a cluster analysis to identify four industry-level performance categories: elite, high, medium, and low. In the Accelerate State of DevOps Report from 2021, the DORA team outlined how teams at each level perform under each metric. For example, when it comes to MTTR, elite teams restore service in less than an hour on average, while medium performers take anywhere from a day to a week to restore service.

If the industry standards that the DORA team outlined make sense for your DevOps team, then you can easily use Cortex to encourage your team to advance from low- or medium-performers to high- or elite-performers. If these standards don’t quite apply to your team, you can use Cortex in the same way, but set goals that are appropriate for your team.

To promote movement toward a goal, you can iterate the same rule with different thresholds and use ladders to represent that progress. This not only makes it super clear to developers what the priorities are, but it gamifies the process of improvement.

Let’s say, for example, that your team’s current change failure rate is 15%. That’s low enough that your DevOps team considered an elite performer by DORA standards, but you know that your teams can do better. To target this metric, begin by writing a rule that sets a goal of 10%:

Then, create another rule that sets the threshold to 5% instead of 10%:

You can repeat this process until you’ve added a rule that represents each level of improvement. Once that’s done, it’s time to create a ladder. In this example, the first level has our 10% goal and the second has our 5% goal:

Rules and ladders give you even more power to incentivize the right behavior and encourage progress over time. Because these features are completely customizable, you don’t have to stick with the industry standard. With Cortex, you can track DORA metrics in a way that’s actually meaningful to your organization and sets actionable goals, so you’re not idly monitoring metrics without taking the right steps to improve performance.

Start using Cortex today

This is just the beginning of what Cortex can do to unlock DORA metrics for your organization.

With the power of reporting, you can gain even clearer insight into performance across your organization. By grouping the Bird’s Eye report by team, you can quickly see which teams are meeting standards and which ones aren’t. With your rules organized as columns, you can easily determine where teams are above or below the bar.

This kind of data aggregation makes it easy for engineering leaders to drive progress across all four DORA metrics — and every other that’s key for your organization. Out of the box, Cortex has the integrations and tools you need to track DORA metrics at your organization and make the most of this invaluable insight.

Start using Cortex to gain the visibility you need into your services so you can deliver the highest quality software your teams are capable of. Book a demo today so you can see the power of Cortex for yourself.

Important message example

Past queries  will be automatically deleted after 30 days.Past queries  will be automatically deleted after 30 days.Past queries  will be automatically deleted after 30 days.Past queries  will be automatically deleted after 30 days.

Read also

No items found.