Best Practice

How do you measure software health?

Improving the health of your services, APIs, data pipelines, and other software leads to fewer outages and frees developer time for feature work. In the third installment of our Scorecards deep-dive, we're sharing how Cortex makes tracking your software health easy with Scorecards and Initiatives.

By
Jeff Schnitter
-
November 1, 2023

Just like personal health, software health is best managed proactively so you can prevent issues before they occur and avoid costly, stressful outages. Cortex helps you track and improve the health of your software with Scorecards and Initiatives. Scorecards quantify software health by aggregating data from multiple sources to give you a continuous view into the health of your system. Initiatives use Scorecards to drive organizational improvement.

In this article, you’ll learn how to define software health, which is one of the first three Scorecards we recommend organizations create. You’ll also discover how Cortex can help you measure and improve software health before an incident happens.

What does software health mean?

Software health is a measure of how robust and reliable your applications, services, and other software are. For example, a healthy service can handle fluctuations in traffic and is unlikely to crash at 3 AM, whereas an unhealthy service might experience frequent problems as a result of things like poor testing, lack of security monitoring, and incomplete code coverage. Measuring and improving software health frees up more time for your developers to work on producing new value rather than troubleshooting issues in production.

The health of your software is influenced by your development practices. Static analysis, proactive monitoring of logs, and good testing practices all contribute to maintaining the health of your software.

How do you measure software health?

When measuring software health, look for two things: outcomes from running your software in production (e.g., number of open bugs), and the quality of your development practices (e.g., code coverage). Measuring outcomes shows how your software is doing right now. Tracking development practices leads to healthier software in the future and can create a culture of accountability on your team. 

Keeping both these categories in mind, below are some example metrics that you can import from your existing tooling to build your first software health Scorecard:

  • Number of open Git/Jira issues: A high number of open bugs could indicate that your team is prioritizing new features too heavily over reliability.
  • Latency metrics: Latency spikes can indicate that your application is struggling with load, or that it has an unmitigated bug on a rare code path. Latency metrics can often predict future outages.
  • Static analysis frequency: Static analysis tools detect bugs that tests can miss and pinpoint bad practices in your code. That’s why it’s best practice to track how recently static analysis has been run for a given software asset. 
  • Uptime/SLOs: Unhealthy software will experience more outages and crashes than healthy software. Measuring the uptime of your application and other service level objectives (SLOs) gives a direct view on software health.
  • Frequency of on-call issues: Frequent on-call issues are characteristic of unhealthy software. Cortex integrates with tools like PagerDuty to help you measure on-call metrics.
  • Code coverage: Code coverage measures the percentage of your codebase that is covered by tests. Tests can only catch bugs in the code paths that they cover, making high coverage important to software health.

Sample software health Scorecard

Collecting all these metrics can become a full-time job, a cost that outweighs the benefits for many teams. That’s where Cortex comes in. Cortex can pull data from 50+ integrations and even custom sources with the plugins feature. Once your data is in Cortex, Scorecards quantify your software health automatically.

Scorecards are made of custom rules that you define. Below is an example software health Scorecard that integrates with PagerDuty, Snyk, SonarQube, and Jira. The Scorecard rates each software asset as Bronze, Silver, or Gold to give you a quick visual indication of how healthy your software is.

Software health means different things to different organizations. No two organizations have exactly the same metric sources, SLOs or business domain. Cortex Scorecards are fully configurable, meaning you can create the rules and tiers that work best for your team. 

Software health Initiatives

Initiatives are a Cortex feature that makes use of Scorecards to drive organizational change. How does this work? When you create an Initiative in Cortex, you select a set of Scorecard rules that software assets must pass by a specific deadline. For example, you could create an Initiative that requires all customer-facing software to implement static analysis in the next six months. Once you’ve created an Initiative, Cortex will send reminders about it to asset owners through Slack, Jira and on Cortex itself.

With Initiatives, you can take measuring software health a step further and start driving change across your organization. This will make users happier and free your developers to spend more time building value. To learn more about Cortex Scorecards and Initiatives, take the tour, or request a demo with our team.

Best Practice
By
Jeff Schnitter
What's driving urgency for IDPs?