Establish best practices and keep teams accountable

Scorecards allow your team to define standards like production readiness and development quality, and enforce them without building scripts and maintaining spreadsheets.
screenshot of scorecard example in cortex app
“Scorecards have made it incredibly easy to track the status of migrations across different services and teams.
We now have real data on which services are at risk and no longer need to manually check with teams, run scripts, or dig through several tools to find the right data. No one has to go in and update anything manually - it’s all automated and synced with Cortex.”
Rafael Garcia
Rafael Garcia - Cofounder & CTO, Clever

Scorecard Examples

Operational Maturity

Are services meeting SLOs? Are on-call metrics looking healthy? Are post-mortem tickets closed promptly? Are there too many customer facing incidents?
scorecard example in cortex app
oncall.analysis.meanSecondsToResolve < 3600
Make sure that issues are resolved in a reasonable amount of time. If they’re not, you can dig into the root cause.
10
oncall.analysis.offHourInterruptions < 3
If engineers are being paged off hours, it will lead to alert fatigue and low morale. By catching services that are causing high numbers of off hour interruptions, you can improve developer happiness.
30
JIRA: post mortem tickets opened in the last 6 months that are still open
Developers constantly creating action items for services and not actually closing them is an organizational risk. Either the team is not prioritizing incident-related issues, or the team is not equipped with the right resources.
10
jira.issues(“labels=customer and created > startOfMonth(-3)”)< 2
A reliable service should not be a source of frequent customer facing incidents.
20
jira.issues(“labels=compliance”)< 3
Make sure there are no outstanding compliance/legal issues affecting the service.
10

Operational Readiness

Are services ready to be deployed to production? Are there runbooks, dashboards, logs, on-call escalation policies, monitoring/alerting, and accountable owners?
scorecard example in cortex app
owners.count > 2
Incident response requires crystal clear accountability, so make sure there are owners defined for each service.
10
oncall.escalations.count > 1
Check that there are at least 2 levels in the escalation policy, so that if the first on-call does not ack, there is a backup.
30
runbooks.count >= 1
Create a culture of preparation by requiring that there are runbooks in place for the service.
10
links(“logs”).count> 1
When there is an incident, responders should be able to easily find the right logs (usually load balancer logs + application logs).
20
dashboards count >= 1
Responders should have standard standard dashboards quickly accessible for every service for speeding up triage.
10
custom(“pre-prod-enabled”) = true
Use an asynchronous process to check whether there is a live pre-prod environment for the service, and send a true/false flag to Cortex using the custom metadata API.
10
sonarqube.metric(“vulnerabilities”) < 3
Ensure that production services are not deployed with a high number of security vulnerabilities
10

Development Maturity

Is code coverage adequate? Do the right lock files and READMEs? Are the right package versions being used? Is ownership properly defined?
scorecard example in cortex app
owners.count > 2
Catch organizational risk by detecting orphaned services.
10
git.fileExists(“package-lock.json”)
Developers should be checking in lockfiles to ensure repeatable builds.
30
sonarqube.metric(“coverage”) > 80.0
Set a threshold that’s achievable, so there’s an incentive to actually try. This also serves secondarily as a check that the service is hooked up to Sonarqube and reporting frequently.
10
git.lastCommit.freshness < duration(“P30D”)
Services that are committed to infrequently, counterintuitively, are actually at more risk. This is because people who are familiar with the service may leave the team, tribal knowledge accumulates, and from a technical standpoint, the service may be running old/outdated versions of your platform tooling.
20
git.fileExists(*Test.java”)
Use a wildcard search to make sure there are unit tests enabled.
10
git.numRequiredApprovals >= 1
Ensure that a rigorous PR process is in place for the repo, and PRs must be approved by at least one user before merging.
10
git.fileContents(“circleci/config.yml”).matches(“.*npm test.*”)
Enforce that a CI pipeline exists, and there is a testing step defined in the pipeline.
10

DORA Metrics

There are a lot of ways to gauge the performance of your DevOps teams and the health of your software, but DORA metrics have emerged as the industry standard. If you aren’t familiar with DORA metrics, take a few minutes to read this comprehensive guide to understanding DORA metrics.
scorecard example in cortex app
Averaging to at least one deploy a day in the last 7 days
deploys(lookback=duration("P7D"),types=["DEPLOY"]).count >= 7
2
Incident was resolved within an hour
oncall.analysis(lookback = duration("P7D")).meanSecondsToResolve < 3600
10
Incident was ack'ed within 5 minutes
oncall.analysis(lookback = duration("P7D")).meanSecondsToFirstAck <= 300
10
No incidents in the last 7 days
oncall.analysis(lookback = duration("P7D")).totalIncidentCount = 0
10
Number of bugs in the last 7 days
jira.issues("label=\"Bug\" and created >= -7d ") <= 5
8
Number of incidents in the last 7 days
jira.issues("label=\"Incident\" and created >= -7d ") <= 5
8
Number of rollbacks in the 7 days
deploys(lookback=duration("P7D"),types=["ROLLBACK"]).count = 0
5
Ratio of incidents to deploys in the last 7 days
(oncall.analysis(lookback = duration("P7D")).totalIncidentCount / deploys(lookback=duration("P7D"),types=["DEPLOY"]).count) = 0
5
Ratio of rollbacks to deploys in the last 7 days
(deploys(lookback=duration("P7D"),types=["ROLLBACK"]).count / deploys(lookback=duration("P7D"),types=["DEPLOY"]).count ) = 0
5
Validate that last commit was within 24 hours
git.lastCommit.freshness <= duration("P1D")
2

Migrations

Have teams moved to right platform library version? Is the migration to the new Kubernetes cluster complete? How many teams have the right CI file checked-in?
scorecard example in cortex app
custom(“ci-platform-version”) > semver(“1.1.3”)
Having every CI pipeline send a current version to Cortex on each master build lets you catch services that are on outdated versions of tooling (like CI, deploy scripts, etc).
10
package(“apache.commons.lang”) > semver(“1.2”)
Cortex automatically parses dependency management files, so you can easily enforce library versions for platform migrations, security audits, and more.
10

One-click integration with third-party tools

Scorecards fetch data automatically from your integrations without manual work, letting you easily enforce standards across all your tools.
Make sure each service has accountable owners, an oncall rotation, high test coverage, and much more.
Learn more
tools logos such as google, github and pagerDuty being connected to cortex dashboard
robust api from cortex

The flexibility to meet your organization’s needs

Our robust APIs make it easy to use data from custom sources in your scorecards.

Cortex Query Language (CQL) enables you to create complex rules that can compare data across multiple sources or write expressive logical statements.

Enable leaders to make informed decisions

Historical data and organizational summaries give leadership deep visibility into progress, bottlenecks, and areas of risk.
historical data example in cortex app
initiatives example in cortex app

Drive organizational progress with ease using Initiatives

Within any Scorecard, assign owners and due dates to drive any best-practice, platform migration, and audit need.
Learn more