Cortex

How to Measure and Improve Platform Maturity and Operational Readiness

Working on software infrastructure for the sake of infrastructure is rarely a good use of time. That probably sounds obvious, but perhaps the next question is less straightforward: what should engineering platform teams do instead?

By
Cortex
-
July 14, 2023

A well-run platform team sets business-relevant goals, measures how well the current platform is meeting those objectives, and uses that to iterate on the platform in a meaningful way. These teams recognize that they’re shipping a product to real users, even if those users are often other engineers. It helps to take a page from the product world and follow a user-driven approach.

In this article, we’ll articulate a concrete platform development framework to guide how you set your team’s direction. As an engineering leader, you’ll come away with strategies for measuring and improving both your team’s platform maturity and their operational readiness.

1. Assess the health of your platform

Before you can improve your platform and operations, you have to know where you’re starting. Then, as you make improvements, you can periodically reevaluate your platform’s health (we recommend quarterly or biannually) to determine whether your changes have been effective and how to recalibrate your roadmap.

The assessment process has three steps:

  1. Define your key questions to zero in on top-of-mind issues, goals, and unknowns
  2. Pinpoint the gaps by answering your questions
  3. Prioritize the elements of your platform needing improvement

Your key questions will vary depending on your organization’s needs and domain area. That said, some questions to consider include:

  • How many incidents have we had in the last week/month/quarter?
  • Are teams adopting the latest deployment strategies?
  • Is basic unit testing pervasive?
  • Is everything adequately code reviewed? 

Once you formulate your questions, find a way to answer them, whether on your own, by consulting engineering team members, or by examining your available metrics and resources. Don’t worry about building pristine dashboards or doing publication-worthy research; focus on efficiently finding answers that reveal your platform’s areas for improvement.

While you likely won’t be able to immediately solve every problem, an understanding of the weaknesses in your software development cycle lets you stack rank opportunities according to factors like feasibility, engineering cost, and expected return. Be careful not to exclusively set your plans based on an engineering perspective, though. A business lens is key to adding the full context, as we’ll discuss next.

2. Align platform goals to the business

Transpose your platform to a different business and it’s almost certain that your goals would change, too. In other words, your business context heavily influences your software platform’s requirements.

For instance, a hedge fund’s core needs might be to trade equities in global markets and to deliver content to clients. Engineering processes ought to be designed with these primary goals in mind and in a way that reduces any operational risks in those areas. Everything else is secondary.

As a platform leader, visualize what the end state of your platform needs to be and work backwards. This way, you can figure out what engineering ecosystem gaps align most closely to business needs.

Continuing with the hedge fund example, imagine we have a vision for a frictionless trading experience. Doing a platform assessment (step 1) we might find a number of issues. Suppose we discovered incidents in the trading service have led to downtime and revenue loss. We might also have identified a complete lack of unit testing for the fund’s analysis software. Because of our business priorities, our number one goal should be to resolve the trading service incidents, a decision that would be harder to make without the business context. Under different business circumstances—say instead of a hedge fund we were running a large bank—the calculus could change.

3. Choose your levers to improve the platform

So how exactly do you improve your platform? While your specific priorities are a function of your platform health assessment and your business context, tactics for platform improvements tend to fall in three main categories:

  1. Development maturity: How developers are building software
  2. Operational readiness: How developers are shipping changes
  3. Operational maturity: How developers are operating the platform

Every company has unique strengths and weaknesses. For example, if incidents or security are holding back your business, you probably want to focus more on the operational levers. Conversely, if velocity is lagging, try investing in development maturity and perhaps operational readiness, too.

Development Maturity

Within development maturity, there are several key questions to ask and by extension, ways to elevate how effective your team is at building software:

  • Who are the service owners?
  • How do you use automation and tooling to make it easier for people to build?
  • Are people following code review standards?
  • Are there branch protections?
  • Does every pull request need to be reviewed?
  • Do you have confidence in your code through unit tests and code coverage?
  • Do you have paved roads/golden paths?

You don’t necessarily need to adopt all these practices at once. Rather, an awareness of techniques for increasing development maturity can help you decide how to address the biggest gaps in your platform.

Operational Readiness

Similarly, you can explore the following questions when you want to upgrade your operational readiness and have teams shipping more frequently and more smoothly:

  • Who are the service owners?
  • Are people shipping things in the right way?
  • Are services ready to be deployed to prod?
  • Does the service have runbooks? Dashboards? Logs?
  • If something happens, is there an easy way to rollback? Is there a runbook for that?
  • Does the service have monitors and alerts set up?
  • What is the on-call escalation policy?
  • Does the service have any existing vulnerabilities?

Operational Maturity

For operational maturity, which focuses on maintaining a platform (as opposed to deploying changes), some key questions to ask include:

  • Are people operating the right way?
  • Is the service meeting SLOs?
  • Are on-call metrics looking healthy?
  • Are post-mortem tickets closed promptly?
  • Are you looking at DORA metrics such as deployment frequency and mean time to recovery?
  • Are you continuously monitoring and fixing vulnerabilities?

An operationally mature team isn’t one that’s entirely free of incidents—that’s virtually impossible. Instead, a mature team does its best to minimize risk and is prepared for inevitable disruptions, being able to respond to them thoroughly and efficiently.

Cortex implements this framework so you can standardize your platform improvements

At Cortex, we build products that take the guesswork out of improving your software components. There’s no need to manually implement the framework above with custom spreadsheets and scripts that organize your information. Cortex Scorecards give you a streamlined way to define platform standards like production readiness and development quality. With Scorecards, Cortex automatically pulls data from your software tools so you can answer questions about your platform’s health.

Want to learn more about how Cortex can enhance your engineering team’s efficiency? Schedule a demo today.

Cortex
By
Cortex
What's driving urgency for IDPs?