Do engineers on your team have to dig through logs to debug a service because there’s no documentation, get bottlenecked by pull requests that sit around for too long, or generally feel pain in their day-to-day work? These issues are quite common. Even though engineering leaders and SREs usually have a good idea of what makes for a high-performance engineering culture, their best intentions get pushed aside as deadlines pile up and teams are shuffled around.
Why is it so hard to make progress on cultural change? One theory is that as engineers, we tend to devote a lot of resources to solving technical problems, but not as many to solving cultural problems, which might seem insurmountable and abstract. A way around this is to approach cultural issues as we do anything else: with data. It’s actually not that hard to systematically quantify your engineering best practices.
In this article, we’ll show you how to stop the pain by defining and tracking toward cultural change. As examples, we’ve picked five pillars of engineering culture that resonate with us, but you can use the same strategies on any other cultural objectives that you might have for your organization.
Measuring if your culture is proactive, not reactive
Being proactive rather than reactive means deciding what it means to be a high-functioning engineering organization, and holding yourself accountable. As a starting point, maybe that means you decide all services need runbooks, documentation, and an on-call attached with at least one active employee.
How do you get there? First, you need to catalog your list of microservices or components, which can be a massive task depending on the size of your engineering organization. But this effort will allow you to expand into systemizing your reliability must-haves, enabling you to set OKRs around your goals and see how your services are tracking.
Measuring if you have a culture of ownership
Concretely, this means your teams need to own and be responsible for the things that they build. With a catalog of services, you can begin to track things like: are there any services that are unowned? Are there any services that are understaffed yet depended on by our most important services? These are all issues that can be identified and rectified through automation, to set up alerts and transfer ownership as soon as teams change or someone leaves the company.
Measuring your continual improvement
Liability and durability are moving targets, so we always want to be leveling up our engineering organizations. Especially as your business grows and changes, there are always ways to improve and new learnings that can be codified and tracked to keep making that incremental progress forward.
For instance, you might have a goal to ensure all production code has at least 70% coverage that you want to achieve in one year or two years from now. As part of CI, you could push code coverage data to your service catalog and reflect on your progress. Maybe it'll take four quarters for you to reach 60% coverage. But by systemizing and tracking over time, you can celebrate wins and discover where you need to invest more resources.
Measuring that your culture provides autonomy with consistency
Letting teams operate autonomously is a great benefit of working with microservices or a distributed monolith, but it’s easy to end up with interdependencies that slow everything down. Functional teams like infrastructure or data engineering may cut across other teams and become bottlenecks to releases, resulting in an awkward situation where some teams are prioritized over others, and making it difficult to plan product launches.
With a catalog of services, you can explicitly track and visualize dependencies and help teams automate away their external requirements while ensuring everyone is still following best practices. One way to achieve this is by investing in building template repositories with the bare minimum requirements to launch services into production on day one.
Measuring your team’s enablement mindset
An enablement mindset means individuals and teams are always thinking about how to better support those around them. Between teams, this means making sure services are properly documented: not just with a simple wiki page, but with robust, continuously updated API docs. Within teams, it means unblocking your fellow team members and ensuring that processes are flowing smoothly.
Again, even this software engineering goal can be systematically defined and tracked over time. For API documentation, you can set an alert on things like: Is there any API documentation for this service, and has it been updated in the last two months? You can track metrics like the average lifetime of pull requests, or if there are certain team members who are bottlenecks in the code review process. It’s about transparency, not blame — you might learn that important cultural changes that need to be made, like splitting responsibility more evenly among the team.
Everything is quantifiable
We’ve described just a handful of engineering pillars to give you an idea of the breadth of cultural metrics that can be tracked and automated. Whatever problems you decide to tackle, this discipline of being data-driven about cultural issues will reduce downtime and increase product velocity in the long run. And crucially, it will also reduce burnout and improve the overall quality of life within your engineering organization.
Cortex is a software engineering platform built specifically to solve these kinds of cultural problems. Through automation and with over 30 third-party integrations, you can automatically track KPIs for your microservices and the engineering organization at large. If this sounds interesting to you, please reach out to us at email@example.com. We’d love to hear from you.
This article was based on a talk by Nikhil Unni, Co-Founder and Chief Architect of Cortex, at the IBM PREVAIL Conference in 2021.