Business
Development
SRE

Improve velocity and reliability through service standardization

Dealing with microservice sprawl is challenging. We wrangle countless microservices, each built in it’s own slightly unique way – everything from different frameworks, deploy pipelines, project structure and even monitoring. Each new service that's built brings even more complexity to your architecture. 

Establishing consistency in the way microservices are built can mitigate some of these issues, helping you massively improve developer velocity, reduce operational overhead and minimize the complexity of your platform. While it requires a significant investment up front, it leads to better outcomes in the long run. This saves time, leads to higher quality architecture, and allows the focus to be on the needs of the customer. 

Why should you care about consistency?

1. Velocity

Standing up a new service requires a multitude of manual steps by developers, including

  1. Setting up a new git repo
  2. Handrolling scripts for CI/CD pipelines
  3. Defining basic project structure and framework scaffolding
  4. Boilerplate for database and message bus connections
  5. Establishing connections for logging, alerting,monitoring, etc.
  6. Adding any additional company-specific customizations to the microservice framework, like authentication, rate limit, service mesh integration, deploy scripts

Building out this technical plumbing is tedious and yet requires the same work for every single new microservice. Additionally, let’s be honest, this is grunt work that no one particularly loves doing. Systematizing and defining standards for each task expedites the process by making it easy for developers to understand exactly how to do all this the right way. 

2. Maintenance and support cost

The complexity of microservices makes diagnosing issues all that more difficult. Chances are few people on the team understand how all your services work which adds the risk of a low bus factor. Most likely, only the team that built the service understands it, and the support burden is centralized to them. There’s limited visibility into what the service is actually doing and therefore it requires legacy knowledge to diagnose. While we usually think of this risk from a domain knowledge perspective, lack of standardization actually means key operations like deploys/rollbacks, logging and monitoring are different from team to team – making the bus factor much worse. 

This makes it hard to have a reliable on-call schedule; the same people get called at 2am whether they are on call or not. Having consistent standards across services drastically reduces the operational burden and enables the broader team to be able to quickly cut through problems independently. Not to mention, not getting woken up at 2am provides a much needed morale boost.

3. Resource management

Similar to maintenance costs, lack of consistency across services isolates knowledge to the team that built it. This makes it difficult for developers to move across teams. As such, when priorities shift and more work is required on one service than another, you end up with developers facing a steepa ramp up period before they can meaningfully contribute to other projects. 

4. Governance

The way services are built generally reflect the preferences of the team that built them. That is to say, there are going to be some differences. Chances are that not all of your services meet the quality bar you’d like. Defining a standard allows you to do the design work once and then leverage it for all services, ensuring quality across the board.

5. Resiliency

The more services you have, the more ways in which your system can potentially fail. Streamlining implementation of the basic structure of your services allows you to more easily plan for issues and minimize impact.

The theme here is time. Consistency saves your team from recreating the wheel, having to maintain ten slightly different wheels, training people on all your different wheels...you get the point. Being diligent about your designs and processes up front, pay off down the line by saving your team time, and in turn, money. 

How do you ensure consistency?

1. Define standards

Defining standards for shared plumbing is the first step in ensuring consistency across your services. Without it, you have no foundation on which to build. 

2. Use microservices templates

What are microservices templates? They are a way to automate the standards you’ve defined - your dockerfile, CI/CD pipeline, health checks, etc. This is essentially just an example service that has all of your standards baked into it, so when a developer needs to stand up a new service they use that as a baseline; clone and get started immediately.

3. Encourage adoption

Make it harder to do the wrong thing and easier to do the right thing. If developers have a way to avoid the grunt work, they will. Providing templates makes it easy for them to build services in line with standard practices, saving them time and contributing to better overall platform governance at the same time. 

Both standardization and automation are required in order to reap the benefits. Without defined standards,  there’s no shared definition of quality to aim for and developers are spent spinning their wheels. Without automation, the standards will never be followed – there’s too much friction to  build to the standards. 

How do you get started?

At Cortex, we've released a new product that lets you define your standards and automatically apply them when creating new services. You can define your standard templates using the industry-standard Cookiecutter engine, then generate new services with one click. This includes everything from creating your repo, to generating the scaffolding, and tracking the owners of the new service. We integrate with all industry standard developer tools; Github, Jenkins, Prometheus, Splunk, Pagerduty, and others (see our full list of integrations here). Book a demo if you’re interested in learning more about our product.

Integration
Microservice catalog
SRE
Instant service information using Cortex & Slack
By 
Cortex
 - 
September 9, 2021