Best practices for logging microservices

Logging is an essential debugging and analytics tool for any software system. Here’s how to approach logging for microservices, which come with unique challenges given their distributed nature.

Cortex

April 13, 2022

Bugs, errors, and the unexpected are an inescapable reality of software development. Rarely do programmers write bug-free code in their first attempt at an implementation. Instead, programming often involves iteratively writing and testing code to eventually squash all the bugs. Logging is a core part of the debugging process to help programmers understand a machine’s state and whether functions were executed as intended. With robust logging, developers can pinpoint where code might be going wrong without having to spend as much time combing through an entire codebase.

In the world of microservices, where modular components independently handle a narrow set of functions, logging plays an even more critical role in managing the complexity that comes from a disjointed architecture. Compared to monolithic applications, where all the logic is housed in a single place, microservices offer advantages with respect to flexibility and scalability. But, in the event that something goes wrong (an inevitability in any sufficiently large system), developers need to diagnose the failure across a plethora of services.

In this article, we’ll review best practices for logging and identify special considerations for microservices architectures.

Standard Logging Best Practices

Good logging has certain foundational elements that are common to microservices and monoliths. While there are entire books on the subject, here we’ll go over a few key points.

Balance Conciseness and Detail

Logging everything risks taking up disk space with extremely large files. Verbose log data can take a long time to open, let alone search through. On the other extreme, too much brevity could make logs useless when it comes time to debug a specific issue. At the very least, log information like unique identifiers that will make it clear what the program was processing when it encountered an error. Timestamp, including timezone and UTC offset, is also essential to troubleshooting so that developers can reason about sequences of events. Beyond the debugging use case, consider whether there’s information worth logging for metrics analysis, auditing, or other business needs.

Have Both Machine Parsability and Human Readability

Since log files can grow very large, ensuring they’re in a standardized, machine-parsable format makes it easier to automate searching through them. On the flip side, developers commonly need to inspect individual logs, too, so logs should not be in binary or some other format that needs to be decoded. Keep it simple with ASCII and English characters in an organized format suitable for humans and machines.

Log Complex Logic

The moments when a program calls another function or reaches a branch, say through an if-then-else statement, are among the most important to be logging carefully. This is when a program could go in one of several directions and it helps developers to know if the program traversed the expected path based on its current state.

Respect Sensitive Data

Refrain from logging personally identifiable information (PII) such as passwords and social security numbers, especially in the case that laws or regulations prohibit storing this data. Because individual developers are likely to need to inspect logs for debugging purposes, PII would be seen by other humans, presenting a privacy risk. If your organization needs to analyze user behavior, consider how logging might be able to aggregate certain kinds of sensitive data to keep things anonymous.

Microservice-specific Tips for Logging

In addition to all of the above best practices, good logging requires some extra techniques for microservices applications.

Standardize Log Formats

Earlier, we emphasized how standardization enables logs to be machine parsable. For microservices, standardization plays another important role in unifying logs across services. Choose and stick to a format, like JSON, using consistent naming conventions for key-value pairs across all the logs. In cases where microservices are dealing with the same information, key names should match exactly. This way, when looking up information across different services, development teams don’t need to have special knowledge per service.

Support Distributed Request Tracing

Request tracing is a method of following a request through a series of microservices to understand the flow of information and locate where errors originate. Standard request tracing used in monolith applications is not designed to handle how microservices might exist across different servers and environments. Distributing tracing, on the other hand, fits the microservices model by using trace IDs that get passed between services along with the rest of a request.

In brief, distributed tracing helps developers navigate the tree structure that forms when a “parent” microservice calls on “children” services to fulfill a request. Every line of the logs across all services in the tree needs to have a unique correlation ID to identify a specific request. Beyond that, each step in the request, also known as a “span,” needs to have its own unique ID. Then, whenever a parent service invokes a child, the parent sends both the overall correlation ID and its individual span ID. Together, this information gives developers enough context to follow the entire life of a request across all the services it touches.

Log Inbound/Outbound Requests

Related to the previous point, it is important to log interactions between microservices, a complexity that monolith applications do not have. Suppose Service A makes a request to Service B. The log messages should follow a pattern like this:

Service A: "Making request to B for URL /api/something/on/b"
Service B: "Received request for /api/something/on/b/"
Service B: Logging about the actual business logic
Service B: "Returning HTTP Status 400"
Service A: "Received status 400 with error…"

The example above illustrates how Service A logs both the request it makes to Service B and the response it receives back, while Service B logs the inbound request and outbound response on its end. This way, if there is a mismatch in the logs where one of these four key lines is missing, a developer knows exactly where to target in troubleshooting.

Include Infrastructure Information

Because microservices can exist in numerous instances across servers and environments, logging should also capture infrastructure information that could affect how an application behaves. Tools for microservices logging often automatically record this information, but be sure that logs include identifiers for cluster, pod, and environment (e.g. production or staging).

Practice Centralized Logging

Tools like Azure Monitor and AWS Cloudwatch can serve as a central location for logging across an array of microservices. Log aggregation substantially reduces the complexity of debugging since there is no need to open multiple files for data retrieval or analytics. Centralized logging can include more than just logs from microservices; logging the container framework (e.g. Docker) and orchestration system (e.g. Kubernetes) can help pinpoint problems beyond the business logic of individual microservices.

With centralized logging, microservices applications can leverage their inherent benefits, those around scalability, fault isolation, and higher developer velocity, while still maintaining a single source of truth in the log data. This is to the benefit of developers, analysts, and other stakeholders who rely on logs for maintaining and improving systems.

Cortex offers a suite of products to manage microservices architectures and power growing engineering teams. You can read more about microservices, like how to write good documentation for them, and find other tips for scaling systems on the Cortex blog.

Cortex

Content

Topic

This year's major trends in cloud migration

Recent years have seen a large number of companies pouring an extraordinary amount of resources into moving their applications to the cloud. Read this piece to learn about cloud migration trends in 2022.

How Cortex can help you get the most out of Datadog

Cortex is uniquely equipped to augment Datadog’s observability tools, providing deeper insights and greater visibility into your services. Learn more about how Cortex can help you maximize your APM tools, and make sure to stop by our booth at Datadog’s Dash conference.

15 Engineering KPIs to Improve Software Development

Tracking the right KPIs in software development is tricky—leading indicators, lagging indicators, input-based, output-based…there’s lots to consider beyond the marker itself. Here’s a few to get the conversation started.

See All Articles