How ShareChat improved visibility into its services


ShareChat is one of the largest Indian social networks, with over 180 million monthly active users across 15 different languages. The platform lets users share videos, jokes, songs and other language based social content.

What was the problem?

ShareChat has grown incredibly fast and with each new team member joining, there was a steep learning curve for them to pick up the right context around services across the company. Understanding where the documentation for a service lived or what its dependencies were involved digging through code or relying on tribal knowledge from another engineer. This does not scale well. On top of that, dashboards and metrics about each service were scattered across different tools and docs, making it difficult to debug during incident response.

How was this solved before?

We had several tools that hosted all our service documentation, but the discoverability and visibility was not great. It was very hard to find documentation about specific services, especially since some teams used Google docs, others used Github readmes, and some used Confluence. With limited visibility in Confluence, it also becomes very hard to enforce that everyone has a set of good documentation about their services.

ShareChat’s problem of scattered documentation is one that many companies experience as they grow and change. Getting teams to use the same processes and patterns is incredibly difficult, making it hard to find specific service information when engineers need it. It’s specially critical to have this information readily available during incidents to enable engineers to triage issues as fast as possible.

Relying on docs to track basic service data like ownership made it very hard to actually enforce people were keeping their documentation up to date. All of these issues combined led ShareChat to search for a more scalable solution for their engineering team.

How has Cortex helped?

“Cortex has tremendously helped with the visibility and discoverability of service info. It’s very easy for a new engineer to learn more about the services they own and their downstream dependencies. We can understand what a service is calling, who it’s owners are, and where the documentation lives in one single dashboard.”

Cortex has become an integral part of the developer onboarding experience at ShareChat, helping them understand critical context about all the services in the organization. Information that used to be tribal knowledge or live in outdated docs is automatically kept up to date in Cortex. ShareChat enforces that each service has a corresponding cortex.yaml through checks in their Jenkins pipeline, also tracking that documentation exists in each configuration. 

“Having all of the monitoring and logging dashboards in one place has helped speed up incident response and makes it easy to triage bugs found with our services. First time on-call responders are able to onboard much quicker thanks to Cortex.”

The engineering team has leveraged our many integrations to keep as much information as possible in the dashboards. This has proven particularly useful during incidents where the engineer on call can see the latest deploys, ECS or K8s information, and Grafana dashboards.

Any interesting insights that have resulted due to Cortex?

What’s your favorite feature of Cortex?

“The dependency graph is a really cool feature. In a microservices architecture, it can be very hard to understand how services interact, and the Cortex service graph makes it easy to understand. There have been multiple instances where we didn’t realize that two services depended on each other until we looked at the graph.”

Sanket - Sr. Software Engineer

The dependency graph has allowed ShareChat to visualize their service architecture easily - they’ve even started to represent databases and infrastructure. This feature has also helped other customers find orphaned services.

(example graph with sample data)

What’s next for Cortex?

“We’re looking at adding Scorecards to start tracking service quality across different teams. For example we want to start testing Sonarqube test coverage and deploy health and understand which teams are not meeting best practices.”

Scorecards help teams enforce best practices & hold teams accountable and are incredibly powerful to align teams and drive organization-wide cultural shifts. Engineers can set reliability standards across teams and types of services by tracking the health of deploys, SLOs, on-call, vulnerabilities, package versions, and more. Learn more about Scorecards here.

How has it been working with the Cortex team?

“It’s been great working with the team and the best part is the prompt replies and immediate turn around for any kind of bugs or issues. Also, the Cortex team is open to any new features and has helped us with our requirements in a short amount of time!”

We’re excited to continue working with the ShareChat team. If you’d like to see a demo of Cortex and help your team with service complexity, sign up here!