Back to Blog
Best Practice
Incident Response
Catalogs

Black Friday is 30 days away. Your engineering infrastructure might not be ready

Cristina Buenahora Bustamante

Cristina Buenahora Bustamante | October 28, 2025

Black Friday is 30 days away. Your engineering infrastructure might not be ready

If you're anything like your peers, you probably blinked in April and found yourself a month away from Black Friday when you opened your eyes. Much like a shopper desperately scrambling to pull together gift lists for their loved ones, many engineering teams find themselves rushing to ensure their systems can handle the biggest shopping day of the year.

While this might feel like a frantic dash to the finish line, there's still time to implement the essential safeguards that will protect your business during peak season. The bad news is that you can't afford to waste any time.

Based on our experience helping engineering teams prepare for high-traffic events, here are the three critical steps to prioritize right now for a successful Black Friday season.

1. Establish clear service ownership

You might not wish payment system outages or broken product pages on your worst enemy during Black Friday (OK, you might), but these emergencies tend to pop up at the worst possible time. When they happen, you need to know exactly who to call, and you can't wait until tomorrow. You need to know right that second.

Without clear ownership, incidents turn into hours of detective work. When every minute of downtime costs thousands of dollars, you can't afford to spend time figuring out who's responsible for what.

Start by documenting everything

Create a spreadsheet listing every critical service and its owner. Focus on the services that are in your critical path to Black Friday success: payment processing, website infrastructure, inventory management, and customer support systems.

For each service, be sure to document the following:

  • Link to escalation rotation in your incident management tool

  • Team who is primary owner

  • Contact information (e.g. Slack channels)

  • Secondary contact for escalation

  • Team lead or manager

  • Direct phone numbers (not just Slack handles)

You already know this manual approach doesn't scale, especially when the team on-call isn't always the team that owns the service. Whoever gets paged during an incident needs to know immediately who the actual owner is and how to reach them for remediation. For now, capture this ownership chain in your spreadsheet. When you have time after the dust settles from Black Friday this year, consider implementing a service catalog that maintains ownership automatically and stays current as your team grows.

2. Review and test your incident management process

Your incident management process is your lifeline when things go wrong. One month before Black Friday is the perfect time to stress-test this process and ensure it can handle the pressure.

During peak traffic, incidents happen fast and escalate quickly. The process that looks good on paper has to actually work under stress.

Verify your escalation paths now

Conduct a full review of your incident management process. For every service in your critical path, verify that you have:

  • Three levels of escalation defined

  • Contact information for everyone in the on-call rotation

  • Phone numbers (not just email addresses)

  • Confirmation that on-call team members aren't out of office during Black Friday week

  • Clear escalation timelines (how long to wait before escalating)

  • Ensure critical alerts override “Do not disturb” settings so pages reach on-call engineers

Many teams discover their incident management process looks great on paper but fails under pressure. People's phone numbers are outdated, escalation paths are unclear, and critical team members are unreachable.

A manual review will get you through Black Friday, but you're already tracking how much time you're spending maintaining this information. Automated systems can ensure your incident management data stays current and accessible without the overhead.

3. Create and test critical runbooks

When your database is struggling under load or your payment gateway is timing out, you need clear, tested procedures for immediate response. Generic troubleshooting and educated guesses won't cut it during peak traffic.

During an incident, clarity of action is everything. Well-tested runbooks mean faster resolution and less panic. Poor runbooks mean longer outages and more stress.

Document your critical failure scenarios

Create runbooks for your most likely emergencies:

  • Rollback procedures: How to quickly revert to the previous stable version

  • Service restart protocols: When and how to restart services without causing cascading failures

  • Infrastructure scaling: How to add capacity when systems are under stress

  • Circuit breaker activation: When and how to fail over to backup systems

Run through each procedure with your team to identify gaps, unclear steps, or missing permissions. You know how this goes: runbooks that work in theory often fail under pressure. Store these runbooks where they're easily accessible during an incident. Whether that's a shared drive, wiki, or service catalog, make sure the links are current and the content is up to date.

Sprinting towards minimal viable preparation

If you implement these three steps, you'll have the essential foundation for Black Friday readiness:

  1. Clear ownership of every critical service

  2. Tested incident management with verified contact information

  3. Accessible runbooks for common failure scenarios

We all know this is triage to survive Black Friday, not a transformative exercise that lays the foundation for future peak sales periods. But given the timeline, these steps will protect your business during peak season and give you what you need to respond quickly when problems arise.

Surviving this Black Friday while laying the foundation for future peak events

These three steps will get you through Black Friday, but they also highlight what you already know: manual processes don't scale. As your team grows and your systems become more complex, spreadsheets and manual tracking become increasingly difficult to maintain.

The teams that handle peak season consistently well have automated these processes. They use service catalogs that automatically maintain ownership information, scorecards that track incident management readiness, and integrated systems that keep runbooks current and accessible.

For now, execute on what you can control. Black Friday is coming, and your customers are counting on you to be ready.

Your next steps to ensure a smooth(er) Black Friday

If you're implementing these steps and realizing how much manual work is involved, you're not alone. Many engineering teams discover that preparing for high-traffic events exposes the limitations of their current processes.

Cortex can help automate and streamline these preparations, making Black Friday readiness and ongoing reliability much easier to maintain. We help engineering teams maintain service ownership, incident management, and runbook documentation automatically. Learn more or schedule a demo to see how we can help you prepare for peak season.

Begin your Engineering Excellence journey today