← Back to Home

Staying Calm Under Pressure: Lessons from Scaling Systems

Staying Calm Under Pressure: Lessons from Scaling Systems

In technology, small issues often lurk unnoticed until scale exposes them. Recently, while working through a growth push, a seemingly minor quirk turned into a major blocker under higher volumes. As a team, we dug in and found the root cause, then aligned on a long-term fix. Moments like these remind me of when I was first a manager in eCommerce.

A senior leader once told me: You're cool as a cucumber. At the time, I thought it was a strange comment. Staying calm under pressure felt natural, but he made it sound rare.

Over time, I realized what he meant: the ability to stay steady and use data made me a reliable fix-it leader. Diagnosing issues systematically became my strength. That perspective left an indelible mark on how I approach challenges. But there's more to it than temperament—there's a methodology underneath that calm exterior.

The Anatomy of Staying Cool

What looks like unflappable calm is actually a practiced system. When crisis hits, most people's first instinct is to act—to try something, fix something, do something. The pressure to resolve immediately is intense, especially when customers are affected or executives are watching.

But rushing to action without understanding creates its own chaos. You end up chasing symptoms instead of root causes, implementing fixes that address what you can see rather than what's actually broken. The counterintuitive move is to slow down first—not to delay resolution, but to ensure you're solving the right problem.

This is where my neuroscience background intersects with engineering leadership in unexpected ways. The brain under stress narrows its focus and prioritizes speed over accuracy. It latches onto the most available explanation rather than the most likely one. Knowing this, I've learned to recognize the signs in myself and my teams: when we start pattern-matching too quickly, when we stop asking questions, when consensus forms suspiciously fast.

The antidote is systematic diagnosis. Data grounds you. Hypotheses force clarity. Testing one variable at a time creates learning even when you're wrong.

The Loyalty Integration Puzzle

One of the most formative projects in my career involved integrating legacy loyalty systems into a modern eCommerce stack. These systems were built for point-of-sale transactions—not real-time, high-volume digital interactions. The architecture mismatch was obvious from day one: batch processing meeting streaming data, eventual consistency colliding with customer expectations of instant gratification.

To make them work, we had to build buffers and caches. We added queues to bridge the gap. We load-tested, we optimized, we thought we understood the system's behavior. Then we went live.

Customers' loyalty points were showing up inconsistently—sometimes accurate, sometimes not. The pattern made no sense. A customer would check their balance, see 1,000 points, make a purchase using 500 points, then refresh and see 1,000 points again. Or they'd see 500. Or 250. It was nondeterministic in a way that suggested race conditions, but none of our traces supported that theory.

We tore through network traces and latency distributions. We checked queue configurations and timing issues in the cache invalidation. We verified that the message queue was processing in order. We reviewed the loyalty API's documentation for edge cases we might have missed. Nothing fit the pattern.

The Question That Mattered

The team was starting to spiral—theories were getting more baroque, someone suggested rebuilding the entire caching layer, tension was rising. That's when I noticed we were all solving for the architecture we'd built rather than questioning our assumptions about the underlying systems.

Almost as an aside, I suggested: What if one of the databases is receiving writes but not propagating to the reads?

The room went quiet. We'd been so focused on what we'd built that we hadn't questioned what we were integrating with. Our architect ran a quick check. Within minutes, he confirmed it: the legacy loyalty system had a primary-replica database configuration, and some of the replicas had replication lag that wasn't being monitored. Writes were going to the primary, but our reads were hitting replicas that were seconds to minutes behind.

We weren't dealing with a cache problem, a queue problem, or a network problem. We were dealing with a fundamental architectural assumption: that reads and writes hitting different databases would be consistent. The legacy system had been designed for batch reconciliation overnight, where eventual consistency over hours was acceptable. We'd grafted real-time expectations onto a system never built for them.

The Fix and the Framework

The immediate fix was straightforward: route reads to the primary database during checkout flows where consistency mattered critically. We accepted replication lag for less time-sensitive queries and added monitoring for replication delay so we'd know when the system was operating outside acceptable parameters.

But the deeper lesson was about diagnostic methodology. We found the answer when we stopped defending our hypotheses and started questioning our assumptions. The ability to stay calm wasn't about having a naturally unflappable personality—it was about having a framework that created psychological safety for the team to explore uncomfortable possibilities.

That framework looked like this:

  • Observe without judgment: Describe what we're seeing, not what we think it means
  • Generate multiple hypotheses: Resist premature convergence on the obvious explanation
  • Test assumptions systematically: What are we taking for granted that might be wrong?
  • Create space for lateral thinking: The person with the "stupid question" might be seeing something everyone else misses

Lessons That Last

Looking back, that project wasn't just about fixing a bug. It was about modernizing a system and breaking down silos. We proved that legacy infrastructure and modern digital platforms could work together. The bonds formed through those challenges still stand out in my career—there's something about solving hard problems together that creates lasting trust.

But it also shaped how I think about engineering leadership today. The technical skills matter—you need to understand distributed systems, database replication, and caching strategies. But the meta-skill is creating the conditions where teams can think clearly under pressure.

This is what I bring to decision science consulting now: the ability to diagnose technical problems and the framework for helping engineering leaders build teams that stay effective. When complexity increases and stakes rise, the best outcomes come from teams with psychological safety to question assumptions and the discipline to follow evidence.

The Pattern Repeats

That recent growth push issue I mentioned? Same pattern, different domain. The team was under pressure and theories were proliferating. Someone needed to create the space for systematic diagnosis. We slowed down enough to understand what was actually happening rather than what we assumed was happening. The fix, once we found it, was straightforward.

Staying cool as a cucumber isn't about lacking stress or never feeling urgency. It's about having internalized a methodology so deeply that it activates automatically under pressure. It's about recognizing when your brain is trying to take shortcuts and consciously choosing the longer path of systematic understanding.

And it's about building teams where everyone can do the same.