Designing for Exceptions (Not Just the Happy Path)

The Bottom Line, First

Most automation is designed for the easy case. The request that arrives complete, fits a known category, and gets handled without a hitch. That case exists, and on a good day, it describes most of what comes in. But the messy cases don’t disappear just because your system wasn’t built for them. They become fires. The teams that build reliable operations design for those moments upfront, not after the first time they cause a problem.

And now for the details.

The happy path is not a workflow

When a team sits down to design a workflow, they naturally start from the best case. A complete request arrives. It’s clear what type it is. It goes to the right person. That person handles it. Done.

That workflow is real. It probably describes a majority of what comes in.

The problem is that a system designed only for that case isn’t reliable. It’s reliable for the good days. When a request shows up with missing information, or sits in an ambiguous category, or arrives at a moment when the usual owner is unavailable, the system has no answer. The request doesn’t route itself. It waits. Or it gets handled inconsistently. Or it quietly gets dropped.

In a service business, the messy cases aren’t rare. They’re built into the volume. Designing only for the ideal scenario means accepting that everything else will be handled by improvisation.

What an exception actually is

“Exception” tends to sound like a rare edge case. The kind of thing you deal with manually when it comes up twice a year. That framing undersells the problem.

An exception is any situation where the standard workflow doesn’t apply cleanly. That covers a lot of territory:

Incomplete information. The request arrived, but something is missing. A client name, a policy number, a description of scope. The system can’t classify or route it without knowing what it actually is.

Ambiguous category. It could be one type of request or another. Or it combines elements of both. Your routing rules weren’t written for this one, and the system doesn’t know what to do with it.

Unusual urgency. Routine work that suddenly needs to happen in two hours. Same request type, but the normal handling timeline is wrong. Nothing in the standard workflow signals that.

No clear owner. The request arrived during a coverage gap, involves a service area you added recently, or falls between two roles. The standard assignment logic doesn’t produce a usable answer.

None of these are rare. If you handle any significant volume of service requests, you see all of them regularly. The question isn’t whether they’ll happen. It’s whether your system is ready when they do.

What happens when exceptions aren’t designed for

When a request hits a gap in the workflow, one of a few things tends to happen.

It sits. Nobody owns it because the system didn’t assign it. It ages until someone notices, or until a client follows up to ask where things stand.

It gets handled inconsistently. Different people deal with the same unusual situation differently, with no record of what was decided or why. The next time it comes up, the team starts from scratch.

It becomes someone’s manual job. Usually the most experienced person on the team, or the owner. They become the unofficial handler of everything the system can’t manage, which means they’re absorbing exactly the kind of interruptions the workflow was supposed to eliminate.

Over time, these workarounds become invisible. Things get resolved, so the system appears to be working. But reliability is coming from people filling in the gaps, not from the system itself. That’s a fragile foundation, and it doesn’t scale.

This is one of the most common reasons automation projects fail to deliver on their promise: the workflow works on the happy path, breaks on the edges, and the edges are where the real operational pain lives.

Escalation is a feature, not a failure

The most important shift in thinking: escalation isn’t what happens when a system breaks down. It’s part of how a reliable system works.

A well-designed workflow includes explicit handling for the situations it can’t resolve automatically. When a request arrives incomplete, the system prompts for the missing information and sets a timeline. If the information doesn’t arrive, it escalates to a person. When a request doesn’t match any known category, it routes to someone for classification rather than disappearing into a queue. When urgency signals are present - specific language, deadline dates, client flags, the workflow recognizes them and responds.

None of this requires complicated logic. It requires deciding in advance: what do we do when this situation comes up? That decision, made once during design, converts an ad hoc workaround into a repeatable procedure.

The humans who handle those escalations aren’t evidence that the system failed. They’re part of the system. Keeping people in control at the points where judgment matters is how automation earns trust over time. Assistants handle what’s predictable. People handle what isn’t. The job of the system is to make sure nothing falls between those two.

Finding your exceptions before they find you

The most common objection to designing for exceptions is that you can’t anticipate everything. That’s true. But you can anticipate most things, and that’s enough to build a substantially more reliable system.

Start by asking your team: what happens when a request comes in that doesn’t fit the normal process? You’ll get immediate answers. The people who do this work know exactly which situations cause problems. They’ve been handling them manually for months or years. They can tell you where the gaps are. Write those down.

Then look at recent history. Which requests required extra back-and-forth before they could move forward? Which ones stalled waiting for clarification? Which ones got routed to the wrong person first? Those patterns are already visible. The exceptions have been happening. You’re just making them explicit.

You don’t need to solve every edge case before you start. This is part of why starting with a smaller, well-defined pilot workflow makes sense: limited scope means a controlled environment where you can discover which exceptions actually show up in practice. You learn from real volume without exposing your entire operation to an untested system.

What it looks like when exceptions are handled well

The signal that exception handling is working isn’t that exceptions stop happening. It’s that they stop being fires.

When unusual situations have designed paths, they follow them. Incomplete requests get resolved or escalated on a clear timeline. Ambiguous cases go to a person and come back with a decision. Urgent work surfaces early instead of arriving at the last minute. Your team still handles edge cases. They just handle them deliberately, not reactively.

That’s the practical difference. Not a system that eliminates human judgment, but a system that uses human judgment where it actually matters. Everything else runs consistently. The edges don’t bring the whole operation down.

At DST, designing for exceptions is part of the work from the first conversation. We map edge cases alongside the standard workflow because that’s where most of the real reliability problems live. Whether you work with us or not, the principle holds: a system that handles only the good days is a good-day system. Build for the messy days, and reliability follows.