The Test Pyramid Was a Lie (Or at Least an Oversimplification)
- Phil Hargreaves
- 2 hours ago
- 4 min read
For years, we’ve been told to follow the “Test Pyramid.”

Lots of unit tests. Fewer integration tests. Even fewer end-to-end tests.
It’s been presented as an almost unquestionable truth - a universal law of good engineering.
But what if the test pyramid was never a law?
What if it is a simplification that became dogma?
And more importantly: What testing actually matters?
Where the Test Pyramid Came From
The concept popularised by Mike Cohn in Succeeding with Agile was meant to guide teams away from slow, brittle UI-heavy test suites.
The idea was simple:
Unit tests are fast and cheap → have many.
Integration tests are slower → have fewer.
End-to-end tests are slow and fragile → have very few.
As a principle of feedback speed and maintainability, this made sense.
But somewhere along the way, a heuristic became a rigid rule.
And that’s where the problems began.
The Pyramid Optimises for Cost - Not Risk
The pyramid is fundamentally an economic model.
It optimises:
Execution speed
Maintenance cost
Developer productivity
It does not optimise for:
Business risk
Customer impact
Revenue exposure
Security threats
System complexity
You can have thousands of unit tests and still ship a catastrophic production defect.
Why?
Because the pyramid optimises for test quantity by layer, not risk coverage.
And risk is what actually matters.
The Illusion of Safety
Many teams proudly report:
90%+ unit test coverage
Fast CI pipelines
A “healthy” pyramid shape, so to speak
And yet:
Payment systems fail in production
Authentication breaks after refactors
Critical user journeys silently degrade
Integrations collapse under real-world conditions
The pyramid never promised safety. It promised speed and maintainability.
Did we just assume safety came with it?
Modern Systems Don’t Fit the Pyramid Model
When the pyramid was introduced, systems were:
More monolithic
Less distributed
Less dependent on third parties
Less API-driven
Less cloud-native
Today’s systems are:
Microservice-heavy
Event-driven
Dependent on external APIs
Continuously deployed
Highly integrated
Where do you put:
Contract testing?
Observability validation?
Data pipeline verification?
Infrastructure-as-code validation?
Machine Learning model validation?
Thats before we start thinking of how and where we leverage AI to make our approach smarter, faster, and more predictive.
They don’t neatly stack into a pyramid. The model feels increasingly artificial.
The Real Question: What Testing Reduces Risk?
Instead of asking:
“Do we have enough unit tests?”
We should be asking:
“What could hurt the business most, and how are we validating that it won’t?”
That shifts the conversation from structure to impact.
Testing that truly matters often includes:
1. Critical Path Validation
Does the primary revenue-generating/critical workflow work under realistic conditions?
2. Integration Confidence
Do your services behave correctly together - not just in isolation?
3. Contract and Schema Protection
Are changes breaking downstream consumers?
4. Resilience and Failure Testing
What happens when dependencies fail?
5. Security Testing
How could this system be exploited?
6. Production Monitoring as Testing
Are you detecting real-world failures quickly?
There are, of course, others, e.g., usability.
None of these are about the testing pyramid. They’re about risk management, and there is so much to consider.
The Pyramid Encouraged the Wrong Metric
The most damaging side effect of the test pyramid wasn’t structure. It was a measurement.
Teams started tracking:
Test count
Coverage percentage
Layer distribution
Instead of:
Risk exposure
Defect leakage
Incident severity
Customer impact
Coverage is easy to measure. Risk reduction is harder. So we optimised for the easiest metric.
I spoke about how output measures often dominate reporting in a post about OKRs because they are simple, familiar, and readily available, but it's outcomes that are more important to any business. The same applies here.
What Actually Matters in Testing
If we strip away the pyramid metaphor, what remains?
Testing that matters is:
Risk-Focused
It protects what would hurt most if it failed.
Behaviour-Driven
It validates real user workflows, not just code paths.
Change-Aware
It increases scrutiny where code is unpredictable.
System-Level Conscious
It acknowledges distributed complexity.
Economically Rational
It balances the cost of testing against the cost of failure.
Notice what’s missing? There’s no one-size-fits-all view.
A Better Mental Model: The Risk Radar
Instead of a pyramid, imagine a radar.
Each release introduces potential exposure in different directions:
Revenue
Security
Performance
Reliability
Compliance
Reputation
Testing effort expands outward where risk signals are strongest.
Some releases may need deep unit validation. Others may need heavy integration scrutiny. Others may need chaos testing or load testing.
Your approach changes release to release. Because risk changes.
So… Was the Pyramid a Myth?
Not entirely.
It was useful. It corrected an over-reliance on UI tests. It improved feedback loops. But it was never a universal blueprint.
The mistake wasn’t the pyramid. The mistake was treating it as a rule instead of guidance.
The Future of Testing Isn’t One Shape
As systems grow more complex and interconnected, rigid models become less helpful.
What matters is:
Understanding where failure hurts most
Aligning validation effort with impact
Measuring real-world outcomes
Adapting continuously
Testing isn’t about building the right shape.
It’s about reducing meaningful risk.
And risk doesn’t stack neatly into layers.
Lastly
If your team stopped using the phrase “test pyramid” tomorrow, what would change?
If the answer is “nothing,” then it was never your strategy. It was just a diagram.
Diagrams don’t protect production systems. Intentional, risk-aligned validation does.
