The frequency trap
- Moved from biweekly deployments to daily deployments over 6 months
- Celebrated the velocity improvement at the all-hands
- Saw production incidents increase from 1–2 per month to 5–8 per month
- Spent 40% more engineering time on incident response than the prior quarter
- Had no single metric that showed the trade-off they were making
How to read change failure rate as a leading indicator
- CI pass rate trend. A declining CI pass rate over the last 5–10 deployments is a leading indicator that test quality is degrading. Teams that ignore a CI pass rate dropping from 94% to 88% over two weeks reliably see a change failure rate spike in the subsequent month.
- Flaky test rate. Flaky tests suppress CI signal reliability. When engineers learn that CI failures are "probably just flaky," they start merging on red. That behavior is invisible in deployment frequency metrics but shows up directly in change failure rate.
- PR review time compression. When deployment pace increases without a corresponding increase in review capacity, PR review time drops. PRs reviewed in under 30 minutes by a single reviewer have a higher post-deploy defect rate than PRs reviewed over 2+ hours by two reviewers. Speed pressure on the review queue is a quality predictor.
- Hotfix frequency. How often in the last 30 days did you deploy a hotfix — a commit that exists specifically to patch something broken in a previous deploy? One hotfix per month is normal. Five is a signal that your deployment quality gate is not filtering enough.
Four signals that predict a quality regression before it hits production
- 1. CI pass rate on main below 90%. If your main branch is failing CI more than 10% of the time, the team has normalized broken main. Deploys from a broken main are unpredictable. This is a hard threshold — not a trend to watch but a condition to act on immediately.
- 2. PR cycle time below 4 hours with no corresponding test coverage increase. Fast cycle times are healthy when test coverage scales proportionally. Fast cycle times with flat or declining coverage means the team is going faster with less verification. That's a quality regression in slow motion.
- 3. Deployment frequency acceleration without pipeline coverage expansion. Doubling deployment frequency should roughly double the number of code paths exercised per CI run. If frequency doubled but pipeline configuration stayed constant, each deploy is less-tested than it was before.
- 4. Week-over-week increase in P1 alert volume from production. Not all P1 alerts indicate a quality regression — but a trend over 3+ weeks is hard to dismiss as noise. P1 alert volume is the canary for change failure rate before the rate itself is calculable.
Building a deployment quality gate without slowing your pipeline
- Track CI pass rate per commit author over 30 days. Not to blame individuals — to identify which areas of the codebase have degrading test quality before they become deployment failures.
- Set a flaky test count threshold. When active flaky test count exceeds N, auto-create a team-level ticket to address the backlog. Flaky tests are not individual problems — they're a team quality signal.
- Surface the frequency/failure rate ratio weekly. An Engineering Manager who sees "deployment frequency +12%, change failure rate +8%" in their weekly health report can ask the right question before the next production incident.
- Require hotfix PRs to be tagged. Every hotfix creates a data point. A tagging requirement (even just a PR label) makes hotfix frequency trackable without adding process friction to normal deployments.
The right ratio: how elite teams balance frequency and stability
The benchmarks for a team operating in the elite band:
- Deployment frequency: Multiple deploys per day (or at minimum, daily)
- Change failure rate: <5% of deploys require rollback or hotfix
- Mean time to restore: <1 hour when a failure does occur
- CI pass rate on main: >95% rolling 30-day average
- Flaky test count: <5 known flaky tests in the active suite
↑ High (over 15%)
Slow and fragile
Rare deploys that still break often. Every release is high-stakes and under-tested.
The frequency trap
Shipping fast but breaking fast. Velocity celebrated while incidents quietly climb 5-8x.
Cautious
Stable but slow. Reliability is bought with infrequent, large-batch releases.
Elite
Multiple deploys per day with CFR under 5%. CI quality treated as infrastructure, not a project.
↓ Low (under 5%)
