DevieraDeviera
Back to blog
Sprint PlanningEngineering velocityAgile

Why Sprint Estimates Are Always Wrong (And the Engineering Metrics That Fix Them)

March 24, 2026·7 min read·by Deviera Team

Sprint estimation is the ritual that every engineering team performs and almost none perfects. The failure mode is predictable: the team estimates based on planned work and ignores the unplanned overhead that will consume 20–40% of the sprint. CI failures, production incidents, review bottlenecks, stale PRs — none of these appear on the sprint board, but all of them eat into the sprint's effective capacity. Until you measure them, you can't account for them.

The real reason sprint estimates are wrong

The standard diagnosis for estimation failure is story points: teams use them inconsistently, velocity isn't stable, point inflation happens over time. Story points are genuinely problematic. But they're not the primary cause of sprint misses in most teams.

The primary cause is unplanned interrupt rate — the percentage of sprint capacity consumed by work that wasn't on the board when planning happened.

For a typical 10-person engineering team, the interrupt breakdown per sprint looks like this:

  • CI failures requiring investigation: 2–4 hours per sprint per engineer (varies by pipeline health)
  • Production incidents: 4–8 hours per sprint for the team when they occur (1–2 incidents/sprint is typical for teams without strong CI gates)
  • PR review requests outside planned work: 3–6 hours per engineer per sprint
  • Dependency unblocking: 2–4 hours per sprint per engineer (waiting on or unblocking other teams)
  • Context switches from stale PR escalations: 1–2 hours per sprint

Sum these up: the average engineering team is losing 30–40% of their sprint capacity to unplanned interrupts that don't appear anywhere in their sprint planning estimates. They plan 100% of capacity for planned work, then wonder why 60–70% of it gets done.

The solution is not to estimate better. It's to measure your interrupt rate and plan capacity accordingly — then work to reduce the interrupt rate over time.

How to measure your team's true interrupt rate

You can't plan around interrupt rate you haven't measured. The first step is establishing a baseline over 3–4 sprints. You don't need perfect precision — you need a directionally accurate picture of where the unplanned time is going.

The four interrupt categories to track:

  • CI failure response time. How many CI failures occurred in the sprint, and what was the total engineering time spent investigating them? A failed CI run that gets fixed in 15 minutes is different from one that requires 2 hours of root cause analysis. Track count and time separately.
  • Production incident time. Total hours the team spent on production incidents — from first alert to resolution. Include both the primary responder time and the secondary time of people who were pulled in to assist.
  • Review queue overflow. How many PRs per engineer per sprint arrived as review requests outside their planned sprint work? This is the "teammates asking for reviews" interrupt that's easy to overlook because it feels like normal work — but it's not in the sprint estimate.
  • Coordination overhead. Time spent unblocking dependencies, attending cross-team syncs that weren't in the sprint plan, or investigating shared infrastructure issues. This is the hardest to measure but often the largest category in multi-squad organizations.

After 3 sprints of tracking, calculate your interrupt rate: total interrupt hours ÷ total available sprint hours. A rate of 25–35% is typical for teams without strong CI automation. A rate above 40% indicates a structural problem — the team is spending nearly half their time on work that doesn't appear in planning.

The four metrics that make sprint capacity predictable

Once you know your interrupt rate, you can plan around it. But the deeper goal is to reduce it — which requires tracking the metrics that predict interrupt rate changes before the sprint starts.

  • 1. CI pass rate (rolling 30-day average). Your CI pass rate is your single best predictor of CI-related interrupt time next sprint. A team at 95% CI pass rate has predictable, low interrupt time from CI. A team at 82% CI pass rate will spend significantly more sprint capacity on CI failure investigation than their estimates account for. Track the trend, not just the current number.
  • 2. Open production incidents at sprint start. Any production incident that's open when sprint planning happens will consume unplanned capacity during the sprint. "We'll finish the sprint work and get to the incident" is how incidents stay open for three sprints. Count open incidents at planning time and factor in at least 4 hours per open incident as reserved sprint capacity.
  • 3. PR review lag (90th percentile). If 90% of PRs get reviewed within 4 hours, review queue interrupts are a minor overhead. If 90th percentile review lag is 2+ days, engineers are being pulled out of planned work to review PRs that sat too long. High review lag compresses into interrupt bursts — people suddenly needing 4 reviews in one day rather than steady-state one per day.
  • 4. Stale PR count at sprint start. Every stale PR (>3 days without review activity) at sprint start represents pending context reload for the author and pending review time for a senior engineer. A sprint starting with 8 stale PRs will resolve most of them in the first 2 days — burning capacity that wasn't planned for.

Building a feedback loop between sprint outcomes and future estimates

The most important practice in improving estimation accuracy is the sprint capacity retrospective — a dedicated 15-minute review at sprint close that answers one question: where did the time actually go?

Structure it around four buckets:

  • Planned work completed: story points or tickets finished
  • Planned work not completed: what was estimated but not delivered, and why
  • Unplanned work completed: CI failures, incidents, reviews, coordination — estimate hours
  • Process improvement impact: did last sprint's process changes reduce interrupt time?

After 3–4 sprints of this retrospective, two things become clear: your interrupt rate baseline (with enough confidence to factor it into planning), and whether the process changes you're making are actually reducing it.

Teams that do this consistently for 2 quarters typically improve sprint estimate accuracy by 35–45%. Not because they got better at estimating features — but because they stopped underestimating the overhead that was always there.

The sprint planning calibration checklist

Before sprint planning

  • Pull CI pass rate for the last 2 weeks — factor in CI interrupt rate accordingly
  • Count open production incidents — reserve capacity for each
  • Count stale PRs — estimate review time to clear the backlog in sprint week 1
  • Check previous sprint interrupt rate — use as the baseline overhead deduction

During sprint planning

  • Explicitly name planned capacity as: total available hours × (1 - interrupt rate)
  • Reserve a "buffer pool" of hours (10–15%) for unplanned interrupts — make it visible, not hidden
  • If CI pass rate is below 90%, add explicit CI remediation work to the sprint board (not just buffer)

At sprint close

  • Record actual interrupt hours by category (CI, incidents, reviews, coordination)
  • Compare to the buffer pool — did actual interrupts exceed the reserve?
  • Update the interrupt rate baseline for next sprint planning
  • Identify the highest-interrupt category and add one process improvement action to the next sprint

The teams that plan most accurately aren't clairvoyant — they're instrumented. They know their interrupt rate because they measured it. They know their CI pass rate because they track it weekly. They know their stale PR count because it's on their dashboard, not hiding in a GitHub list they haven't checked this week.

Estimation accuracy is a data quality problem. Fix the data, and the estimates follow.

Before your next sprint planning, get a baseline on the two metrics that matter most:

  • CI Health Score Calculator — score your pipeline failure rate and recovery time, the primary source of unplanned sprint interrupts
  • PR Cycle Time Calculator — measure your review lag and merge gap, which drive the review queue overflow that inflates your interrupt rate

Share:

Stay Updated

Get the latest engineering insights

No spam, unsubscribe at any time. We respect your privacy.

14-day free trial

Try Deviera for your team

Connect GitHub in under 5 minutes. No credit card required.

Start free trial