DevieraDeviera
Back to blog
DORA MetricsEngineering Metrics
DORA mean time to recovery benchmarks — Elite under 1 hour, High under 1 day, Medium 1 day to 1 week, Low over 1 week to restore service after a failure

What is MTTR? Mean Time to Recovery for Engineering Teams

June 2, 2026·7 min read·by Ihab Hamdy
Mean time to recovery (MTTR) is one of the four DORA metrics. It measures how quickly you restore service after a production failure — the metric that answers "when something breaks, how fast are we back?"

What is MTTR?

MTTR stands for Mean Time To Recovery — the average time it takes to restore service after a failure in production. Together with change failure rate, it forms the stability half of DORA — while deployment frequency and lead time measure throughput.
The clock starts when a failure begins affecting users and stops when normal service is restored. The "mean" is taken across incidents over your measurement window.

MTTR benchmarks

From the 2024 State of DevOps Report:
  • Elite: Less than one hour
  • High: Less than one day
  • Medium: One day to one week
  • Low: More than one week
Elite
Under 1 hour
High
Under 1 day
Medium
1 day – 1 week
Low
Over 1 week
DORA mean time to recovery, by performance tier. The clock runs from a failure affecting users to normal service being restored.Source: 2024 State of DevOps Report (DORA).
Notice that throughput and stability are not in tension. Elite teams deploy more often and recover faster — because small, frequent changes are easier to diagnose and roll back than large, infrequent ones.

A note on the name

"MTTR" is ambiguous in the wider industry — it gets expanded as mean time to recovery, repair, respond, or resolve, which are genuinely different intervals. DORA uses recovery: time to restore service, not time to ship a permanent fix. Pick one definition and hold it constant, or your trend line is meaningless.

The MTTR formula

MTTR is a simple average. Over your measurement window, take the total time your service spent in a failed state and divide it by the number of incidents:

MTTR = total downtime ÷ number of incidents

If you had four incidents in a month totalling 6 hours of degraded service, your MTTR is 6 ÷ 4 = 1.5 hours. The clock for each incident starts when the failure begins affecting users and stops when normal service is restored — so the quality of your measurement depends entirely on detecting the start accurately. Teams that reconstruct incident start times from memory after the fact almost always understate MTTR.

What shortens MTTR

  • Fast detection. You can't recover from what you haven't noticed. The largest chunk of MTTR is often the gap between failure and someone realizing it.
  • Clear ownership. An incident with no obvious owner sits while people figure out whose problem it is.
  • Easy rollback. If reverting is a one-click operation, recovery is fast. If it's a manual scramble, it isn't.
  • Context at hand. The commit, the deploy, and the failing check in one place beats hunting across four dashboards.

How automated detection cuts MTTR

The detection gap is where an engineering intelligence platform helps most. When CI fails on main or a deployment fails, Deviera opens a structured incident immediately — tagged to the commit author, with the failing run attached — and posts it to Slack. There's no waiting for someone to notice. When the next build goes green, it auto-resolves the incident and records the recovery time, so MTTR is measured automatically rather than reconstructed after the fact.
For a step-by-step plan to move MTTR alongside the other three DORA keys, see how to reduce MTTR and our 4-week DORA roadmap.

Frequently asked questions

What does MTTR stand for?
MTTR most commonly stands for Mean Time To Recovery — the average time it takes to restore service after a production failure. The acronym is ambiguous across the industry and also gets expanded as mean time to repair, respond, or resolve, which are genuinely different intervals. DORA uses recovery: time to restore service, not time to ship a permanent fix.
How is MTTR calculated?
MTTR is calculated as total downtime divided by the number of incidents over a measurement window: MTTR = total time spent in failure ÷ number of incidents. For each incident the clock starts when the failure begins affecting users and stops when normal service is restored; you then average across all incidents in the period.
What is a good MTTR?
Per the DORA State of DevOps research, elite teams recover in under one hour, high performers in under one day, medium performers in one day to one week, and low performers take more than a week. The largest reducible slice of MTTR is usually detection time — the gap between a failure occurring and someone noticing it.
Share:

Stay Updated

Get the latest engineering insights

No spam, unsubscribe at any time. We respect your privacy.

14-day free trial

Try Deviera for your team

Track DORA metrics, PR cycle time, and delivery health automatically. Connect GitHub in under 5 minutes — no credit card required.

Start free trial

New to engineering metrics? Read the complete guide →