The 5-person team: what works and why it can't scale
The 5-to-20 transition: the four process gaps that open up
- 1. PR review becomes a bottleneck instead of a quality gate. At 5–8 engineers, the senior engineers can review every PR within a few hours. At 15–18, there are too many PRs for the same reviewers to handle without becoming the team's primary constraint. PRs start sitting for 24–48 hours. Engineers start working around the bottleneck — self-approving, rubber-stamp reviewing, merging without full confidence. Review lag goes from a minor inconvenience to a velocity killer.
- 2. On-call becomes unmanageable without rotation tooling. At 5 engineers, everyone is informally on call. At 15–18, informal on-call means the same 2–3 people absorb all production incidents while the rest of the team is insulated. Burnout follows. On-call rotation needs structure before it's needed — not after the first resignation.
- 3. Deployment process becomes tribal knowledge. The person who set up the CI/CD pipeline 18 months ago is the only one who can reliably deploy to production. As the team grows, the deployment process needs documentation and automation — not because it's broken, but because the knowledge holder can't be the single point of failure at scale.
- 4. Engineering health becomes invisible to leadership. The CTO who was in every standup at 5 engineers is now running a 15-person team with three feature streams. They can no longer observe team health by presence. They need metrics — but most teams at this stage haven't yet built the tooling to produce them.
The 20-to-50 transition: when informal coordination becomes a full-time job
- Cross-squad dependency management. Squad A is waiting on Squad B to merge a shared API change. There's no formal mechanism to track this dependency. It surfaces in a standup on the day it's blocking, not the week before. Untracked cross-squad dependencies are the #1 source of sprint misses at this scale.
- Inconsistent engineering standards across squads. Squad A has strong test coverage and reviews. Squad B ships fast and patches later. Over time, the codebase develops islands of quality — some areas trusted, others avoided. The EM can't see this without metrics.
- Health signal fragmentation. Squad A's CI failures don't appear in Squad B's Slack channel. Squad B's deployment issues aren't visible to Squad A. The Engineering Director who needs a view across all squads is checking four separate dashboards or relying on stand-up reports that are filtered and delayed by the time they arrive.
- Leadership context loss. The VP of Engineering at a 40-person team is 2–3 reporting layers from the code. Without structured weekly metrics, they learn about quality regressions from incident reports — retrospectively, not proactively.
The metrics that matter at each scale (and when to introduce them)
At 5–10 engineers (baseline now, you'll need it later)
- Deployment frequency (even if it's just a Slack message when you deploy)
- CI pass rate on main (weekly average)
- PR count per engineer per week (basic velocity baseline)
At 10–20 engineers (process gaps are opening)
- PR review time — median and 90th percentile (detect bottlenecks before they compound)
- Stale PR count (PRs open >3 days) — weekly trend
- CI failure rate per deploy — track the trend, not just the absolute
- Incident count per month + time to resolution
At 20–50 engineers (multi-squad coordination)
- All of the above, segmented by squad
- Cross-squad dependency incidents (PRs blocked on another squad's work)
- Friction Score by squad — comparative health, not just absolute
- Weekly health report delivered to leadership without requiring anyone to compile it
The scaling readiness checklist
Before hiring engineer 10
- PR review ownership is distributed (not bottlenecked on 1–2 senior engineers)
- CI/CD deployment process is documented and executable by any engineer, not just the original author
- Baseline metrics established (deploy frequency, CI pass rate, PR cycle time)
- On-call rotation defined with explicit ownership, not informal coverage
Before hiring engineer 20
- Stale PR alerts wired to EM automatically — no manual queue monitoring
- CI failure routing established — failures go to the right person, not to a #dev-general channel no one watches
- Weekly health report exists and is delivered without manual compilation
- Engineering metrics visible to leadership in a single view, not aggregated from standup reports
Before hiring engineer 30
- Squad-level metrics segmented — squad A's CI health visible separately from squad B's
- Cross-squad dependency tracking formalized (not just standup mentions)
- Engineering Director has a single weekly summary of all squad health without needing to attend all squad standups
- Automation templates for standard incident response — not every on-call event should require a human to start the paper trail