DevieraDeviera
Back to blog
CI/CDMonorepoPlatform Engineering

Monorepo vs. Multi-Repo CI: The Engineering Trade-Offs at Scale

March 20, 2026·7 min read·by Deviera Team

The monorepo vs. multi-repo debate is one of the longest-running arguments in engineering. Both sides have strong data. Both choices work at scale — Google runs a monorepo, Amazon runs multi-repo. What the debate usually misses is that the topology is not the primary variable in CI coordination quality. The observability layer is. And most teams have one regardless of which topology they chose.

The CI coordination problem: why topology matters less than observability

When teams argue about monorepo vs. multi-repo, they're usually arguing about the symptoms of an observability problem, not the root cause.

The monorepo team says: "CI takes forever, every change triggers 40 services, we can't tell which tests are relevant to which change."

The multi-repo team says: "When service A's API changes, service B breaks — and we don't find out until service B's team runs their CI three days later. We have no view across all repos."

Both complaints are observability problems. The monorepo team doesn't have good change-impact analysis. The multi-repo team doesn't have cross-repo failure correlation. Different symptoms, same underlying gap: there's no unified layer that makes the CI signal across the entire system legible.

Monorepo CI: the three failure patterns that don't appear in single-service pipelines

Monorepo CI introduces coordination challenges that are invisible in smaller codebases. The three patterns that show up consistently at scale:

  • 1. Blast radius ambiguity. A developer changes a shared utility function. The CI pipeline runs. 37 services test that function as a dependency. 3 of them fail. Was the change wrong, or were the 3 failing services already broken before this commit? In a monorepo without strong change-impact analysis, engineers spend significant time determining whether a failure is caused by their change or is a pre-existing condition they've exposed.
    The signal you need: per-service failure attribution that distinguishes "this service failed because of this commit's changes" from "this service was already failing before this commit."
  • 2. CI runtime growth that kills developer feedback loops. Monorepos with 50+ services report average CI run times of 23 minutes, compared to 6 minutes for equivalent single-service pipelines. At 23 minutes, developers stop waiting for CI before starting the next task. They context-switch. By the time CI finishes, they're mid-way through a different problem. The CI feedback loop breaks down not because it fails more often, but because the latency makes it impractical to stay engaged with it.
    The signal you need: affected-service detection that only runs CI for services actually changed by or dependent on the committed files — not the full suite for every commit.
  • 3. Shared CI configuration drift. In a monorepo, multiple teams contribute to the CI configuration. Over time, inconsistencies accumulate: some services have test coverage gates, others don't. Some services run integration tests, others only run unit tests. The configuration becomes a patchwork that individual teams understand only for their own service. A platform engineer responsible for CI health across all services has no unified view of whether the configuration is consistent or safe.
    The signal you need: CI configuration coverage map showing which services have which test types, quality gates, and failure thresholds — auditable from a single view.

Multi-repo CI: the cross-service coordination tax

Multi-repo CI has different problems. Each service is isolated — which is good for clarity and good for CI run time. It's bad for cross-service correlation.

  • 1. Cross-service failure correlation is manual and delayed. Service A deploys a breaking API change. Service B starts failing CI. Service B's team doesn't connect the two events because they're in separate repositories, separate CI pipelines, and separate Slack channels. The average time from a cross-service breaking change to its identification as the root cause of a downstream failure: 2–3 days in multi-repo teams without shared CI visibility, vs. hours in teams with correlated CI signals.
  • 2. Shared library updates are invisible blast radii. A shared NPM package is updated. 12 services depend on it. The library maintainer doesn't know which downstream services will be affected — they'd have to search each repository's dependency files manually. Most teams skip this check. They find out which services broke when the services' own CI runs next.
  • 3. The aggregate CI health signal doesn't exist. An engineering director overseeing a multi-repo architecture has no native view of aggregate CI health across all 15 repositories. They're looking at 15 separate CI dashboards or relying on engineers to report failures upward. The aggregate picture — what percentage of CI runs are passing across all repos, what the trend is, which repos are outliers — requires either manual compilation or a tool that aggregates across repositories.

How to build a unified CI signal regardless of topology

The good news: the observability gap is solvable with the same approach regardless of whether you chose monorepo or multi-repo. The solution is an aggregation layer that normalizes CI events from all sources into a single queryable feed.

What the aggregation layer needs to do:

  • Collect CI events from all repositories (or all services in a monorepo). GitHub Actions, CircleCI, GitLab CI, and similar platforms all emit webhook events on check run completion. A unified aggregation layer subscribes to all of these and normalizes the events into a consistent schema: service name, commit SHA, branch, pass/fail, failing test suite, timestamp.
  • Surface aggregate health metrics, not per-repo status pages. The signal that matters for an engineering leader is: what percentage of CI runs are passing across all services in the last 24 hours? Which services are outliers (failing more than 10% of the time)? Is the aggregate trend improving or degrading? This requires a rollup view — not a per-repo dashboard.
  • Enable cross-service correlation for failure investigation. When service B starts failing, the investigation should start with the question: "what changed in the last 24 hours in service A or any shared dependency that service B depends on?" That query requires the aggregation layer to know about events in both repositories, not just service B's own history.
  • Route failures to structured tickets with cross-service context. A CI failure ticket for service B is more useful if it includes: "service A deployed a breaking change 4 hours ago that service B depends on." That context assembly requires the aggregation layer to correlate events across services — which is only possible if the signal is unified.

The repository architecture decision checklist for platform teams

If you're evaluating monorepo vs. multi-repo, the tooling questions matter as much as the topology questions:

Before choosing monorepo

  • Do you have or plan to invest in affected-service detection for CI? (Without it, CI run times will grow linearly with the codebase)
  • Is there a single team responsible for CI configuration consistency across all services?
  • Do you have tooling for blast radius analysis on shared utility changes?

Before choosing multi-repo

  • How will you surface aggregate CI health across all repositories to engineering leadership?
  • How will you detect and route cross-service breaking changes before they cause downstream CI failures?
  • Is there a unified view for on-call engineers that shows CI failures across all repositories in one place?

For any topology

  • CI failures should route to structured tickets automatically — not to #dev-alerts channels that may or may not be watched
  • Aggregate CI pass rate should be tracked weekly as a first-class team health metric
  • Engineering leadership should see a rollup CI health view, not per-service dashboards

The monorepo vs. multi-repo decision has real architectural implications — but the CI coordination problems that teams attribute to their topology choice are almost always observability problems in disguise. The topology defines where the signal is. The aggregation layer defines whether you can see it.

Teams that invest in the observability layer find that the topology choice matters less than they expected — because both patterns are manageable when you can see what's happening across the whole system.

Share:

Stay Updated

Get the latest engineering insights

No spam, unsubscribe at any time. We respect your privacy.

14-day free trial

Try Deviera for your team

Connect GitHub in under 5 minutes. No credit card required.

Start free trial