Almost every team that writes tests eventually measures code coverage — the share of their code that those tests actually exercise. It is one of the few quality signals that is cheap to collect, easy to plot on a dashboard, and immediately understood by everyone from a new graduate to a certification authority. That popularity is also its trap: the single percentage on the dashboard hides a stack of very different metrics, each one proving something stronger (or weaker) than the next, and "we're at 90%" can mean almost anything.
This guide untangles that stack. We will walk from the weakest metric to the strongest — function, statement, branch, decision, condition, MC/DC and multiple-condition coverage — explain in plain terms what each one actually demonstrates, show why a green "100%" can still sit on top of a real defect, and cover how coverage is collected, gated in CI, and mapped to assurance levels in functional-safety standards. By the end you should be able to read any coverage report and know exactly how much confidence the number behind it deserves.
Code coverage tells you what your tests touched, never whether the result was correct — so treat it as a gap-finder, not a quality score.
What is code coverage?
Code coverage is a measurement of how much of your source code is executed while your tests run. A coverage tool watches the program as the test suite runs, records which parts of the code were reached, and reports the result as a percentage of some total — lines, branches, conditions, and so on.
The crucial word is executed. Coverage is purely about reachability under your tests: did this line run, did this branch get taken, did this condition ever turn true and false? It says nothing about whether the program produced the right answer when it did. A test with no assertions at all can drive coverage to 100% while verifying absolutely nothing. Coverage measures the reach of your tests; your assertions measure their judgement. You need both.
What changes between metrics is the unit being counted and how hard it is to satisfy. Counting executed functions is a coarse, forgiving measure. Counting whether every individual Boolean condition independently affected an outcome is a fine, demanding one. The same test suite can post 100% on the first and barely 60% on the last — which is exactly why the headline number is meaningless until you know which metric produced it.
Why code coverage matters (and what it can't tell you)
Coverage earns its place because it answers a question that is otherwise invisible: what did my tests never even look at? Untested code is unmeasured risk, and a coverage report turns that risk into a concrete, line-by-line list. Used this way, coverage is a flashlight for the dark corners of a codebase — the error handler nobody exercises, the fallback path that only fires once a year, the branch added in a rush and never tested.
Three things make it genuinely valuable:
- It finds untested code. Every uncovered line is a place a bug can hide with zero chance of a test catching it.
- It is objective and repeatable. Two engineers measuring the same suite get the same number, which makes it a fair basis for review gates and trends.
- It maps onto standards. Safety standards prescribe specific coverage metrics at specific assurance levels, so the measurement becomes part of the evidence trail.
But coverage is just as important for what it cannot tell you, and confusing the two is where teams get burned:
- It is not correctness. Executing a line proves it ran, not that it behaves. Without assertions, coverage measures nothing useful.
- High coverage is not a quality ceiling. A suite at 100% statement coverage can still miss boundary values, error conditions, and the wrong combinations of inputs.
- The number is gameable. Tests written to chase a percentage rather than a requirement inflate the metric while leaving real behaviour unverified.
Coverage tells you where you have not looked. It can never tell you that what you looked at is right.
The coverage metrics, from weakest to strongest
The metrics form a ladder. Each rung is harder to climb than the one below, and each subsumes the weaker ones: you cannot satisfy branch coverage without also satisfying statement coverage, and so on up. Here is the same test suite measured against six metrics on a typical body of decision-heavy code — watch the number fall as the metric gets stricter.
One suite, six metrics. The stronger the criterion, the more honest — and the lower — the number.
Function & statement coverage
Function coverage is the coarsest metric of all: it asks only whether each function or method was called at least once. It is useful as a first sanity check — a function never invoked by any test is a glaring gap — but a single call into a 200-line function marks it fully covered while leaving most of its logic untouched.
Statement coverage (often loosely called "line coverage") tightens this to the level of individual executable statements: every statement must run at least once. It is the metric most dashboards default to, because it is intuitive and easy to render as colored lines in a diff. Its blind spot is decisions. Consider:
int scale(int x) { if (x > 0) x = x * 2; // one test with x=5 covers every statement… return x; // …but the implicit "else" path is never run }
A single test with x = 5 executes every statement in scale, so statement coverage reports 100%. Yet the case where x is zero or negative — where the if is false — was never exercised. Statement coverage cannot see the branch it skipped.
Branch & decision coverage
Decision coverage (commonly called branch coverage) fixes exactly that gap. It requires every decision — every if, while, for guard, switch case, or ternary — to evaluate to both true and false at least once across the suite. In the scale example above, decision coverage forces a second test where x <= 0, so the false branch is finally exercised.
Branch coverage is a meaningful step up and a sensible default target for most general-purpose software. But it still treats a decision as a single switch. A compound condition like if (a && b) can be flipped true and false without ever showing what role a and b each played — toggling the whole expression is enough. That is the gap the condition-level metrics close.
Condition, condition/decision & MC/DC
Once a decision contains more than one Boolean operand, three finer metrics come into play. A condition here means a single Boolean term with no && or || inside it (such as a or x > 0); a decision is the whole expression they combine to form.
- Condition coverage requires every individual condition to be both true and false at some point — but, perversely, it does not require the overall decision to take both outcomes. For
a && b, the pair (a=T, b=F) and (a=F, b=T) makes each condition both true and false, yet the decision is false in both cases. - Condition/decision coverage closes that loophole by demanding both at once: every condition true and false, and the decision true and false.
- MC/DC (Modified Condition/Decision Coverage) goes one critical step further. It requires you to show that each condition independently affects the decision's outcome — that you can flip just that one condition, hold the others fixed, and watch the decision flip with it. This is the metric DO-178C mandates for the most critical avionics software, and it catches conditions that are present in the code but make no actual difference. We unpack it fully in MC/DC Explained for DO-178C.
The beauty of MC/DC is that it delivers near-exhaustive logical confidence at linear cost: for a decision with N independent conditions, it needs only about N+1 tests, not the 2N that exhaustive testing would demand.
Multiple condition coverage
Multiple condition coverage is the strongest of the family and the most expensive. It requires every possible combination of the conditions in a decision to be tested. A decision with three conditions has 23 = 8 combinations; one with six has 64. The cost grows exponentially, which is precisely why MC/DC exists as the practical alternative: MC/DC achieves comparable logical confidence without the combinatorial blow-up, so multiple condition coverage is reserved for the small, critical expressions where exhaustive proof is worth the price.
This exponential growth is exactly what the meter card above illustrates: as you climb from function to multi-condition coverage, the same test suite leaves more and more combinations unexercised, and the honest number drops from 100% to 64%.
The ladder in three rules
- Each metric subsumes the weaker ones — branch coverage already implies statement coverage, and so on up to MC/DC.
- The stronger the metric, the lower (and more honest) the percentage the same test suite will report.
- Pick the rung that matches the risk: statement or branch for everyday code, condition-level and MC/DC for critical logic.
The "100% coverage" myth
The most expensive misunderstanding in testing is that 100% coverage means the code is correct. It does not — not even at 100% statement coverage, the most commonly quoted figure. Here is a defect that survives a perfectly green statement-coverage report:
double average(const int* a, int n) { int sum = 0; for (int i = 0; i < n; i++) sum += a[i]; return sum / n; // division by zero when n == 0 }
One test — average({2, 4}, 2) — runs every single statement in this function and returns the correct 3.0. Statement coverage: 100%. But the empty-array case, where n == 0 divides by zero, was never tested. The bug is fully covered and entirely undetected. The table below shows why each metric does or does not catch it:
| Metric | Reports | Catches the n==0 bug? |
|---|---|---|
| Statement | 100% | No — every line ran with n=2 |
| Decision (branch) | 100% | No — the loop guard hit true and false already |
| Boundary / requirements-based test | — | Yes — a test for the empty input is what's missing |
The lesson is not that coverage is useless — it is that coverage measures the reach of your tests, while a missing requirement (handle the empty array) is what leaves the bug exposed. No structural metric invents the test case you forgot to write. That is why requirements-based testing comes first and coverage second, a point we return to below.
How coverage is collected
Whatever the metric, the mechanics are broadly the same three steps:
- Instrument. The tool inserts lightweight probes into the code so that each line, branch, or condition records when it executes. This can happen in the source, at compile time, or in the binary; source-based and compile-time instrumentation is what lets a tool measure condition-level metrics like MC/DC accurately.
- Run. The instrumented program is exercised by the test suite — unit tests, integration tests, even manual or system runs. As it runs, the probes accumulate execution data into a coverage database.
- Report. The tool reduces that raw data to per-metric percentages and an annotated, line-by-line view showing exactly what was hit, partially hit, or missed — typically color-coded so gaps are obvious at a glance.
The instrumentation step is where tools differ most. Approaches that need source edits or that only see line-level data cannot honestly report condition or MC/DC coverage. The strongest tools instrument during compilation, using the same compiler that builds the shipping software, so the coverage measured is the coverage of the code that actually runs.
Coverage in CI
Coverage delivers the most value when it runs automatically on every change rather than as an occasional manual check. In a continuous-integration pipeline, three patterns do the heavy lifting:
- Coverage gates. A build fails if coverage drops below a defined threshold — say, "decision coverage must stay at or above 85%." Gates stop coverage from quietly eroding over time.
- Delta (changed-code) coverage. Rather than gating the whole codebase, gate only the lines a pull request touched: "new and modified code must be 90% covered." This is far more actionable than a project-wide average, which a large legacy base can mask, and it keeps the bar high exactly where the risk is — in the new code.
- Trends. Plotting coverage over time turns the metric from a snapshot into a signal. A steady decline flags accumulating untested code long before it becomes a crisis, and a sudden jump can reveal a test that asserts nothing.
Publishing the per-change report back into the pull request — so reviewers see which new lines are uncovered right next to the diff — is what turns coverage from a number nobody reads into part of the review.
Coverage in safety standards
In safety-critical domains, coverage stops being a nicety and becomes mandated evidence. The major functional-safety standards each map structural coverage metrics to their assurance levels, so the higher the risk a component carries, the stronger the metric you must demonstrate.
| Standard | Domain | Highest level | Coverage expected at the top |
|---|---|---|---|
| DO-178C | Avionics | DAL A | Statement + Decision + MC/DC |
| ISO 26262 | Automotive | ASIL D | Statement + Branch + MC/DC |
| IEC 61508 | Industrial | SIL 4 | Statement + Branch + MC/DC |
The pattern is consistent: lower assurance levels are satisfied by statement and branch coverage, while the most critical level in each standard calls for MC/DC. The tool you use to collect that evidence must itself be trusted — which is its own subject. See ISO 26262 Tool Confidence Levels, Demystified for how a coverage tool earns the confidence to be used in an ASIL-D workflow.
In a certification context, structural coverage must be collected from code built with the same compiler and options as the delivered software. Coverage measured on a convenient host build, with different optimization or instrumentation, is not the coverage that flies.
Setting sensible coverage targets
Because coverage is so easy to game, the way you set targets matters as much as the number itself. A team told "hit 100%" will write tests that touch lines without checking behaviour — and end up with a worse suite than one aiming for 80% with real assertions. A few principles keep targets honest:
- Write requirements-based tests first. Derive tests from what the code is supposed to do, then use coverage to reveal what those tests missed. Coverage is the gap-finder, never the test designer.
- Match the metric to the risk. Statement or branch coverage is a reasonable bar for ordinary application code; reserve condition-level and MC/DC targets for the critical, decision-dense logic where they pay off.
- Gate the change, not the average. A high bar on new and modified code beats chasing a fixed project-wide percentage that legacy code can drag in either direction.
- Treat the last few percent as analysis, not a sprint. The final uncovered lines are often defensive code or genuinely unreachable paths. Justify them deliberately rather than writing contortions to color them green.
- Never reward the number alone. A coverage figure with weak assertions behind it is worse than an honest lower figure, because it manufactures false confidence.
The bottom line
Code coverage is one of the most useful tools in testing and one of the most misread. Its power is that it makes the invisible visible — it points straight at the code your tests never reached. Its limit is that it can only ever talk about reach, never about correctness, and the single percentage on a dashboard collapses a whole ladder of metrics that prove wildly different things.
Read every coverage number with two questions: which metric is this? and are there real assertions behind it? Choose statement or branch coverage for everyday code, climb to condition-level and MC/DC where the logic is critical, write your tests from requirements first, and gate the code that's changing. Do that, and coverage becomes what it was always meant to be: not a score to defend, but a flashlight that shows you exactly where to look next.