Code Coverage on Bare-Metal Targets Without a File System

The first time most engineers try to collect code coverage on a bare-metal target, the build instruments cleanly, the tests run, and then — nothing. No report appears, because the tooling was quietly waiting for a file that was never written. The default coverage runtime that ships with GCC, gcov, ends a run by calling fopen() and fwrite() to drop .gcda files onto a file system. On a desktop that file system is always there. On a Cortex-M with 32 KB of RAM and no disk, it is not.

This is the central friction of measuring coverage on deeply embedded software: the instrumentation works fine — counters increment in RAM exactly as they would on a host — but the delivery step assumes infrastructure the target does not have. The good news is that the counter data is just bytes, and embedded systems already have several ways to push bytes off a chip. This article walks through why coverage breaks on bare metal, how the data actually flows, the transports you can use to get it off the device, the overhead to watch for, and how the report gets rebuilt on the host.

The trap in one line

Instrumentation that works perfectly on the host can produce zero coverage output on the target — not because nothing ran, but because the runtime tried to write a file to a file system that isn't there.

Why coverage breaks on bare metal

Coverage instrumentation adds a counter for each statement, branch, or condition you want to track. Every time control passes through that point, the counter increments. That part is portable — it is just arithmetic on a chunk of RAM. The problem is what happens at the end of the run, when those counters need to be persisted.

With gcov, the persistence step is triggered when the program exits. A profiling hook flushes every translation unit's counters into a .gcda file next to the corresponding .gcno file produced at compile time. That single design decision assumes three things that bare-metal targets routinely lack:

A file system. No fopen(), no path, nowhere to write. Many MCUs have no block storage at all, and where flash exists it is often the program memory itself.
A clean program exit. Embedded firmware frequently never returns from main() — it runs an infinite super-loop or hands control to an RTOS scheduler that never terminates. The exit-time flush hook is simply never called.
Spare RAM and flash. Coverage counters and the C library plumbing for buffered file I/O can dwarf the memory budget of a small device, and the linked-in stdio can blow the flash budget too.

So the failure is not in measuring coverage — it is in the unstated contract that the measurements can be written to disk and that the program will reach a tidy end. Break either assumption and the standard flow falls apart.

How coverage data normally flows

It helps to separate coverage into two distinct phases, because only one of them is the issue.

Phase one — accumulation. While the tests execute, instrumented code bumps counters held in a RAM region. This is fast, deterministic, and identical to what happens on a host. Nothing about it requires a file system; it is just memory being updated in place.

Phase two — extraction. When the run is done, those counters have to leave the device so a host tool can map them back to source lines and decisions. On a desktop this is a file write. On bare metal it has to become a transport problem: the raw counter buffer must travel down some wire — a debug pin, a serial line, a JTAG read — to a host that captures it.

On bare metal, coverage stops being a file-system problem and becomes a transport problem: the counters are fine; getting them off the chip is the work.

Reframing extraction as transport is the key move. Once you accept that the counters simply need to be streamed out as bytes, the question becomes which of the device's existing communication paths to borrow — and embedded targets typically have more than one.

Transports for getting data off the device

Every coverage-on-target strategy comes down to picking a channel that already exists on the board, hooking the counter buffer to it, and capturing the stream on the host. The four common options trade speed, intrusiveness, and hardware requirements against each other.

SWO / ITM trace

On Arm Cortex-M parts, the Instrumentation Trace Macrocell (ITM) lets firmware write bytes to a stimulus port that the core emits over the Serial Wire Output (SWO) pin. A debug probe captures the SWO stream on the host. It is a one-way, low-cost channel that piggybacks on the debug connection you already use, so no extra application UART or RAM mailbox is needed. Throughput is bounded by the SWO clock, so it suits modest counter volumes flushed at the end of a run rather than continuous high-rate streaming. Reach for it when you have a Cortex-M with SWO routed to the probe and want minimal application-side plumbing.

SEGGER RTT

Real-Time Transfer (RTT) uses a small ring buffer in the target's RAM that the debug probe reads directly through the existing JTAG/SWD link while the core keeps running. Because the probe pulls the data rather than the CPU pushing it byte-by-byte, RTT is fast and only lightly intrusive — the firmware just copies counters into the buffer. It needs a compatible probe and a few hundred bytes to a few kilobytes of RAM for the buffer. Use it when SWO bandwidth is too tight or you want higher throughput without dedicating an application peripheral.

Semihosting & UART

Semihosting lets the target issue I/O requests that the attached debugger services on the host — effectively borrowing the host's file system through the debug link. It is convenient and needs no spare peripheral, but each request halts the core, which makes it slow and intrusive; it is best for one-shot end-of-run dumps, not anything timing-sensitive. A plain UART is the lowest-common-denominator alternative: if the board has a free serial port, stream the counter bytes out of it and capture them with a terminal on the host. UART needs an available port and adds modest code, but works on almost anything, including parts with no trace unit at all.

RAM buffer dump over JTAG

The most transport-agnostic option keeps the counters in a known, linker-placed RAM region and does nothing else on the device. After the tests finish, the host halts the core over JTAG/SWD and reads that region's bytes straight out of memory — no on-target I/O code, no peripheral, essentially zero runtime overhead beyond the counters themselves. It needs a JTAG/SWD connection and enough RAM to hold the full counter set at once. This is the go-to when code space is razor-thin, when there is no usable trace or serial channel, or when you want the lightest possible footprint on the firmware.

Transport	Speed	Intrusiveness	Needs	Reach for it when…
SWO / ITM	Moderate	Low	Cortex-M + SWO pin + probe	You want minimal app plumbing on Arm-M
SEGGER RTT	High	Low	Compatible probe + small RAM buffer	You need throughput without a spare peripheral
Semihosting	Low	High	Debugger attached	Quick one-shot dumps; timing not critical
UART	Moderate	Moderate	A free serial port	No trace unit; you need a universal fallback
RAM dump / JTAG	n/a (post-run)	Minimal	JTAG/SWD + RAM for counters	Code space is tight; lightest footprint wins

Memory and timing overhead

None of this is free, and on a constrained part the budget is the whole game. Two costs matter, and they are separate.

RAM for counters. Every instrumented point needs storage. Statement coverage is the cheapest; condition and MC/DC coverage track more points and so cost more RAM. On a part with a few kilobytes free, instrumenting the entire image at once may not fit — which is why the counter region's size has to be a planned number, not an afterthought.

Cycles for streaming. Pushing bytes out over SWO, a UART, or semihosting consumes CPU time and can perturb timing-sensitive code. A RAM-dump-over-JTAG approach sidesteps most of this by doing the extraction after the run, when the host reads memory directly. If you must stream live, keep the per-flush work bounded and predictable.

The most effective lever is selective instrumentation: instrument only the modules under test rather than the entire firmware image. That shrinks both the counter RAM and the volume of data to extract, and it keeps the timing impact on untouched code at zero. On the tightest targets, you instrument in passes — a subset of modules per test run — and merge the results on the host.

Budget the counter region first

Size the RAM region that holds your coverage counters before you pick a transport. If the full instrumented image won't fit, instrument selectively or in passes and merge on the host — don't discover the shortfall after the build.

Reconstructing the report on the host

The bytes that arrive on the host are raw — just counter values with no notion of which source line or decision they belong to. The mapping lives in the compile-time metadata generated alongside the instrumented build (for the gcov model, the .gcno graph files). A host-side tool joins the two: it takes the raw counters captured from the transport and the structural map from build time, and reconstructs source-level coverage — which statements ran, which branches took both directions, which conditions were exercised.

This split is what makes the whole approach practical. The device does the absolute minimum — increment counters, optionally copy them to a buffer — while all the heavy lifting of demangling, mapping, merging multiple runs, and rendering a human-readable report happens on the host, where memory and CPU are abundant. The target never has to know what a percentage is.

How RKTracer handles file-system-less targets

RKTracer is built for exactly this case. It runs on embedded targets with or without a file system, and on a target that has no file system, the coverage data streams out over the existing debug transport — SWO, JTAG or UART — or it is held in a RAM buffer that the host reads back. There is no dependency on disk, on a clean program exit, or on a heavyweight C library to be present.

Because RKTracer makes no source changes and instruments by prefixing your existing build command, the firmware you measure is built with the same cross-compiler that builds the firmware you ship — RKTracer auto-detects the cross-compiler, so the coverage you collect reflects the real target image, not a convenient host build. It measures statement, decision, condition, MC/DC and multi-condition coverage, and keeps the on-target footprint low, which matters when every kilobyte is spoken for. Once the counters reach the host, rkresults renders the report as HTML or XML for review and CI gating. For the broader picture of validating constrained devices, see our guide to embedded system testing.

terminal — coverage off a file-system-less target

# Prefix your normal cross build — no source edits
$ rktracer make firmware

  compiler: arm-none-eabi-gcc 12.2 (auto-detected)
  ✓ instrumented 38 modules — source unmodified

$ # run tests on target; counters stream out over SWO/JTAG
$ rkresults --report html

  ✓ Statement 100%
  ✓ Decision  97%
  ✓ Condition 94%  (no file system — RAM buffer read back)

On a target with no disk, RKTracer streams counters out over the debug link or a RAM buffer and rebuilds the report on the host.

As an ISO 9001 vendor, RKValidate develops RKTracer under a documented quality process — the kind of pedigree that matters when the coverage evidence has to stand up in a safety-critical program.

What to remember

Coverage instrumentation works on bare metal — it's the file-write at the end that fails.
Reframe extraction as a transport problem: SWO/ITM, RTT, semihosting/UART, or a RAM dump over JTAG.
Budget counter RAM up front; instrument selectively to bound RAM and timing cost.
The host joins raw counters to build-time metadata to rebuild source-level coverage.
RKTracer streams coverage off file-system-less targets over the existing debug link or a RAM buffer.

The bottom line

Measuring coverage on a target without a file system feels impossible only until you separate the two halves of the problem. Accumulation is portable; it is just counters in RAM. Extraction is a transport choice, and your board almost certainly already has a usable channel — a trace pin, a debug probe, a serial port, or a JTAG link to read RAM directly. Pick the one that fits your speed, intrusiveness, and footprint budget, capture the bytes on the host, and let the host rebuild the report.

Do that, and the coverage you report is the coverage of the firmware that actually ships — collected on the real silicon, not approximated on a host that never sees the target's compiler, its timing, or its constraints. That is the difference between a number that looks good and a number you can defend.

Sanjay Iyer

Embedded Systems, RKValidate

Sanjay works with embedded and firmware teams bringing structural coverage to resource-constrained targets — from bare-metal MCUs to RTOS-based devices.

Measuring Coverage on Targets Without a File System

Why coverage breaks on bare metal

How coverage data normally flows

Transports for getting data off the device

SWO / ITM trace

SEGGER RTT

Semihosting & UART

RAM buffer dump over JTAG

Memory and timing overhead

Reconstructing the report on the host

How RKTracer handles file-system-less targets

What to remember

The bottom line

Cross-Compiler Coverage: Why Host Numbers Lie

MC/DC Explained for DO-178C

Everything You Need to Know About Code Coverage

Coverage on the target that actually ships