Code coverage for GPU: host and device, CUDA and ROCm
RKTracer delivers code coverage for GPU code on the host launcher and the on-device kernel in one workflow, from statement and branch through condition, MC/DC and multi-condition. It covers CUDA on NVIDIA through the nvcc toolchain, ROCm on AMD GPUs and Bazel builds, without changing your source or build system.
No source changes · host + device in one report · HTML and XML reports
# Just prefix your existing CUDA build command $ rktracer make all preprocessing host (.cpp) + device (.cu) with nvcc… instrumenting host + device translation units… ✓ build complete, no source modified # Run your tests, then generate the report $ ctest && rkresults --report html ✓ host · MC/DC 100% · decision 100% ✓ device · MC/DC 97.4% · decision 100%
Your tests run the launcher. The kernel goes uncounted.
GPU and accelerator code now powers perception, sensor fusion, signal processing and on-device inference, and that logic is increasingly safety-relevant. Yet most code coverage tools only see the host side. They watch the CPU launch a kernel and then stop at the device boundary, so the branches and conditions inside __global__ and __device__ functions run untested and uncounted. That is a silent gap in the coverage picture for exactly the code that does the heaviest lifting.
Parallelism makes the gap worse. A kernel runs across thousands of threads with warp divergence and boundary tiles, so a path you saw execute once may never have run for the data that matters. Real code coverage for GPU means measuring structural coverage inside the device kernels themselves, not inferring it from a host approximation. RKTracer closes that gap by instrumenting the host launcher and the on-device kernel with one engine, then reporting both together.
Measure coverage on host AND device in a single run
RKTracer does not bolt device numbers onto a host report. It treats the GPU as just another target, instruments host translation units and on-GPU kernels with the same model, and produces one unified coverage report. You see exactly which kernel branches and conditions your tests reached, beside the host launcher that drove them.
- Per-kernel statement, decision, condition and MC/DC breakdown
- Uncovered branches in device code surfaced line by line
- Multi-condition coverage on warp-divergent decision logic
- One coverage standard across host, GPU, embedded and simulator
CUDA, Bazel and ROCm, into one coverage report
You only change the command you run. CUDA through nvcc and ROCm through CMake take the rktracer prefix; a Bazel build uses the dedicated single-binary rktracerbz prefix. Every path is instrumented on host and device, lands in the same unified MC/DC report, then feeds RKMCP and RKTracerGen to close the gaps.
CUDA and ROCm use the rktracer prefix · a Bazel build uses rktracerbz · RKTracerGen generates tests for C and C++ only.
CUDA host and device coverage through the nvcc toolchain
Code coverage for CUDA is where code coverage for GPU code most often falls apart, because a single source file becomes two very different programs once nvcc is done with it.
The nvcc compiler driver splits each .cu file into two parts. The host code is handed to your host compiler, whether that is g++, clang or cl, while the device code is compiled to run on the GPU itself. Those two halves are built by different compilers, run on different processors and, crucially for coverage, are usually measured by different tooling, if the device side is measured at all.
A typical GPU coverage tool only ever sees the host side. It watches the CPU prepare and launch a kernel, then stops at the device boundary, so the branches and conditions inside your kernels never get counted. RKTracer instruments both sides. During the nvcc build it preprocesses and instruments the host translation units and the device translation units together, inserting coverage probes directly into the kernels that run on the GPU.
As your tests run on the GPU, the RKTracer runtime captures coverage for the host launcher code and for the on-device kernel code in the same pass. You are not approximating device behaviour from the host: you are recording exactly which kernel statements, branches and conditions executed on the real hardware, beside the launcher that drove them.
The workflow stays the workflow you already have. You prefix the rktracer keyword to your existing CUDA build, for example rktracer make all, or your CMake or nvcc-driven build. RKTracer auto-detects nvcc and the host compiler, instruments host plus device, and links its runtime automatically. You then run your tests on the GPU and run rkresults --report html for one unified report.
That report shows host coverage and device coverage separately, each through MC/DC, so you can see exactly which kernels and which conditions executed on the device. There are no source changes and no separate device-coverage build to maintain.
- nvcc splits each .cu file into host code and device code
- RKTracer instruments BOTH host and device translation units
- Probes go into the kernels that actually run on the GPU
- Host and device coverage reported separately, each through MC/DC
- No source changes, no separate device-coverage build
For the full step-by-step setup, see the guide on CUDA coverage on Linux, or read the deep dive on code coverage for CUDA on NVIDIA.
nvcc splits each source file; RKTracer instruments both halves.
# Prefix the rktracer keyword to your CUDA build $ rktracer make all auto-detect: nvcc + host compiler (g++) nvcc split: detect_kernel.cu · host code → host compiler · device code → GPU instrument host + device translation units ✓ runtime linked · source untouched # Run tests on the GPU, then one report $ ctest && rkresults --report html ✓ host · MC/DC 100% · decision 100% ✓ device · MC/DC 97.4% · decision 100%
Code coverage for GPU builds with Bazel
Many of the largest GPU and machine-learning codebases build their CUDA kernels with Bazel. For those projects RKTracer ships a dedicated tool, rktracerbz, so you get the same code coverage for CUDA, host and device, as any other build, just by prefixing one command.
rktracerbz is a single, self-contained static binary: the rktracer enabler for Bazel. There is no Bazel plugin, no aspect and no edits to your BUILD or WORKSPACE files.
The only thing you do is prefix it to your normal Bazel command, either rktracerbz bazel build //... or the direct form rktracerbz build //..., where it calls Bazel for you. Run rktracerbz --rk-info and it prints the resolved setup: the Bazel it found, the real compilers, the jobs ceiling and the config files it loaded.
It works by playing two roles. As the launcher it starts the Bazel build; and as the compiler that Bazel invokes for each compile action it execs rktracer <real-compiler> <args>. So every C, C++ and CUDA translation unit Bazel compiles is instrumented by the rktracer pipeline automatically, at full build parallelism. It auto-detects the real gcc, g++ and nvcc (or reads them from config), passes nvcc host steps straight through, and runs the C++ and CUDA compile actions with a local strategy so the instrumentation is actually applied.
After the build you get the same unified host plus device coverage report as any other RKTracer build, through MC/DC. You then point RKMCP at the uncovered lines and decisions so your AI agent writes the missing tests, exactly as you would for a Makefile or nvcc-driven build.
- One static binary you prefix: rktracerbz bazel build //...
- No plugin, no aspect, no edits to BUILD or WORKSPACE files
- Instruments every C, C++ and CUDA compile action Bazel runs
One binary plays two roles, so every compile action is traced.
# Prefix rktracerbz to your existing Bazel build. Nothing else changes. $ rktracerbz bazel build //... [rktracerbz] real C =/usr/bin/gcc [rktracerbz] real C++=/usr/bin/g++ ... every C, C++ and CUDA compile action instrumented ... $ rkresults --report html # one unified host + device report
The same coverage model on ROCm and AMD GPUs
CUDA is not the only accelerator stack that ships in safety-relevant systems. AMD ROCm and HIP code builds with CMake, not Bazel, so you use the standard rktracer prefix on the CMake build: configure with cmake, then run rktracer make, or prefix rktracer to cmake --build. RKTracer then instruments the HIP host and device code that runs on AMD GPUs.
HIP host + device, via CMake
ROCm and HIP build with CMake. You configure with cmake, then prefix rktracer to the build (rktracer make or rktracer cmake --build); RKTracer instruments the HIP host launcher and the on-GPU kernel and reports both in one unified model.
Full metrics on AMD kernels
ROCm and HIP device code is C and C++, so every structural metric, through MC/DC and multi-condition, applies to the kernels just as it does on the host.
Bazel vs CMake, one bar
Bazel-built GPU and CUDA projects use the dedicated rktracerbz binary; CUDA built with make or nvcc and ROCm built with CMake use the normal rktracer prefix. Either way you get the same reports, gates and CI hooks.
ROCm appears alongside CUDA on the RKTracer features and metrics page. Tell us your exact GPU and ROCm version and we will confirm support for your setup.
From GPU source to one unified coverage report
Preprocess and instrument host + device
RKTracer hooks your existing CUDA or ROCm build, preprocesses each source file with the same compiler, and adds low-overhead probes to both host translation units and on-GPU kernels. Your source is never rewritten.
Run your host + device tests
Execute the test suite you already have. Launchers run on the CPU, kernels run on the real GPU, exactly as they do in production. Coverage is captured live.
Collect per-kernel hits
Coverage data is collected from the device after each kernel launch and merged with host results into a single, consistent model, so host and device are never out of sync.
Report & gate in CI
Export HTML and XML coverage, through MC/DC, color-coded green, yellow and red to the line, and fail the build below threshold so the gap cannot slip through unnoticed.
# Drop GPU coverage into any CI pipeline steps: - build: rktracer make - test: ctest --output-on-failure - cover: rkresults --scope host,device - report: rkresults --report html,xml,sonar - gate: rkresults --device-mcdc 95 --decision 100 ✓ pipeline passed · host + device evidence archived
Full structural coverage on the C and C++ side
CUDA and ROCm device code is C and C++, so the complete set of structural metrics applies to the kernels themselves, not just the host.
Function, File & Line
Know exactly which kernels, files and lines your tests execute on host and device.
Statement & Branch/Decision
Confirm every statement runs and both outcomes of each device decision are tested.
Condition & MC/DC
The strongest structural metrics, applied to warp-divergent kernel logic. MC/DC is supported for C and C++.
Multi-Condition & Delta
Go beyond MC/DC, and focus coverage on exactly which kernels changed each build.
| Coverage metric | Referenced by |
|---|---|
| Statement | DO-178C Level C · ISO 26262 ASIL A |
| Decision / Branch | DO-178C Level B · ASIL B/C |
| MC/DC | DO-178C Level A · ASIL D |
| Multi-Condition | Deep verification · IEC 61508 SIL 4 |
RKValidate is an ISO 9001 quality-certified vendor. The standards above are shown only as context. RKTracer measures the coverage metrics these standards reference; it does not itself carry a functional-safety certification.
Host and device evidence, in every pipeline
Reports
- HTML reports, source-level detail
- Green, yellow, red line coloring
- XML for dashboards and gates
- Merge reports across multiple runs
CI systems
- Jenkins, Azure DevOps, GitLab CI
- Publishes coverage to SonarQube
- Separate host and device thresholds
- Fail the build below your target
Scope control
- Folder include and exclude
- Function-level instrument and ignore
- File-type filtering
- Configured via rktracer.config
See the documentation for CI recipes, or read how RKTracer works under the hood.
Coverage finds the gaps. Two ways to close them.
RKTracer pinpoints the uncovered lines, decisions and MC/DC conditions in your kernels and launchers. Turning those gaps into tests is your choice, and both options are part of the RKTracer tool.
The Bazel flow joins here too: once rktracerbz has instrumented your build and you have host plus device coverage, RKMCP serves those uncovered lines and decisions to your AI agent over MCP, exactly as it does for a make or nvcc build. Nothing about the gap-to-test step is Bazel-specific.
RKTracer itself measures coverage; it does not generate tests. The two tools below do. Because CUDA and ROCm code is C and C++, the offline generator applies to it.
See the full code coverage toolRKMCP · AI-assisted
An MCP server that streams the uncovered code to your AI agent as JSON-RPC. The agent writes unit and functional tests plus the build, then runs and re-checks until the gaps close.
RKTracerGen · offline, C and C++ only
A deterministic, fully offline unit-test generator for C and C++ only. Boundary-Value Analysis, a real-run oracle, managed stubs and a standalone Makefile. No AI, no tokens, no network.
Host & GPU coverage questions, answered
rktracer keyword, for example rktracer make. On Windows you integrate with Visual Studio using rktracer -vs -integrate, then clean and rebuild. RKTracer preprocesses with your existing compiler, instruments at compile time and adds its runtime automatically. There are no source edits.Close the device-code blind spot in your coverage
Download the free 30-day trial and measure host + device GPU code coverage on your own CUDA or ROCm workload today, or book a demo with an engineer. Questions first? Talk to us.