Product
Solutions
Resources
Company
Download Trial Book a Demo
RKTracer · Host & GPU code coverage

Code coverage for GPU: host and device, CUDA and ROCm

RKTracer delivers code coverage for GPU code on the host launcher and the on-device kernel in one workflow, from statement and branch through condition, MC/DC and multi-condition. It covers CUDA on NVIDIA through the nvcc toolchain, ROCm on AMD GPUs and Bazel builds, without changing your source or build system.

No source changes · host + device in one report · HTML and XML reports

nvcc · host + device coverage
# Just prefix your existing CUDA build command
$ rktracer make all

  preprocessing host (.cpp) + device (.cu) with nvcc…
  instrumenting host + device translation units…
   build complete, no source modified

# Run your tests, then generate the report
$ ctest && rkresults --report html
   host   · MC/DC 100% · decision 100%
   device · MC/DC 97.4% · decision 100%
Why GPU code is a coverage blind spot

Your tests run the launcher. The kernel goes uncounted.

GPU and accelerator code now powers perception, sensor fusion, signal processing and on-device inference, and that logic is increasingly safety-relevant. Yet most code coverage tools only see the host side. They watch the CPU launch a kernel and then stop at the device boundary, so the branches and conditions inside __global__ and __device__ functions run untested and uncounted. That is a silent gap in the coverage picture for exactly the code that does the heaviest lifting.

Parallelism makes the gap worse. A kernel runs across thousands of threads with warp divergence and boundary tiles, so a path you saw execute once may never have run for the data that matters. Real code coverage for GPU means measuring structural coverage inside the device kernels themselves, not inferring it from a host approximation. RKTracer closes that gap by instrumenting the host launcher and the on-device kernel with one engine, then reporting both together.

Host and device, one workflow

Measure coverage on host AND device in a single run

RKTracer does not bolt device numbers onto a host report. It treats the GPU as just another target, instruments host translation units and on-GPU kernels with the same model, and produces one unified coverage report. You see exactly which kernel branches and conditions your tests reached, beside the host launcher that drove them.

  • Per-kernel statement, decision, condition and MC/DC breakdown
  • Uncovered branches in device code surfaced line by line
  • Multi-condition coverage on warp-divergent decision logic
  • One coverage standard across host, GPU, embedded and simulator
See every coverage metric & integration
Host · perception_pipeline.cpp Host
Statement100%
Decision100%
MC/DC100%

Device · detect_kernel.cu On-GPU
Statement100%
Decision100%
Condition99%
MC/DC97%
Multi-cond.95%

2
scopes covered
CUDA
device target
0
build changes
Every GPU build path, one engine

CUDA, Bazel and ROCm, into one coverage report

You only change the command you run. CUDA through nvcc and ROCm through CMake take the rktracer prefix; a Bazel build uses the dedicated single-binary rktracerbz prefix. Every path is instrumented on host and device, lands in the same unified MC/DC report, then feeds RKMCP and RKTracerGen to close the gaps.

CUDA nvcc toolchain rktracer make Bazel //... targets, no BUILD edits rktracerbz build //... ROCm HIP on AMD, via CMake rktracer cmake --build RKTracer instruments host + device no source changes every C, C++ and CUDA action Unified report host + device statement to MC/DC Close the gaps RKMCP, AI tests RKTracerGen, C/C++

CUDA and ROCm use the rktracer prefix · a Bazel build uses rktracerbz · RKTracerGen generates tests for C and C++ only.

Code coverage for CUDA · NVIDIA

CUDA host and device coverage through the nvcc toolchain

Code coverage for CUDA is where code coverage for GPU code most often falls apart, because a single source file becomes two very different programs once nvcc is done with it.

The nvcc compiler driver splits each .cu file into two parts. The host code is handed to your host compiler, whether that is g++, clang or cl, while the device code is compiled to run on the GPU itself. Those two halves are built by different compilers, run on different processors and, crucially for coverage, are usually measured by different tooling, if the device side is measured at all.

A typical GPU coverage tool only ever sees the host side. It watches the CPU prepare and launch a kernel, then stops at the device boundary, so the branches and conditions inside your kernels never get counted. RKTracer instruments both sides. During the nvcc build it preprocesses and instruments the host translation units and the device translation units together, inserting coverage probes directly into the kernels that run on the GPU.

As your tests run on the GPU, the RKTracer runtime captures coverage for the host launcher code and for the on-device kernel code in the same pass. You are not approximating device behaviour from the host: you are recording exactly which kernel statements, branches and conditions executed on the real hardware, beside the launcher that drove them.

The workflow stays the workflow you already have. You prefix the rktracer keyword to your existing CUDA build, for example rktracer make all, or your CMake or nvcc-driven build. RKTracer auto-detects nvcc and the host compiler, instruments host plus device, and links its runtime automatically. You then run your tests on the GPU and run rkresults --report html for one unified report.

That report shows host coverage and device coverage separately, each through MC/DC, so you can see exactly which kernels and which conditions executed on the device. There are no source changes and no separate device-coverage build to maintain.

  • nvcc splits each .cu file into host code and device code
  • RKTracer instruments BOTH host and device translation units
  • Probes go into the kernels that actually run on the GPU
  • Host and device coverage reported separately, each through MC/DC
  • No source changes, no separate device-coverage build

For the full step-by-step setup, see the guide on CUDA coverage on Linux, or read the deep dive on code coverage for CUDA on NVIDIA.

One .cu file, two programs

nvcc splits each source file; RKTracer instruments both halves.

detect_kernel.cu Host TU g++ / clang / cl Device TU GPU kernels one host + device report
nvcc · host + device coverage
# Prefix the rktracer keyword to your CUDA build
$ rktracer make all

  auto-detect: nvcc + host compiler (g++)
  nvcc split: detect_kernel.cu
    · host code   → host compiler
    · device code → GPU
  instrument host + device translation units
   runtime linked · source untouched

# Run tests on the GPU, then one report
$ ctest && rkresults --report html
   host   · MC/DC 100% · decision 100%
   device · MC/DC 97.4% · decision 100%
Code coverage for GPU · Bazel · rktracerbz

Code coverage for GPU builds with Bazel

Many of the largest GPU and machine-learning codebases build their CUDA kernels with Bazel. For those projects RKTracer ships a dedicated tool, rktracerbz, so you get the same code coverage for CUDA, host and device, as any other build, just by prefixing one command.

rktracerbz is a single, self-contained static binary: the rktracer enabler for Bazel. There is no Bazel plugin, no aspect and no edits to your BUILD or WORKSPACE files.

The only thing you do is prefix it to your normal Bazel command, either rktracerbz bazel build //... or the direct form rktracerbz build //..., where it calls Bazel for you. Run rktracerbz --rk-info and it prints the resolved setup: the Bazel it found, the real compilers, the jobs ceiling and the config files it loaded.

It works by playing two roles. As the launcher it starts the Bazel build; and as the compiler that Bazel invokes for each compile action it execs rktracer <real-compiler> <args>. So every C, C++ and CUDA translation unit Bazel compiles is instrumented by the rktracer pipeline automatically, at full build parallelism. It auto-detects the real gcc, g++ and nvcc (or reads them from config), passes nvcc host steps straight through, and runs the C++ and CUDA compile actions with a local strategy so the instrumentation is actually applied.

After the build you get the same unified host plus device coverage report as any other RKTracer build, through MC/DC. You then point RKMCP at the uncovered lines and decisions so your AI agent writes the missing tests, exactly as you would for a Makefile or nvcc-driven build.

  • One static binary you prefix: rktracerbz bazel build //...
  • No plugin, no aspect, no edits to BUILD or WORKSPACE files
  • Instruments every C, C++ and CUDA compile action Bazel runs
How rktracerbz fits the build

One binary plays two roles, so every compile action is traced.

bazel CompileAction rktracerbz launcher + per-action compiler rktracer g++ · nvcc · gcc real compiler, traced instrumented C / C++ / CUDA
rktracerbz · Bazel host + device coverage
# Prefix rktracerbz to your existing Bazel build. Nothing else changes.
$ rktracerbz bazel build //...
  [rktracerbz] real C  =/usr/bin/gcc
  [rktracerbz] real C++=/usr/bin/g++
  ... every C, C++ and CUDA compile action instrumented ...
$ rkresults --report html      # one unified host + device report
ROCm code coverage on AMD · CMake

The same coverage model on ROCm and AMD GPUs

CUDA is not the only accelerator stack that ships in safety-relevant systems. AMD ROCm and HIP code builds with CMake, not Bazel, so you use the standard rktracer prefix on the CMake build: configure with cmake, then run rktracer make, or prefix rktracer to cmake --build. RKTracer then instruments the HIP host and device code that runs on AMD GPUs.

HIP host + device, via CMake

ROCm and HIP build with CMake. You configure with cmake, then prefix rktracer to the build (rktracer make or rktracer cmake --build); RKTracer instruments the HIP host launcher and the on-GPU kernel and reports both in one unified model.

Full metrics on AMD kernels

ROCm and HIP device code is C and C++, so every structural metric, through MC/DC and multi-condition, applies to the kernels just as it does on the host.

Bazel vs CMake, one bar

Bazel-built GPU and CUDA projects use the dedicated rktracerbz binary; CUDA built with make or nvcc and ROCm built with CMake use the normal rktracer prefix. Either way you get the same reports, gates and CI hooks.

ROCm appears alongside CUDA on the RKTracer features and metrics page. Tell us your exact GPU and ROCm version and we will confirm support for your setup.

How it works

From GPU source to one unified coverage report

Preprocess and instrument host + device

RKTracer hooks your existing CUDA or ROCm build, preprocesses each source file with the same compiler, and adds low-overhead probes to both host translation units and on-GPU kernels. Your source is never rewritten.

Run your host + device tests

Execute the test suite you already have. Launchers run on the CPU, kernels run on the real GPU, exactly as they do in production. Coverage is captured live.

Collect per-kernel hits

Coverage data is collected from the device after each kernel launch and merged with host results into a single, consistent model, so host and device are never out of sync.

Report & gate in CI

Export HTML and XML coverage, through MC/DC, color-coded green, yellow and red to the line, and fail the build below threshold so the gap cannot slip through unnoticed.

pipeline.yml · host + device gate
# Drop GPU coverage into any CI pipeline
steps:
  - build:   rktracer make
  - test:    ctest --output-on-failure
  - cover:   rkresults --scope host,device
  - report:  rkresults --report html,xml,sonar
  - gate:    rkresults --device-mcdc 95 --decision 100
  ✓ pipeline passed · host + device evidence archived
Jenkins
Azure DevOps
GitLab CI
SonarQube
HTML / XML
Bamboo
Every metric, through MC/DC

Full structural coverage on the C and C++ side

CUDA and ROCm device code is C and C++, so the complete set of structural metrics applies to the kernels themselves, not just the host.

Function, File & Line

Know exactly which kernels, files and lines your tests execute on host and device.

Statement & Branch/Decision

Confirm every statement runs and both outcomes of each device decision are tested.

Condition & MC/DC

The strongest structural metrics, applied to warp-divergent kernel logic. MC/DC is supported for C and C++.

Multi-Condition & Delta

Go beyond MC/DC, and focus coverage on exactly which kernels changed each build.

Coverage metricReferenced by
StatementDO-178C Level C · ISO 26262 ASIL A
Decision / BranchDO-178C Level B · ASIL B/C
MC/DCDO-178C Level A · ASIL D
Multi-ConditionDeep verification · IEC 61508 SIL 4

RKValidate is an ISO 9001 quality-certified vendor. The standards above are shown only as context. RKTracer measures the coverage metrics these standards reference; it does not itself carry a functional-safety certification.

CI and reports

Host and device evidence, in every pipeline

Reports

  • HTML reports, source-level detail
  • Green, yellow, red line coloring
  • XML for dashboards and gates
  • Merge reports across multiple runs

CI systems

  • Jenkins, Azure DevOps, GitLab CI
  • Publishes coverage to SonarQube
  • Separate host and device thresholds
  • Fail the build below your target

Scope control

  • Folder include and exclude
  • Function-level instrument and ignore
  • File-type filtering
  • Configured via rktracer.config

See the documentation for CI recipes, or read how RKTracer works under the hood.

From coverage to tests

Coverage finds the gaps. Two ways to close them.

RKTracer pinpoints the uncovered lines, decisions and MC/DC conditions in your kernels and launchers. Turning those gaps into tests is your choice, and both options are part of the RKTracer tool.

The Bazel flow joins here too: once rktracerbz has instrumented your build and you have host plus device coverage, RKMCP serves those uncovered lines and decisions to your AI agent over MCP, exactly as it does for a make or nvcc build. Nothing about the gap-to-test step is Bazel-specific.

RKTracer itself measures coverage; it does not generate tests. The two tools below do. Because CUDA and ROCm code is C and C++, the offline generator applies to it.

See the full code coverage tool

RKMCP · AI-assisted

An MCP server that streams the uncovered code to your AI agent as JSON-RPC. The agent writes unit and functional tests plus the build, then runs and re-checks until the gaps close.

RKTracerGen · offline, C and C++ only

A deterministic, fully offline unit-test generator for C and C++ only. Boundary-Value Analysis, a real-run oracle, managed stubs and a standalone Makefile. No AI, no tokens, no network.

FAQ

Host & GPU coverage questions, answered

GPU code coverage measures how much of your accelerator code your tests actually execute, including the branches and conditions inside device kernels rather than just the host launcher. RKTracer measures GPU code coverage on host and device together, from statement and branch through condition, MC/DC and multi-condition, for CUDA on NVIDIA and ROCm on AMD GPUs.
Both. RKTracer instruments host translation units and on-GPU __global__ and __device__ code, then collects per-kernel hit data from the device. You get one unified report showing structural coverage, through MC/DC, for the launcher and the kernel side by side, instead of host coverage with a device-shaped blind spot.
Yes. RKTracer covers CUDA on NVIDIA GPUs through the nvcc toolchain and ROCm on AMD GPUs. Because it hooks your existing build rather than replacing your compiler, it fits the toolchain you already ship with on either stack. It targets discrete data-center accelerators and embedded GPUs such as Jetson-class modules.
Function, file, statement, branch and decision, condition, MC/DC, multi-condition and delta (changed-code) coverage. CUDA and ROCm device code is C and C++, so the full set of C/C++ structural metrics, including MC/DC, applies to the kernels themselves, not just the host launcher.
No. On Linux you prefix your build with the rktracer keyword, for example rktracer make. On Windows you integrate with Visual Studio using rktracer -vs -integrate, then clean and rebuild. RKTracer preprocesses with your existing compiler, instruments at compile time and adds its runtime automatically. There are no source edits.
Prefix rktracerbz to your Bazel build, for example rktracerbz bazel build //.... It instruments every C, C++ and CUDA compile action that Bazel runs, with no BUILD-file changes (no plugin and no aspect). You then use RKMCP on the uncovered lines and decisions so your AI agent writes the missing tests.
RKTracer is a code coverage tool and does not generate tests itself. It pinpoints the uncovered lines, decisions and MC/DC conditions in your kernels. To close those gaps you can use RKMCP, which serves the gaps to your AI agent over MCP, or RKTracerGen, a deterministic offline unit-test generator for C and C++ only. Because CUDA and ROCm code is C and C++, RKTracerGen applies to it.

Close the device-code blind spot in your coverage

Download the free 30-day trial and measure host + device GPU code coverage on your own CUDA or ROCm workload today, or book a demo with an engineer. Questions first? Talk to us.