Skip to content

Capsem Doctor

capsem-doctor is a pytest-based diagnostic suite that runs inside the guest VM. It verifies every security invariant, network isolation property, and runtime configuration that Capsem guarantees. Tests are baked into the rootfs via Dockerfile.rootfs and repacked into the initrd on every just run, so changes to test files take effect immediately without a full rootfs rebuild.

CommandWhat it does
just run "capsem-doctor"Repack initrd, build, sign, boot VM, run all tests, shut down (~10s)
capsem-doctorRun all tests (inside a running VM)
capsem-doctor -k sandboxRun only sandbox tests
capsem-doctor -k "network and not throughput"Run network tests excluding throughput
capsem-doctor -xStop on first failure
FileTestsWhat it verifies
test_sandbox.py36Clock sync, filesystem isolation (squashfs immutability, overlay config, ephemeral writes, writable mounts), guest binary security (read-only, executable), no setuid/setgid, kernel hardening (no modules, no /dev/mem, no /dev/port, no /proc/kcore, no debugfs, no IPv6, no kallsyms, seccomp available), kernel cmdline hardening (ro, init_on_alloc, slab_nomerge, page_alloc.shuffle), network isolation (dummy0, fake DNS, iptables redirect, net-proxy running, allowed/denied domains, no real NICs), process integrity (pty-agent, dnsmasq running, no systemd/sshd/cron), swap mode validation, loopback interface
test_network.py24Layered L1-L7 network verification: L1 guest plumbing (dummy0 IP, dnsmasq, multi-domain DNS, iptables redirect), L2 net-proxy (TCP 10443 listener, 443 redirect, vsock byte delivery), L3 TLS handshake (MITM proxy termination, Capsem CA cert verification), L4 HTTP over MITM (curl with skip-verify, verbose diagnostics), L5 CA trust chain (cert file exists, system bundle, certifi bundle, curl without -k, Python urllib TLS, CA env vars), L6 policy enforcement (denied domains, POST to random domains, AI provider blocking, HTTP port 80 blocked, non-standard ports, direct IP), L7 proxy download throughput
test_environment.py18Env vars (TERM, HOME, PATH, VIRTUAL_ENV), shell is bash, kernel version (Linux 6.x), aarch64 architecture, mount points (/proc, /sys, /dev, /dev/pts), filesystem layout (overlay root, writable /root, writable /tmp, VirtioFS kernel support), boot performance (under 1s total, XSS rejection in timing data)
test_runtimes.py11Dev runtime versions (python3, node, npm, pip3, uv, git), package installation (pip install, uv pip install, uv add, npm install -g, npm install local, apt-get install), tmux, Python/Node execution with file I/O, git init/commit workflow
test_utilities.py1Availability of 39 unix utilities via parametrization: system inspection (df, ps, free, lsof, find, grep, sed, awk, less, file, tar, strace, lsblk, mount, id, hostname, uname, uptime, dmesg, vim, du), core file ops (cat, cp, mv, rm, mkdir, chmod, touch, ln), text processing (sort, uniq, wc, cut, tr, diff, tee, xargs), network/shell (curl, ip, bash, env), benchmarks (capsem-bench)
test_workflows.py5File I/O patterns: text write/read, JSON roundtrip (Python + Node), shell pipes, large file (10MB) write and verify
test_ai_cli.py12AI CLI binaries installed (claude, gemini, codex), PATH configuration (/opt/ai-clis/bin in PATH, no stale .npm-global), npm prefix, login shell visibility, —help execution without runtime errors, Gemini configuration (API key handling, settings.json, projects.json, trustedFolders.json, installation_id), Google AI domain reachability
test_virtiofs.py9VirtioFS storage mode (skipped in block mode): VirtioFS root mount, ext4 loopback overlay upper, loop device active on rootfs.img, workspace write/read/large file/subdirectory, system overlay writable, pip install through overlay, file delete and recreate
test_mcp.py91MCP gateway: binary exists, JSON-RPC initialize handshake, tools/list (fetch_http, grep_http, http_headers with descriptions, input schemas, annotations), tool invocation (allowed/blocked domains, real content verification, subpath fetch, raw HTML mode, grep pattern matching, pagination, headers), error handling (unknown tool, missing URL, invalid URL), Claude/Gemini/Codex MCP server configuration, file tools (list_changed_files, revert_file, snapshots_create/delete), snapshots CLI (create, list, changes, revert), snapshot scenarios (multi-version history, revert to specific checkpoint, delete and restore, auto-pick latest, path prefix handling, multi-file snapshots), bug regression tests (changes vs previous, triple snap unchanged status, sequential history, delete-recreate), compact/merge operations
test_injection.py11Data-driven injection verification from host manifest: env vars present in login shell with correct values, no empty env vars, boot files exist with correct permissions and non-empty content, .git-credentials format and permissions, .gitconfig credential helper, git credential fill, GitHub CLI (GH_TOKEN env var, gh auth status)

The shared test configuration in conftest.py provides:

  • Auto-skip outside the VM: pytest_ignore_collect checks os.geteuid() == 0 and os.access("/root", os.W_OK). Tests are silently skipped when run on the host or in CI.
  • run(cmd, timeout=10): Shell command helper returning CompletedProcess. All tests use this instead of calling subprocess directly.
  • output_dir fixture: Returns /root/tests (created automatically via autouse fixture). Tests that write temp files use this shared directory.

test_network.py orders tests from L1 (guest plumbing) through L7 (throughput) so that a failure at a lower layer immediately pinpoints the root cause. If L2 (net-proxy TCP) fails, there is no point debugging L4 (HTTP over MITM) — the proxy is not listening. This structure eliminates cascading false failures.

Several tests use @pytest.mark.parametrize to cover lists of items with a single test function:

  • Domain lists: test_dns_all_resolve_to_local checks 5 domains, test_ai_provider_domain_blocked checks 2 AI providers
  • Env vars: test_ca_env_var_set checks 3 CA-related environment variables
  • Binaries: test_ai_cli_installed, test_ai_cli_in_login_shell, test_ai_cli_help each check 3 AI CLIs
  • Runtimes: test_runtime_version checks 6 dev tools
  • Utilities: test_utility_available checks 39 unix utilities
  • Writable paths: test_writable_mounts checks 5 paths

The test_sandbox.py file also uses a fixture-based parametrization pattern for guest binary paths, yielding each existing binary path to test_guest_binary_not_writable and test_guest_binary_executable.

  1. Add test functions to the appropriate guest/artifacts/diagnostics/test_<category>.py file, or create a new test_<category>.py.
  2. Use from conftest import run for shell commands and the output_dir fixture for temp files.
  3. Tests auto-skip outside the capsem VM — conftest checks for root user with writable /root.
  4. Run just run "capsem-doctor" to test. Initrd repacking picks up modified diagnostics/ files automatically.
  5. For new rootfs-level changes (packages, configs), run just build-assets instead.