Capsem Doctor
capsem-doctor is a pytest-based diagnostic suite that runs inside the guest VM. It verifies every security invariant, network isolation property, and runtime configuration that Capsem guarantees. Tests are baked into the rootfs via Dockerfile.rootfs and repacked into the initrd on every just run, so changes to test files take effect immediately without a full rootfs rebuild.
Running Diagnostics
Section titled “Running Diagnostics”| Command | What it does |
|---|---|
just run "capsem-doctor" | Repack initrd, build, sign, boot VM, run all tests, shut down (~10s) |
capsem-doctor | Run all tests (inside a running VM) |
capsem-doctor -k sandbox | Run only sandbox tests |
capsem-doctor -k "network and not throughput" | Run network tests excluding throughput |
capsem-doctor -x | Stop on first failure |
Test Categories
Section titled “Test Categories”| File | Tests | What it verifies |
|---|---|---|
test_sandbox.py | 36 | Clock sync, filesystem isolation (squashfs immutability, overlay config, ephemeral writes, writable mounts), guest binary security (read-only, executable), no setuid/setgid, kernel hardening (no modules, no /dev/mem, no /dev/port, no /proc/kcore, no debugfs, no IPv6, no kallsyms, seccomp available), kernel cmdline hardening (ro, init_on_alloc, slab_nomerge, page_alloc.shuffle), network isolation (dummy0, fake DNS, iptables redirect, net-proxy running, allowed/denied domains, no real NICs), process integrity (pty-agent, dnsmasq running, no systemd/sshd/cron), swap mode validation, loopback interface |
test_network.py | 24 | Layered L1-L7 network verification: L1 guest plumbing (dummy0 IP, dnsmasq, multi-domain DNS, iptables redirect), L2 net-proxy (TCP 10443 listener, 443 redirect, vsock byte delivery), L3 TLS handshake (MITM proxy termination, Capsem CA cert verification), L4 HTTP over MITM (curl with skip-verify, verbose diagnostics), L5 CA trust chain (cert file exists, system bundle, certifi bundle, curl without -k, Python urllib TLS, CA env vars), L6 policy enforcement (denied domains, POST to random domains, AI provider blocking, HTTP port 80 blocked, non-standard ports, direct IP), L7 proxy download throughput |
test_environment.py | 18 | Env vars (TERM, HOME, PATH, VIRTUAL_ENV), shell is bash, kernel version (Linux 6.x), aarch64 architecture, mount points (/proc, /sys, /dev, /dev/pts), filesystem layout (overlay root, writable /root, writable /tmp, VirtioFS kernel support), boot performance (under 1s total, XSS rejection in timing data) |
test_runtimes.py | 11 | Dev runtime versions (python3, node, npm, pip3, uv, git), package installation (pip install, uv pip install, uv add, npm install -g, npm install local, apt-get install), tmux, Python/Node execution with file I/O, git init/commit workflow |
test_utilities.py | 1 | Availability of 39 unix utilities via parametrization: system inspection (df, ps, free, lsof, find, grep, sed, awk, less, file, tar, strace, lsblk, mount, id, hostname, uname, uptime, dmesg, vim, du), core file ops (cat, cp, mv, rm, mkdir, chmod, touch, ln), text processing (sort, uniq, wc, cut, tr, diff, tee, xargs), network/shell (curl, ip, bash, env), benchmarks (capsem-bench) |
test_workflows.py | 5 | File I/O patterns: text write/read, JSON roundtrip (Python + Node), shell pipes, large file (10MB) write and verify |
test_ai_cli.py | 12 | AI CLI binaries installed (claude, gemini, codex), PATH configuration (/opt/ai-clis/bin in PATH, no stale .npm-global), npm prefix, login shell visibility, —help execution without runtime errors, Gemini configuration (API key handling, settings.json, projects.json, trustedFolders.json, installation_id), Google AI domain reachability |
test_virtiofs.py | 9 | VirtioFS storage mode (skipped in block mode): VirtioFS root mount, ext4 loopback overlay upper, loop device active on rootfs.img, workspace write/read/large file/subdirectory, system overlay writable, pip install through overlay, file delete and recreate |
test_mcp.py | 91 | MCP gateway: binary exists, JSON-RPC initialize handshake, tools/list (fetch_http, grep_http, http_headers with descriptions, input schemas, annotations), tool invocation (allowed/blocked domains, real content verification, subpath fetch, raw HTML mode, grep pattern matching, pagination, headers), error handling (unknown tool, missing URL, invalid URL), Claude/Gemini/Codex MCP server configuration, file tools (list_changed_files, revert_file, snapshots_create/delete), snapshots CLI (create, list, changes, revert), snapshot scenarios (multi-version history, revert to specific checkpoint, delete and restore, auto-pick latest, path prefix handling, multi-file snapshots), bug regression tests (changes vs previous, triple snap unchanged status, sequential history, delete-recreate), compact/merge operations |
test_injection.py | 11 | Data-driven injection verification from host manifest: env vars present in login shell with correct values, no empty env vars, boot files exist with correct permissions and non-empty content, .git-credentials format and permissions, .gitconfig credential helper, git credential fill, GitHub CLI (GH_TOKEN env var, gh auth status) |
Test Infrastructure
Section titled “Test Infrastructure”conftest.py
Section titled “conftest.py”The shared test configuration in conftest.py provides:
- Auto-skip outside the VM:
pytest_ignore_collectchecksos.geteuid() == 0andos.access("/root", os.W_OK). Tests are silently skipped when run on the host or in CI. run(cmd, timeout=10): Shell command helper returningCompletedProcess. All tests use this instead of callingsubprocessdirectly.output_dirfixture: Returns/root/tests(created automatically viaautousefixture). Tests that write temp files use this shared directory.
Layered Testing
Section titled “Layered Testing”test_network.py orders tests from L1 (guest plumbing) through L7 (throughput) so that a failure at a lower layer immediately pinpoints the root cause. If L2 (net-proxy TCP) fails, there is no point debugging L4 (HTTP over MITM) — the proxy is not listening. This structure eliminates cascading false failures.
Parametrization
Section titled “Parametrization”Several tests use @pytest.mark.parametrize to cover lists of items with a single test function:
- Domain lists:
test_dns_all_resolve_to_localchecks 5 domains,test_ai_provider_domain_blockedchecks 2 AI providers - Env vars:
test_ca_env_var_setchecks 3 CA-related environment variables - Binaries:
test_ai_cli_installed,test_ai_cli_in_login_shell,test_ai_cli_helpeach check 3 AI CLIs - Runtimes:
test_runtime_versionchecks 6 dev tools - Utilities:
test_utility_availablechecks 39 unix utilities - Writable paths:
test_writable_mountschecks 5 paths
The test_sandbox.py file also uses a fixture-based parametrization pattern for guest binary paths, yielding each existing binary path to test_guest_binary_not_writable and test_guest_binary_executable.
Adding New Tests
Section titled “Adding New Tests”- Add test functions to the appropriate
guest/artifacts/diagnostics/test_<category>.pyfile, or create a newtest_<category>.py. - Use
from conftest import runfor shell commands and theoutput_dirfixture for temp files. - Tests auto-skip outside the capsem VM — conftest checks for root user with writable
/root. - Run
just run "capsem-doctor"to test. Initrd repacking picks up modifieddiagnostics/files automatically. - For new rootfs-level changes (packages, configs), run
just build-assetsinstead.