Skip to content

Probe Testing

The GOVERN monitoring probe is a Docker container deployed alongside customer AI systems. It intercepts inference calls, captures telemetry, and emits monitoring events to the GOVERN API. Testing the probe requires a Docker environment and a local proxy setup.

What the Probe Does

The probe container runs as a sidecar or transparent proxy alongside the customer’s AI system:

  1. Intercept — All HTTP calls to AI provider APIs (OpenAI, Anthropic, Groq, etc.) pass through the probe’s proxy
  2. Capture — Request/response data is captured: model used, tokens consumed, latency, content flags
  3. Assess — Local policy rules evaluate the captured data against the customer’s governance framework
  4. Emit — Monitoring events are sent to the GOVERN API for accumulation and rollup
  5. Report — Inline response headers carry governance metadata back to the calling application

Docker Build Test

Before any probe release, verify the Docker image builds cleanly:

expressiveCode.terminalWindowFallbackTitle
cd packages/govern-probe
# Build the production image
docker build -t govern-probe:test .
# Verify the image is under 500MB (our size budget)
docker images govern-probe:test --format "{{.Size}}"
# Run the container in test mode
docker run --rm \
-e GOVERN_API_URL=http://host.docker.internal:8787 \
-e GOVERN_API_KEY=test-key-local \
-e PROBE_MODE=test \
-p 8080:8080 \
govern-probe:test

Expected output on startup:

[GOVERN Probe] v0.x.x starting
[GOVERN Probe] Proxy listening on :8080
[GOVERN Probe] API endpoint: http://host.docker.internal:8787
[GOVERN Probe] Mode: test
[GOVERN Probe] Ready

Local Proxy Test

With the probe container running, verify it intercepts and forwards requests correctly:

Configure your test client

expressiveCode.terminalWindowFallbackTitle
# Export proxy settings for your test session
export HTTPS_PROXY=http://localhost:8080
export HTTP_PROXY=http://localhost:8080
# If testing against Anthropic
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic

Send a test inference call

expressiveCode.terminalWindowFallbackTitle
curl -s -X POST "http://localhost:8080/anthropic/v1/messages" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 10,
"messages": [{ "role": "user", "content": "Say hello" }]
}' | jq .

Expected behavior:

  1. The probe intercepts the request
  2. It forwards to the real Anthropic API (or a stub in test mode)
  3. The response is returned to the caller with added governance headers:
    X-GOVERN-Assessment: pass
    X-GOVERN-EventId: evt_abc123
    X-GOVERN-PolicyFlags: []
  4. A monitoring event is emitted to the GOVERN API

Verify telemetry emission

Check that the probe emitted an event to the GOVERN API:

expressiveCode.terminalWindowFallbackTitle
# Query the local API for the emitted event
curl -s "http://localhost:8787/api/monitoring/recent" \
-H "Authorization: Bearer test-secret" | jq '.data.events[0]'

Expected event shape:

{
"id": "evt_abc123",
"systemId": "probe-test",
"eventType": "inference",
"provider": "anthropic",
"model": "claude-haiku-4-5-20251001",
"inputTokens": 12,
"outputTokens": 10,
"latencyMs": 342,
"policyResult": "pass",
"timestamp": "2026-04-12T..."
}

Telemetry Verification Checklist

For each probe release, verify these telemetry properties are correctly captured:

Accuracy checks

  • Model name matches what was requested (no normalization drift)
  • Token counts match provider response (input + output separately)
  • Latency is measured end-to-end (probe receipt → provider response)
  • Timestamp is in UTC ISO-8601 format
  • System ID comes from the GOVERN_SYSTEM_ID env var, not defaulted

Policy evaluation checks

  • policyResult is pass, flag, or block
  • policyFlags is an array (empty array [] when no flags, not null)
  • Block decisions result in the original request being rejected (HTTP 451)
  • Flag decisions pass through with the governance header set

Emission reliability checks

  • Events are emitted within 100ms of the proxied response returning
  • Failed emissions are retried (check probe logs for retry behavior)
  • After 3 failed retries, events are written to local buffer file
  • Probe does not block the proxy response waiting for emission acknowledgment

Multi-Provider Testing

The probe supports multiple AI providers. Test each before release:

expressiveCode.terminalWindowFallbackTitle
# OpenAI test
curl -s -X POST "http://localhost:8080/openai/v1/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hello"}],"max_tokens":5}'
# Groq test
curl -s -X POST "http://localhost:8080/groq/openai/v1/chat/completions" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llama3-8b-8192","messages":[{"role":"user","content":"hello"}],"max_tokens":5}'

Container Size and Performance Budget

MetricBudgetHow to measure
Image size< 500MBdocker images govern-probe:test --format "{{.Size}}"
Startup time< 3 secondstime docker run --rm govern-probe:test --dry-run
Proxy overhead< 20ms P95Run 100 requests, measure latency added vs direct
Memory at idle< 128MBdocker stats govern-probe:test --no-stream

Probe Test Automation

expressiveCode.terminalWindowFallbackTitle
# Full probe test suite (requires Docker)
cd packages/govern-probe
pnpm test:probe
# This runs:
# 1. Docker build verification
# 2. Container startup check
# 3. Proxy interception tests (10 providers)
# 4. Telemetry emission verification
# 5. Policy evaluation accuracy tests
# 6. Performance budget checks

Common Probe Failures

SymptomCauseFix
Container exits immediatelyMissing env varsCheck GOVERN_API_URL and GOVERN_API_KEY are set
Proxy returns 502Upstream provider unreachableCheck network, try stub mode
No events in GOVERN APIEmission failing silentlyCheck probe logs: docker logs <container_id>
Policy result always “pass”Policy rules not loadedVerify GOVERN_POLICY_FILE points to valid rules file
Token counts wrongProvider response parsing errorCheck probe version matches provider API version