Skip to content

Operational Runbooks

This page contains operational runbooks for the GOVERN platform. Each runbook is a step-by-step procedure for a specific operational scenario.

Runbook: Deploy to Production

When to use: Releasing a new version of a GOVERN component to production. Prerequisites: Gate II (V(Q) >= 0.85) and Gate IV (5-point polish) must both be open.

Step 1 — Verify gates are open

expressiveCode.terminalWindowFallbackTitle
# Run the full QA score check
pnpm run qa:score
# Expected output: V(Q) >= 0.85 for all components being deployed

Do not proceed if any gate shows BLOCKED.

Step 2 — Tag the release

expressiveCode.terminalWindowFallbackTitle
# Create a semver release tag
git tag v0.12.0 -m "Release v0.12.0 — [brief description]"
git push origin v0.12.0

Step 3 — Deploy the API Gateway

expressiveCode.terminalWindowFallbackTitle
cd packages/api-gateway
# Deploy to production Workers
npx wrangler deploy --env production
# Verify health immediately after deploy
curl https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq .
# Expected: { "status": "ok" }

Step 4 — Deploy frontend packages

expressiveCode.terminalWindowFallbackTitle
# Build all frontend packages
pnpm build
# Deploy Customer Dashboard (Cloudflare Pages)
npx wrangler pages deploy packages/govern-app/dist \
--project-name=govern-app \
--branch=main
# Deploy Internal Dashboard
npx wrangler pages deploy packages/govern-dashboard/dist \
--project-name=govern-dashboard \
--branch=main

Step 5 — Post-deploy verification

expressiveCode.terminalWindowFallbackTitle
# Wait 60 seconds for health checks to run
sleep 60
# Check deploy watchdog status
curl "$JARVIS_API_URL/api/monitoring/deploys/health" \
-H "Authorization: Bearer $AUTH_SECRET" | jq '.targets'
# All targets should show "healthy"

Step 6 — Emit deploy build event

expressiveCode.terminalWindowFallbackTitle
curl -s -X POST "$JARVIS_API_URL/api/build-events" \
-H "Authorization: Bearer $AUTH_SECRET" \
-H "Content-Type: application/json" \
-d '{
"type": "deploy",
"archetypeIds": ["jarvis"],
"skillsExercised": ["deployment-orchestration"],
"description": "Production deploy: v0.12.0",
"quality": 1.0,
"metadata": { "version": "v0.12.0", "targets": ["api-gateway", "govern-app"] }
}'

Runbook: Emergency Rollback

When to use: A deploy has caused production degradation. Health checks failing, error rate spiking, or V(Q) dropped significantly.

Step 1 — Identify the last good deploy

expressiveCode.terminalWindowFallbackTitle
curl "$JARVIS_API_URL/api/monitoring/deploys?status=success&limit=10" \
-H "Authorization: Bearer $AUTH_SECRET" | jq '[.[] | {id, version, completedAt, vqScoreAfter}]'

Step 2 — Roll back API Gateway

expressiveCode.terminalWindowFallbackTitle
# Find the last good commit hash from the deploy record
GOOD_COMMIT=<hash from deploy record>
# Check out the good commit
git checkout $GOOD_COMMIT
# Deploy immediately
cd packages/api-gateway
npx wrangler deploy --env production

Step 3 — Roll back Cloudflare Pages

For Pages deployments, use the Cloudflare dashboard:

  1. Go to dash.cloudflare.com → Pages → govern-app
  2. Click “Deployments”
  3. Find the last successful deployment before the problematic one
  4. Click “Roll back to this deployment”

Step 4 — Verify recovery

expressiveCode.terminalWindowFallbackTitle
# Health check
curl https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq .
# Watch for V(Q) recovery
watch -n 30 'curl -s "$JARVIS_API_URL/api/monitoring/deploys/health" \
-H "Authorization: Bearer $AUTH_SECRET" | jq ".targets"'

Step 5 — Emit rollback build event

expressiveCode.terminalWindowFallbackTitle
curl -s -X POST "$JARVIS_API_URL/api/build-events" \
-H "Authorization: Bearer $AUTH_SECRET" \
-H "Content-Type: application/json" \
-d '{
"type": "error",
"archetypeIds": ["jarvis", "alvin"],
"skillsExercised": ["diagnostic-reasoning"],
"description": "Emergency rollback: degraded deploy rolled back to $GOOD_COMMIT",
"metadata": { "rolledBackVersion": "v0.12.0", "recoveredTo": "$GOOD_COMMIT" }
}'

Runbook: Incident Response

When to use: Production is degraded, customers are reporting issues, or alerts are firing.

Severity levels

LevelCriteriaResponse time
P1Complete service outageImmediate
P2Degraded service (> 10% error rate)15 minutes
P3Single feature broken2 hours
P4Cosmetic or minor issueNext business day

P1/P2 incident procedure

Step 1 — Acknowledge

Post in #ops-incidents: “Acknowledging P[1/2] incident: [brief description]. [Your name] is on it.”

Step 2 — Diagnose

expressiveCode.terminalWindowFallbackTitle
# Check API health
curl https://jarvis-api-gateway.ben-c1f.workers.dev/health
# Check recent errors in Cloudflare Analytics
# dash.cloudflare.com → Workers & Pages → jarvis-api-gateway → Analytics
# Check Supabase status
curl https://status.supabase.com/api/v2/summary.json | jq '.status'
# Check recent deploys (was a deploy the trigger?)
curl "$JARVIS_API_URL/api/monitoring/deploys?limit=5" \
-H "Authorization: Bearer $AUTH_SECRET"

Step 3 — Contain

If a recent deploy is suspected: execute the Emergency Rollback runbook.

If no recent deploy: identify the failing component and determine if it can be isolated.

Step 4 — Resolve

Fix the root cause. Deploy the fix following the Deploy to Production runbook (even during an incident — gates still apply, but can be expedited).

Step 5 — Post-incident report

Within 24 hours, write a post-incident report covering:

  • What happened
  • Root cause
  • Impact (customers affected, duration)
  • Timeline
  • Resolution
  • Prevention measures

Post the report in #post-incident and link it from the Internal Dashboard incident log.


Runbook: Database Migration

When to use: Deploying a new Supabase migration file.

Prerequisites: Migration file has been reviewed, tested locally, and Gate II is open.

expressiveCode.terminalWindowFallbackTitle
# Apply migration to production Supabase
cd Chairman-Infrastructure
supabase db push
# Verify migration applied
supabase db diff --use-migra
# Check expected: no diff between migration files and production schema
# Verify RLS policies are correct (see Database Operations runbook)

Runbook: Wrangler Secrets Rotation

When to use: Rotating API keys, auth tokens, or other secrets stored in Wrangler.

expressiveCode.terminalWindowFallbackTitle
# List current secrets
wrangler secret list --env production
# Rotate a secret
echo "NEW_SECRET_VALUE" | wrangler secret put SECRET_NAME --env production
# Verify the worker picked up the new secret (may require re-deploy)
npx wrangler deploy --env production
# Test with the new secret
curl https://jarvis-api-gateway.ben-c1f.workers.dev/health \
-H "Authorization: Bearer NEW_SECRET_VALUE"

Important: After rotating AUTH_SECRET, update all CI/CD pipelines and the Internal Dashboard’s stored credentials.