Deploy Watchdog
The Deploy Watchdog monitors every GOVERN deployment across all targets. It tracks deploy success/failure, correlates deploys with quality score changes, maintains rollback history, and alerts on failure patterns.
Deploy Targets
GOVERN deploys to multiple targets. The watchdog monitors all of them:
| Target | Deploy mechanism | Watchdog check |
|---|---|---|
| API Gateway | wrangler deploy to Cloudflare Workers | Health endpoint poll |
| Customer Dashboard | Cloudflare Pages | Pages deployment status API |
| Internal Dashboard | Cloudflare Pages | Pages deployment status API |
| GOVERN Docs | Cloudflare Pages | Pages deployment status API |
| Supabase Migrations | supabase db push | Migration table check |
Deploy Health Indicators
For each target, the watchdog tracks:
- Last deploy time — When was the most recent successful deploy?
- Deploy success rate — % of deploys that succeeded in the last 30 days
- Average deploy duration — How long deploys take (healthy: < 3 minutes)
- Post-deploy health — Did the health check pass after the deploy?
- Quality score delta — Did V(Q) go up or down after the deploy?
Deploy Record Schema
interface DeployRecord { id: string; target: 'api-gateway' | 'customer-dashboard' | 'internal-dashboard' | 'docs'; version: string; // Git commit hash or semver tag deployedBy: string; // User ID or 'ci-automated' status: 'pending' | 'in-progress' | 'success' | 'failed' | 'rolled-back'; startedAt: string; completedAt?: string; durationMs?: number; healthCheckPassed: boolean; vqScoreBefore?: number; // V(Q) before this deploy vqScoreAfter?: number; // V(Q) after this deploy (measured 5 min post-deploy) rollbackDeployId?: string; // If this is a rollback, which deploy is it rolling back to? failureReason?: string; metadata: { commitHash: string; branch: string; changedPackages: string[]; buildDurationMs?: number; };}Deploy Watchdog API
# Recent deployscurl "$JARVIS_API_URL/api/monitoring/deploys?limit=10" \ -H "Authorization: Bearer $AUTH_SECRET" | jq .
# Deploy health summarycurl "$JARVIS_API_URL/api/monitoring/deploys/health" \ -H "Authorization: Bearer $AUTH_SECRET" | jq .
# Response:# {# "targets": {# "api-gateway": { "status": "healthy", "lastDeploy": "...", "successRate": 0.97 },# "customer-dashboard": { "status": "healthy", "lastDeploy": "...", "successRate": 1.00 },# "internal-dashboard": { "status": "degraded", "lastDeploy": "...", "successRate": 0.88 }# },# "alerts": [# { "target": "internal-dashboard", "type": "low_success_rate", "message": "..." }# ]# }Post-Deploy Health Check
Every deploy triggers an automatic health check 60 seconds after completion:
// Post-deploy health check (run in waitUntil)async function postDeployHealthCheck(deployId: string, target: DeployTarget) { await new Promise(resolve => setTimeout(resolve, 60_000));
const health = await checkTargetHealth(target);
await supabase .from('deploy_records') .update({ health_check_passed: health.passed, health_check_response_ms: health.latencyMs, }) .eq('id', deployId);
if (!health.passed) { await triggerDeployAlert({ severity: 'critical', target, deployId, message: `Post-deploy health check failed: ${health.error}`, }); }}Rollback Procedure
When a deploy fails or degrades quality, follow this rollback procedure.
Automatic rollback triggers
The watchdog auto-initiates rollback when:
- Post-deploy health check fails (health endpoint returns non-200)
- V(Q) score drops more than 0.10 within 5 minutes of deploy
- Error rate in Cloudflare Analytics exceeds 5% of requests
Manual rollback
# Identify the last good deploycurl "$JARVIS_API_URL/api/monitoring/deploys?target=api-gateway&status=success&limit=5" \ -H "Authorization: Bearer $AUTH_SECRET" | jq '.[0]'
# Roll back to a specific commitgit checkout <last-good-commit-hash>
# For Cloudflare Workerscd packages/api-gatewaywrangler deploy --env production
# For Cloudflare Pages (roll back via dashboard)# Navigate to: dash.cloudflare.com → Pages → <project> → Deployments → Roll backRollback record
Every rollback is recorded as a deploy with status: 'rolled-back' on the failed deploy and rollbackDeployId set on the new (rollback) deploy. The Deploy Watchdog shows the full rollback chain.
Quality Score Correlation
The most powerful feature of the Deploy Watchdog is correlating deploys with V(Q) score changes.
Deploy timeline with quality overlay:
v0.10.0 v0.11.0 v0.11.1 v0.12.0| | | || ↓ V(Q): 0.94 | ↓ V(Q): 0.97| ────────────── | ────────────| 0.91 | 0.94 | 0.91 | 0.97─────────| ─────────|A deploy that drops V(Q) is flagged immediately. If V(Q) drops below 0.85 after a deploy, the watchdog raises a CRITICAL alert and suggests rollback.
Deploy History Queries
-- Deploy success rate by target (last 30 days)SELECT target, COUNT(*) AS total_deploys, COUNT(*) FILTER (WHERE status = 'success') AS successful, ROUND( COUNT(*) FILTER (WHERE status = 'success')::numeric / COUNT(*) * 100, 1 ) AS success_rate_pct, AVG(duration_ms) / 1000 AS avg_duration_secFROM deploy_recordsWHERE started_at > NOW() - INTERVAL '30 days'GROUP BY target;
-- Deploys that triggered rollbackSELECT d.target, d.version, d.deployed_by, d.started_at, d.vq_score_before, d.vq_score_after, d.failure_reasonFROM deploy_records dWHERE d.status = 'rolled-back' AND d.started_at > NOW() - INTERVAL '90 days'ORDER BY d.started_at DESC;
-- Mean time to recover (MTTR) from failed deploysSELECT target, AVG( EXTRACT(EPOCH FROM (r.started_at - f.started_at)) / 60 ) AS avg_recovery_minutesFROM deploy_records fJOIN deploy_records r ON r.rollback_deploy_id = f.idGROUP BY target;Deploy Alerts
The watchdog sends alerts via Slack when:
| Condition | Severity | Channel |
|---|---|---|
| Deploy failed | ERROR | #ops-alerts |
| Health check failed post-deploy | CRITICAL | #ops-alerts + #on-call |
| V(Q) dropped > 0.10 after deploy | WARNING | #ops-alerts |
| Rollback initiated | CRITICAL | #ops-alerts + #on-call |
| No deploy in > 7 days (staleness check) | INFO | #ops-digest |
Alerts include: target, version, deploy ID, V(Q) delta, failure reason (if any), and link to the Internal Dashboard deploy detail view.