Infrastructure Monitoring
This page covers operational monitoring of the GOVERN infrastructure — the Cloudflare Workers, Supabase database, Upstash Redis cache, and R2 storage that the platform runs on.
Infrastructure Components
| Component | Provider | Monitor via |
|---|---|---|
| API Gateway | Cloudflare Workers | Cloudflare Analytics + health endpoint |
| Durable Objects (Coordinator, AutonomyKernel) | Cloudflare DOs | Cloudflare Analytics |
| Database | Supabase (Postgres) | Supabase Dashboard + custom queries |
| Cache | Upstash Redis | Upstash Console |
| Object Storage | Cloudflare R2 (jarvis-artifacts) | Cloudflare Dashboard |
| Email (if used) | Resend | Resend Dashboard |
Cloudflare Worker Health
Health endpoint
# API Gateway health checkcurl -s https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq .
# Expected response{ "status": "ok", "timestamp": "2026-04-12T...", "version": "0.12.0", "environment": "production", "checks": { "supabase": "ok", "redis": "ok", "r2": "ok" }}Cloudflare Analytics
Navigate to: dash.cloudflare.com → Workers & Pages → jarvis-api-gateway → Analytics
Key metrics to watch:
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Error rate | < 1% | 1–5% | > 5% |
| P95 response time | < 200ms | 200–500ms | > 500ms |
| Requests/minute | Normal range | Sudden spike → investigate | — |
| CPU time | < 50ms avg | 50–100ms | > 100ms |
Durable Object monitoring
# Check Coordinator DO healthcurl -s https://jarvis-api-gateway.ben-c1f.workers.dev/api/coordinator/health \ -H "Authorization: Bearer $AUTH_SECRET" | jq .
# Check AutonomyKernel DO statuscurl -s https://jarvis-api-gateway.ben-c1f.workers.dev/api/autonomy/status \ -H "Authorization: Bearer $AUTH_SECRET" | jq .Worker logs (real-time)
# Stream live logs from production workernpx wrangler tail --env production
# Filter for errors onlynpx wrangler tail --env production --format pretty | grep -i errorSupabase Database Monitoring
Connection pool health
# Check connection pool utilizationcurl "https://supabase-your-project.supabase.co/rest/v1/rpc/get_pool_stats" \ -H "apikey: $SUPABASE_SERVICE_ROLE_KEY" \ -H "Authorization: Bearer $SUPABASE_SERVICE_ROLE_KEY" | jq .Table sizes (storage growth check)
-- Run in Supabase SQL editorSELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size, pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size, pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) AS index_sizeFROM pg_tablesWHERE schemaname = 'public'ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESCLIMIT 20;Slow queries
-- Find slow queries (requires pg_stat_statements extension)SELECT query, calls, mean_exec_time, total_exec_time, rowsFROM pg_stat_statementsWHERE mean_exec_time > 100 -- Queries averaging > 100msORDER BY mean_exec_time DESCLIMIT 20;Database health check
-- Quick health check: recent activitySELECT 'monitoring_events last hour' AS check_name, COUNT(*) AS valueFROM monitoring_eventsWHERE created_at > NOW() - INTERVAL '1 hour'
UNION ALL
SELECT 'build_events last hour', COUNT(*)FROM build_eventsWHERE created_at > NOW() - INTERVAL '1 hour'
UNION ALL
SELECT 'active connections', COUNT(*)FROM pg_stat_activityWHERE state = 'active';Supabase status page
Always check https://status.supabase.com/ before investigating database issues. If Supabase is reporting an incident, wait for their fix before investigating GOVERN-side issues.
Upstash Redis Monitoring
Redis is used for:
- Rate limit counters (sliding window per org)
- Session state caching
- Build event deduplication
Redis health check
# Via API gateway health endpoint (checks Redis internally)curl https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq '.checks.redis'Redis metrics (Upstash Console)
Navigate to: console.upstash.com → your Redis database
Key metrics:
- Commands/second — Healthy: < 1000/s in normal operation
- Memory usage — Alert at 80% of plan limit
- Latency P99 — Alert if > 50ms
- Connection errors — Any errors → investigate
Redis key patterns
# Rate limit keysGOVERN:ratelimit:{orgId}:{endpoint}:{window}
# Session cache keysGOVERN:session:{sessionId}
# Build event dedup keysGOVERN:dedup:{eventHash}Keys expire automatically. Rate limit keys expire at window end (60s). Session keys expire after 24 hours.
R2 Storage Monitoring
R2 (jarvis-artifacts bucket) stores:
- Assessment report PDFs
- Probe container build artifacts
- Exported data files
Storage usage check
# Via Cloudflare dashboard:# dash.cloudflare.com → R2 → jarvis-artifacts → Settings# Shows: Object count, Storage used, Requests (30 days)Storage budget
| Tier | Budget | Alert at |
|---|---|---|
| Storage | 10 GB | 8 GB (80%) |
| Class A ops (writes) | 1M/month | 800K |
| Class B ops (reads) | 10M/month | 8M |
Orphaned file cleanup
Periodically check for orphaned artifacts (files no longer referenced by any database record):
-- Find artifact records with no matching database record-- (Run monthly as maintenance)SELECT r2_key, created_at, size_bytesFROM artifactsWHERE referenced_by IS NULL AND created_at < NOW() - INTERVAL '30 days';Monitoring Checklist (Weekly)
Run this checklist every Monday morning:
- API Gateway health endpoint returns 200 with all checks “ok”
- Error rate in Cloudflare Analytics < 1% for the past 7 days
- Supabase: no incidents on status page
- Database: no tables > 1 GB
- Database: no queries averaging > 100ms
- Redis: memory usage < 80% of plan
- R2: storage < 8 GB
- Cost Governor: last 7 days total < weekly budget
- Deploy Watchdog: all targets “healthy”
- No unacknowledged alerts in
#ops-alerts
External Status Pages
Bookmark these for incident response:
| Service | Status page |
|---|---|
| Cloudflare | https://www.cloudflarestatus.com/ |
| Supabase | https://status.supabase.com/ |
| Upstash | https://status.upstash.com/ |
| Anthropic | https://status.anthropic.com/ |
| OpenAI | https://status.openai.com/ |