Skip to content

Infrastructure Monitoring

This page covers operational monitoring of the GOVERN infrastructure — the Cloudflare Workers, Supabase database, Upstash Redis cache, and R2 storage that the platform runs on.

Infrastructure Components

ComponentProviderMonitor via
API GatewayCloudflare WorkersCloudflare Analytics + health endpoint
Durable Objects (Coordinator, AutonomyKernel)Cloudflare DOsCloudflare Analytics
DatabaseSupabase (Postgres)Supabase Dashboard + custom queries
CacheUpstash RedisUpstash Console
Object StorageCloudflare R2 (jarvis-artifacts)Cloudflare Dashboard
Email (if used)ResendResend Dashboard

Cloudflare Worker Health

Health endpoint

expressiveCode.terminalWindowFallbackTitle
# API Gateway health check
curl -s https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq .
# Expected response
{
"status": "ok",
"timestamp": "2026-04-12T...",
"version": "0.12.0",
"environment": "production",
"checks": {
"supabase": "ok",
"redis": "ok",
"r2": "ok"
}
}

Cloudflare Analytics

Navigate to: dash.cloudflare.com → Workers & Pages → jarvis-api-gateway → Analytics

Key metrics to watch:

MetricHealthyWarningCritical
Error rate< 1%1–5%> 5%
P95 response time< 200ms200–500ms> 500ms
Requests/minuteNormal rangeSudden spike → investigate
CPU time< 50ms avg50–100ms> 100ms

Durable Object monitoring

expressiveCode.terminalWindowFallbackTitle
# Check Coordinator DO health
curl -s https://jarvis-api-gateway.ben-c1f.workers.dev/api/coordinator/health \
-H "Authorization: Bearer $AUTH_SECRET" | jq .
# Check AutonomyKernel DO status
curl -s https://jarvis-api-gateway.ben-c1f.workers.dev/api/autonomy/status \
-H "Authorization: Bearer $AUTH_SECRET" | jq .

Worker logs (real-time)

expressiveCode.terminalWindowFallbackTitle
# Stream live logs from production worker
npx wrangler tail --env production
# Filter for errors only
npx wrangler tail --env production --format pretty | grep -i error

Supabase Database Monitoring

Connection pool health

expressiveCode.terminalWindowFallbackTitle
# Check connection pool utilization
curl "https://supabase-your-project.supabase.co/rest/v1/rpc/get_pool_stats" \
-H "apikey: $SUPABASE_SERVICE_ROLE_KEY" \
-H "Authorization: Bearer $SUPABASE_SERVICE_ROLE_KEY" | jq .

Table sizes (storage growth check)

-- Run in Supabase SQL editor
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Slow queries

-- Find slow queries (requires pg_stat_statements extension)
SELECT
query,
calls,
mean_exec_time,
total_exec_time,
rows
FROM pg_stat_statements
WHERE mean_exec_time > 100 -- Queries averaging > 100ms
ORDER BY mean_exec_time DESC
LIMIT 20;

Database health check

-- Quick health check: recent activity
SELECT
'monitoring_events last hour' AS check_name,
COUNT(*) AS value
FROM monitoring_events
WHERE created_at > NOW() - INTERVAL '1 hour'
UNION ALL
SELECT
'build_events last hour',
COUNT(*)
FROM build_events
WHERE created_at > NOW() - INTERVAL '1 hour'
UNION ALL
SELECT
'active connections',
COUNT(*)
FROM pg_stat_activity
WHERE state = 'active';

Supabase status page

Always check https://status.supabase.com/ before investigating database issues. If Supabase is reporting an incident, wait for their fix before investigating GOVERN-side issues.

Upstash Redis Monitoring

Redis is used for:

  • Rate limit counters (sliding window per org)
  • Session state caching
  • Build event deduplication

Redis health check

expressiveCode.terminalWindowFallbackTitle
# Via API gateway health endpoint (checks Redis internally)
curl https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq '.checks.redis'

Redis metrics (Upstash Console)

Navigate to: console.upstash.com → your Redis database

Key metrics:

  • Commands/second — Healthy: < 1000/s in normal operation
  • Memory usage — Alert at 80% of plan limit
  • Latency P99 — Alert if > 50ms
  • Connection errors — Any errors → investigate

Redis key patterns

expressiveCode.terminalWindowFallbackTitle
# Rate limit keys
GOVERN:ratelimit:{orgId}:{endpoint}:{window}
# Session cache keys
GOVERN:session:{sessionId}
# Build event dedup keys
GOVERN:dedup:{eventHash}

Keys expire automatically. Rate limit keys expire at window end (60s). Session keys expire after 24 hours.

R2 Storage Monitoring

R2 (jarvis-artifacts bucket) stores:

  • Assessment report PDFs
  • Probe container build artifacts
  • Exported data files

Storage usage check

expressiveCode.terminalWindowFallbackTitle
# Via Cloudflare dashboard:
# dash.cloudflare.com → R2 → jarvis-artifacts → Settings
# Shows: Object count, Storage used, Requests (30 days)

Storage budget

TierBudgetAlert at
Storage10 GB8 GB (80%)
Class A ops (writes)1M/month800K
Class B ops (reads)10M/month8M

Orphaned file cleanup

Periodically check for orphaned artifacts (files no longer referenced by any database record):

-- Find artifact records with no matching database record
-- (Run monthly as maintenance)
SELECT r2_key, created_at, size_bytes
FROM artifacts
WHERE referenced_by IS NULL
AND created_at < NOW() - INTERVAL '30 days';

Monitoring Checklist (Weekly)

Run this checklist every Monday morning:

  • API Gateway health endpoint returns 200 with all checks “ok”
  • Error rate in Cloudflare Analytics < 1% for the past 7 days
  • Supabase: no incidents on status page
  • Database: no tables > 1 GB
  • Database: no queries averaging > 100ms
  • Redis: memory usage < 80% of plan
  • R2: storage < 8 GB
  • Cost Governor: last 7 days total < weekly budget
  • Deploy Watchdog: all targets “healthy”
  • No unacknowledged alerts in #ops-alerts

External Status Pages

Bookmark these for incident response:

ServiceStatus page
Cloudflarehttps://www.cloudflarestatus.com/
Supabasehttps://status.supabase.com/
Upstashhttps://status.upstash.com/
Anthropichttps://status.anthropic.com/
OpenAIhttps://status.openai.com/