Infrastructure Monitoring

This page covers operational monitoring of the GOVERN infrastructure — the Cloudflare Workers, Supabase database, Upstash Redis cache, and R2 storage that the platform runs on.

Infrastructure Components

Component	Provider	Monitor via
API Gateway	Cloudflare Workers	Cloudflare Analytics + health endpoint
Durable Objects (Coordinator, AutonomyKernel)	Cloudflare DOs	Cloudflare Analytics
Database	Supabase (Postgres)	Supabase Dashboard + custom queries
Cache	Upstash Redis	Upstash Console
Object Storage	Cloudflare R2 (`jarvis-artifacts`)	Cloudflare Dashboard
Email (if used)	Resend	Resend Dashboard

Cloudflare Worker Health

Health endpoint

# API Gateway health check
curl -s https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq .

# Expected response
{
  "status": "ok",
  "timestamp": "2026-04-12T...",
  "version": "0.12.0",
  "environment": "production",
  "checks": {
    "supabase": "ok",
    "redis": "ok",
    "r2": "ok"
  }
}

Cloudflare Analytics

Navigate to: dash.cloudflare.com → Workers & Pages → jarvis-api-gateway → Analytics

Key metrics to watch:

Metric	Healthy	Warning	Critical
Error rate	< 1%	1–5%	> 5%
P95 response time	< 200ms	200–500ms	> 500ms
Requests/minute	Normal range	Sudden spike → investigate	—
CPU time	< 50ms avg	50–100ms	> 100ms

Durable Object monitoring

# Check Coordinator DO health
curl -s https://jarvis-api-gateway.ben-c1f.workers.dev/api/coordinator/health \
  -H "Authorization: Bearer $AUTH_SECRET" | jq .

# Check AutonomyKernel DO status
curl -s https://jarvis-api-gateway.ben-c1f.workers.dev/api/autonomy/status \
  -H "Authorization: Bearer $AUTH_SECRET" | jq .

Worker logs (real-time)

# Stream live logs from production worker
npx wrangler tail --env production

# Filter for errors only
npx wrangler tail --env production --format pretty | grep -i error

Supabase Database Monitoring

Connection pool health

# Check connection pool utilization
curl "https://supabase-your-project.supabase.co/rest/v1/rpc/get_pool_stats" \
  -H "apikey: $SUPABASE_SERVICE_ROLE_KEY" \
  -H "Authorization: Bearer $SUPABASE_SERVICE_ROLE_KEY" | jq .

Table sizes (storage growth check)

-- Run in Supabase SQL editor
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
  pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
  pg_size_pretty(pg_indexes_size(schemaname||'.'||tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Slow queries

-- Find slow queries (requires pg_stat_statements extension)
SELECT
  query,
  calls,
  mean_exec_time,
  total_exec_time,
  rows
FROM pg_stat_statements
WHERE mean_exec_time > 100  -- Queries averaging > 100ms
ORDER BY mean_exec_time DESC
LIMIT 20;

Database health check

-- Quick health check: recent activity
SELECT
  'monitoring_events last hour' AS check_name,
  COUNT(*) AS value
FROM monitoring_events
WHERE created_at > NOW() - INTERVAL '1 hour'

UNION ALL

SELECT
  'build_events last hour',
  COUNT(*)
FROM build_events
WHERE created_at > NOW() - INTERVAL '1 hour'

UNION ALL

SELECT
  'active connections',
  COUNT(*)
FROM pg_stat_activity
WHERE state = 'active';

Supabase status page

Always check https://status.supabase.com/ before investigating database issues. If Supabase is reporting an incident, wait for their fix before investigating GOVERN-side issues.

Upstash Redis Monitoring

Redis is used for:

Rate limit counters (sliding window per org)
Session state caching
Build event deduplication

Redis health check

# Via API gateway health endpoint (checks Redis internally)
curl https://jarvis-api-gateway.ben-c1f.workers.dev/health | jq '.checks.redis'

Redis metrics (Upstash Console)

Navigate to: console.upstash.com → your Redis database

Key metrics:

Commands/second — Healthy: < 1000/s in normal operation
Memory usage — Alert at 80% of plan limit
Latency P99 — Alert if > 50ms
Connection errors — Any errors → investigate

Redis key patterns

# Rate limit keys
GOVERN:ratelimit:{orgId}:{endpoint}:{window}

# Session cache keys
GOVERN:session:{sessionId}

# Build event dedup keys
GOVERN:dedup:{eventHash}

Keys expire automatically. Rate limit keys expire at window end (60s). Session keys expire after 24 hours.

R2 Storage Monitoring

R2 (jarvis-artifacts bucket) stores:

Assessment report PDFs
Probe container build artifacts
Exported data files

Storage usage check

# Via Cloudflare dashboard:
# dash.cloudflare.com → R2 → jarvis-artifacts → Settings
# Shows: Object count, Storage used, Requests (30 days)

Storage budget

Tier	Budget	Alert at
Storage	10 GB	8 GB (80%)
Class A ops (writes)	1M/month	800K
Class B ops (reads)	10M/month	8M

Orphaned file cleanup

Periodically check for orphaned artifacts (files no longer referenced by any database record):

-- Find artifact records with no matching database record
-- (Run monthly as maintenance)
SELECT r2_key, created_at, size_bytes
FROM artifacts
WHERE referenced_by IS NULL
  AND created_at < NOW() - INTERVAL '30 days';

Monitoring Checklist (Weekly)

Run this checklist every Monday morning:

External Status Pages

Bookmark these for incident response:

Service	Status page
Cloudflare	https://www.cloudflarestatus.com/
Supabase	https://status.supabase.com/
Upstash	https://status.upstash.com/
Anthropic	https://status.anthropic.com/
OpenAI	https://status.openai.com/