Operations Monitoring
GOVERN uses a combination of Cloudflare Analytics, Langfuse, and Supabase observability to monitor the platform.
Primary observability surfaces
| Tool | What it monitors |
|---|---|
| Cloudflare Workers Analytics | API latency, error rates, CPU time, request volume |
| Cloudflare Pages Analytics | Web app traffic, error rates |
| Supabase Dashboard | Database query performance, connection pool, disk usage |
| Upstash Redis Dashboard | Cache hit rate, connection count, memory usage |
| Langfuse | AI call traces, model latency, token usage |
Key metrics to watch
| Metric | Warning | Critical |
|---|---|---|
| API P95 latency | > 300ms | > 1000ms |
| Error rate | > 0.5% | > 2% |
| Assessment throughput | Drop > 20% | Drop > 50% |
| DB connection pool | > 70% used | > 90% used |
| Redis memory | > 70% used | > 85% used |
| Disk usage (Supabase) | > 70% | > 85% |
Alerting
Alerts are routed via PagerDuty for critical and high severity incidents. Medium and low go to the #govern-ops Slack channel.
On-call rotation is managed in PagerDuty. The current on-call schedule is visible in the PagerDuty portal.
Synthetic monitoring
Synthetic health checks run every 60 seconds from three regions (US, EU, APAC):
GET https://govern-api.archetypal.ai/healthGET https://govern-dashboard.pages.devGET https://govern-docs.pages.devFailure triggers a PagerDuty incident immediately.