Testing Dashboard — Overview
The Testing Dashboard (Surface 01) is the internal pre-release validation environment for the GOVERN platform, deployed at jarvis-dashboard-v6.pages.dev.
This is where all GOVERN surfaces are tested before they reach customers. The promotion path is:
Testing Dashboard (jarvis-dashboard-v6.pages.dev) → finalized testing → Internal Dashboard (Surface 02) → finalized for customer → Customer SOC (Surface 04)Every SOC-style dashboard is first built and validated here, then promoted to the Internal Dashboard after testing passes, then copied to the Customer SOC surface for customer-facing deployment.
Purpose
GOVERN is a governance platform — it certifies AI systems for customers. That means our own quality bar must be higher than theirs. The Testing Dashboard enforces this by requiring every component to pass a full validation stack before release.
The dashboard answers three questions:
- Does it work? Unit tests, integration tests, and e2e flows confirm functional correctness.
- Does it look right? Playwright visual regression confirms UI components haven’t regressed.
- Is it world-class? Gate II (V(Q) >= 85%) and Gate IV (5-point polish check) confirm quality meets the Home Standard.
Scope — The 10 GOVERN Components
Every GOVERN surface gets tested before release:
| # | Component | Test Focus |
|---|---|---|
| 01 | Testing Dashboard (this surface) | Meta — tests the test infrastructure itself |
| 02 | Internal Dashboard | Build event wiring, pipeline integrity, cost accuracy |
| 03 | Customer Dashboard | Assessment flows, report generation, benchmark display |
| 04 | GOVERN API | Route coverage, auth, response shapes, rate limits |
| 05 | Monitoring Probe (Docker) | Container build, agent detection, telemetry emit |
| 06 | Browser Extension | Content script injection, overlay rendering, data capture |
| 07 | OS Agent (Desktop) | Process monitoring, system event capture, telemetry |
| 08 | Mobile App | iOS/Android flows, offline behavior, sync |
| 09 | Developer SDK | Integration flows, TypeScript types, example apps |
| 10 | GOVERN Docs | Link integrity, content accuracy, search index |
Testing Philosophy
The Testing Dashboard follows the GOVERN convergence doctrine: no deploy without convergence, no convergence without evidence.
This means:
- Tests are not optional pre-commit gate theater — they are the evidence that convergence has been reached
- Every test failure is a blocked deploy, not a warning
- Visual regression captures what metrics cannot — that the product looks right, not just that it ran
- Probe tests run in Docker to match production deployment conditions exactly
Quality Gates
Two constitutional quality gates govern every release:
Gate II — V(Q) >= 85% The convergence score across all test dimensions must be 85% or higher. A score below 85% means the component is not ready to ship. No exceptions.
Gate IV — 5-Point Polish Check Before any UI surface ships, all five points must pass:
- Scene moves at idle (ambient animation active)
- Orb is the interface, not a textarea
- Page matches the reference benchmark (archetypal-app.pages.dev)
- Energy layers L1/L2/L3 visible
- Text is ambient, not dominant
Any point failing means the UI is blocked until it passes.
Test Execution Flow
Engineer pushes change ↓Typecheck (pnpm typecheck) — must return 0 errors ↓Unit tests (pnpm test) — all suites must pass ↓Integration tests — API endpoints, database queries ↓Playwright visual regression — screenshots compared ↓API spot-checks — curl against staging endpoints ↓Probe container test — Docker build + local proxy ↓QA score calculated — must be >= 85% ↓Polish check — 5-point UI checklist ↓Gate opens → component eligible for releaseDashboard Layout
The Testing Dashboard UI shows:
- Test run status — current pass/fail for each component
- Coverage heatmap — which areas have the most test coverage
- Visual diff viewer — side-by-side Playwright screenshot comparison
- Gate status panel — Gate II score + Gate IV checklist per component
- Recent test history — last 10 runs per component with trends
- Failure detail drawer — full output, stack traces, and suggested fixes
Relationship to CI/CD
The Testing Dashboard is not a replacement for CI/CD — it is the quality layer that gates deployment. The flow is:
- CI runs on every PR (GitHub Actions) — fast, subset of tests
- Testing Dashboard runs the full stack before release approval
- Release is blocked until Testing Dashboard shows green across all gates
The dashboard reads test results from CI artifacts and enriches them with visual regression, probe tests, and quality scoring that CI does not produce.