Test Intelligence v1: mapping every site to identify coverage gaps

2026-04-07·v2.6.0·Test Intelligence

TLDR

We mapped all 6 sites end to end — every page type, every analytics event, every navigation path. We now know exactly where our tests do not reach.

New tests are added with a quarantine period before they can fail CI, so the suites stay stable while coverage grows.

The change

The most significant change this cycle is in how we decide what to test. Until now our suites have grown incrementally: a gap is noticed, a test is written, the test ships. That approach works at a small scale and produces undetected coverage gaps at a larger one. With six sites and dozens of template families, those gaps accumulate silently. v1 replaces the incremental model with a systematic one.

The method is straightforward. Before writing any new test, we run a reconnaissance pass across all six sites and record everything it finds: URL patterns, template families, analytics events, navigation paths. We then diff that inventory against what our existing tests actually cover. The remaining uncovered surface becomes a prioritized work list, scored by business risk.

Two tools, two distinct roles

Firecrawl performs the reconnaissance. It crawls each site, extracts clean content, and runs interactive sessions in its own Playwright sandbox to capture real dataLayer events during navigation. It is fast, broad, and cheap to operate.

Firecrawl is explicitly not the source of truth. Every finding is treated as a hypothesis until MCP Playwright verifies it against the live DOM: does the selector exist, does the event actually fire. Only verified findings become test assertions. If Firecrawl and Playwright disagree, Playwright is authoritative. This separation matters because reconnaissance tools reliably surface plausible findings that do not hold up against the running page.

Delivered in v1

Reconnaissance complete for all 6 sites: blesk, auto, e15, reflex, isport, opinio. 23 JSON artifacts in data/reconnaissance/, 27 template families classified.
Journey maps for all 6 sites in docs/site-journey-maps/. These are now the authoritative reference for user navigation on each site.
Cross-site events catalog in docs/events-catalog.md with explicit P0/P1/P2 contract decisions.
Gap analysis: 42 gaps identified, 12 high-priority. Coverage model with 288 entries tracking each (site × suite × template-family) combination as discovered, classified, testable, covered, quarantine, intentionally-untested, blocked, or deprecated.
Events suite expanded to per-page-type P0 (page_ready) and P1 (gallery_open) coverage for blesk and e15. 9 tests where previously there was one gallery-only spec.
Content registry updated with page types identified during reconnaissance: live scores, premium landing, opinio author and podcast. 124 entries, 235 site slots.

The rules that keep scope contained

The discipline applied to discovery matters as much as the discovery itself. Each candidate test must clear a promotion gate (structurally distinct, business critical, analytics relevant, or historically failure-prone) and survive a kill rule (no duplicates of existing failure-mode coverage, no editorial-dependent assertions, no non-deterministic ad-server-dependent checks). Candidates that fail either check remain documented in the journey map but do not become tests. Deliberate non-coverage — gaps we consciously do not close — is treated as a first-class output of this work.

New tests deploy to quarantine first. They are non-blocking until they record at least 5 consecutive passes across at least 2 run windows, at which point they are promoted to blocking. A demoted test (3+ flakes within 14 days) returns to quarantine pending root-cause analysis. The rollout policy exists to prevent a large wave of new coverage from destabilizing suites that gate deploys.

Planned in v1.1+

v1 scope was deliberately constrained. The remaining work is scoped and tracked in the TODO list:

Events expansion — extend P0/P1 coverage from blesk and e15 to the remaining four sites (auto, isport, reflex, opinio).
E2E expansion by layout family — new journey specs for video, podcast, search, author, and topic pages across all sites. The current 71 E2E tests are concentrated on article, gallery, and share flows; the gap analysis identifies the uncovered layout families.
User-flows for the remaining authentication pairs. blesk/isport and auto/e15 are verified. e15/reflex SSO and Opinio edge cases are next.
Smoke, PDT, and Shadow alignment. Targeted additions for critical page types currently missing from these suites.
Mobile deltas — tests for user journeys where the mobile structure differs materially from desktop. Responsive-only variations are excluded.
Visual baselines for new layout families with genuine rendering risk. We do not plan to add visual snapshots for every page variation.

Impact on existing tests

Some effects are immediate; others will emerge over subsequent cycles.

Immediate. The events suite now exercises page_ready per page type on blesk and e15, so analytics regressions on either site are no longer silent. The content registry now covers page types that were not previously documented. Anyone authoring a new test has six journey maps and a coverage model as an authoritative reference.

Ongoing. Every new test is filtered through the promotion gate and the kill rule. The expected outcome is that fewer tests will be written than the gap list suggests, which is intended. Gaps are closed along one of three paths: assertion, documented intentionally-untested, or blocked: missing-selector-contract raised as a product instrumentation request. That last category is meaningful: selector instability caused by missing targeting hooks is a product issue, not a QA issue, and is now tracked separately instead of absorbed into maintenance work.

A temporary, controlled increase in flake is expected during the quarantine phase of new coverage. The rollout policy is designed to absorb this without affecting suites that gate deploys.

References

Spec: docs/superpowers/specs/2026-04-04-test-intelligence-gap-closure.md
Plan: docs/superpowers/plans/2026-04-04-test-intelligence-v1.md
Changelog: CHANGELOG.md — v2.6.0