# URL Monitor System

| Key | Value |
|-----|-------|
| Status | Active |
| Owner | QA Automation |
| Updated | 2026-03-26 |
| Scope | Link crawling, broken-link reporting, and URL-quality monitoring across CNC sites |

The URL Monitor exists because broken links create a very different kind of failure from selector or journey regressions. The page can load. The UI can look fine. The user can still land on a dead page, a redirect loop, or a quietly expired campaign destination.

## What The URL Monitor Does

| Capability | Why It Matters |
|------------|----------------|
| crawl pages | finds the links the sites actually expose |
| extract links intelligently | keeps the signal closer to real user navigation |
| check URLs asynchronously | makes large checks practical |
| deduplicate results | avoids alert floods |
| track source pages | helps people find where the bad link came from |
| keep history | lets the team spot recurring URL issues |

## Crawl Depth Options

| Crawl Mode | Best For |
|------------|----------|
| basic crawl | quick health check |
| nav-focused crawl | menu and structural link paths |
| full crawl | deeper audits and larger sweeps |

## What Makes It Useful

The system is not just a raw link checker. It keeps enough context to help someone act on the result.

| Useful Context | Why It Helps |
|----------------|--------------|
| source tracking | you know where the bad URL came from |
| history | you know whether this is new or repeating |
| deduplication | one broken endpoint does not become dozens of noisy alerts |
| scheduled operation | issues surface without someone manually crawling the sites |

## Typical Problems It Catches

| Problem Type | Example |
|--------------|---------|
| broken navigation targets | menu item or footer link goes nowhere |
| expired campaign pages | old marketing or editorial landing page now returns 404 |
| backend 500s | link exists but destination is broken |
| decommissioned subdomains | historical links still live on the page |

## Slack And Reporting

The URL Monitor should feel like an operator tool, not a crawler dump.

| Reporting Goal | How It Should Feel |
|----------------|--------------------|
| clear severity | is this one broken page or a broad issue? |
| source context | where was the link found? |
| trend awareness | is it recurring or brand new? |
| low noise | one incident, not fifty near-identical messages |

## Why It Belongs In This Repo

Links are part of product quality. For news sites, bad links affect:

- reader trust
- editorial campaigns
- SEO
- subscription journeys
- archive content

That makes URL quality a real operational concern, not just a maintenance nicety.

[EXPAND: Good operator workflow]

When the URL monitor reports problems:

1. check whether the failures cluster on one destination or one source surface
2. confirm whether the breakage is editorial, backend, or infrastructure
3. fix the source link if it is a content issue
4. escalate the destination if the source is correct but the page is broken

[END EXPAND]

## Related Pages

| Need | Go To |
|------|-------|
| general monitors | [Monitoring](../wiki/monitoring.md) |
| commands | [CLI Reference](../wiki/cli-reference.md) |
