## Roadmap (post v0.0.1)
Prioritized from easiest/low-risk to more involved work. Check off as we ship.
### Quick wins (target v0.0.2)
- [ ] Add crawl metadata (startedAt, finishedAt, durationMs)
- [ ] Include run parameters in report (maxDepth, concurrency, timeout, userAgent, sameHostOnly)
- [ ] Status histogram (2xx/3xx/4xx/5xx totals) in summary
- [ ] Normalize and dedupe trailing `/.` URL variants in output
- [ ] Add compact `reportSummary` text block to JSON
- [ ] Top external domains with counts
- [ ] Broken links sample (first N) + per-domain broken counts
### Moderate scope
- [ ] Robots.txt summary (present, fetchedAt, sample disallow rules)
- [ ] Sitemap extras (index → child sitemaps, fetch errors)
- [ ] Per-page response time (responseTimeMs) and content length
- [ ] Basic page metadata: `
`, canonical (if present)
- [ ] Depth distribution (count of pages by depth)
- [ ] Duplicate title/canonical detection (lists of URLs)
### Content/asset analysis
- [ ] Extract assets (images/css/js) per page with status/type/size
- [ ] Mixed-content detection (http assets on https pages)
- [ ] Image accessibility metric (alt present ratio)
### Security and quality signals
- [ ] Security headers by host (HSTS, CSP, X-Frame-Options, Referrer-Policy)
- [ ] Insecure forms (http action on https page)
- [ ] Large pages and slow pages (p95 thresholds) summary
### Link behavior and graph
- [ ] Redirect map (from → to, hops; count summary)
- [ ] Indegree/outdegree stats; small graph summary
### Outputs and UX
- [ ] CSV exports: pages.csv, links.csv, assets.csv
- [ ] NDJSON export option for streaming pipelines
- [ ] Optional: include file/line anchors in JSON for large outputs
### Notes
- Keep JSON stable and sorted; avoid breaking changes. If we change fields, bump minor version and document in `reports/REPORT_SCHEMA.md`.
- Favor opt-in flags for heavier analyses (assets, headers) to keep default runs fast.