## Roadmap (post v0.0.1) Prioritized from easiest/low-risk to more involved work. Check off as we ship. ### Quick wins (target v0.0.2) - [x] Add crawl metadata (startedAt, finishedAt, durationMs) - [x] Include run parameters in report (maxDepth, concurrency, timeout, userAgent, sameHostOnly) - [x] Status histogram (2xx/3xx/4xx/5xx totals) in summary - [x] Normalize and dedupe trailing `/.` URL variants in output - [ ] Add compact `reportSummary` text block to JSON - [ ] Top external domains with counts - [ ] Broken links sample (first N) + per-domain broken counts ### Moderate scope - [ ] Robots.txt summary (present, fetchedAt, sample disallow rules) - [ ] Sitemap extras (index → child sitemaps, fetch errors) - [ ] Per-page response time (responseTimeMs) and content length - [ ] Basic page metadata: ``, canonical (if present) - [ ] Depth distribution (count of pages by depth) - [ ] Duplicate title/canonical detection (lists of URLs) ### Content/asset analysis - [ ] Extract assets (images/css/js) per page with status/type/size - [ ] Mixed-content detection (http assets on https pages) - [ ] Image accessibility metric (alt present ratio) ### Security and quality signals - [ ] Security headers by host (HSTS, CSP, X-Frame-Options, Referrer-Policy) - [ ] Insecure forms (http action on https page) - [ ] Large pages and slow pages (p95 thresholds) summary ### Link behavior and graph - [ ] Redirect map (from → to, hops; count summary) - [ ] Indegree/outdegree stats; small graph summary ### Outputs and UX - [ ] CSV exports: pages.csv, links.csv, assets.csv - [ ] NDJSON export option for streaming pipelines - [ ] Optional: include file/line anchors in JSON for large outputs ### Notes - Keep JSON stable and sorted; avoid breaking changes. If we change fields, bump minor version and document in `reports/REPORT_SCHEMA.md`. - Favor opt-in flags for heavier analyses (assets, headers) to keep default runs fast.