1.9 KiB
1.9 KiB
Roadmap (post v0.0.1)
Prioritized from easiest/low-risk to more involved work. Check off as we ship.
Quick wins (target v0.0.2)
- Add crawl metadata (startedAt, finishedAt, durationMs)
- Include run parameters in report (maxDepth, concurrency, timeout, userAgent, sameHostOnly)
- Status histogram (2xx/3xx/4xx/5xx totals) in summary
- Normalize and dedupe trailing
/.
URL variants in output - Add compact
reportSummary
text block to JSON - Top external domains with counts
- Broken links sample (first N) + per-domain broken counts
Moderate scope
- Robots.txt summary (present, fetchedAt, sample disallow rules)
- Sitemap extras (index → child sitemaps, fetch errors)
- Per-page response time (responseTimeMs) and content length
- Basic page metadata:
<title>
, canonical (if present) - Depth distribution (count of pages by depth)
- Duplicate title/canonical detection (lists of URLs)
Content/asset analysis
- Extract assets (images/css/js) per page with status/type/size
- Mixed-content detection (http assets on https pages)
- Image accessibility metric (alt present ratio)
Security and quality signals
- Security headers by host (HSTS, CSP, X-Frame-Options, Referrer-Policy)
- Insecure forms (http action on https page)
- Large pages and slow pages (p95 thresholds) summary
Link behavior and graph
- Redirect map (from → to, hops; count summary)
- Indegree/outdegree stats; small graph summary
Outputs and UX
- CSV exports: pages.csv, links.csv, assets.csv
- NDJSON export option for streaming pipelines
- Optional: include file/line anchors in JSON for large outputs
Notes
- Keep JSON stable and sorted; avoid breaking changes. If we change fields, bump minor version and document in
reports/REPORT_SCHEMA.md
. - Favor opt-in flags for heavier analyses (assets, headers) to keep default runs fast.