1.2 KiB
1.2 KiB
Roadmap (post v0.0.2)
Prioritized from easiest/low-risk to more involved work. Check off as we ship.
Shipped in v0.0.2
- Add crawl metadata (startedAt, finishedAt, durationMs)
- Include run parameters in report (maxDepth, concurrency, timeout, userAgent, sameHostOnly)
- Status histogram (2xx/3xx/4xx/5xx totals) in summary
- Normalize and dedupe trailing
/.
URL variants in output - Add compact
reportSummary
text block to JSON - Top external domains with counts
- Broken links sample (first N) + per-domain broken counts
- Robots.txt summary (present, fetchedAt)
- Sitemap extras (index → child sitemaps, fetch errors)
- Per-page response time (responseTimeMs) and content length (basic)
- Basic page metadata:
<title>
- Depth distribution (count of pages by depth)
- Redirect map summary (from → to domain counts)
Next (target v0.0.3)
- CSV exports: pages.csv, links.csv
- NDJSON export option for streaming pipelines
Notes
- All report metrics must be gathered by default with zero flags required.
- Keep JSON stable and sorted; update
reports/REPORT_SCHEMA.md
when fields change.