Commit Graph

1 Commits

Author SHA1 Message Date
Leopere 5b49685ae9
Harden resilience: auto-restart harvester, poison-safe mutexes, graceful shutdown
- Replace all Mutex::lock().unwrap() with lock_or_recover() that recovers
  from poisoned mutexes instead of panicking (cascading failure prevention)
- Wrap harvester loop in catch_unwind with a supervisor thread that
  automatically restarts on panic (requires panic=unwind in release profile)
- Add exponential backoff with jitter for camera reconnection (2s base,
  60s cap) instead of fixed 10s intervals
- Enforce frame deadline: frames exceeding FRAME_TIMEOUT are treated as
  errors rather than just logged
- Add graceful shutdown via SIGINT/SIGTERM with axum's
  with_graceful_shutdown
- Track harvester restart count via AtomicU64 for diagnostics
- Extract docs/MCP handlers into src/docs_handlers.rs to keep main.rs
  under 400 lines
- Change release profile from panic=abort to panic=unwind so
  catch_unwind actually works in production
- Add tokio signal feature for shutdown handling

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-09 13:47:23 -05:00