sharded-gotify/benchmark/MEMORY_ANALYSIS.md

3.9 KiB

Memory Usage Analysis

Current Memory Per Connection: ~75KB

This is higher than ideal. Let's break down where the memory is going:

Per-Connection Memory Breakdown

  1. WebSocket Buffers: 16KB ⚠️ Largest contributor

    • Read buffer: 8KB
    • Write buffer: 8KB
    • These are allocated per connection regardless of usage
  2. Channel Buffer: ~2KB

    • 10 messages * ~200 bytes per message
    • Helps prevent blocking but uses memory
  3. Goroutine Stacks: ~4KB

    • 2 goroutines per connection (read + write handlers)
    • ~2KB stack per goroutine (default Go stack size)
  4. Client Struct: ~100 bytes

    • Minimal overhead
  5. Map Overhead: Variable

    • Nested map structure: map[uint]map[string][]*client
    • Each map level has hash table overhead
    • Pointer storage overhead
  6. Go Runtime Overhead: ~2-4KB

    • GC metadata
    • Runtime structures
  7. Docker/System Overhead: Shared

    • Container base memory
    • System libraries

Sharding Structure Analysis

Current Structure:

map[uint]map[string][]*client  // userID -> token -> []client

Memory Impact:

  • Good: Sharding reduces lock contention significantly
  • ⚠️ Concern: Nested maps add overhead
    • Each map has bucket overhead (~8 bytes per bucket)
    • Hash table structure overhead
    • For 256 shards with sparse distribution, this adds up

Is Sharding Okay?

  • Yes, sharding is necessary for performance at scale
  • ⚠️ But we could optimize the structure for memory efficiency

Optimization Opportunities

1. Reduce Buffer Sizes (Quick Win)

Current: 8KB read + 8KB write = 16KB Optimized: 2KB read + 2KB write = 4KB Savings: ~12KB per connection (16% reduction)

Trade-off: More syscalls, but acceptable for most use cases

2. Flatten Map Structure (Memory Optimization)

Current: map[uint]map[string][]*client Optimized: map[string]*client with composite key Savings: Eliminates one level of map overhead

Trade-off: Slightly more complex key generation, but better memory

3. Reduce Channel Buffer Size

Current: 10 messages Optimized: 5 messages Savings: ~1KB per connection

Trade-off: Slightly higher chance of blocking, but usually acceptable

4. Connection Pooling (Advanced)

Reuse connections or reduce goroutine overhead

Option A: Quick Memory Reduction (Easy)

# Reduce buffer sizes
readbuffersize: 2048    # from 8192
writebuffersize: 2048   # from 8192
channelbuffersize: 5    # from 10

Expected: ~12-15KB per connection (60-80% reduction in buffer overhead)

Option B: Structure Optimization (Medium)

Flatten the nested map structure to reduce overhead:

// Instead of: map[uint]map[string][]*client
// Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token)

Expected: ~2-5KB per connection savings

Option C: Hybrid Approach (Best)

Combine buffer reduction + structure optimization Expected: ~15-20KB per connection (down from 75KB to ~55-60KB)

Real-World Expectations

For 1M connections:

  • Current: ~75GB (75KB * 1M)
  • Optimized: ~55-60GB (55KB * 1M)
  • Savings: ~15-20GB

For 10M connections:

  • Current: ~750GB (not feasible)
  • Optimized: ~550-600GB (still large, but more manageable)

Conclusion

Sharding is good - it's essential for performance. The memory issue comes from:

  1. Large WebSocket buffers (16KB) - biggest issue
  2. Nested map overhead - moderate issue
  3. Channel buffers - minor issue

Recommendation:

  1. Keep sharding (it's working well)
  2. ⚠️ Reduce buffer sizes for memory-constrained environments
  3. ⚠️ Consider flattening map structure if memory is critical
  4. Test with reduced buffers to validate performance

The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.