3.9 KiB

Raw Permalink Blame History

Memory Usage Analysis

Current Memory Per Connection: ~75KB

This is higher than ideal. Let's break down where the memory is going:

Per-Connection Memory Breakdown

WebSocket Buffers: 16KB ⚠️ Largest contributor
- Read buffer: 8KB
- Write buffer: 8KB
- These are allocated per connection regardless of usage
Channel Buffer: ~2KB
- 10 messages * ~200 bytes per message
- Helps prevent blocking but uses memory
Goroutine Stacks: ~4KB
- 2 goroutines per connection (read + write handlers)
- ~2KB stack per goroutine (default Go stack size)
Client Struct: ~100 bytes
- Minimal overhead
Map Overhead: Variable
- Nested map structure: map[uint]map[string][]*client
- Each map level has hash table overhead
- Pointer storage overhead
Go Runtime Overhead: ~2-4KB
- GC metadata
- Runtime structures
Docker/System Overhead: Shared
- Container base memory
- System libraries

Sharding Structure Analysis

Current Structure:

map[uint]map[string][]*client  // userID -> token -> []client

Memory Impact:

✅ Good: Sharding reduces lock contention significantly
⚠️ Concern: Nested maps add overhead
- Each map has bucket overhead (~8 bytes per bucket)
- Hash table structure overhead
- For 256 shards with sparse distribution, this adds up

Is Sharding Okay?

✅ Yes, sharding is necessary for performance at scale
⚠️ But we could optimize the structure for memory efficiency

Optimization Opportunities

1. Reduce Buffer Sizes (Quick Win)

Current: 8KB read + 8KB write = 16KB Optimized: 2KB read + 2KB write = 4KB Savings: ~12KB per connection (16% reduction)

Trade-off: More syscalls, but acceptable for most use cases

2. Flatten Map Structure (Memory Optimization)

Current: map[uint]map[string][]*client Optimized: map[string]*client with composite key Savings: Eliminates one level of map overhead

Trade-off: Slightly more complex key generation, but better memory

3. Reduce Channel Buffer Size

Current: 10 messages Optimized: 5 messages Savings: ~1KB per connection

Trade-off: Slightly higher chance of blocking, but usually acceptable

4. Connection Pooling (Advanced)

Reuse connections or reduce goroutine overhead

Recommended Optimizations

Option A: Quick Memory Reduction (Easy)

# Reduce buffer sizes
readbuffersize: 2048    # from 8192
writebuffersize: 2048   # from 8192
channelbuffersize: 5    # from 10

Expected: ~12-15KB per connection (60-80% reduction in buffer overhead)

Option B: Structure Optimization (Medium)

Flatten the nested map structure to reduce overhead:

// Instead of: map[uint]map[string][]*client
// Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token)

Expected: ~2-5KB per connection savings

Option C: Hybrid Approach (Best)

Combine buffer reduction + structure optimization Expected: ~15-20KB per connection (down from 75KB to ~55-60KB)

Real-World Expectations

For 1M connections:

Current: ~75GB (75KB * 1M)
Optimized: ~55-60GB (55KB * 1M)
Savings: ~15-20GB

For 10M connections:

Current: ~750GB (not feasible)
Optimized: ~550-600GB (still large, but more manageable)

Conclusion

Sharding is good - it's essential for performance. The memory issue comes from:

Large WebSocket buffers (16KB) - biggest issue
Nested map overhead - moderate issue
Channel buffers - minor issue

Recommendation:

✅ Keep sharding (it's working well)
⚠️ Reduce buffer sizes for memory-constrained environments
⚠️ Consider flattening map structure if memory is critical
✅ Test with reduced buffers to validate performance

The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.

3.9 KiB Raw Permalink Blame History