3.9 KiB
Memory Usage Analysis
Current Memory Per Connection: ~75KB
This is higher than ideal. Let's break down where the memory is going:
Per-Connection Memory Breakdown
-
WebSocket Buffers: 16KB ⚠️ Largest contributor
- Read buffer: 8KB
- Write buffer: 8KB
- These are allocated per connection regardless of usage
-
Channel Buffer: ~2KB
- 10 messages * ~200 bytes per message
- Helps prevent blocking but uses memory
-
Goroutine Stacks: ~4KB
- 2 goroutines per connection (read + write handlers)
- ~2KB stack per goroutine (default Go stack size)
-
Client Struct: ~100 bytes
- Minimal overhead
-
Map Overhead: Variable
- Nested map structure:
map[uint]map[string][]*client - Each map level has hash table overhead
- Pointer storage overhead
- Nested map structure:
-
Go Runtime Overhead: ~2-4KB
- GC metadata
- Runtime structures
-
Docker/System Overhead: Shared
- Container base memory
- System libraries
Sharding Structure Analysis
Current Structure:
map[uint]map[string][]*client // userID -> token -> []client
Memory Impact:
- ✅ Good: Sharding reduces lock contention significantly
- ⚠️ Concern: Nested maps add overhead
- Each map has bucket overhead (~8 bytes per bucket)
- Hash table structure overhead
- For 256 shards with sparse distribution, this adds up
Is Sharding Okay?
- ✅ Yes, sharding is necessary for performance at scale
- ⚠️ But we could optimize the structure for memory efficiency
Optimization Opportunities
1. Reduce Buffer Sizes (Quick Win)
Current: 8KB read + 8KB write = 16KB Optimized: 2KB read + 2KB write = 4KB Savings: ~12KB per connection (16% reduction)
Trade-off: More syscalls, but acceptable for most use cases
2. Flatten Map Structure (Memory Optimization)
Current: map[uint]map[string][]*client
Optimized: map[string]*client with composite key
Savings: Eliminates one level of map overhead
Trade-off: Slightly more complex key generation, but better memory
3. Reduce Channel Buffer Size
Current: 10 messages Optimized: 5 messages Savings: ~1KB per connection
Trade-off: Slightly higher chance of blocking, but usually acceptable
4. Connection Pooling (Advanced)
Reuse connections or reduce goroutine overhead
Recommended Optimizations
Option A: Quick Memory Reduction (Easy)
# Reduce buffer sizes
readbuffersize: 2048 # from 8192
writebuffersize: 2048 # from 8192
channelbuffersize: 5 # from 10
Expected: ~12-15KB per connection (60-80% reduction in buffer overhead)
Option B: Structure Optimization (Medium)
Flatten the nested map structure to reduce overhead:
// Instead of: map[uint]map[string][]*client
// Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token)
Expected: ~2-5KB per connection savings
Option C: Hybrid Approach (Best)
Combine buffer reduction + structure optimization Expected: ~15-20KB per connection (down from 75KB to ~55-60KB)
Real-World Expectations
For 1M connections:
- Current: ~75GB (75KB * 1M)
- Optimized: ~55-60GB (55KB * 1M)
- Savings: ~15-20GB
For 10M connections:
- Current: ~750GB (not feasible)
- Optimized: ~550-600GB (still large, but more manageable)
Conclusion
Sharding is good - it's essential for performance. The memory issue comes from:
- Large WebSocket buffers (16KB) - biggest issue
- Nested map overhead - moderate issue
- Channel buffers - minor issue
Recommendation:
- ✅ Keep sharding (it's working well)
- ⚠️ Reduce buffer sizes for memory-constrained environments
- ⚠️ Consider flattening map structure if memory is critical
- ✅ Test with reduced buffers to validate performance
The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.