# Memory Usage Analysis ## Current Memory Per Connection: ~75KB This is higher than ideal. Let's break down where the memory is going: ### Per-Connection Memory Breakdown 1. **WebSocket Buffers: 16KB** ⚠️ **Largest contributor** - Read buffer: 8KB - Write buffer: 8KB - These are allocated per connection regardless of usage 2. **Channel Buffer: ~2KB** - 10 messages * ~200 bytes per message - Helps prevent blocking but uses memory 3. **Goroutine Stacks: ~4KB** - 2 goroutines per connection (read + write handlers) - ~2KB stack per goroutine (default Go stack size) 4. **Client Struct: ~100 bytes** - Minimal overhead 5. **Map Overhead: Variable** - Nested map structure: `map[uint]map[string][]*client` - Each map level has hash table overhead - Pointer storage overhead 6. **Go Runtime Overhead: ~2-4KB** - GC metadata - Runtime structures 7. **Docker/System Overhead: Shared** - Container base memory - System libraries ### Sharding Structure Analysis **Current Structure:** ```go map[uint]map[string][]*client // userID -> token -> []client ``` **Memory Impact:** - ✅ **Good**: Sharding reduces lock contention significantly - ⚠️ **Concern**: Nested maps add overhead - Each map has bucket overhead (~8 bytes per bucket) - Hash table structure overhead - For 256 shards with sparse distribution, this adds up **Is Sharding Okay?** - ✅ **Yes, sharding is necessary** for performance at scale - ⚠️ **But** we could optimize the structure for memory efficiency ### Optimization Opportunities #### 1. Reduce Buffer Sizes (Quick Win) **Current:** 8KB read + 8KB write = 16KB **Optimized:** 2KB read + 2KB write = 4KB **Savings:** ~12KB per connection (16% reduction) **Trade-off:** More syscalls, but acceptable for most use cases #### 2. Flatten Map Structure (Memory Optimization) **Current:** `map[uint]map[string][]*client` **Optimized:** `map[string]*client` with composite key **Savings:** Eliminates one level of map overhead **Trade-off:** Slightly more complex key generation, but better memory #### 3. Reduce Channel Buffer Size **Current:** 10 messages **Optimized:** 5 messages **Savings:** ~1KB per connection **Trade-off:** Slightly higher chance of blocking, but usually acceptable #### 4. Connection Pooling (Advanced) Reuse connections or reduce goroutine overhead ### Recommended Optimizations #### Option A: Quick Memory Reduction (Easy) ```yaml # Reduce buffer sizes readbuffersize: 2048 # from 8192 writebuffersize: 2048 # from 8192 channelbuffersize: 5 # from 10 ``` **Expected:** ~12-15KB per connection (60-80% reduction in buffer overhead) #### Option B: Structure Optimization (Medium) Flatten the nested map structure to reduce overhead: ```go // Instead of: map[uint]map[string][]*client // Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token) ``` **Expected:** ~2-5KB per connection savings #### Option C: Hybrid Approach (Best) Combine buffer reduction + structure optimization **Expected:** ~15-20KB per connection (down from 75KB to ~55-60KB) ### Real-World Expectations **For 1M connections:** - Current: ~75GB (75KB * 1M) - Optimized: ~55-60GB (55KB * 1M) - Savings: ~15-20GB **For 10M connections:** - Current: ~750GB (not feasible) - Optimized: ~550-600GB (still large, but more manageable) ### Conclusion **Sharding is good** - it's essential for performance. The memory issue comes from: 1. Large WebSocket buffers (16KB) - biggest issue 2. Nested map overhead - moderate issue 3. Channel buffers - minor issue **Recommendation:** 1. ✅ Keep sharding (it's working well) 2. ⚠️ Reduce buffer sizes for memory-constrained environments 3. ⚠️ Consider flattening map structure if memory is critical 4. ✅ Test with reduced buffers to validate performance The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.