131 lines
3.9 KiB
Markdown
131 lines
3.9 KiB
Markdown
# Memory Usage Analysis
|
|
|
|
## Current Memory Per Connection: ~75KB
|
|
|
|
This is higher than ideal. Let's break down where the memory is going:
|
|
|
|
### Per-Connection Memory Breakdown
|
|
|
|
1. **WebSocket Buffers: 16KB** ⚠️ **Largest contributor**
|
|
- Read buffer: 8KB
|
|
- Write buffer: 8KB
|
|
- These are allocated per connection regardless of usage
|
|
|
|
2. **Channel Buffer: ~2KB**
|
|
- 10 messages * ~200 bytes per message
|
|
- Helps prevent blocking but uses memory
|
|
|
|
3. **Goroutine Stacks: ~4KB**
|
|
- 2 goroutines per connection (read + write handlers)
|
|
- ~2KB stack per goroutine (default Go stack size)
|
|
|
|
4. **Client Struct: ~100 bytes**
|
|
- Minimal overhead
|
|
|
|
5. **Map Overhead: Variable**
|
|
- Nested map structure: `map[uint]map[string][]*client`
|
|
- Each map level has hash table overhead
|
|
- Pointer storage overhead
|
|
|
|
6. **Go Runtime Overhead: ~2-4KB**
|
|
- GC metadata
|
|
- Runtime structures
|
|
|
|
7. **Docker/System Overhead: Shared**
|
|
- Container base memory
|
|
- System libraries
|
|
|
|
### Sharding Structure Analysis
|
|
|
|
**Current Structure:**
|
|
```go
|
|
map[uint]map[string][]*client // userID -> token -> []client
|
|
```
|
|
|
|
**Memory Impact:**
|
|
- ✅ **Good**: Sharding reduces lock contention significantly
|
|
- ⚠️ **Concern**: Nested maps add overhead
|
|
- Each map has bucket overhead (~8 bytes per bucket)
|
|
- Hash table structure overhead
|
|
- For 256 shards with sparse distribution, this adds up
|
|
|
|
**Is Sharding Okay?**
|
|
- ✅ **Yes, sharding is necessary** for performance at scale
|
|
- ⚠️ **But** we could optimize the structure for memory efficiency
|
|
|
|
### Optimization Opportunities
|
|
|
|
#### 1. Reduce Buffer Sizes (Quick Win)
|
|
**Current:** 8KB read + 8KB write = 16KB
|
|
**Optimized:** 2KB read + 2KB write = 4KB
|
|
**Savings:** ~12KB per connection (16% reduction)
|
|
|
|
**Trade-off:** More syscalls, but acceptable for most use cases
|
|
|
|
#### 2. Flatten Map Structure (Memory Optimization)
|
|
**Current:** `map[uint]map[string][]*client`
|
|
**Optimized:** `map[string]*client` with composite key
|
|
**Savings:** Eliminates one level of map overhead
|
|
|
|
**Trade-off:** Slightly more complex key generation, but better memory
|
|
|
|
#### 3. Reduce Channel Buffer Size
|
|
**Current:** 10 messages
|
|
**Optimized:** 5 messages
|
|
**Savings:** ~1KB per connection
|
|
|
|
**Trade-off:** Slightly higher chance of blocking, but usually acceptable
|
|
|
|
#### 4. Connection Pooling (Advanced)
|
|
Reuse connections or reduce goroutine overhead
|
|
|
|
### Recommended Optimizations
|
|
|
|
#### Option A: Quick Memory Reduction (Easy)
|
|
```yaml
|
|
# Reduce buffer sizes
|
|
readbuffersize: 2048 # from 8192
|
|
writebuffersize: 2048 # from 8192
|
|
channelbuffersize: 5 # from 10
|
|
```
|
|
**Expected:** ~12-15KB per connection (60-80% reduction in buffer overhead)
|
|
|
|
#### Option B: Structure Optimization (Medium)
|
|
Flatten the nested map structure to reduce overhead:
|
|
```go
|
|
// Instead of: map[uint]map[string][]*client
|
|
// Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token)
|
|
```
|
|
**Expected:** ~2-5KB per connection savings
|
|
|
|
#### Option C: Hybrid Approach (Best)
|
|
Combine buffer reduction + structure optimization
|
|
**Expected:** ~15-20KB per connection (down from 75KB to ~55-60KB)
|
|
|
|
### Real-World Expectations
|
|
|
|
**For 1M connections:**
|
|
- Current: ~75GB (75KB * 1M)
|
|
- Optimized: ~55-60GB (55KB * 1M)
|
|
- Savings: ~15-20GB
|
|
|
|
**For 10M connections:**
|
|
- Current: ~750GB (not feasible)
|
|
- Optimized: ~550-600GB (still large, but more manageable)
|
|
|
|
### Conclusion
|
|
|
|
**Sharding is good** - it's essential for performance. The memory issue comes from:
|
|
1. Large WebSocket buffers (16KB) - biggest issue
|
|
2. Nested map overhead - moderate issue
|
|
3. Channel buffers - minor issue
|
|
|
|
**Recommendation:**
|
|
1. ✅ Keep sharding (it's working well)
|
|
2. ⚠️ Reduce buffer sizes for memory-constrained environments
|
|
3. ⚠️ Consider flattening map structure if memory is critical
|
|
4. ✅ Test with reduced buffers to validate performance
|
|
|
|
The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.
|
|
|