sharded-gotify/benchmark/MEMORY_ANALYSIS.md

131 lines
3.9 KiB
Markdown

# Memory Usage Analysis
## Current Memory Per Connection: ~75KB
This is higher than ideal. Let's break down where the memory is going:
### Per-Connection Memory Breakdown
1. **WebSocket Buffers: 16KB** ⚠️ **Largest contributor**
- Read buffer: 8KB
- Write buffer: 8KB
- These are allocated per connection regardless of usage
2. **Channel Buffer: ~2KB**
- 10 messages * ~200 bytes per message
- Helps prevent blocking but uses memory
3. **Goroutine Stacks: ~4KB**
- 2 goroutines per connection (read + write handlers)
- ~2KB stack per goroutine (default Go stack size)
4. **Client Struct: ~100 bytes**
- Minimal overhead
5. **Map Overhead: Variable**
- Nested map structure: `map[uint]map[string][]*client`
- Each map level has hash table overhead
- Pointer storage overhead
6. **Go Runtime Overhead: ~2-4KB**
- GC metadata
- Runtime structures
7. **Docker/System Overhead: Shared**
- Container base memory
- System libraries
### Sharding Structure Analysis
**Current Structure:**
```go
map[uint]map[string][]*client // userID -> token -> []client
```
**Memory Impact:**
-**Good**: Sharding reduces lock contention significantly
- ⚠️ **Concern**: Nested maps add overhead
- Each map has bucket overhead (~8 bytes per bucket)
- Hash table structure overhead
- For 256 shards with sparse distribution, this adds up
**Is Sharding Okay?**
-**Yes, sharding is necessary** for performance at scale
- ⚠️ **But** we could optimize the structure for memory efficiency
### Optimization Opportunities
#### 1. Reduce Buffer Sizes (Quick Win)
**Current:** 8KB read + 8KB write = 16KB
**Optimized:** 2KB read + 2KB write = 4KB
**Savings:** ~12KB per connection (16% reduction)
**Trade-off:** More syscalls, but acceptable for most use cases
#### 2. Flatten Map Structure (Memory Optimization)
**Current:** `map[uint]map[string][]*client`
**Optimized:** `map[string]*client` with composite key
**Savings:** Eliminates one level of map overhead
**Trade-off:** Slightly more complex key generation, but better memory
#### 3. Reduce Channel Buffer Size
**Current:** 10 messages
**Optimized:** 5 messages
**Savings:** ~1KB per connection
**Trade-off:** Slightly higher chance of blocking, but usually acceptable
#### 4. Connection Pooling (Advanced)
Reuse connections or reduce goroutine overhead
### Recommended Optimizations
#### Option A: Quick Memory Reduction (Easy)
```yaml
# Reduce buffer sizes
readbuffersize: 2048 # from 8192
writebuffersize: 2048 # from 8192
channelbuffersize: 5 # from 10
```
**Expected:** ~12-15KB per connection (60-80% reduction in buffer overhead)
#### Option B: Structure Optimization (Medium)
Flatten the nested map structure to reduce overhead:
```go
// Instead of: map[uint]map[string][]*client
// Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token)
```
**Expected:** ~2-5KB per connection savings
#### Option C: Hybrid Approach (Best)
Combine buffer reduction + structure optimization
**Expected:** ~15-20KB per connection (down from 75KB to ~55-60KB)
### Real-World Expectations
**For 1M connections:**
- Current: ~75GB (75KB * 1M)
- Optimized: ~55-60GB (55KB * 1M)
- Savings: ~15-20GB
**For 10M connections:**
- Current: ~750GB (not feasible)
- Optimized: ~550-600GB (still large, but more manageable)
### Conclusion
**Sharding is good** - it's essential for performance. The memory issue comes from:
1. Large WebSocket buffers (16KB) - biggest issue
2. Nested map overhead - moderate issue
3. Channel buffers - minor issue
**Recommendation:**
1. ✅ Keep sharding (it's working well)
2. ⚠️ Reduce buffer sizes for memory-constrained environments
3. ⚠️ Consider flattening map structure if memory is critical
4. ✅ Test with reduced buffers to validate performance
The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.