sharded-gotify/benchmark/MEMORY_ANALYSIS.md

# Memory Usage Analysis

## Current Memory Per Connection: ~75KB

This is higher than ideal. Let's break down where the memory is going:

### Per-Connection Memory Breakdown

1. **WebSocket Buffers: 16KB** ⚠️ **Largest contributor**
   - Read buffer: 8KB
   - Write buffer: 8KB
   - These are allocated per connection regardless of usage

2. **Channel Buffer: ~2KB**
   - 10 messages * ~200 bytes per message
   - Helps prevent blocking but uses memory

3. **Goroutine Stacks: ~4KB**
   - 2 goroutines per connection (read + write handlers)
   - ~2KB stack per goroutine (default Go stack size)

4. **Client Struct: ~100 bytes**
   - Minimal overhead

5. **Map Overhead: Variable**
   - Nested map structure: `map[uint]map[string][]*client`
   - Each map level has hash table overhead
   - Pointer storage overhead

6. **Go Runtime Overhead: ~2-4KB**
   - GC metadata
   - Runtime structures

7. **Docker/System Overhead: Shared**
   - Container base memory
   - System libraries

### Sharding Structure Analysis

**Current Structure:**
```go
map[uint]map[string][]*client  // userID -> token -> []client
```

**Memory Impact:**
- ✅ **Good**: Sharding reduces lock contention significantly
- ⚠️ **Concern**: Nested maps add overhead
  - Each map has bucket overhead (~8 bytes per bucket)
  - Hash table structure overhead
  - For 256 shards with sparse distribution, this adds up

**Is Sharding Okay?**
- ✅ **Yes, sharding is necessary** for performance at scale
- ⚠️ **But** we could optimize the structure for memory efficiency

### Optimization Opportunities

#### 1. Reduce Buffer Sizes (Quick Win)
**Current:** 8KB read + 8KB write = 16KB
**Optimized:** 2KB read + 2KB write = 4KB
**Savings:** ~12KB per connection (16% reduction)

**Trade-off:** More syscalls, but acceptable for most use cases

#### 2. Flatten Map Structure (Memory Optimization)
**Current:** `map[uint]map[string][]*client`
**Optimized:** `map[string]*client` with composite key
**Savings:** Eliminates one level of map overhead

**Trade-off:** Slightly more complex key generation, but better memory

#### 3. Reduce Channel Buffer Size
**Current:** 10 messages
**Optimized:** 5 messages
**Savings:** ~1KB per connection

**Trade-off:** Slightly higher chance of blocking, but usually acceptable

#### 4. Connection Pooling (Advanced)
Reuse connections or reduce goroutine overhead

### Recommended Optimizations

#### Option A: Quick Memory Reduction (Easy)
```yaml
# Reduce buffer sizes
readbuffersize: 2048    # from 8192
writebuffersize: 2048   # from 8192
channelbuffersize: 5    # from 10
```
**Expected:** ~12-15KB per connection (60-80% reduction in buffer overhead)

#### Option B: Structure Optimization (Medium)
Flatten the nested map structure to reduce overhead:
```go
// Instead of: map[uint]map[string][]*client
// Use: map[string]*client with key = fmt.Sprintf("%d:%s", userID, token)
```
**Expected:** ~2-5KB per connection savings

#### Option C: Hybrid Approach (Best)
Combine buffer reduction + structure optimization
**Expected:** ~15-20KB per connection (down from 75KB to ~55-60KB)

### Real-World Expectations

**For 1M connections:**
- Current: ~75GB (75KB * 1M)
- Optimized: ~55-60GB (55KB * 1M)
- Savings: ~15-20GB

**For 10M connections:**
- Current: ~750GB (not feasible)
- Optimized: ~550-600GB (still large, but more manageable)

### Conclusion

**Sharding is good** - it's essential for performance. The memory issue comes from:
1. Large WebSocket buffers (16KB) - biggest issue
2. Nested map overhead - moderate issue
3. Channel buffers - minor issue

**Recommendation:**
1. ✅ Keep sharding (it's working well)
2. ⚠️ Reduce buffer sizes for memory-constrained environments
3. ⚠️ Consider flattening map structure if memory is critical
4. ✅ Test with reduced buffers to validate performance

The 75KB includes Docker and Go runtime overhead. Actual application memory per connection is likely ~25-30KB, which is more reasonable but still could be optimized.