sharded-gotify/benchmark/HARDWARE_RECOMMENDATIONS.md

235 lines
6.5 KiB
Markdown

# Hardware Recommendations for WebSocket Benchmarking
## Testing Millions of WebSocket Connections
To properly test Gotify's ability to handle millions of concurrent WebSocket connections, you need to consider several hardware and system factors.
## M4 Mac Mini Considerations
### Pros:
- **Powerful CPU**: M4 chip has excellent single-threaded and multi-threaded performance
- **Unified Memory**: Fast memory access
- **Energy Efficient**: Can run tests for extended periods
### Cons:
- **Limited RAM Options**: Max 24GB (M4 Pro) or 36GB (M4 Max) - may be limiting for millions of connections
- **macOS Limitations**:
- Lower default file descriptor limits (~10,000)
- Docker Desktop overhead
- Network stack differences from Linux
- **Memory Per Connection**:
- Each WebSocket connection uses ~2-4KB of memory
- 1M connections = ~2-4GB just for connections
- Plus OS overhead, buffers, etc.
- Realistically need 8-16GB+ for 1M connections
### M4 Mac Mini Verdict:
**Good for**: Testing up to 100K-500K connections, development, validation
**Limited for**: Testing true millions of connections (1M+)
## Recommended Hardware for Full-Scale Testing
### Option 1: Linux Server (Recommended)
**Best for: 1M+ connections**
**Minimum Specs:**
- **CPU**: 8+ cores (Intel Xeon or AMD EPYC)
- **RAM**: 32GB+ (64GB+ recommended for 1M+ connections)
- **Network**: 10Gbps+ network interface
- **OS**: Linux (Ubuntu 22.04+ or similar)
- **Storage**: SSD for database
**Why Linux:**
- Higher file descriptor limits (can be set to 1M+)
- Better network stack performance
- Native Docker (no Desktop overhead)
- More control over system resources
**System Tuning Required:**
```bash
# Increase file descriptor limits
ulimit -n 1000000
echo "* soft nofile 1000000" >> /etc/security/limits.conf
echo "* hard nofile 1000000" >> /etc/security/limits.conf
# Network tuning
echo 'net.core.somaxconn = 65535' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog = 65535' >> /etc/sysctl.conf
echo 'net.ipv4.ip_local_port_range = 1024 65535' >> /etc/sysctl.conf
sysctl -p
```
### Option 2: Cloud Instance (AWS/GCP/Azure)
**Best for: Flexible scaling**
**Recommended Instance Types:**
- **AWS**: c6i.4xlarge or larger (16+ vCPUs, 32GB+ RAM)
- **GCP**: n2-standard-16 or larger
- **Azure**: Standard_D16s_v3 or larger
**Benefits:**
- Can scale up/down as needed
- High network bandwidth
- Easy to replicate test environments
- Can test with multiple instances
### Option 3: Dedicated Server
**Best for: Consistent long-term testing**
- **CPU**: 16+ cores
- **RAM**: 64GB+
- **Network**: 10Gbps+
- **Cost**: $200-500/month for good hardware
## Memory Requirements by Connection Count
| Connections | Estimated RAM | Recommended RAM | Notes |
|------------|---------------|------------------|-------|
| 10K | 2-4GB | 8GB | M4 Mac Mini ✅ |
| 100K | 4-8GB | 16GB | M4 Mac Mini ⚠️ (may work) |
| 500K | 8-16GB | 32GB | M4 Mac Mini ❌ |
| 1M | 16-32GB | 64GB | Linux Server ✅ |
| 5M | 80-160GB | 256GB | High-end Server ✅ |
*Note: RAM estimates include OS, Docker, and application overhead*
## Network Requirements
### Bandwidth Calculation:
- Each WebSocket connection: ~1-2KB initial handshake
- Ping/pong messages: ~10 bytes every 45 seconds
- Message delivery: Variable (depends on message size)
**For 1M connections:**
- Initial connection burst: ~1-2GB
- Sustained: ~100-200MB/s for ping/pong
- Message delivery: Depends on message rate
**Recommendation**: 10Gbps network for 1M+ connections
## Testing Strategy by Hardware
### M4 Mac Mini (24-36GB RAM)
1. **Start Small**: Test with 10K connections
2. **Scale Gradually**: 50K → 100K → 250K
3. **Monitor Memory**: Watch for OOM conditions
4. **Focus on**: Shard comparison, latency testing, throughput at moderate scale
**Commands:**
```bash
# Test with 10K connections
./benchmark/run-benchmark.sh scale 1k
# Compare shard configurations
./benchmark/run-benchmark.sh all
```
### Linux Server (32GB+ RAM)
1. **Full Scale Testing**: 100K → 500K → 1M+ connections
2. **Multiple Instances**: Test horizontal scaling
3. **Stress Testing**: Find breaking points
4. **Production Simulation**: Real-world scenarios
**Commands:**
```bash
# Test with 100K connections
SCALE=10k ./benchmark/run-benchmark.sh scale 10k
# Test with custom high connection count
# (modify k6 scripts for higher VU counts)
```
## System Limits to Check
### macOS (M4 Mac Mini)
```bash
# Check current limits
ulimit -n # File descriptors
sysctl kern.maxfiles # Max open files
sysctl kern.maxfilesperproc # Max files per process
# Increase limits (temporary)
ulimit -n 65536
```
### Linux
```bash
# Check limits
ulimit -n
cat /proc/sys/fs/file-max
# Increase limits (permanent)
# Edit /etc/security/limits.conf
```
## Docker Considerations
### Docker Desktop (macOS)
- **Overhead**: ~2-4GB RAM for Docker Desktop
- **Performance**: Slightly slower than native Linux
- **Limits**: Subject to macOS system limits
### Native Docker (Linux)
- **Overhead**: Minimal (~500MB)
- **Performance**: Near-native
- **Limits**: Can use full system resources
## Recommendations
### For Development & Initial Testing:
**M4 Mac Mini is fine**
- Test up to 100K-250K connections
- Validate shard configurations
- Test latency and throughput
- Develop and debug
### For Production-Scale Testing:
**Use Linux Server**
- Test 1M+ connections
- Validate true scalability
- Stress testing
- Production simulation
### Hybrid Approach:
1. **Develop on M4 Mac Mini**: Quick iteration, smaller scale tests
2. **Validate on Linux Server**: Full-scale testing before production
## Quick Start on M4 Mac Mini
```bash
# 1. Increase file descriptor limits
ulimit -n 65536
# 2. Start single instance for testing
docker-compose -f docker-compose.benchmark.yml up -d gotify-256
# 3. Run small-scale test
./benchmark/run-benchmark.sh 256 websocket-simple.js
# 4. Monitor resources
docker stats gotify-bench-256
```
## Quick Start on Linux Server
```bash
# 1. Tune system limits (see above)
# 2. Start all instances
docker-compose -f docker-compose.benchmark.yml up -d --build
# 3. Run full-scale test
./benchmark/run-benchmark.sh scale 10k
# 4. Monitor system resources
htop
docker stats
```
## Conclusion
**M4 Mac Mini**: Great for development and testing up to ~250K connections
**Linux Server**: Required for testing true millions of connections
Start with the M4 Mac Mini to validate the setup and optimizations, then move to a Linux server for full-scale production validation.