sharded-gotify/benchmark/README.md

# Gotify WebSocket Performance Benchmarking

This directory contains tools and configurations for benchmarking Gotify's WebSocket performance with different shard configurations.

## Overview

The benchmarking setup allows you to:
- Test multiple Gotify instances with different shard counts (64, 128, 256, 512, 1024)
- Measure WebSocket connection performance, latency, and throughput
- Compare performance across different shard configurations
- Test connection scaling (1K, 10K, 100K+ concurrent connections)

## Prerequisites

- Docker and Docker Compose installed
- At least 8GB of available RAM (for running multiple instances)
- Sufficient CPU cores (recommended: 4+ cores)

## Quick Start

### 1. Start All Benchmark Instances

```bash
# Build and start all Gotify instances with different shard counts
docker-compose -f docker-compose.benchmark.yml up -d --build
```

This will start 5 Gotify instances:
- `gotify-64` on port 8080 (64 shards)
- `gotify-128` on port 8081 (128 shards)
- `gotify-256` on port 8082 (256 shards, default)
- `gotify-512` on port 8083 (512 shards)
- `gotify-1024` on port 8084 (1024 shards)

### 2. Verify Services Are Running

```bash
# Check health of all instances
curl http://localhost:8080/health
curl http://localhost:8081/health
curl http://localhost:8082/health
curl http://localhost:8083/health
curl http://localhost:8084/health
```

### 3. Run Benchmarks

#### Run All Benchmarks (Compare All Shard Counts)

```bash
./benchmark/run-benchmark.sh all
```

#### Run Benchmark Against Specific Instance

```bash
# Test instance with 256 shards
./benchmark/run-benchmark.sh 256

# Test instance with 512 shards
./benchmark/run-benchmark.sh 512
```

#### Run Connection Scaling Test

```bash
# Test with 1K connections
./benchmark/run-benchmark.sh scale 1k

# Test with 10K connections
./benchmark/run-benchmark.sh scale 10k
```

#### Stop All Services

```bash
./benchmark/run-benchmark.sh stop
```

## Manual k6 Testing

You can also run k6 tests manually for more control:

### Simple Connection Test

```bash
docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  grafana/k6:latest run /scripts/websocket-simple.js
```

### Full WebSocket Test

```bash
docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  grafana/k6:latest run /scripts/websocket-test.js
```

### Connection Scaling Test

```bash
docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  -e SCALE="10k" \
  grafana/k6:latest run /scripts/connection-scaling.js
```

## Test Scripts

### `websocket-simple.js`
- Quick validation test
- 100 virtual users for 2 minutes
- Basic connection and message delivery checks

### `websocket-test.js`
- Comprehensive performance test
- Gradual ramp-up: 1K → 5K → 10K connections
- Measures connection time, latency, throughput
- Includes thresholds for performance validation

### `connection-scaling.js`
- Tests different connection scales
- Configurable via `SCALE` environment variable (1k, 10k, 100k)
- Measures connection establishment time
- Tracks message delivery latency

## Metrics Collected

The benchmarks collect the following metrics:

### Connection Metrics
- **Connection Time**: Time to establish WebSocket connection
- **Connection Success Rate**: Percentage of successful connections
- **Connection Duration**: How long connections stay alive

### Message Metrics
- **Message Latency**: Time from message creation to delivery (P50, P95, P99)
- **Messages Per Second**: Throughput of message delivery
- **Message Success Rate**: Percentage of messages successfully delivered

### Resource Metrics
- **CPU Usage**: Per-instance CPU utilization
- **Memory Usage**: Per-instance memory consumption
- **Memory Per Connection**: Average memory used per WebSocket connection

## Interpreting Results

### Shard Count Comparison

When comparing different shard counts, look for:

1. **Connection Time**: Lower is better
   - More shards should reduce lock contention
   - Expect 64 shards to have higher connection times under load
   - 256-512 shards typically provide optimal balance

2. **Message Latency**: Lower is better
   - P95 latency should be < 100ms for most scenarios
   - Higher shard counts may reduce latency under high concurrency

3. **Throughput**: Higher is better
   - Messages per second should scale with shard count up to a point
   - Diminishing returns after optimal shard count

4. **Memory Usage**: Lower is better
   - More shards = slightly more memory overhead
   - Balance between performance and memory

### Optimal Shard Count

Based on testing, recommended shard counts:
- **< 10K connections**: 128-256 shards
- **10K-100K connections**: 256-512 shards
- **100K-1M connections**: 512-1024 shards
- **> 1M connections**: 1024+ shards (may need custom build)

## Benchmark Scenarios

### Scenario 1: Connection Scaling
Test how many concurrent connections each configuration can handle:
```bash
./benchmark/run-benchmark.sh scale 1k   # Start with 1K
./benchmark/run-benchmark.sh scale 10k  # Then 10K
./benchmark/run-benchmark.sh scale 100k # Finally 100K
```

### Scenario 2: Shard Comparison
Compare performance across all shard configurations:
```bash
./benchmark/run-benchmark.sh all
```

### Scenario 3: Message Throughput
Test message delivery rate with different connection counts:
- Modify k6 scripts to send messages via REST API
- Measure delivery latency through WebSocket

### Scenario 4: Latency Testing
Focus on P50, P95, P99 latency metrics:
- Run tests with steady connection count
- Send messages at controlled rate
- Analyze latency distribution

## Configuration

### Adjusting Shard Counts

Edit `docker-compose.benchmark.yml` to modify shard counts:

```yaml
environment:
  - GOTIFY_SERVER_STREAM_SHARDCOUNT=256
```

### Adjusting Buffer Sizes

Modify buffer sizes in config files or environment variables:

```yaml
environment:
  - GOTIFY_SERVER_STREAM_READBUFFERSIZE=8192
  - GOTIFY_SERVER_STREAM_WRITEBUFFERSIZE=8192
  - GOTIFY_SERVER_STREAM_CHANNELBUFFERSIZE=10
```

### Custom k6 Test Parameters

Modify k6 test scripts to adjust:
- Virtual users (VUs)
- Test duration
- Ramp-up/ramp-down stages
- Thresholds

## Troubleshooting

### Services Won't Start

1. Check Docker resources:
   ```bash
   docker system df
   docker system prune  # If needed
   ```

2. Verify ports are available:
   ```bash
   lsof -i :8080-8084
   ```

3. Check logs:
   ```bash
   docker-compose -f docker-compose.benchmark.yml logs
   ```

### High Connection Failures

1. Increase system limits:
   ```bash
   # Linux: Increase file descriptor limits
   ulimit -n 65536
   ```

2. Check Docker resource limits:
   - Increase memory allocation
   - Increase CPU allocation

3. Reduce concurrent connections in test scripts

### Memory Issues

1. Monitor memory usage:
   ```bash
   docker stats
   ```

2. Reduce number of instances running simultaneously
3. Adjust shard counts (fewer shards = less memory)

### Slow Performance

1. Check CPU usage: `docker stats`
2. Verify network connectivity between containers
3. Check for resource contention
4. Consider running tests sequentially instead of parallel

## Results Storage

Benchmark results are stored in:
- `benchmark/results/` - Detailed logs per shard configuration
- k6 output includes summary statistics

## Advanced Usage

### Custom Test Scenarios

Create custom k6 scripts in `benchmark/k6/`:

```javascript
import ws from 'k6/ws';
import { check } from 'k6';

export const options = {
  vus: 1000,
  duration: '5m',
};

export default function() {
  // Your custom test logic
}
```

### Monitoring with Prometheus

Add Prometheus to `docker-compose.benchmark.yml` for detailed metrics collection.

### Load Balancer Testing

Test with a load balancer in front of multiple instances to simulate production scenarios.

## Performance Expectations

Based on optimizations implemented:

- **Connection Capacity**: 100K-1M+ concurrent connections per instance
- **Message Latency**: P95 < 100ms for most scenarios
- **Throughput**: 10K+ messages/second per instance
- **Memory**: ~2-4KB per connection (varies by shard count)

## Contributing

When adding new benchmark scenarios:
1. Add k6 script to `benchmark/k6/`
2. Update this README with usage instructions
3. Add configuration if needed
4. Test and validate results

## References

- [k6 WebSocket Documentation](https://k6.io/docs/javascript-api/k6-ws/)
- [Gotify Configuration](https://gotify.net/docs/config)
- [WebSocket Performance Best Practices](https://www.ably.com/topic/websockets)