History

Colin a4f41768ba Optimize WebSocket implementation for millions of connections - Implement sharded client storage (256 shards by default) to eliminate mutex contention - Replace slice-based storage with map structure for O(1) token lookup - Increase WebSocket buffer sizes (8192 bytes) and channel buffers (10 messages) - Optimize Notify method with per-shard locking - Add configuration options for shard count and buffer sizes - Add comprehensive benchmarking setup with docker-compose - Include k6 load testing scripts for WebSocket performance testing - All existing tests pass with new sharded implementation		2025-11-20 14:43:33 -05:00
..
configs	Optimize WebSocket implementation for millions of connections	2025-11-20 14:43:33 -05:00
k6	Optimize WebSocket implementation for millions of connections	2025-11-20 14:43:33 -05:00
HARDWARE_RECOMMENDATIONS.md	Optimize WebSocket implementation for millions of connections	2025-11-20 14:43:33 -05:00
MEMORY_ANALYSIS.md	Optimize WebSocket implementation for millions of connections	2025-11-20 14:43:33 -05:00
README.md	Optimize WebSocket implementation for millions of connections	2025-11-20 14:43:33 -05:00
run-benchmark.sh	Optimize WebSocket implementation for millions of connections	2025-11-20 14:43:33 -05:00

README.md

Gotify WebSocket Performance Benchmarking

This directory contains tools and configurations for benchmarking Gotify's WebSocket performance with different shard configurations.

Overview

The benchmarking setup allows you to:

Test multiple Gotify instances with different shard counts (64, 128, 256, 512, 1024)
Measure WebSocket connection performance, latency, and throughput
Compare performance across different shard configurations
Test connection scaling (1K, 10K, 100K+ concurrent connections)

Prerequisites

Docker and Docker Compose installed
At least 8GB of available RAM (for running multiple instances)
Sufficient CPU cores (recommended: 4+ cores)

Quick Start

1. Start All Benchmark Instances

# Build and start all Gotify instances with different shard counts
docker-compose -f docker-compose.benchmark.yml up -d --build

This will start 5 Gotify instances:

gotify-64 on port 8080 (64 shards)
gotify-128 on port 8081 (128 shards)
gotify-256 on port 8082 (256 shards, default)
gotify-512 on port 8083 (512 shards)
gotify-1024 on port 8084 (1024 shards)

2. Verify Services Are Running

# Check health of all instances
curl http://localhost:8080/health
curl http://localhost:8081/health
curl http://localhost:8082/health
curl http://localhost:8083/health
curl http://localhost:8084/health

3. Run Benchmarks

Run All Benchmarks (Compare All Shard Counts)

./benchmark/run-benchmark.sh all

Run Benchmark Against Specific Instance

# Test instance with 256 shards
./benchmark/run-benchmark.sh 256

# Test instance with 512 shards
./benchmark/run-benchmark.sh 512

Run Connection Scaling Test

# Test with 1K connections
./benchmark/run-benchmark.sh scale 1k

# Test with 10K connections
./benchmark/run-benchmark.sh scale 10k

Stop All Services

./benchmark/run-benchmark.sh stop

Manual k6 Testing

You can also run k6 tests manually for more control:

Simple Connection Test

docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  grafana/k6:latest run /scripts/websocket-simple.js

Full WebSocket Test

docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  grafana/k6:latest run /scripts/websocket-test.js

Connection Scaling Test

docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  -e SCALE="10k" \
  grafana/k6:latest run /scripts/connection-scaling.js

Test Scripts

`websocket-simple.js`

Quick validation test
100 virtual users for 2 minutes
Basic connection and message delivery checks

`websocket-test.js`

Comprehensive performance test
Gradual ramp-up: 1K → 5K → 10K connections
Measures connection time, latency, throughput
Includes thresholds for performance validation

`connection-scaling.js`

Tests different connection scales
Configurable via SCALE environment variable (1k, 10k, 100k)
Measures connection establishment time
Tracks message delivery latency

Metrics Collected

The benchmarks collect the following metrics:

Connection Metrics

Connection Time: Time to establish WebSocket connection
Connection Success Rate: Percentage of successful connections
Connection Duration: How long connections stay alive

Message Metrics

Message Latency: Time from message creation to delivery (P50, P95, P99)
Messages Per Second: Throughput of message delivery
Message Success Rate: Percentage of messages successfully delivered

Resource Metrics

CPU Usage: Per-instance CPU utilization
Memory Usage: Per-instance memory consumption
Memory Per Connection: Average memory used per WebSocket connection

Interpreting Results

Shard Count Comparison

When comparing different shard counts, look for:

Connection Time: Lower is better
- More shards should reduce lock contention
- Expect 64 shards to have higher connection times under load
- 256-512 shards typically provide optimal balance
Message Latency: Lower is better
- P95 latency should be < 100ms for most scenarios
- Higher shard counts may reduce latency under high concurrency
Throughput: Higher is better
- Messages per second should scale with shard count up to a point
- Diminishing returns after optimal shard count
Memory Usage: Lower is better
- More shards = slightly more memory overhead
- Balance between performance and memory

Optimal Shard Count

Based on testing, recommended shard counts:

< 10K connections: 128-256 shards
10K-100K connections: 256-512 shards
100K-1M connections: 512-1024 shards
> 1M connections: 1024+ shards (may need custom build)

Benchmark Scenarios

Scenario 1: Connection Scaling

Test how many concurrent connections each configuration can handle:

./benchmark/run-benchmark.sh scale 1k   # Start with 1K
./benchmark/run-benchmark.sh scale 10k  # Then 10K
./benchmark/run-benchmark.sh scale 100k # Finally 100K

Scenario 2: Shard Comparison

Compare performance across all shard configurations:

./benchmark/run-benchmark.sh all

Scenario 3: Message Throughput

Test message delivery rate with different connection counts:

Modify k6 scripts to send messages via REST API
Measure delivery latency through WebSocket

Scenario 4: Latency Testing

Focus on P50, P95, P99 latency metrics:

Run tests with steady connection count
Send messages at controlled rate
Analyze latency distribution

Configuration

Adjusting Shard Counts

Edit docker-compose.benchmark.yml to modify shard counts:

environment:
  - GOTIFY_SERVER_STREAM_SHARDCOUNT=256

Adjusting Buffer Sizes

Modify buffer sizes in config files or environment variables:

environment:
  - GOTIFY_SERVER_STREAM_READBUFFERSIZE=8192
  - GOTIFY_SERVER_STREAM_WRITEBUFFERSIZE=8192
  - GOTIFY_SERVER_STREAM_CHANNELBUFFERSIZE=10

Custom k6 Test Parameters

Modify k6 test scripts to adjust:

Virtual users (VUs)
Test duration
Ramp-up/ramp-down stages
Thresholds

Troubleshooting

Services Won't Start

Check Docker resources:

docker system df
docker system prune  # If needed

Verify ports are available:
```
lsof -i :8080-8084
```

Check logs:

docker-compose -f docker-compose.benchmark.yml logs

High Connection Failures

Increase system limits:

# Linux: Increase file descriptor limits
ulimit -n 65536

Check Docker resource limits:
- Increase memory allocation
- Increase CPU allocation
Reduce concurrent connections in test scripts

Memory Issues

Monitor memory usage:
```
docker stats
```
Reduce number of instances running simultaneously
Adjust shard counts (fewer shards = less memory)

Slow Performance

Check CPU usage: docker stats
Verify network connectivity between containers
Check for resource contention
Consider running tests sequentially instead of parallel

Results Storage

Benchmark results are stored in:

benchmark/results/ - Detailed logs per shard configuration
k6 output includes summary statistics

Advanced Usage

Custom Test Scenarios

Create custom k6 scripts in benchmark/k6/:

import ws from 'k6/ws';
import { check } from 'k6';

export const options = {
  vus: 1000,
  duration: '5m',
};

export default function() {
  // Your custom test logic
}

Monitoring with Prometheus

Add Prometheus to docker-compose.benchmark.yml for detailed metrics collection.

Load Balancer Testing

Test with a load balancer in front of multiple instances to simulate production scenarios.

Performance Expectations

Based on optimizations implemented:

Connection Capacity: 100K-1M+ concurrent connections per instance
Message Latency: P95 < 100ms for most scenarios
Throughput: 10K+ messages/second per instance
Memory: ~2-4KB per connection (varies by shard count)

Contributing

When adding new benchmark scenarios:

Add k6 script to benchmark/k6/
Update this README with usage instructions
Add configuration if needed
Test and validate results