sharded-gotify/benchmark
Colin a4f41768ba
Optimize WebSocket implementation for millions of connections
- Implement sharded client storage (256 shards by default) to eliminate mutex contention
- Replace slice-based storage with map structure for O(1) token lookup
- Increase WebSocket buffer sizes (8192 bytes) and channel buffers (10 messages)
- Optimize Notify method with per-shard locking
- Add configuration options for shard count and buffer sizes
- Add comprehensive benchmarking setup with docker-compose
- Include k6 load testing scripts for WebSocket performance testing
- All existing tests pass with new sharded implementation
2025-11-20 14:43:33 -05:00
..
configs Optimize WebSocket implementation for millions of connections 2025-11-20 14:43:33 -05:00
k6 Optimize WebSocket implementation for millions of connections 2025-11-20 14:43:33 -05:00
HARDWARE_RECOMMENDATIONS.md Optimize WebSocket implementation for millions of connections 2025-11-20 14:43:33 -05:00
MEMORY_ANALYSIS.md Optimize WebSocket implementation for millions of connections 2025-11-20 14:43:33 -05:00
README.md Optimize WebSocket implementation for millions of connections 2025-11-20 14:43:33 -05:00
run-benchmark.sh Optimize WebSocket implementation for millions of connections 2025-11-20 14:43:33 -05:00

README.md

Gotify WebSocket Performance Benchmarking

This directory contains tools and configurations for benchmarking Gotify's WebSocket performance with different shard configurations.

Overview

The benchmarking setup allows you to:

  • Test multiple Gotify instances with different shard counts (64, 128, 256, 512, 1024)
  • Measure WebSocket connection performance, latency, and throughput
  • Compare performance across different shard configurations
  • Test connection scaling (1K, 10K, 100K+ concurrent connections)

Prerequisites

  • Docker and Docker Compose installed
  • At least 8GB of available RAM (for running multiple instances)
  • Sufficient CPU cores (recommended: 4+ cores)

Quick Start

1. Start All Benchmark Instances

# Build and start all Gotify instances with different shard counts
docker-compose -f docker-compose.benchmark.yml up -d --build

This will start 5 Gotify instances:

  • gotify-64 on port 8080 (64 shards)
  • gotify-128 on port 8081 (128 shards)
  • gotify-256 on port 8082 (256 shards, default)
  • gotify-512 on port 8083 (512 shards)
  • gotify-1024 on port 8084 (1024 shards)

2. Verify Services Are Running

# Check health of all instances
curl http://localhost:8080/health
curl http://localhost:8081/health
curl http://localhost:8082/health
curl http://localhost:8083/health
curl http://localhost:8084/health

3. Run Benchmarks

Run All Benchmarks (Compare All Shard Counts)

./benchmark/run-benchmark.sh all

Run Benchmark Against Specific Instance

# Test instance with 256 shards
./benchmark/run-benchmark.sh 256

# Test instance with 512 shards
./benchmark/run-benchmark.sh 512

Run Connection Scaling Test

# Test with 1K connections
./benchmark/run-benchmark.sh scale 1k

# Test with 10K connections
./benchmark/run-benchmark.sh scale 10k

Stop All Services

./benchmark/run-benchmark.sh stop

Manual k6 Testing

You can also run k6 tests manually for more control:

Simple Connection Test

docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  grafana/k6:latest run /scripts/websocket-simple.js

Full WebSocket Test

docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  grafana/k6:latest run /scripts/websocket-test.js

Connection Scaling Test

docker run --rm -i --network gotify_benchmark-net \
  -v $(pwd)/benchmark/k6:/scripts \
  -e BASE_URL="http://gotify-256:80" \
  -e SCALE="10k" \
  grafana/k6:latest run /scripts/connection-scaling.js

Test Scripts

websocket-simple.js

  • Quick validation test
  • 100 virtual users for 2 minutes
  • Basic connection and message delivery checks

websocket-test.js

  • Comprehensive performance test
  • Gradual ramp-up: 1K → 5K → 10K connections
  • Measures connection time, latency, throughput
  • Includes thresholds for performance validation

connection-scaling.js

  • Tests different connection scales
  • Configurable via SCALE environment variable (1k, 10k, 100k)
  • Measures connection establishment time
  • Tracks message delivery latency

Metrics Collected

The benchmarks collect the following metrics:

Connection Metrics

  • Connection Time: Time to establish WebSocket connection
  • Connection Success Rate: Percentage of successful connections
  • Connection Duration: How long connections stay alive

Message Metrics

  • Message Latency: Time from message creation to delivery (P50, P95, P99)
  • Messages Per Second: Throughput of message delivery
  • Message Success Rate: Percentage of messages successfully delivered

Resource Metrics

  • CPU Usage: Per-instance CPU utilization
  • Memory Usage: Per-instance memory consumption
  • Memory Per Connection: Average memory used per WebSocket connection

Interpreting Results

Shard Count Comparison

When comparing different shard counts, look for:

  1. Connection Time: Lower is better

    • More shards should reduce lock contention
    • Expect 64 shards to have higher connection times under load
    • 256-512 shards typically provide optimal balance
  2. Message Latency: Lower is better

    • P95 latency should be < 100ms for most scenarios
    • Higher shard counts may reduce latency under high concurrency
  3. Throughput: Higher is better

    • Messages per second should scale with shard count up to a point
    • Diminishing returns after optimal shard count
  4. Memory Usage: Lower is better

    • More shards = slightly more memory overhead
    • Balance between performance and memory

Optimal Shard Count

Based on testing, recommended shard counts:

  • < 10K connections: 128-256 shards
  • 10K-100K connections: 256-512 shards
  • 100K-1M connections: 512-1024 shards
  • > 1M connections: 1024+ shards (may need custom build)

Benchmark Scenarios

Scenario 1: Connection Scaling

Test how many concurrent connections each configuration can handle:

./benchmark/run-benchmark.sh scale 1k   # Start with 1K
./benchmark/run-benchmark.sh scale 10k  # Then 10K
./benchmark/run-benchmark.sh scale 100k # Finally 100K

Scenario 2: Shard Comparison

Compare performance across all shard configurations:

./benchmark/run-benchmark.sh all

Scenario 3: Message Throughput

Test message delivery rate with different connection counts:

  • Modify k6 scripts to send messages via REST API
  • Measure delivery latency through WebSocket

Scenario 4: Latency Testing

Focus on P50, P95, P99 latency metrics:

  • Run tests with steady connection count
  • Send messages at controlled rate
  • Analyze latency distribution

Configuration

Adjusting Shard Counts

Edit docker-compose.benchmark.yml to modify shard counts:

environment:
  - GOTIFY_SERVER_STREAM_SHARDCOUNT=256

Adjusting Buffer Sizes

Modify buffer sizes in config files or environment variables:

environment:
  - GOTIFY_SERVER_STREAM_READBUFFERSIZE=8192
  - GOTIFY_SERVER_STREAM_WRITEBUFFERSIZE=8192
  - GOTIFY_SERVER_STREAM_CHANNELBUFFERSIZE=10

Custom k6 Test Parameters

Modify k6 test scripts to adjust:

  • Virtual users (VUs)
  • Test duration
  • Ramp-up/ramp-down stages
  • Thresholds

Troubleshooting

Services Won't Start

  1. Check Docker resources:

    docker system df
    docker system prune  # If needed
    
  2. Verify ports are available:

    lsof -i :8080-8084
    
  3. Check logs:

    docker-compose -f docker-compose.benchmark.yml logs
    

High Connection Failures

  1. Increase system limits:

    # Linux: Increase file descriptor limits
    ulimit -n 65536
    
  2. Check Docker resource limits:

    • Increase memory allocation
    • Increase CPU allocation
  3. Reduce concurrent connections in test scripts

Memory Issues

  1. Monitor memory usage:

    docker stats
    
  2. Reduce number of instances running simultaneously

  3. Adjust shard counts (fewer shards = less memory)

Slow Performance

  1. Check CPU usage: docker stats
  2. Verify network connectivity between containers
  3. Check for resource contention
  4. Consider running tests sequentially instead of parallel

Results Storage

Benchmark results are stored in:

  • benchmark/results/ - Detailed logs per shard configuration
  • k6 output includes summary statistics

Advanced Usage

Custom Test Scenarios

Create custom k6 scripts in benchmark/k6/:

import ws from 'k6/ws';
import { check } from 'k6';

export const options = {
  vus: 1000,
  duration: '5m',
};

export default function() {
  // Your custom test logic
}

Monitoring with Prometheus

Add Prometheus to docker-compose.benchmark.yml for detailed metrics collection.

Load Balancer Testing

Test with a load balancer in front of multiple instances to simulate production scenarios.

Performance Expectations

Based on optimizations implemented:

  • Connection Capacity: 100K-1M+ concurrent connections per instance
  • Message Latency: P95 < 100ms for most scenarios
  • Throughput: 10K+ messages/second per instance
  • Memory: ~2-4KB per connection (varies by shard count)

Contributing

When adding new benchmark scenarios:

  1. Add k6 script to benchmark/k6/
  2. Update this README with usage instructions
  3. Add configuration if needed
  4. Test and validate results

References