Benchmarks

Swytch is designed for production workloads where performance and durability both matter. This page presents benchmark results comparing Swytch to Redis under various conditions.

Test Environment

All benchmarks were run on:

CPU: AMD Ryzen 5 3600 (6 cores / 12 threads)
RAM: 64GB DDR4
Storage: Samsung NVMe in RAID0
OS: Ubuntu 24.04
Connection: Unix socket (lowest latency)

Where Does the Speed Come From?

Swytch’s performance advantage comes from a lock-free architecture designed from the ground up for multicore systems:

Lock-free transactional index: Concurrent reads and writes proceed without blocking each other. No thread waits for another to release a lock.
Novel eviction algorithm: A self-tuning, lock-free eviction system that adapts to your workload in real-time. This is an area of active research—expect further improvements as we refine the algorithm.
FASTER-inspired storage: The persistent storage layer uses techniques from Microsoft’s FASTER research project, enabling lock-free append-only writes with indexed lookups that don’t block the hot path.

Redis processes commands single-threaded. Swytch processes commands in parallel across all available cores while maintaining the same consistency guarantees. The result: linear scaling with core count instead of a single-threaded bottleneck.

Synthetic Benchmarks

redis-benchmark (Single Operations)

Using redis-benchmark with 4 threads, 100 clients, 500K operations per command, 16-byte values, Unix socket:

Command	Swytch	Redis	Ratio
PING	199,760 ops/s	90,876 ops/s	2.2x
SET	181,620 ops/s	83,292 ops/s	2.2x
GET	199,760 ops/s	86,926 ops/s	2.3x
INCR	166,556 ops/s	83,292 ops/s	2.0x
LPUSH	181,686 ops/s	83,306 ops/s	2.2x
LPOP	181,620 ops/s	83,306 ops/s	2.2x
SADD	166,611 ops/s	90,876 ops/s	1.8x
HSET	153,799 ops/s	83,278 ops/s	1.8x
LRANGE_100	133,191 ops/s	64,483 ops/s	2.1x
LRANGE_600	47,519 ops/s	28,128 ops/s	1.7x
MSET (10 keys)	111,062 ops/s	64,483 ops/s	1.7x

Swytch achieves 2x+ throughput on most operations while providing full per-operation durability. Redis was configured with appendfsync everysec (1 second of potential data loss).

memtier_benchmark (Realistic Workloads)

High-throughput pipeline test (4 threads, 10 clients, pipeline 50, 10 M write operations):

System	Throughput	p50 Latency
Swytch	643,836 ops/s	2.34ms
Redis	622,854 ops/s	3.18ms

Large value test (4 threads, 20 clients, 4KB values, 1:10 write:read ratio, rate-limited):

Metric	Swytch	Redis	Ratio
Throughput	203,266 ops/s	86,238 ops/s	2.4x
GET p50	0.35ms	0.90ms	2.6x
GET p99	1.00ms	1.76ms	1.8x

Zipf 0.99 distribution (4 threads, 50 clients, 256-byte values, 1:10 write:read ratio):

Metric	Swytch	Redis	Ratio
Throughput	213,912 ops/s	89,235 ops/s	2.4x
p50	0.91ms	2.21ms	2.4x
p99	1.53ms	4.38ms	2.9x

High-concurrency pipeline (8 threads, 100 clients, pipeline 10, Zipf 1.1, 128-byte values):

Metric	Swytch	Redis	Ratio
Throughput	1,473,367 ops/s	580,041 ops/s	2.5x
p50	4.51ms	13.63ms	3.0x
p99	19.20ms	26.88ms	1.4x

Production Trace Replay

We replayed production cache traces from published academic datasets to measure real-world hit rates and backend impact. These traces capture actual access patterns from hyperscale deployments.

Disk-Tiered Storage

Swytch can use disk as extended storage, not just for durability. While Redis evicts data when RAM fills up, Swytch transparently tiers cold data to disk and serves it with minimal latency penalty. This is ideal for key-value store workloads where you want all data accessible, or highly cacheable workloads with predictable access patterns.

Alibaba Block Storage Trace (10MB RAM, 8.6M operations, appendfsync everysec):

Metric	Swytch	Redis
Hit Rate	99.69%	33.06%
Hits	8,581,109	2,845,487
Misses	26,317	5,761,939
Throughput	14,253/s	12,951/s
Avg GET Latency	70.2µs	77.2µs
Backend Reduction	99.5%	baseline

Redis is RAM-only: when the 10MB limit is reached, it evicts data aggressively, resulting in a 33% hit rate. Swytch keeps hot data in RAM and tiers the rest to NVMe, achieving near-perfect hit rates with **no meaningful latency penalty ** (70µs avg).

When to use this mode:

Key-value store replacing a database, where all keys should remain accessible
Workloads with predictable, cacheable access patterns
Situations where disk is cheap but cache misses are expensive

When NOT to use this mode:

Traditional cache-aside patterns where misses are expected and acceptable
Workloads with unbounded key growth (disk isn’t infinite either)

For pure caching workloads, see the memory-constrained benchmarks below where both systems operate under the same RAM limits.

Memory-Constrained Caching (Eviction Algorithm Comparison)

When both systems are constrained to the same RAM limit—a fair apples-to-apples comparison—Swytch’s adaptive eviction algorithm outperforms Redis LRU.

Alibaba Block Storage (18MB cache limit, 48-hour trace):

Metric	Swytch	Redis
Hit Rate	82.13%	73.31%
Backend Reduction	33%	—

Swytch maintains a 9 percentage point advantage in hit rate under identical memory constraints. The algorithm accounts for access frequency, recency, and object size—not just recency like Redis LRU.

Adequate Memory Scenarios

When memory is sufficient for the working set, both systems achieve similar hit rates, but Swytch maintains its throughput advantage.

Alibaba Block Storage (40GB cache, 48-hour trace, appendfsync everysec):

Metric	Swytch	Redis
Hit Rate	99.69%	99.69%
Throughput	21,186/s	19,493/s
Avg GET Latency	47.2µs	51.3µs

Twitter Cluster (40GB cache, 20-minute trace, appendfsync everysec):

Metric	Swytch	Redis
Hit Rate	86.94%	86.95%
Throughput	29,793/s	30,909/s
Avg GET Latency	33.6µs	32.4µs

Hit rates are effectively identical. Choose Swytch for the durability guarantees without sacrificing performance.

Variable Object Sizes

Tencent Photo CDN (40GB cache, 5.5M operations, appendfsync everysec):

Metric	Swytch	Redis
Hit Rate	41.94%	39.14%
Backend Reduction	4.6%	—
RPS Saved	128 req/s	—

With highly variable object sizes (typical of CDN workloads), Swytch’s size-aware eviction provides a modest but consistent advantage.

Latency Distribution

Swytch consistently delivers more requests in the sub-100µs bucket:

GET Latency (Alibaba trace, adequate memory):

Bucket	Swytch	Redis
<100µs	98.7%	96.0%
<500µs	1.2%	4.0%
<1ms	0.0%	0.0%

GET Latency (Alibaba trace, memory-constrained 18MB):

Bucket	Swytch	Redis
<100µs	95.4%	97.8%
<500µs	4.4%	2.2%
<1ms	0.0%	0.0%

Under memory pressure, Swytch trades slightly more latency variance for dramatically better hit rates—a worthwhile tradeoff when each miss costs a database round-trip.

Tiered Storage Performance

Swytch’s tiered storage provides full durability (10 ms max data loss) with minimal performance impact.

Test: memtier_benchmark, 4 threads, 50 clients, 256-byte values, Unix socket

Throughput

Workload	Write-Through	Ghost Mode
100% writes	247,000 ops/s	397,000 ops/s
100% reads	418,000 ops/s	—
50/50 mixed	336,000 ops/s	—

Latency

Workload	Mode	p50	p99	p99.9
100% writes	Write-through	0.52ms	3.36ms	6.50ms
100% writes	Ghost	0.43ms	2.19ms	5.41ms
100% reads	Write-through	0.42ms	1.77ms	4.51ms
50/50 mixed	Write-through	0.44ms	2.98ms	5.44ms

Write-through mode (full durability) adds minimal latency overhead. Ghost mode (write-back) offers higher write throughput when eventual persistence is acceptable.

Summary

Scenario	Swytch Advantage
Single-op throughput	2-2.3x faster
Pipeline throughput	2.5x faster at high concurrency
Disk-tiered storage	Near-perfect hit rates with NVMe backend
Memory-constrained caching	9 percentage points better hit rate
Durability	10ms vs 1000ms max data loss

Swytch delivers higher throughput, lower latency, and better cache efficiency under memory pressure—all while providing stronger durability guarantees than Redis. For workloads that benefit from disk-tiered storage, Swytch can serve as a high-performance key-value store with near-perfect data availability.

Reproducing These Benchmarks

redis-benchmark

# Single operations (matches our test parameters)
redis-benchmark -s /path/to/socket \
  -t ping_inline,ping_mbulk,set,get,incr,lpush,rpush,lpop,rpop,sadd,hset,spop,lrange_100,lrange_300,lrange_500,lrange_600,mset \
  --csv -d 16 --threads 4 -c 100 -n 500000

memtier_benchmark

# High-throughput pipeline (write-heavy)
memtier_benchmark --protocol=redis -S /path/to/socket \
  -t 4 -c 10 --pipeline=50 \
  --key-minimum=1 --key-maximum=10000000 \
  --key-pattern=P:P --ratio=1:0 -n allkeys \
  --hide-histogram

# Large values with rate limiting
memtier_benchmark --protocol=redis -S /path/to/socket \
  -t 4 -c 20 --pipeline=1 --rate-limiting=50000 \
  --key-minimum=1 --key-maximum=2000000 \
  --ratio=1:10 -n 200000 --data-size=4096 \
  --hide-histogram

# Zipf distribution (hot keys)
memtier_benchmark --protocol=redis -S /path/to/socket \
  --threads=4 --clients=50 --requests=100000 \
  --ratio=1:10 --key-pattern=Z:Z \
  --key-zipf-exp=0.99 --key-maximum=100000 \
  --data-size=256 --hide-histogram

# High-concurrency pipeline
memtier_benchmark --protocol=redis -S /path/to/socket \
  --threads=8 --clients=100 --requests=100000 \
  --ratio=1:20 --key-pattern=Z:Z \
  --key-zipf-exp=1.1 --key-maximum=50000 \
  --data-size=128 --pipeline=10 --hide-histogram

Production Trace Replay

Our trace-bench tool replays real production traces against both Redis and Swytch:

# Memory-constrained with persistence
./trace-bench --real --real-vsize \
  --swytch-path ./swytch \
  --time-limit 48h \
  --gb 0.010 --ram 1 --cpus 4 \
  --noscale \
  --trace alibabaBlock_277.oracleGeneral.zst \
  --persistent-everysec

# Adequate memory
./trace-bench --real --real-vsize \
  --swytch-path ./swytch \
  --time-limit 48h \
  --gb 40 --ram 50 --cpus 16 \
  --noscale \
  --trace alibabaBlock_277.oracleGeneral.zst

Trace files are available from the CacheMon cache_dataset project.