Benchmarks
Swytch is designed for production workloads where performance and durability both matter. This page presents benchmark results comparing Swytch to Redis under various conditions.
All benchmarks were run on:
- CPU: AMD Ryzen 5 3600 (6 cores / 12 threads)
- RAM: 64GB DDR4
- Storage: Samsung NVMe in RAID0
- OS: Ubuntu 24.04
- Connection: Unix socket (lowest latency)
Swytch’s performance advantage comes from a lock-free architecture designed from the ground up for multicore systems:
Lock-free transactional index: Concurrent reads and writes proceed without blocking each other. No thread waits for another to release a lock.
Novel eviction algorithm: A self-tuning, lock-free eviction system that adapts to your workload in real-time. This is an area of active research—expect further improvements as we refine the algorithm.
FASTER-inspired storage: The persistent storage layer uses techniques from Microsoft’s FASTER research project, enabling lock-free append-only writes with indexed lookups that don’t block the hot path.
Redis processes commands single-threaded. Swytch processes commands in parallel across all available cores while maintaining the same consistency guarantees. The result: linear scaling with core count instead of a single-threaded bottleneck.
Using redis-benchmark with 4 threads, 100 clients, 500K operations per command, 16-byte values, Unix socket:
| Command | Swytch | Redis | Ratio |
|---|---|---|---|
| PING | 199,760 ops/s | 90,876 ops/s | 2.2x |
| SET | 181,620 ops/s | 83,292 ops/s | 2.2x |
| GET | 199,760 ops/s | 86,926 ops/s | 2.3x |
| INCR | 166,556 ops/s | 83,292 ops/s | 2.0x |
| LPUSH | 181,686 ops/s | 83,306 ops/s | 2.2x |
| LPOP | 181,620 ops/s | 83,306 ops/s | 2.2x |
| SADD | 166,611 ops/s | 90,876 ops/s | 1.8x |
| HSET | 153,799 ops/s | 83,278 ops/s | 1.8x |
| LRANGE_100 | 133,191 ops/s | 64,483 ops/s | 2.1x |
| LRANGE_600 | 47,519 ops/s | 28,128 ops/s | 1.7x |
| MSET (10 keys) | 111,062 ops/s | 64,483 ops/s | 1.7x |
Swytch achieves 2x+ throughput on most operations while providing full per-operation durability. Redis was
configured with appendfsync everysec (1 second of potential data loss).
High-throughput pipeline test (4 threads, 10 clients, pipeline 50, 10 M write operations):
| System | Throughput | p50 Latency |
|---|---|---|
| Swytch | 643,836 ops/s | 2.34ms |
| Redis | 622,854 ops/s | 3.18ms |
Large value test (4 threads, 20 clients, 4KB values, 1:10 write:read ratio, rate-limited):
| Metric | Swytch | Redis | Ratio |
|---|---|---|---|
| Throughput | 203,266 ops/s | 86,238 ops/s | 2.4x |
| GET p50 | 0.35ms | 0.90ms | 2.6x |
| GET p99 | 1.00ms | 1.76ms | 1.8x |
Zipf 0.99 distribution (4 threads, 50 clients, 256-byte values, 1:10 write:read ratio):
| Metric | Swytch | Redis | Ratio |
|---|---|---|---|
| Throughput | 213,912 ops/s | 89,235 ops/s | 2.4x |
| p50 | 0.91ms | 2.21ms | 2.4x |
| p99 | 1.53ms | 4.38ms | 2.9x |
High-concurrency pipeline (8 threads, 100 clients, pipeline 10, Zipf 1.1, 128-byte values):
| Metric | Swytch | Redis | Ratio |
|---|---|---|---|
| Throughput | 1,473,367 ops/s | 580,041 ops/s | 2.5x |
| p50 | 4.51ms | 13.63ms | 3.0x |
| p99 | 19.20ms | 26.88ms | 1.4x |
We replayed production cache traces from published academic datasets to measure real-world hit rates and backend impact. These traces capture actual access patterns from hyperscale deployments.
Swytch can use disk as extended storage, not just for durability. While Redis evicts data when RAM fills up, Swytch transparently tiers cold data to disk and serves it with minimal latency penalty. This is ideal for key-value store workloads where you want all data accessible, or highly cacheable workloads with predictable access patterns.
Alibaba Block Storage Trace (10MB RAM, 8.6M operations, appendfsync everysec):
| Metric | Swytch | Redis |
|---|---|---|
| Hit Rate | 99.69% | 33.06% |
| Hits | 8,581,109 | 2,845,487 |
| Misses | 26,317 | 5,761,939 |
| Throughput | 14,253/s | 12,951/s |
| Avg GET Latency | 70.2µs | 77.2µs |
| Backend Reduction | 99.5% | baseline |
Redis is RAM-only: when the 10MB limit is reached, it evicts data aggressively, resulting in a 33% hit rate. Swytch keeps hot data in RAM and tiers the rest to NVMe, achieving near-perfect hit rates with **no meaningful latency penalty ** (70µs avg).
When to use this mode:
- Key-value store replacing a database, where all keys should remain accessible
- Workloads with predictable, cacheable access patterns
- Situations where disk is cheap but cache misses are expensive
When NOT to use this mode:
- Traditional cache-aside patterns where misses are expected and acceptable
- Workloads with unbounded key growth (disk isn’t infinite either)
For pure caching workloads, see the memory-constrained benchmarks below where both systems operate under the same RAM limits.
When both systems are constrained to the same RAM limit—a fair apples-to-apples comparison—Swytch’s adaptive eviction algorithm outperforms Redis LRU.
Alibaba Block Storage (18MB cache limit, 48-hour trace):
| Metric | Swytch | Redis |
|---|---|---|
| Hit Rate | 82.13% | 73.31% |
| Backend Reduction | 33% | — |
Swytch maintains a 9 percentage point advantage in hit rate under identical memory constraints. The algorithm accounts for access frequency, recency, and object size—not just recency like Redis LRU.
When memory is sufficient for the working set, both systems achieve similar hit rates, but Swytch maintains its throughput advantage.
Alibaba Block Storage (40GB cache, 48-hour trace, appendfsync everysec):
| Metric | Swytch | Redis |
|---|---|---|
| Hit Rate | 99.69% | 99.69% |
| Throughput | 21,186/s | 19,493/s |
| Avg GET Latency | 47.2µs | 51.3µs |
Twitter Cluster (40GB cache, 20-minute trace, appendfsync everysec):
| Metric | Swytch | Redis |
|---|---|---|
| Hit Rate | 86.94% | 86.95% |
| Throughput | 29,793/s | 30,909/s |
| Avg GET Latency | 33.6µs | 32.4µs |
Hit rates are effectively identical. Choose Swytch for the durability guarantees without sacrificing performance.
Tencent Photo CDN (40GB cache, 5.5M operations, appendfsync everysec):
| Metric | Swytch | Redis |
|---|---|---|
| Hit Rate | 41.94% | 39.14% |
| Backend Reduction | 4.6% | — |
| RPS Saved | 128 req/s | — |
With highly variable object sizes (typical of CDN workloads), Swytch’s size-aware eviction provides a modest but consistent advantage.
Swytch consistently delivers more requests in the sub-100µs bucket:
GET Latency (Alibaba trace, adequate memory):
| Bucket | Swytch | Redis |
|---|---|---|
| <100µs | 98.7% | 96.0% |
| <500µs | 1.2% | 4.0% |
| <1ms | 0.0% | 0.0% |
GET Latency (Alibaba trace, memory-constrained 18MB):
| Bucket | Swytch | Redis |
|---|---|---|
| <100µs | 95.4% | 97.8% |
| <500µs | 4.4% | 2.2% |
| <1ms | 0.0% | 0.0% |
Under memory pressure, Swytch trades slightly more latency variance for dramatically better hit rates—a worthwhile tradeoff when each miss costs a database round-trip.
Swytch’s tiered storage provides full durability (10 ms max data loss) with minimal performance impact.
Test: memtier_benchmark, 4 threads, 50 clients, 256-byte values, Unix socket
| Workload | Write-Through | Ghost Mode |
|---|---|---|
| 100% writes | 247,000 ops/s | 397,000 ops/s |
| 100% reads | 418,000 ops/s | — |
| 50/50 mixed | 336,000 ops/s | — |
| Workload | Mode | p50 | p99 | p99.9 |
|---|---|---|---|---|
| 100% writes | Write-through | 0.52ms | 3.36ms | 6.50ms |
| 100% writes | Ghost | 0.43ms | 2.19ms | 5.41ms |
| 100% reads | Write-through | 0.42ms | 1.77ms | 4.51ms |
| 50/50 mixed | Write-through | 0.44ms | 2.98ms | 5.44ms |
Write-through mode (full durability) adds minimal latency overhead. Ghost mode (write-back) offers higher write throughput when eventual persistence is acceptable.
| Scenario | Swytch Advantage |
|---|---|
| Single-op throughput | 2-2.3x faster |
| Pipeline throughput | 2.5x faster at high concurrency |
| Disk-tiered storage | Near-perfect hit rates with NVMe backend |
| Memory-constrained caching | 9 percentage points better hit rate |
| Durability | 10ms vs 1000ms max data loss |
Swytch delivers higher throughput, lower latency, and better cache efficiency under memory pressure—all while providing stronger durability guarantees than Redis. For workloads that benefit from disk-tiered storage, Swytch can serve as a high-performance key-value store with near-perfect data availability.
# Single operations (matches our test parameters)
redis-benchmark -s /path/to/socket \
-t ping_inline,ping_mbulk,set,get,incr,lpush,rpush,lpop,rpop,sadd,hset,spop,lrange_100,lrange_300,lrange_500,lrange_600,mset \
--csv -d 16 --threads 4 -c 100 -n 500000
# High-throughput pipeline (write-heavy)
memtier_benchmark --protocol=redis -S /path/to/socket \
-t 4 -c 10 --pipeline=50 \
--key-minimum=1 --key-maximum=10000000 \
--key-pattern=P:P --ratio=1:0 -n allkeys \
--hide-histogram
# Large values with rate limiting
memtier_benchmark --protocol=redis -S /path/to/socket \
-t 4 -c 20 --pipeline=1 --rate-limiting=50000 \
--key-minimum=1 --key-maximum=2000000 \
--ratio=1:10 -n 200000 --data-size=4096 \
--hide-histogram
# Zipf distribution (hot keys)
memtier_benchmark --protocol=redis -S /path/to/socket \
--threads=4 --clients=50 --requests=100000 \
--ratio=1:10 --key-pattern=Z:Z \
--key-zipf-exp=0.99 --key-maximum=100000 \
--data-size=256 --hide-histogram
# High-concurrency pipeline
memtier_benchmark --protocol=redis -S /path/to/socket \
--threads=8 --clients=100 --requests=100000 \
--ratio=1:20 --key-pattern=Z:Z \
--key-zipf-exp=1.1 --key-maximum=50000 \
--data-size=128 --pipeline=10 --hide-histogram
Our trace-bench tool replays real production traces against both Redis and Swytch:
# Memory-constrained with persistence
./trace-bench --real --real-vsize \
--swytch-path ./swytch \
--time-limit 48h \
--gb 0.010 --ram 1 --cpus 4 \
--noscale \
--trace alibabaBlock_277.oracleGeneral.zst \
--persistent-everysec
# Adequate memory
./trace-bench --real --real-vsize \
--swytch-path ./swytch \
--time-limit 48h \
--gb 40 --ram 50 --cpus 16 \
--noscale \
--trace alibabaBlock_277.oracleGeneral.zst
Trace files are available from the CacheMon cache_dataset project.