Benchmarks
The short version: on workloads where the cluster sees mixed-key traffic, Swytch matches Redis at one node and pulls ahead at two-plus, with one to four orders of magnitude less network traffic. On workloads where every node has to coordinate on every key, Swytch’s per-key write coordination dominates and throughput drops with node count. The shape of your traffic decides which regime you’re in.
All numbers below come from trace-bench, an internal harness (not yet open source) that drives Redis and Swytch with
identical workloads on identical hardware and reports side-by-side. Traces are real production captures from
the CMU PDL twemcache workload dataset.
The single biggest difference between Redis and Swytch in these benchmarks is what crosses the network. Redis serializes full values over TCP for every operation, even when the client is on the same machine. Swytch operates as a nearcache over a Unix socket; in single-node deployment, zero network traffic. In multi-node deployment, only coordination metadata crosses the wire (roughly 100 bytes per operation per peer), not the values themselves.
| Trace | Nodes | Redis network | Swytch network | Ratio |
|---|---|---|---|---|
| alibabaBlock_277 | 1 | 78.52 GB | 0 B | ∞ |
| alibabaBlock_277 | 4 | 78.52 GB | 766.7 MB | 102× |
| cluster17 | 1 | 2.38 GB | 0 B | ∞ |
| cluster17 | 4 | 2.38 GB | 1.03 GB | 2.3× |
| w01 | 1 | 61.10 GB | 3.0 MB | 20,367× |
At one node, the comparison is unbounded: Swytch serves everything in-process, Redis serializes the same data over TCP. The Redis network total is unchanged across Swytch node counts because the Redis baseline is single-server; Redis traffic depends on what the client does, not on what Swytch does next to it.
Block I/O trace where writes are distributed across a large keyspace. At 24-hour replay the working set fits in cache; at 72 hours, eviction is present but minimal. Swytch aggregate throughput scales with node count because each node serves requests in parallel; single-node Redis throughput is flat.
24-hour replay, real value sizes
| Nodes | redis-mem | Hit % (R / S) | Throughput /s (R / S) | Avg GET µs (R / S) | Avg SET µs (R / S) |
|---|---|---|---|---|---|
| 1 | 0.5 GB | 99.52 / 99.52 | 29,568 / 24,805 | 120.5 / 142.8 | 132.7 / 165.7 |
| 2 | 1 GB | 99.52 / 99.52 | 29,461 / 36,433 | 120.9 / 162.7 | 135.9 / 336.2 |
| 4 | 3 GB | 99.52 / 99.52 | 29,267 / 42,601 | 121.8 / 214.3 | 134.1 / 703.2 |
| 6 | 3 GB | 99.52 / 99.52 | 29,240 / 43,754 | 122.0 / 255.0 | 134.8 / 1,046.8 |
72-hour replay, 4 nodes
| Redis | Swytch | |
|---|---|---|
| Hit rate | 99.80% | 99.80% |
| Throughput | 28,185/s | 46,731/s |
| Avg GET | 123.2 µs | 201.5 µs |
| Avg SET | 129.1 µs | 744.2 µs |
At 72 hours, Swytch reaches 46,731 req/s (roughly 1.7× Redis) and throughput continues improving as the cache warms.
Twitter trace with small values and heavy key skew. This benchmark runs under full-sharing replay, where every node subscribes to every key (an upper bound on cross-node coordination cost). In production, nodes subscribe only to the keys their application actually touches; full-sharing isn’t a realistic workload. The result here is the worst case, not the typical one.
| Nodes | redis-mem | Hit % (R / S) | Throughput /s (R / S) | Avg GET µs (R / S) | Avg SET µs (R / S) |
|---|---|---|---|---|---|
| 1 | 0.5 GB | 97.70 / 97.70 | 31,216 / 31,121 | 95.9 / 92.7 | 95.3 / 115.2 |
| 2 | 1 GB | 97.70 / 97.70 | 31,112 / 27,065 | 95.8 / 139.5 | 95.2 / 223.0 |
| 4 | 2 GB | 97.70 / 97.69 | 31,165 / 20,513 | 95.9 / 213.7 | 95.0 / 456.2 |
| 6 | 3 GB | 97.70 / 97.68 | 30,870 / 16,058 | 97.0 / 286.7 | 96.1 / 655.3 |
At one node Swytch matches Redis on throughput and beats it on GET latency (92.7 µs vs 95.9 µs). The Unix socket path is faster than Redis’s TCP for small values, even on the same machine.
At higher node counts under full-sharing, two costs stack. On the write path, concurrent writes to the same key on the same node serialize behind a per-key lock. On the read path, every node has accumulated effects for every key from every other node, and each read walks the local causal DAG and evaluates fork-choice over those effects to materialize the current value. The more concurrent effects on a key, the heavier the read.
cluster17 is a real production trace from Twitter, including its real hot-key skew. trace-bench replays it at an
accelerated timescale, packing operations back-to-back as fast as the harness can drive them. In the original production
traffic, requests against the same hot key were spread across real wall-clock time, milliseconds apart, and no one was
actually waiting behind anyone else. The per-key lock existed; nothing queued on it. The fork-choice cost existed; the
effect-set for a given key didn’t accumulate because effects propagated and were processed between requests.
Accelerated replay collapses that spacing. Operations that were a millisecond apart in production become operations that hit the cache in the same microsecond. Now they queue on the per-key lock. Now the DAG accumulates concurrent effects faster than reads can amortize the fork-choice work. The contention is an artifact of the harness’s compressed timeline, not a property the same trace would exhibit at its original pace.
CloudPhysics trace with a working set far exceeding the 6 GB cache, producing ~21% hit rates on both systems. This is an out-of-regime stress test: the cache is undersized for the workload and both systems are under heavy eviction pressure.
| Nodes | Hit % (R / S) | Throughput /s (R / S) | Avg GET µs (R / S) | Avg SET µs (R / S) | Network (R / S) | Errors |
|---|---|---|---|---|---|---|
| 1 | 21.18 / 20.73 | 19,229 / 8,789 | 94.6 / 199.4 | 101.5 / 260.8 | 61.10 GB / 3.0 MB | 0 |
| 2 | 21.17 / 21.79 | 19,227 / 1,947 | 94.5 / 2,199.4 | 101.5 / 1,179.3 | 61.10 GB / 17.86 GB | 8 |
At two nodes, Swytch’s hit rate slightly exceeds Redis (+0.61 pp) because local misses can be served from the peer. The throughput and latency cost of cross-node coordination under extreme eviction pressure is significant. This is the regime where sharding (Swytch cluster mode) or durable backing (Swytch Cloud) would take over from the nearcache.
Same trace (alibabaBlock_277), same configuration (4 nodes, 500 MB cache, 30-minute run); only the value size changes.
| Value size | Hit % (R / S) | Throughput /s (R / S) | Avg GET µs (R / S) | Avg SET µs (R / S) | Network (R / S) | Errors |
|---|---|---|---|---|---|---|
| 30 KB | 97.22 / 97.22 | 29,051 / 22,172 | 118.7 / 371.1 | 159.8 / 1,470.3 | 1.72 GB / 116.4 MB | 0 |
| 1 MB | 97.22 / 91.08 | 1,920 / 466 | 1,968.4 / 5,914.4 | 2,274.9 / 6,803.9 | 58.43 GB / 12.24 GB | 2,320 |
At 30 KB values, hit rates match and Swytch handles the workload cleanly. At 1 MB values, the 500 MB cache holds only ~ 500 keys; the per-key overhead of the causal DAG reduces Swytch’s effective capacity relative to Redis, opening a 6 pp hit rate gap. Sizing the cache for the value size closes the gap; this benchmark intentionally tests the undersized case.
The Errors column counts client-side errors during the run (timeouts, connection resets, capacity-exhaustion
failures). Zero in the in-regime benchmarks; non-zero only when the cache is pushed past its sized capacity.
| Server | Hetzner dedicated, 64 GB RAM, Helsinki (hel1) |
| CPUs per cache node | 2 dedicated |
| RAM per cache node | 3 GB (10 GB for w01) |
| Trace | Source | Character |
|---|---|---|
alibabaBlock_277 | Alibaba block I/O | Writes spread across keys, no eviction at 24h |
cluster17 (sample10) | Small values, hot-key skew, no eviction | |
w01 | CloudPhysics | Large working set, heavy eviction (~21% hit rate) |
trace-bench runs in sequential mode, replaying each trace against Redis and Swytch in turn under identical conditions.
Value sizes match the original trace (unless noted otherwise in the value-size benchmark). Each system sees the same
operations in the same order; only the system being tested changes.
The internal trace-bench harness isn’t open source yet, but Swytch ships with built-in micro-benchmarks for the cache
engine, Redis command handling, and effects layer. Swytch also works with the standard redis-benchmark tool.
go test -bench=. -benchmem ./cache/
These benchmarks exercise the CloxCache (L0 in-memory layer) directly, without Redis protocol overhead:
| Benchmark | Description |
|---|---|
BenchmarkCloxCacheGet | Parallel GET on 10k keys |
BenchmarkCloxCachePut | Parallel PUT operations |
BenchmarkCloxCacheMixed | 80% read / 20% write workload, reports hit rate and evictions |
BenchmarkCloxCacheZipf | Zipf distribution (theta=0.99) simulating realistic hotspot access |
BenchmarkCloxCacheContention | High contention on 100 hot keys |
BenchmarkCloxCacheSizes | Scaling across Small, Medium, Large, and XLarge cache sizes |
BenchmarkCloxCachePointers | Pointer-type values |
Each benchmark has a sync.Map variant for comparison.
go test -bench=. -benchmem ./redis/
Exercises the full Redis command pipeline including parsing, execution, and response writing:
| Benchmark | Description |
|---|---|
BenchmarkHandler_Set | SET command throughput |
BenchmarkHandler_Get | GET command throughput (pre-populated 10k keys) |
BenchmarkHandler_Incr | INCR atomic counter |
BenchmarkHandler_IncrBy | INCRBY atomic counter |
BenchmarkHandler_LPush | LPUSH list operations |
BenchmarkHandler_RPush | RPUSH list operations |
BenchmarkHandler_LPop | LPOP list operations |
BenchmarkHandler_RPop | RPOP list operations |
BenchmarkHandler_LRange | LRANGE range queries |
BenchmarkHandler_LIndex | LINDEX positional lookup |
BenchmarkHandler_HSet | HSET hash field writes |
BenchmarkHandler_HGet | HGET hash field reads |
BenchmarkHandler_HGetAll | HGETALL full hash retrieval |
BenchmarkHandler_HIncrBy | HINCRBY atomic hash field increment |
BenchmarkHandler_Mixed | 80/20 read/write mixed workload |
BenchmarkHandler_Parallel | Multi-goroutine GET/SET |
BenchmarkHandler_LargeValues | 10 KB value SET/GET |
BenchmarkHandler_LPushScalability | LPUSH scaling from 1k to 10M list elements |
go test -bench=. -benchmem ./effects/
Benchmarks for the causal effect resolution layer.
Swytch is compatible with the standard tool:
# Basic throughput test
redis-benchmark -p 6379 -n 100000 -c 50
# GET/SET only
redis-benchmark -p 6379 -t set,get -n 1000000 -c 100
# Pipeline mode (higher throughput)
redis-benchmark -p 6379 -t set,get -n 1000000 -P 16
# With specific key size and value size
redis-benchmark -p 6379 -t set,get -d 256 -r 1000000
| Flag | Description |
|---|---|
-n | Total number of requests |
-c | Number of parallel connections |
-P | Pipeline N requests per connection |
-t | Comma-separated list of commands to benchmark |
-d | Data size in bytes for SET values |
-r | Use random keys from a range of this size |
-q | Quiet mode (show only requests/sec) |
For reproducible cache evaluations, trace files are available from the CMU PDL twemcache workload dataset.