Skip to main content
Swytch Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Benchmarks

The short version: on workloads where the cluster sees mixed-key traffic, Swytch matches Redis at one node and pulls ahead at two-plus, with one to four orders of magnitude less network traffic. On workloads where every node has to coordinate on every key, Swytch’s per-key write coordination dominates and throughput drops with node count. The shape of your traffic decides which regime you’re in.

All numbers below come from trace-bench, an internal harness (not yet open source) that drives Redis and Swytch with identical workloads on identical hardware and reports side-by-side. Traces are real production captures from the CMU PDL twemcache workload dataset.


Network overhead

The single biggest difference between Redis and Swytch in these benchmarks is what crosses the network. Redis serializes full values over TCP for every operation, even when the client is on the same machine. Swytch operates as a nearcache over a Unix socket; in single-node deployment, zero network traffic. In multi-node deployment, only coordination metadata crosses the wire (roughly 100 bytes per operation per peer), not the values themselves.

TraceNodesRedis networkSwytch networkRatio
alibabaBlock_277178.52 GB0 B
alibabaBlock_277478.52 GB766.7 MB102×
cluster1712.38 GB0 B
cluster1742.38 GB1.03 GB2.3×
w01161.10 GB3.0 MB20,367×

At one node, the comparison is unbounded: Swytch serves everything in-process, Redis serializes the same data over TCP. The Redis network total is unchanged across Swytch node counts because the Redis baseline is single-server; Redis traffic depends on what the client does, not on what Swytch does next to it.


Throughput

alibabaBlock_277: writes spread across keys

Block I/O trace where writes are distributed across a large keyspace. At 24-hour replay the working set fits in cache; at 72 hours, eviction is present but minimal. Swytch aggregate throughput scales with node count because each node serves requests in parallel; single-node Redis throughput is flat.

24-hour replay, real value sizes

Nodesredis-memHit % (R / S)Throughput /s (R / S)Avg GET µs (R / S)Avg SET µs (R / S)
10.5 GB99.52 / 99.5229,568 / 24,805120.5 / 142.8132.7 / 165.7
21 GB99.52 / 99.5229,461 / 36,433120.9 / 162.7135.9 / 336.2
43 GB99.52 / 99.5229,267 / 42,601121.8 / 214.3134.1 / 703.2
63 GB99.52 / 99.5229,240 / 43,754122.0 / 255.0134.8 / 1,046.8

72-hour replay, 4 nodes

RedisSwytch
Hit rate99.80%99.80%
Throughput28,185/s46,731/s
Avg GET123.2 µs201.5 µs
Avg SET129.1 µs744.2 µs

At 72 hours, Swytch reaches 46,731 req/s (roughly 1.7× Redis) and throughput continues improving as the cache warms.

cluster17 (Twitter): hot-key skew

Twitter trace with small values and heavy key skew. This benchmark runs under full-sharing replay, where every node subscribes to every key (an upper bound on cross-node coordination cost). In production, nodes subscribe only to the keys their application actually touches; full-sharing isn’t a realistic workload. The result here is the worst case, not the typical one.

Nodesredis-memHit % (R / S)Throughput /s (R / S)Avg GET µs (R / S)Avg SET µs (R / S)
10.5 GB97.70 / 97.7031,216 / 31,12195.9 / 92.795.3 / 115.2
21 GB97.70 / 97.7031,112 / 27,06595.8 / 139.595.2 / 223.0
42 GB97.70 / 97.6931,165 / 20,51395.9 / 213.795.0 / 456.2
63 GB97.70 / 97.6830,870 / 16,05897.0 / 286.796.1 / 655.3

At one node Swytch matches Redis on throughput and beats it on GET latency (92.7 µs vs 95.9 µs). The Unix socket path is faster than Redis’s TCP for small values, even on the same machine.

At higher node counts under full-sharing, two costs stack. On the write path, concurrent writes to the same key on the same node serialize behind a per-key lock. On the read path, every node has accumulated effects for every key from every other node, and each read walks the local causal DAG and evaluates fork-choice over those effects to materialize the current value. The more concurrent effects on a key, the heavier the read.

cluster17 is a real production trace from Twitter, including its real hot-key skew. trace-bench replays it at an accelerated timescale, packing operations back-to-back as fast as the harness can drive them. In the original production traffic, requests against the same hot key were spread across real wall-clock time, milliseconds apart, and no one was actually waiting behind anyone else. The per-key lock existed; nothing queued on it. The fork-choice cost existed; the effect-set for a given key didn’t accumulate because effects propagated and were processed between requests.

Accelerated replay collapses that spacing. Operations that were a millisecond apart in production become operations that hit the cache in the same microsecond. Now they queue on the per-key lock. Now the DAG accumulates concurrent effects faster than reads can amortize the fork-choice work. The contention is an artifact of the harness’s compressed timeline, not a property the same trace would exhibit at its original pace.

w01 (CloudPhysics): undersized cache

CloudPhysics trace with a working set far exceeding the 6 GB cache, producing ~21% hit rates on both systems. This is an out-of-regime stress test: the cache is undersized for the workload and both systems are under heavy eviction pressure.

NodesHit % (R / S)Throughput /s (R / S)Avg GET µs (R / S)Avg SET µs (R / S)Network (R / S)Errors
121.18 / 20.7319,229 / 8,78994.6 / 199.4101.5 / 260.861.10 GB / 3.0 MB0
221.17 / 21.7919,227 / 1,94794.5 / 2,199.4101.5 / 1,179.361.10 GB / 17.86 GB8

At two nodes, Swytch’s hit rate slightly exceeds Redis (+0.61 pp) because local misses can be served from the peer. The throughput and latency cost of cross-node coordination under extreme eviction pressure is significant. This is the regime where sharding (Swytch cluster mode) or durable backing (Swytch Cloud) would take over from the nearcache.

Value size impact

Same trace (alibabaBlock_277), same configuration (4 nodes, 500 MB cache, 30-minute run); only the value size changes.

Value sizeHit % (R / S)Throughput /s (R / S)Avg GET µs (R / S)Avg SET µs (R / S)Network (R / S)Errors
30 KB97.22 / 97.2229,051 / 22,172118.7 / 371.1159.8 / 1,470.31.72 GB / 116.4 MB0
1 MB97.22 / 91.081,920 / 4661,968.4 / 5,914.42,274.9 / 6,803.958.43 GB / 12.24 GB2,320

At 30 KB values, hit rates match and Swytch handles the workload cleanly. At 1 MB values, the 500 MB cache holds only ~ 500 keys; the per-key overhead of the causal DAG reduces Swytch’s effective capacity relative to Redis, opening a 6 pp hit rate gap. Sizing the cache for the value size closes the gap; this benchmark intentionally tests the undersized case.

The Errors column counts client-side errors during the run (timeouts, connection resets, capacity-exhaustion failures). Zero in the in-regime benchmarks; non-zero only when the cache is pushed past its sized capacity.


Methodology

Hardware

ServerHetzner dedicated, 64 GB RAM, Helsinki (hel1)
CPUs per cache node2 dedicated
RAM per cache node3 GB (10 GB for w01)

Traces

TraceSourceCharacter
alibabaBlock_277Alibaba block I/OWrites spread across keys, no eviction at 24h
cluster17 (sample10)TwitterSmall values, hot-key skew, no eviction
w01CloudPhysicsLarge working set, heavy eviction (~21% hit rate)

How the benchmarks run

trace-bench runs in sequential mode, replaying each trace against Redis and Swytch in turn under identical conditions. Value sizes match the original trace (unless noted otherwise in the value-size benchmark). Each system sees the same operations in the same order; only the system being tested changes.


Running your own benchmarks

The internal trace-bench harness isn’t open source yet, but Swytch ships with built-in micro-benchmarks for the cache engine, Redis command handling, and effects layer. Swytch also works with the standard redis-benchmark tool.

Cache engine

go test -bench=. -benchmem ./cache/

These benchmarks exercise the CloxCache (L0 in-memory layer) directly, without Redis protocol overhead:

BenchmarkDescription
BenchmarkCloxCacheGetParallel GET on 10k keys
BenchmarkCloxCachePutParallel PUT operations
BenchmarkCloxCacheMixed80% read / 20% write workload, reports hit rate and evictions
BenchmarkCloxCacheZipfZipf distribution (theta=0.99) simulating realistic hotspot access
BenchmarkCloxCacheContentionHigh contention on 100 hot keys
BenchmarkCloxCacheSizesScaling across Small, Medium, Large, and XLarge cache sizes
BenchmarkCloxCachePointersPointer-type values

Each benchmark has a sync.Map variant for comparison.

Redis commands

go test -bench=. -benchmem ./redis/

Exercises the full Redis command pipeline including parsing, execution, and response writing:

BenchmarkDescription
BenchmarkHandler_SetSET command throughput
BenchmarkHandler_GetGET command throughput (pre-populated 10k keys)
BenchmarkHandler_IncrINCR atomic counter
BenchmarkHandler_IncrByINCRBY atomic counter
BenchmarkHandler_LPushLPUSH list operations
BenchmarkHandler_RPushRPUSH list operations
BenchmarkHandler_LPopLPOP list operations
BenchmarkHandler_RPopRPOP list operations
BenchmarkHandler_LRangeLRANGE range queries
BenchmarkHandler_LIndexLINDEX positional lookup
BenchmarkHandler_HSetHSET hash field writes
BenchmarkHandler_HGetHGET hash field reads
BenchmarkHandler_HGetAllHGETALL full hash retrieval
BenchmarkHandler_HIncrByHINCRBY atomic hash field increment
BenchmarkHandler_Mixed80/20 read/write mixed workload
BenchmarkHandler_ParallelMulti-goroutine GET/SET
BenchmarkHandler_LargeValues10 KB value SET/GET
BenchmarkHandler_LPushScalabilityLPUSH scaling from 1k to 10M list elements

Effects engine

go test -bench=. -benchmem ./effects/

Benchmarks for the causal effect resolution layer.

redis-benchmark

Swytch is compatible with the standard tool:

# Basic throughput test
redis-benchmark -p 6379 -n 100000 -c 50

# GET/SET only
redis-benchmark -p 6379 -t set,get -n 1000000 -c 100

# Pipeline mode (higher throughput)
redis-benchmark -p 6379 -t set,get -n 1000000 -P 16

# With specific key size and value size
redis-benchmark -p 6379 -t set,get -d 256 -r 1000000
FlagDescription
-nTotal number of requests
-cNumber of parallel connections
-PPipeline N requests per connection
-tComma-separated list of commands to benchmark
-dData size in bytes for SET values
-rUse random keys from a range of this size
-qQuiet mode (show only requests/sec)

Trace datasets

For reproducible cache evaluations, trace files are available from the CMU PDL twemcache workload dataset.