Tiered Storage

Swytch’s tiered storage transforms Redis from a cache into a fully durable database. Every write is persisted to the OS page cache immediately, and fsync runs every 10ms. Process crashes lose nothing; power loss loses at most 10 ms of writes. This page explains how tiered storage works and when to use it.

Important
High throughput warning: Tiered storage can saturate disk throughput. Under sustained write loads exceeding 1GB/s, you may experience stalls while waiting for the disk to catch up.
Disk recommendations:
Recommended: NVMe SSD for write-heavy workloads
Acceptable: SATA SSD for read-heavy or moderate write workloads
Not recommended: Spinning disks (random L2 reads will bottleneck)

Why Tiered Storage?

Traditional Redis offers two persistence options, both with significant trade-offs:

Feature	Redis RDB	Redis AOF	Swytch Tiered
Durability	Minutes of loss	Seconds of loss	10ms of loss
Write latency	Unaffected	fsync overhead	Batched fsync
Recovery time	Fast (snapshot)	Slow (replay)	Fast (indexed)
Memory usage	2x during save	1x + AOF buffer	1x (passthrough)
Disk usage	Compact	Large (rewrite needed)	Compact (auto-compaction)

Swytch’s tiered storage is not a cache with persistence bolted on—it’s a database that happens to keep hot data in memory.

How It Works

Architecture

Tiered storage uses a two-level architecture:

┌─────────────────────────────────────────────────────────┐
│                      Client Request                     │
└────────────────────────────┬────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────┐
│                    L1: Memory Cache                     │
│  • Lock-free design                                     │
│  • Self-tuning frequency-based eviction                 │
│  • Hot data stays in memory                             │
└────────────────────────────┬────────────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
┌─────────────────────────┐   ┌─────────────────────────┐
│      Write Path         │   │       Read Path         │
│  (Write-through)        │   │  (L1 miss → L2 lookup)  │
└────────────┬────────────┘   └────────────┬────────────┘
             │                             │
             ▼                             ▼
┌─────────────────────────────────────────────────────────┐
│                   L2: FASTER Log                        │
│  • Append-only log with indexed lookups                 │
│  • Memory-mapped for lock-free reads                    │
│  • Batched fsync every 10ms                             │
│  • Automatic compaction                                 │
└─────────────────────────────────────────────────────────┘

The L2 layer uses a log structure inspired by Microsoft FASTER—an append-only log with an in-memory hash index for O(1) lookups. Unlike a traditional write-ahead log that replays commands, this stores values directly and rebuilds the index on startup by scanning entry metadata.

Write Path (Write-Through)

In the default write-through mode, every write follows this path:

Client sends SET command
Write to L1 cache – Immediate, lock-free
Write to L2 log – Append to FASTER log
Batched fsync - Background thread syncs every 10 ms
Return OK to client – After L2 write completes

The write is durable once step 4 completes. In the worst case (power loss immediately after step 3), you lose at most 10 ms of writes.

Read Path (Passthrough Semantics)

Reads check L1 first, falling back to L2:

Client sends GET command
Check L1 cache - Lock-free lookup
If L1 hit – Return immediately
If L1 miss – Check L2 log (memory-mapped, lock-free)
If L2 hit – Promote to L1, return value
If L2 miss – Return nil

This is passthrough semantics: L1 acts as a transparent cache over L2. Data evicted from L1 is never lost—it remains in L2 and can be retrieved on the next access.

Durability Guarantees

Normal Operation

Writes are durable within 10 ms of the write completing
All acknowledged writes survive restart (except the 10 ms window)
Reads always see the latest write (strong consistency)

Process Crash

No data loss – Writes are in the OS page cache and will be flushed to disk
No data corruption – FASTER log uses checksums
Fast recovery – Index rebuild from log, not full replay

Power Loss

At most 10 ms of writes may be lost (the unflushed batch)
The OS page cache is lost, but fsync runs every 10 ms to persist data
No data corruption – Partial writes are detected and truncated on recovery

Comparison to Redis

Scenario	Redis (AOF everysec)	Swytch Tiered
Normal write	Buffered ~1s	Durable in 10ms
Power loss	Lose ~1s of writes	Lose ~10ms of writes
Process crash	Lose buffered writes	No data loss
Recovery time	Replay entire AOF	Index scan (faster)

Configuration

Enabling Tiered Storage

# Basic persistent mode
swytch redis --persistent --db-path=/data/redis.db

# With 4GB memory limit
swytch redis --persistent --db-path=/data/redis.db --maxmemory=4gb

# Defragment log on startup (reclaims space from deleted keys)
swytch redis --persistent --db-path=/data/redis.db --defragment

Warning
Disk usage is unbounded. Unlike Redis, --maxmemory only limits the L1 memory cache—it does not limit disk usage. The L2 log file will grow as you write data, just like any database. Swytch does not automatically evict data to free disk space. Monitor disk usage and add storage or delete old data before it fills—if the disk fills up, writes fail silently (data stays in L1 memory but is not persisted to disk).
This is a fundamental difference from Redis’s eviction model. In Redis, maxmemory caps total data. In Swytch tiered mode, maxmemory only caps what’s kept hot in memory—all data lives on disk.

Note
File locking prevents accidents. Swytch acquires an exclusive lock on the database file at startup. If you accidentally try to start a second instance pointing at the same file, it will fail immediately with:
database is locked by another process
This protects against data corruption during deployments or misconfiguration.

Ghost Mode (Write-Back)

Ghost mode changes the write path to write-back semantics:

Writes go to L1 only – No immediate L2 write
L2 write on eviction – When L1 evicts an entry, it’s written to L2
L2 write on shutdown – Clean shutdown flushes all L1 entries

swytch redis --persistent --ghost --db-path=/data/redis.db

Trade-offs:

Aspect	Write-Through (default)	Ghost Mode
Write latency	Higher (L2 write)	Lower (L1 only)
Durability	No data loss on crash	May lose unflushed L1
Best for	Databases	Caches with persistence

Note
Ghost mode rarely provides benefits. It only helps when you’re saturating disk I/O throughput. In practice, most workloads are CPU-bound, not disk-bound. The L2 write in write-through mode is a sequential append that modern SSDs handle with minimal overhead. Unless you’ve profiled and confirmed disk is your bottleneck, use the default write-through mode.

Use ghost mode only when:

You’ve confirmed disk I/O is your bottleneck (not CPU)
You can tolerate data loss on unclean shutdown
Data can be reconstructed from another source

Recovery and Startup

Normal Startup

On startup, Swytch:

Opens the FASTER log file
Rebuilds the index by scanning committed entries
Validates checksums to detect corruption
Truncates corrupted tail if any (crash recovery)

This is faster than Redis AOF replay because:

Index is rebuilt from metadata, not by re-executing commands
Only committed entries are scanned
No command parsing or execution needed

Recovery time: Approximately 10 seconds per GB of data. A 10GB database recovers in under 2 minutes. Plan maintenance windows accordingly.

Defragmentation

Over time, deleted keys leave holes in the log. How Swytch handles this depends on your operating system.

Linux and macOS (Hole Punching)

On operating systems that support hole punching (fallocate with FALLOC_FL_PUNCH_HOLE), Swytch automatically reclaims space from deleted keys by punching holes in the log file. The file’s apparent size stays the same, but actual disk usage shrinks.

You can see the difference:

# Apparent size (includes holes)
$ ls -lh redis.db
-rw-r--r-- 1 user user 1.2G Jan 15 10:00 redis.db

# Actual disk usage (excludes holes)
$ du -h redis.db
245M    redis.db

In this example, the log has 1.2GB of data written over time, but only 245MB is actually used on disk.

Windows and Other OSes

On operating systems without hole punching support, the log file grows unbounded as keys are deleted and new keys are written. Disk space is not reclaimed automatically.

To reclaim space, restart the server with the --defragment flag:

swytch redis --persistent --db-path=/data/redis.db --defragment

This compacts the log in-place, removing deleted entries and reclaiming space. Plan for downtime proportional to your data size.

Manual Defragmentation

Even on Linux/macOS, you may want to run manual defragmentation occasionally to:

Consolidate data for faster sequential reads
Prepare for backup (smaller file to copy)
Reset after a large bulk delete operation

# Stop the server, defragment, restart
swytch redis --persistent --db-path=/data/redis.db --defragment

Backup and Restore

Live Backup

Operators can safely copy the database file while the server is running. The FASTER log format is append-only with checksummed entries, so a copy made during writes will be consistent up to the point of the copy—any partial write at the end is detected and truncated on recovery.

# Copy the database file while server is running
cp /data/redis.db /backup/redis-$(date +%Y%m%d-%H%M%S).db

For the smallest possible backup, defragment first (this compacts the file in-place):

# Stop the server, defragment, then copy
swytch redis --persistent --db-path=/data/redis.db --defragment
# Once started, stop again and copy
cp /data/redis.db /backup/redis-$(date +%Y%m%d-%H%M%S).db
# Restart for production
swytch redis --persistent --db-path=/data/redis.db

Restore

To restore from a backup, simply replace the database file:

# Stop the server
# Replace the database file
cp /backup/redis-20240115-100000.db /data/redis.db
# Start the server
swytch redis --persistent --db-path=/data/redis.db

The server will validate checksums and rebuild its index on startup.

Monitoring

Prometheus Metrics

Enable metrics to monitor tiered storage:

swytch redis --persistent --db-path=/data/redis.db --metrics-port=9090

Key metrics for tiered storage:

Metric	Description
`swytch_redis_l2_hits_total`	L2 (disk) cache hits
`swytch_redis_l2_misses_total`	L2 cache misses
`swytch_redis_l2_writes_total`	L2 writes
`swytch_redis_cache_hits_total`	L1 cache hits
`swytch_redis_cache_misses_total`	L1 cache misses
`swytch_redis_evictions_total`	L1 evictions
`swytch_redis_memory_bytes`	Current memory usage
`swytch_redis_memory_max_bytes`	Configured memory limit

Understanding L2 Metrics

L2 Hit Rate = l2_hits / (l2_hits + l2_misses)

High L2 hit rate (>90%): Working set exceeds L1 but fits in L2. Consider increasing memory.
Low L2 hit rate (<50%): Many requests for non-existent keys, or very large working set.
L2 hits = 0: All data fits in L1 (ideal for cache workloads).

L2 Write Rate = l2_writes / total_writes

Should be ~100% in write-through mode
Lower in ghost mode (writes only on eviction)

Performance Characteristics

Benchmarked on AMD Ryzen 5 3600 (6 cores), 64GB RAM, Samsung NVMe RAID0, Ubuntu 24.04. Using Unix socket, memtier_benchmark with 4 threads, 50 connections, 256-byte values.

Throughput

Workload	Write-Through	Ghost Mode
100% writes	247k ops/sec	397k ops/sec
100% reads	418-427k ops/sec	-
50/50 mixed	336k ops/sec	-

Latency

Workload	Mode	p50	p99	p99.9
100% writes	Write-through	0.52ms	3.36ms	6.50ms
100% writes	Ghost	0.43ms	2.19ms	5.41ms
100% reads	Write-through	0.42ms	1.45-1.77ms	4.19-4.51ms
50/50 mixed	Write-through	0.44ms	2.98ms	5.44ms

The p99 latency spikes in write workloads reflect SSD write buffer flushes. Under sustained writes, the NVMe’s internal buffer fills and must flush to NAND, causing brief stalls.

Disk I/O

Sequential writes only – No random I/O for writes
Random reads for L2 lookups - Memory-mapped, OS handles caching
Batched fsync – One fsync per 10 ms, not per write

When to Use Tiered Storage

Use Tiered Storage When:

You need durability – Data must survive restarts
You’re using Redis as a database – Primary data store, not just cache
Your working set exceeds memory – L2 extends effective capacity
You need fast recovery – Indexed recovery beats AOF replay

Use In-Memory Mode When:

Pure caching – Data can be regenerated from source
Maximum throughput – No persistence overhead
Ephemeral data - Sessions, rate limits, temporary state

Example: Database Use Case

Using Swytch as a primary database for user sessions:

# Start with persistence and monitoring
swytch redis \
  --persistent \
  --db-path=/data/sessions.db \
  --maxmemory=2gb \
  --metrics-port=9090

import redis

r = redis.Redis(host='localhost', port=6379)

# Store session - durable within 10ms
r.hset('session:abc123', mapping={
    'user_id': '42',
    'created_at': '2024-01-15T10:00:00Z',
    'permissions': 'read,write'
})
r.expire('session:abc123', 86400)  # 24 hour TTL

# Read session - from L1 or L2
session = r.hgetall('session:abc123')

Even if Swytch crashes and restarts, the session data survives.

Comparison to Other Solutions

vs. Redis with AOF

Aspect	Redis AOF	Swytch Tiered
Max data loss	1 second (everysec)	10ms
Write amplification	High (full commands)	Low (values only)
Recovery	Replay commands	Index rebuild
Compaction	Manual BGREWRITEAOF	Automatic

vs. Redis with RDB

Aspect	Redis RDB	Swytch Tiered
Max data loss	Minutes	10ms
Memory during save	2x (fork)	1x
Point-in-time backup	Yes	No (continuous)

vs. KeyDB/Dragonfly

Aspect	KeyDB/Dragonfly	Swytch Tiered
Persistence model	Redis-compatible	FASTER log
Durability	Same as Redis	10ms batched
Multi-threaded	Yes	Yes
Memory efficiency	Similar to Redis	Passthrough (no duplication)