Skip to main content
Swytch Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Tiered Storage

Swytch’s tiered storage transforms Redis from a cache into a fully durable database. Every write is persisted to the OS page cache immediately, and fsync runs every 10ms. Process crashes lose nothing; power loss loses at most 10 ms of writes. This page explains how tiered storage works and when to use it.

Important

High throughput warning: Tiered storage can saturate disk throughput. Under sustained write loads exceeding 1GB/s, you may experience stalls while waiting for the disk to catch up.

Disk recommendations:

  • Recommended: NVMe SSD for write-heavy workloads
  • Acceptable: SATA SSD for read-heavy or moderate write workloads
  • Not recommended: Spinning disks (random L2 reads will bottleneck)

Why Tiered Storage?

Traditional Redis offers two persistence options, both with significant trade-offs:

FeatureRedis RDBRedis AOFSwytch Tiered
DurabilityMinutes of lossSeconds of loss10ms of loss
Write latencyUnaffectedfsync overheadBatched fsync
Recovery timeFast (snapshot)Slow (replay)Fast (indexed)
Memory usage2x during save1x + AOF buffer1x (passthrough)
Disk usageCompactLarge (rewrite needed)Compact (auto-compaction)

Swytch’s tiered storage is not a cache with persistence bolted on—it’s a database that happens to keep hot data in memory.

How It Works

Architecture

Tiered storage uses a two-level architecture:

┌─────────────────────────────────────────────────────────┐
│                      Client Request                     │
└────────────────────────────┬────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────┐
│                    L1: Memory Cache                     │
│  • Lock-free design                                     │
│  • Self-tuning frequency-based eviction                 │
│  • Hot data stays in memory                             │
└────────────────────────────┬────────────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
┌─────────────────────────┐   ┌─────────────────────────┐
│      Write Path         │   │       Read Path         │
│  (Write-through)        │   │  (L1 miss → L2 lookup)  │
└────────────┬────────────┘   └────────────┬────────────┘
             │                             │
             ▼                             ▼
┌─────────────────────────────────────────────────────────┐
│                   L2: FASTER Log                        │
│  • Append-only log with indexed lookups                 │
│  • Memory-mapped for lock-free reads                    │
│  • Batched fsync every 10ms                             │
│  • Automatic compaction                                 │
└─────────────────────────────────────────────────────────┘

The L2 layer uses a log structure inspired by Microsoft FASTER—an append-only log with an in-memory hash index for O(1) lookups. Unlike a traditional write-ahead log that replays commands, this stores values directly and rebuilds the index on startup by scanning entry metadata.

Write Path (Write-Through)

In the default write-through mode, every write follows this path:

  1. Client sends SET command
  2. Write to L1 cache – Immediate, lock-free
  3. Write to L2 log – Append to FASTER log
  4. Batched fsync - Background thread syncs every 10 ms
  5. Return OK to client – After L2 write completes

The write is durable once step 4 completes. In the worst case (power loss immediately after step 3), you lose at most 10 ms of writes.

Read Path (Passthrough Semantics)

Reads check L1 first, falling back to L2:

  1. Client sends GET command
  2. Check L1 cache - Lock-free lookup
  3. If L1 hit – Return immediately
  4. If L1 miss – Check L2 log (memory-mapped, lock-free)
  5. If L2 hit – Promote to L1, return value
  6. If L2 miss – Return nil

This is passthrough semantics: L1 acts as a transparent cache over L2. Data evicted from L1 is never lost—it remains in L2 and can be retrieved on the next access.

Durability Guarantees

Normal Operation

  • Writes are durable within 10 ms of the write completing
  • All acknowledged writes survive restart (except the 10 ms window)
  • Reads always see the latest write (strong consistency)

Process Crash

  • No data loss – Writes are in the OS page cache and will be flushed to disk
  • No data corruption – FASTER log uses checksums
  • Fast recovery – Index rebuild from log, not full replay

Power Loss

  • At most 10 ms of writes may be lost (the unflushed batch)
  • The OS page cache is lost, but fsync runs every 10 ms to persist data
  • No data corruption – Partial writes are detected and truncated on recovery

Comparison to Redis

ScenarioRedis (AOF everysec)Swytch Tiered
Normal writeBuffered ~1sDurable in 10ms
Power lossLose ~1s of writesLose ~10ms of writes
Process crashLose buffered writesNo data loss
Recovery timeReplay entire AOFIndex scan (faster)

Configuration

Enabling Tiered Storage

# Basic persistent mode
swytch redis --persistent --db-path=/data/redis.db

# With 4GB memory limit
swytch redis --persistent --db-path=/data/redis.db --maxmemory=4gb

# Defragment log on startup (reclaims space from deleted keys)
swytch redis --persistent --db-path=/data/redis.db --defragment
Warning

Disk usage is unbounded. Unlike Redis, --maxmemory only limits the L1 memory cache—it does not limit disk usage. The L2 log file will grow as you write data, just like any database. Swytch does not automatically evict data to free disk space. Monitor disk usage and add storage or delete old data before it fills—if the disk fills up, writes fail silently (data stays in L1 memory but is not persisted to disk).

This is a fundamental difference from Redis’s eviction model. In Redis, maxmemory caps total data. In Swytch tiered mode, maxmemory only caps what’s kept hot in memory—all data lives on disk.

Note

File locking prevents accidents. Swytch acquires an exclusive lock on the database file at startup. If you accidentally try to start a second instance pointing at the same file, it will fail immediately with:

database is locked by another process

This protects against data corruption during deployments or misconfiguration.

Ghost Mode (Write-Back)

Ghost mode changes the write path to write-back semantics:

  1. Writes go to L1 only – No immediate L2 write
  2. L2 write on eviction – When L1 evicts an entry, it’s written to L2
  3. L2 write on shutdown – Clean shutdown flushes all L1 entries
swytch redis --persistent --ghost --db-path=/data/redis.db

Trade-offs:

AspectWrite-Through (default)Ghost Mode
Write latencyHigher (L2 write)Lower (L1 only)
DurabilityNo data loss on crashMay lose unflushed L1
Best forDatabasesCaches with persistence
Note
Ghost mode rarely provides benefits. It only helps when you’re saturating disk I/O throughput. In practice, most workloads are CPU-bound, not disk-bound. The L2 write in write-through mode is a sequential append that modern SSDs handle with minimal overhead. Unless you’ve profiled and confirmed disk is your bottleneck, use the default write-through mode.

Use ghost mode only when:

  • You’ve confirmed disk I/O is your bottleneck (not CPU)
  • You can tolerate data loss on unclean shutdown
  • Data can be reconstructed from another source

Recovery and Startup

Normal Startup

On startup, Swytch:

  1. Opens the FASTER log file
  2. Rebuilds the index by scanning committed entries
  3. Validates checksums to detect corruption
  4. Truncates corrupted tail if any (crash recovery)

This is faster than Redis AOF replay because:

  • Index is rebuilt from metadata, not by re-executing commands
  • Only committed entries are scanned
  • No command parsing or execution needed

Recovery time: Approximately 10 seconds per GB of data. A 10GB database recovers in under 2 minutes. Plan maintenance windows accordingly.

Defragmentation

Over time, deleted keys leave holes in the log. How Swytch handles this depends on your operating system.

Linux and macOS (Hole Punching)

On operating systems that support hole punching (fallocate with FALLOC_FL_PUNCH_HOLE), Swytch automatically reclaims space from deleted keys by punching holes in the log file. The file’s apparent size stays the same, but actual disk usage shrinks.

You can see the difference:

# Apparent size (includes holes)
$ ls -lh redis.db
-rw-r--r-- 1 user user 1.2G Jan 15 10:00 redis.db

# Actual disk usage (excludes holes)
$ du -h redis.db
245M    redis.db

In this example, the log has 1.2GB of data written over time, but only 245MB is actually used on disk.

Windows and Other OSes

On operating systems without hole punching support, the log file grows unbounded as keys are deleted and new keys are written. Disk space is not reclaimed automatically.

To reclaim space, restart the server with the --defragment flag:

swytch redis --persistent --db-path=/data/redis.db --defragment

This compacts the log in-place, removing deleted entries and reclaiming space. Plan for downtime proportional to your data size.

Manual Defragmentation

Even on Linux/macOS, you may want to run manual defragmentation occasionally to:

  • Consolidate data for faster sequential reads
  • Prepare for backup (smaller file to copy)
  • Reset after a large bulk delete operation
# Stop the server, defragment, restart
swytch redis --persistent --db-path=/data/redis.db --defragment

Backup and Restore

Live Backup

Operators can safely copy the database file while the server is running. The FASTER log format is append-only with checksummed entries, so a copy made during writes will be consistent up to the point of the copy—any partial write at the end is detected and truncated on recovery.

# Copy the database file while server is running
cp /data/redis.db /backup/redis-$(date +%Y%m%d-%H%M%S).db

For the smallest possible backup, defragment first (this compacts the file in-place):

# Stop the server, defragment, then copy
swytch redis --persistent --db-path=/data/redis.db --defragment
# Once started, stop again and copy
cp /data/redis.db /backup/redis-$(date +%Y%m%d-%H%M%S).db
# Restart for production
swytch redis --persistent --db-path=/data/redis.db

Restore

To restore from a backup, simply replace the database file:

# Stop the server
# Replace the database file
cp /backup/redis-20240115-100000.db /data/redis.db
# Start the server
swytch redis --persistent --db-path=/data/redis.db

The server will validate checksums and rebuild its index on startup.

Monitoring

Prometheus Metrics

Enable metrics to monitor tiered storage:

swytch redis --persistent --db-path=/data/redis.db --metrics-port=9090

Key metrics for tiered storage:

MetricDescription
swytch_redis_l2_hits_totalL2 (disk) cache hits
swytch_redis_l2_misses_totalL2 cache misses
swytch_redis_l2_writes_totalL2 writes
swytch_redis_cache_hits_totalL1 cache hits
swytch_redis_cache_misses_totalL1 cache misses
swytch_redis_evictions_totalL1 evictions
swytch_redis_memory_bytesCurrent memory usage
swytch_redis_memory_max_bytesConfigured memory limit

Understanding L2 Metrics

L2 Hit Rate = l2_hits / (l2_hits + l2_misses)

  • High L2 hit rate (>90%): Working set exceeds L1 but fits in L2. Consider increasing memory.
  • Low L2 hit rate (<50%): Many requests for non-existent keys, or very large working set.
  • L2 hits = 0: All data fits in L1 (ideal for cache workloads).

L2 Write Rate = l2_writes / total_writes

  • Should be ~100% in write-through mode
  • Lower in ghost mode (writes only on eviction)

Performance Characteristics

Benchmarked on AMD Ryzen 5 3600 (6 cores), 64GB RAM, Samsung NVMe RAID0, Ubuntu 24.04. Using Unix socket, memtier_benchmark with 4 threads, 50 connections, 256-byte values.

Throughput

WorkloadWrite-ThroughGhost Mode
100% writes247k ops/sec397k ops/sec
100% reads418-427k ops/sec-
50/50 mixed336k ops/sec-

Latency

WorkloadModep50p99p99.9
100% writesWrite-through0.52ms3.36ms6.50ms
100% writesGhost0.43ms2.19ms5.41ms
100% readsWrite-through0.42ms1.45-1.77ms4.19-4.51ms
50/50 mixedWrite-through0.44ms2.98ms5.44ms

The p99 latency spikes in write workloads reflect SSD write buffer flushes. Under sustained writes, the NVMe’s internal buffer fills and must flush to NAND, causing brief stalls.

Disk I/O

  • Sequential writes only – No random I/O for writes
  • Random reads for L2 lookups - Memory-mapped, OS handles caching
  • Batched fsync – One fsync per 10 ms, not per write

When to Use Tiered Storage

Use Tiered Storage When:

  • You need durability – Data must survive restarts
  • You’re using Redis as a database – Primary data store, not just cache
  • Your working set exceeds memory – L2 extends effective capacity
  • You need fast recovery – Indexed recovery beats AOF replay

Use In-Memory Mode When:

  • Pure caching – Data can be regenerated from source
  • Maximum throughput – No persistence overhead
  • Ephemeral data - Sessions, rate limits, temporary state

Example: Database Use Case

Using Swytch as a primary database for user sessions:

# Start with persistence and monitoring
swytch redis \
  --persistent \
  --db-path=/data/sessions.db \
  --maxmemory=2gb \
  --metrics-port=9090
import redis

r = redis.Redis(host='localhost', port=6379)

# Store session - durable within 10ms
r.hset('session:abc123', mapping={
    'user_id': '42',
    'created_at': '2024-01-15T10:00:00Z',
    'permissions': 'read,write'
})
r.expire('session:abc123', 86400)  # 24 hour TTL

# Read session - from L1 or L2
session = r.hgetall('session:abc123')

Even if Swytch crashes and restarts, the session data survives.

Comparison to Other Solutions

vs. Redis with AOF

AspectRedis AOFSwytch Tiered
Max data loss1 second (everysec)10ms
Write amplificationHigh (full commands)Low (values only)
RecoveryReplay commandsIndex rebuild
CompactionManual BGREWRITEAOFAutomatic

vs. Redis with RDB

AspectRedis RDBSwytch Tiered
Max data lossMinutes10ms
Memory during save2x (fork)1x
Point-in-time backupYesNo (continuous)

vs. KeyDB/Dragonfly

AspectKeyDB/DragonflySwytch Tiered
Persistence modelRedis-compatibleFASTER log
DurabilitySame as Redis10ms batched
Multi-threadedYesYes
Memory efficiencySimilar to RedisPassthrough (no duplication)