This document describes how StrataDB persists data and recovers from crashes.

Write-Ahead Log (WAL)

All data changes are first written to the WAL before being applied to the in-memory store. This ensures that committed transactions can be recovered after a crash.

WAL Entry Types

EntryType TagDescription
BeginTxn0x01Start of a transaction
Write0x02Key-value write with version
Delete0x03Key deletion with version
CommitTxn0x04Transaction committed
AbortTxn0x05Transaction aborted
Checkpoint0x06Snapshot boundary marker
JsonCreate0x20JSON document creation
JsonSet0x21JSON path-level update
JsonDelete0x22JSON field deletion
JsonDestroy0x23Entire JSON document deletion
VectorCollectionCreate0x70Vector collection creation
VectorCollectionDelete0x71Vector collection deletion
VectorUpsert0x72Vector insert/update
VectorDelete0x73Vector deletion

Binary Format

Each WAL entry is encoded as:

+-------------------+-------------------+-------------------+-------------------+
| Length (4 bytes)  | Type Tag (1 byte) | Payload (N bytes) | CRC32 (4 bytes)  |
+-------------------+-------------------+-------------------+-------------------+
  • Length — total size of the entry (type + payload + CRC32)
  • Type tag — identifies the entry type, enables forward compatibility
  • Payload — bincode-serialized entry data
  • CRC32 — checksum of type tag + payload, detects corruption

Durability Modes

ModeBehavior
NoneNo WAL writes (in-memory only)
BatchedWAL writes buffered, fsync every ~100ms or ~1000 writes
StrictImmediate fsync after every commit

Snapshots

Snapshots are periodic full-state captures written to disk.

Snapshot File Format

+---------------------------+
| Magic: "INMEM_SNAP" (10B) |
+---------------------------+
| Format Version (4B)       |
+---------------------------+
| Timestamp (8B)            |
+---------------------------+
| WAL Offset (8B)           |
+---------------------------+
| Transaction Count (8B)    |
+---------------------------+
| Primitive Count (1B)      |
+---------------------------+
| Primitive Section 1       |
|   Type ID (1B)            |
|   Data Length (8B)         |
|   Data (N bytes)           |
+---------------------------+
| ... more sections ...     |
+---------------------------+
| CRC32 (4B)                |
+---------------------------+

Primitive IDs

PrimitiveID
KV1
JSON2
Event3
State4
Branch6
Vector7

Snapshot Benefits

  • Bounded recovery time — replay only WAL entries after the snapshot
  • WAL truncation — entries before the snapshot can be removed
  • Atomic writes — snapshots use temp file + rename for crash safety

Recovery Flow

When a database opens and finds existing WAL/snapshot files:

1. Load latest snapshot (if any)
   └── Restores all primitives to snapshot state

2. Read WAL entries after snapshot's WAL offset
   └── Parses entries, validates CRC32

3. Group entries by transaction ID
   └── Builds per-transaction operation lists

4. Apply committed transactions only
   ├── Transactions with CommitTxn → apply
   └── Transactions without CommitTxn → discard (crashed mid-txn)

5. Restore version counters
   └── Preserves exact version numbers from WAL

Recovery Properties

  • Deterministic — same WAL + same snapshot always produces the same state
  • Idempotent — running recovery again produces identical results
  • Prefix-consistent — no partial transactions are visible
  • All primitives — KV, JSON, Event, State, Branch, and Vector are all recovered

Branch Bundles

Branch bundles package a single branch’s data into a portable archive.

Bundle Format

<branch_id>.branchbundle.tar.zst
  branchbundle/
    MANIFEST.json     # Format version (1), xxh3 file checksums
    BRANCH.json          # Branch metadata (id, status, tags, timestamps)
    WAL.branchlog     # Binary WAL entries for this branch

WAL.branchlog Format

Header (16 bytes):
  Magic: "STRATA_WAL" (10 bytes)
  Version: u16 (2 bytes)
  Entry Count: u32 (4 bytes)

Per entry:
  Length: u32 (4 bytes)
  Data: bincode-serialized WALEntry (N bytes)
  CRC32: u32 (4 bytes)

Import Process

  1. Decompress and untar the archive
  2. Validate MANIFEST checksums
  3. Parse BRANCH.json for branch metadata
  4. Read WAL.branchlog entries
  5. Replay entries into the target database (same as crash recovery)

Design Principles

  1. WAL-first — all changes go through WAL before memory
  2. CRC32 everywhere — every entry and snapshot has checksums
  3. Atomic writes — snapshots and bundles use temp file + rename
  4. Branch isolation — all WAL entries include branch_id for filtering
  5. Forward compatibility — type tags allow skipping unknown entry types