Esc

    Ledger Architecture

    Deep dive into InferaDB's storage layer — a per-vault blockchain with Raft consensus.

    Overview

    The Ledger is InferaDB’s storage layer. It provides durable, replicated, cryptographically verifiable storage for all authorization data. Every write commits to a per-vault blockchain via Raft, producing Merkle proofs for independent integrity verification.

    Consensus

    The Ledger uses OpenRaft. Clusters require an odd number of nodes (typically 3 or 5). The leader handles all writes; reads can be served by any node depending on consistency requirements.

    Storage Engine

    Custom B+ tree storage engine built for authorization workloads:

    Property Detail
    Tables 21 internal tables
    Write model Single-writer (serialized through Raft)
    Page checksums XXHash per page for corruption detection
    Compression zstd for snapshots

    Key Format

    All keys in the storage engine follow a composite format:

    vault_id (8 bytes) + bucket_id (1 byte) + local_key (variable)
    

    This ensures vault-level data locality and efficient per-vault range scans.

    State Root and Merkle Proofs

    State Root Computation

    256-bucket state root, updated incrementally. Only buckets affected by a write are recomputed, keeping overhead constant regardless of data size.

    Merkle Proofs

    Proof Type Verifiable? Description
    Point read Yes Prove that a specific key has a specific value
    Transaction inclusion Yes Prove that a transaction was included in a specific block
    List operation No List results are not individually provable

    Clients can verify point reads and transaction inclusion without trusting any Ledger node.

    Write Path

    Client
      → Leader node
        → Raft AppendEntries to followers
          → Quorum acknowledgment
            → Commit to local B+ tree
              → Produce block with state_root
                → Return revision token to client
    

    Each committed write produces a block with the updated state_root, linked to the vault’s chain.

    Write Latency

    Percentile Latency
    p50 ~3–4 ms
    p99 ~10–15 ms

    Adaptive Batching

    Writes are batched under load:

    Parameter Default Description
    Max batch size 100 Maximum writes per batch
    Batch timeout 5 ms Maximum wait time before flushing a batch
    Eager commit On Commit immediately when a single write arrives

    With eager commit (default), single writes commit immediately. Under high throughput, writes batch automatically.

    Idempotency

    Two-tier idempotency for safe retries:

    1. In-memory cache — Moka LRU for fast deduplication of immediate retries
    2. Replicated entries — Idempotency keys persisted through Raft, surviving leader failover

    Snapshots

    Property Detail
    Format Binary, zstd-compressed
    Trigger interval Every 5 minutes or 10,000 blocks
    Purpose Faster node recovery and log compaction

    New nodes receive a snapshot instead of replaying the entire Raft log.

    Vault Health Monitoring

    Continuous per-vault chain health monitoring:

    • Detects gaps in the block sequence
    • Validates page checksums on read
    • Triggers auto-recovery from peer replicas when corruption is detected

    Multi-Region Deployment

    Independent Raft groups per region. Data residency is enforced at the vault level:

    # Pin a Ledger node to a specific region
    inferadb-ledger --region us-east-1
    

    Vaults are pinned to a region at creation. Data is stored only on nodes within that region for GDPR and data residency compliance.