5.1 KiB
delta-backup — Planning Document
Concept
A CLI tool for space-efficient directory backups using binary deltas. Instead of storing full snapshots each run, it stores the difference between the previous and current state, making backup storage grow proportionally to what actually changed.
Directory Roles
| Name | Purpose |
|---|---|
| SOURCE | Live data, possibly remote (e.g. rsync-accessible path) |
| PREV | Last known good state — the base for delta generation |
| PEND | Working area — assembled current state before diffing |
| DELTAS | Stored deltas + manifests + state tracking |
Full Run Sequence
- Clear PEND — remove all contents
- rsync PREV → PEND — seed locally (fast)
- rsync SOURCE → PEND — apply remote changes (only diffs travel over the wire)
- Generate delta — parse rsync itemize output to get change list, produce per-file deltas + manifest
- Commit delta — write to DELTAS atomically
- Promote PEND → PREV — swap working area to become new base
Safety / State Machine
Sequence numbers (not timestamps) identify each delta. A state.json in DELTAS tracks progress:
{ "next_seq": 5, "last_complete": 4 }
Phase transitions are written to state.json so an aborted run can be detected and recovered.
Atomic commit strategy:
- Write delta files to
DELTAS/tmp/N/ - Rename
DELTAS/tmp/N/→DELTAS/N/(atomic on same filesystem) - Promote PEND → PREV
- Update state.json
The presence of a fully-renamed DELTAS/N/ directory is the canonical "delta committed" marker.
State.json is a recoverable cache — can be reconstructed by scanning DELTAS.
Recovery rules:
DELTAS/N/exists butlast_completeis N-1 → finish promotion, update state- state.json missing → reconstruct from directory scan
Change Detection
No directory walk needed. rsync SOURCE→PEND is run with --itemize-changes, producing a
machine-readable list of exactly what changed. Output is captured (not streamed) and parsed:
| rsync prefix | Meaning |
|---|---|
>f+++++++++ |
New file |
>f.st...... |
Modified file (any combination of change flags) |
*deleting |
Deleted file |
cd+++++++++ |
New directory (ignored for delta purposes) |
Lines starting with >f or *deleting are extracted. The path is the remainder after the
11-character itemize code + space. This becomes the change list fed directly into delta generation
— no separate directory walk required.
Delta Format
Pluggable backend interface with two operations:
backend.createDelta(prevFile, newFile, outFile) // spawn process, no shell strings
backend.applyDelta(prevFile, deltaFile, outFile) // spawn process, no shell strings
Default backend: zstd
- Modified files:
zstd --patch-from=prev new -o out.zst - New files:
zstd new -o out.zst(no base) - Deleted files: manifest entry only, no delta file
Planned backends: xdelta3, bsdiff
Manifest Format
Each delta DELTAS/N/ contains:
manifest.json— lists all changed files with their status (added/modified/deleted) and metadatafiles/— per-file delta or compressed blobs
{
"seq": 5,
"timestamp": "2026-03-07T12:00:00Z",
"prev_seq": 4,
"backend": "zstd",
"changes": [
{ "path": "src/main.js", "status": "modified", "delta": "files/src__main.js.zst" },
{ "path": "assets/logo.png", "status": "added", "delta": "files/assets__logo.png.zst" },
{ "path": "old/thing.txt", "status": "deleted" }
]
}
CLI Interface
delta-backup [options] <command>
Commands:
run Full backup run
status Show current state (sequences, last run, pending recovery)
restore Apply deltas to reconstruct a point in time (future)
Options:
--source <path> SOURCE directory (required)
--prev <path> PREV directory (required)
--pend <path> PEND directory (required)
--deltas <path> DELTAS directory (required)
--backend <name> Delta backend: zstd (default), xdelta3
--dry-run Print what would happen, execute nothing
--config <file> Load options from JSON config file (flags override)
Guards: refuse to run if any required path is missing from args AND config. Never fall back to CWD or implicit defaults for directories — explicit is safer.
Process Spawning
All external tools (rsync, zstd, xdelta3) are spawned with explicit argument arrays.
No shell string interpolation ever. Use Node's child_process.spawn or similar.
Occasional Snapshots
Delta chains are efficient but fragile over long chains. Periodic full snapshots (every N deltas, or on demand) bound the reconstruction blast radius. Snapshot support is planned but not in scope for initial implementation.
Implementation Phases
- Phase 1 (now): Arg parsing, config, dry-run, guards, rsync steps
- Phase 2: Delta generation with zstd backend, manifest writing, atomic commit
- Phase 3: PREV promotion, state.json management, recovery logic
- Phase 4:
statusandrestorecommands - Future: Additional backends, snapshot support, scheduling