From 2fbdde541a0fe9110a9aca287bca176faba45da4 Mon Sep 17 00:00:00 2001 From: mikael-lovqvists-claude-agent Date: Sat, 7 Mar 2026 01:04:14 +0000 Subject: [PATCH] Add planning document --- PLAN.md | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) create mode 100644 PLAN.md diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..c61a6e2 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,126 @@ +# delta-backup — Planning Document + +## Concept + +A CLI tool for space-efficient directory backups using binary deltas. Instead of storing full +snapshots each run, it stores the *difference* between the previous and current state, making +backup storage grow proportionally to what actually changed. + +## Directory Roles + +| Name | Purpose | +|--------|---------| +| SOURCE | Live data, possibly remote (e.g. rsync-accessible path) | +| PREV | Last known good state — the base for delta generation | +| PEND | Working area — assembled current state before diffing | +| DELTAS | Stored deltas + manifests + state tracking | + +## Full Run Sequence + +1. **Clear PEND** — remove all contents +2. **rsync PREV → PEND** — seed locally (fast) +3. **rsync SOURCE → PEND** — apply remote changes (only diffs travel over the wire) +4. **Generate delta** — diff PREV vs PEND, produce per-file deltas + manifest +5. **Commit delta** — write to DELTAS atomically +6. **Promote PEND → PREV** — swap working area to become new base + +## Safety / State Machine + +Sequence numbers (not timestamps) identify each delta. A `state.json` in DELTAS tracks progress: + +```json +{ "next_seq": 5, "last_complete": 4 } +``` + +Phase transitions are written to state.json so an aborted run can be detected and recovered. + +**Atomic commit strategy:** +1. Write delta files to `DELTAS/tmp/N/` +2. Rename `DELTAS/tmp/N/` → `DELTAS/N/` (atomic on same filesystem) +3. Promote PEND → PREV +4. Update state.json + +The presence of a fully-renamed `DELTAS/N/` directory is the canonical "delta committed" marker. +State.json is a recoverable cache — can be reconstructed by scanning DELTAS. + +**Recovery rules:** +- `DELTAS/N/` exists but `last_complete` is N-1 → finish promotion, update state +- state.json missing → reconstruct from directory scan + +## Delta Format + +Pluggable backend interface with two operations: + +```js +backend.createDelta(prevFile, newFile, outFile) // spawn process, no shell strings +backend.applyDelta(prevFile, deltaFile, outFile) // spawn process, no shell strings +``` + +**Default backend: zstd** +- Modified files: `zstd --patch-from=prev new -o out.zst` +- New files: `zstd new -o out.zst` (no base) +- Deleted files: manifest entry only, no delta file + +**Planned backends:** xdelta3, bsdiff + +## Manifest Format + +Each delta `DELTAS/N/` contains: +- `manifest.json` — lists all changed files with their status (added/modified/deleted) and metadata +- `files/` — per-file delta or compressed blobs + +```json +{ + "seq": 5, + "timestamp": "2026-03-07T12:00:00Z", + "prev_seq": 4, + "backend": "zstd", + "changes": [ + { "path": "src/main.js", "status": "modified", "delta": "files/src__main.js.zst" }, + { "path": "assets/logo.png", "status": "added", "delta": "files/assets__logo.png.zst" }, + { "path": "old/thing.txt", "status": "deleted" } + ] +} +``` + +## CLI Interface + +``` +delta-backup [options] + +Commands: + run Full backup run + status Show current state (sequences, last run, pending recovery) + restore Apply deltas to reconstruct a point in time (future) + +Options: + --source SOURCE directory (required) + --prev PREV directory (required) + --pend PEND directory (required) + --deltas DELTAS directory (required) + --backend Delta backend: zstd (default), xdelta3 + --dry-run Print what would happen, execute nothing + --config Load options from JSON config file (flags override) +``` + +Guards: refuse to run if any required path is missing from args AND config. Never fall back to +CWD or implicit defaults for directories — explicit is safer. + +## Process Spawning + +All external tools (rsync, zstd, xdelta3) are spawned with explicit argument arrays. +No shell string interpolation ever. Use Node's `child_process.spawn` or similar. + +## Occasional Snapshots + +Delta chains are efficient but fragile over long chains. Periodic full snapshots (every N deltas, +or on demand) bound the reconstruction blast radius. Snapshot support is planned but not in scope +for initial implementation. + +## Implementation Phases + +1. **Phase 1 (now):** Arg parsing, config, dry-run, guards, rsync steps +2. **Phase 2:** Delta generation with zstd backend, manifest writing, atomic commit +3. **Phase 3:** PREV promotion, state.json management, recovery logic +4. **Phase 4:** `status` and `restore` commands +5. **Future:** Additional backends, snapshot support, scheduling