Files
delta-backup/PLAN.md

127 lines
4.4 KiB
Markdown

# delta-backup — Planning Document
## Concept
A CLI tool for space-efficient directory backups using binary deltas. Instead of storing full
snapshots each run, it stores the *difference* between the previous and current state, making
backup storage grow proportionally to what actually changed.
## Directory Roles
| Name | Purpose |
|--------|---------|
| SOURCE | Live data, possibly remote (e.g. rsync-accessible path) |
| PREV | Last known good state — the base for delta generation |
| PEND | Working area — assembled current state before diffing |
| DELTAS | Stored deltas + manifests + state tracking |
## Full Run Sequence
1. **Clear PEND** — remove all contents
2. **rsync PREV → PEND** — seed locally (fast)
3. **rsync SOURCE → PEND** — apply remote changes (only diffs travel over the wire)
4. **Generate delta** — diff PREV vs PEND, produce per-file deltas + manifest
5. **Commit delta** — write to DELTAS atomically
6. **Promote PEND → PREV** — swap working area to become new base
## Safety / State Machine
Sequence numbers (not timestamps) identify each delta. A `state.json` in DELTAS tracks progress:
```json
{ "next_seq": 5, "last_complete": 4 }
```
Phase transitions are written to state.json so an aborted run can be detected and recovered.
**Atomic commit strategy:**
1. Write delta files to `DELTAS/tmp/N/`
2. Rename `DELTAS/tmp/N/``DELTAS/N/` (atomic on same filesystem)
3. Promote PEND → PREV
4. Update state.json
The presence of a fully-renamed `DELTAS/N/` directory is the canonical "delta committed" marker.
State.json is a recoverable cache — can be reconstructed by scanning DELTAS.
**Recovery rules:**
- `DELTAS/N/` exists but `last_complete` is N-1 → finish promotion, update state
- state.json missing → reconstruct from directory scan
## Delta Format
Pluggable backend interface with two operations:
```js
backend.createDelta(prevFile, newFile, outFile) // spawn process, no shell strings
backend.applyDelta(prevFile, deltaFile, outFile) // spawn process, no shell strings
```
**Default backend: zstd**
- Modified files: `zstd --patch-from=prev new -o out.zst`
- New files: `zstd new -o out.zst` (no base)
- Deleted files: manifest entry only, no delta file
**Planned backends:** xdelta3, bsdiff
## Manifest Format
Each delta `DELTAS/N/` contains:
- `manifest.json` — lists all changed files with their status (added/modified/deleted) and metadata
- `files/` — per-file delta or compressed blobs
```json
{
"seq": 5,
"timestamp": "2026-03-07T12:00:00Z",
"prev_seq": 4,
"backend": "zstd",
"changes": [
{ "path": "src/main.js", "status": "modified", "delta": "files/src__main.js.zst" },
{ "path": "assets/logo.png", "status": "added", "delta": "files/assets__logo.png.zst" },
{ "path": "old/thing.txt", "status": "deleted" }
]
}
```
## CLI Interface
```
delta-backup [options] <command>
Commands:
run Full backup run
status Show current state (sequences, last run, pending recovery)
restore Apply deltas to reconstruct a point in time (future)
Options:
--source <path> SOURCE directory (required)
--prev <path> PREV directory (required)
--pend <path> PEND directory (required)
--deltas <path> DELTAS directory (required)
--backend <name> Delta backend: zstd (default), xdelta3
--dry-run Print what would happen, execute nothing
--config <file> Load options from JSON config file (flags override)
```
Guards: refuse to run if any required path is missing from args AND config. Never fall back to
CWD or implicit defaults for directories — explicit is safer.
## Process Spawning
All external tools (rsync, zstd, xdelta3) are spawned with explicit argument arrays.
No shell string interpolation ever. Use Node's `child_process.spawn` or similar.
## Occasional Snapshots
Delta chains are efficient but fragile over long chains. Periodic full snapshots (every N deltas,
or on demand) bound the reconstruction blast radius. Snapshot support is planned but not in scope
for initial implementation.
## Implementation Phases
1. **Phase 1 (now):** Arg parsing, config, dry-run, guards, rsync steps
2. **Phase 2:** Delta generation with zstd backend, manifest writing, atomic commit
3. **Phase 3:** PREV promotion, state.json management, recovery logic
4. **Phase 4:** `status` and `restore` commands
5. **Future:** Additional backends, snapshot support, scheduling