Use numeric indices for delta filenames, document limitations
- Delta files now named 0.zst, 1.zst etc — avoids path length issues and ambiguous separator substitution; manifest maps index to path - PLAN.md: document delta naming rationale - PLAN.md: document cross-file deduplication limitation and possible future approaches (zstd dictionary training, content-addressing, tar stream)
This commit is contained in:
19
PLAN.md
19
PLAN.md
@@ -161,6 +161,25 @@ rsync meaningful exit codes:
|
||||
Currently basic: any non-zero exit code throws. Finer-grained handling planned as part of the
|
||||
operation abstraction refactor.
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### Delta file naming
|
||||
Delta files are named by numeric index (e.g. `0.zst`, `1.zst`) rather than by path. The manifest
|
||||
maps each index to its source path. Path-based naming was considered but rejected because:
|
||||
- Deep directory trees can exceed filesystem filename length limits
|
||||
- Path separator substitution (e.g. `/` → `__`) is ambiguous for filenames containing that sequence
|
||||
|
||||
### Cross-file deduplication
|
||||
Per-file deltas cannot exploit similarity between different files — each file is compressed/diffed
|
||||
in isolation. Identical or near-identical files in different locations get no benefit from each
|
||||
other. Approaches that could address this:
|
||||
- `zstd --train` to build a shared dictionary from the corpus, then compress all deltas against it
|
||||
- Content-addressed storage (deduplicate at the block or file level before delta generation)
|
||||
- Tar the entire PEND tree and delta against the previous tar (single-stream, cross-file repetition
|
||||
is visible to the compressor — but random access for restore becomes harder)
|
||||
|
||||
These are significant complexity increases and out of scope for now.
|
||||
|
||||
## Occasional Snapshots
|
||||
|
||||
Delta chains are efficient but fragile over long chains. Periodic full snapshots (every N deltas,
|
||||
|
||||
@@ -65,16 +65,16 @@ export async function runCommand(config) {
|
||||
}
|
||||
|
||||
const manifestChanges = [];
|
||||
let fileIndex = 0;
|
||||
|
||||
for (const change of changes) {
|
||||
const deltaFilename = change.path.replaceAll('/', '__') + backend.ext;
|
||||
const outFile = join(filesDir, deltaFilename);
|
||||
|
||||
if (change.status === 'deleted') {
|
||||
manifestChanges.push({ path: change.path, status: 'deleted' });
|
||||
continue;
|
||||
}
|
||||
|
||||
const deltaFilename = `${fileIndex}${backend.ext}`;
|
||||
const outFile = join(filesDir, deltaFilename);
|
||||
const prevFile = join(prev, change.path);
|
||||
const newFile = join(pend, change.path);
|
||||
|
||||
@@ -97,6 +97,8 @@ export async function runCommand(config) {
|
||||
status: change.status,
|
||||
delta: join('files', deltaFilename),
|
||||
});
|
||||
|
||||
fileIndex++;
|
||||
}
|
||||
|
||||
// ── Phase 5: Write manifest + atomic commit ──────────────────
|
||||
|
||||
Reference in New Issue
Block a user