Files
nodejs.esm-library/planning/object-graph-storage.md

102 lines
6.6 KiB
Markdown

# Storage System Design
> [!NOTE]
> This document is written by Claude by Anthropic using Sonnet 4.6 and has yet to be vetted by Mikael Lövqvist
## Overview
A reusable, minimum-footprint storage component written in Node.js, intended as a foundation across multiple projects. Backends for other languages are viable as long as they can serialize to JSON. The design aims to address [ACID](https://en.wikipedia.org/wiki/ACID) guarantees while keeping the implementation surface manageable.
---
## Foundational Types
### Primitives
The base layer. Integers, floats, strings, octet buffers and similar. Initial support covers a limited set of types, with expansion planned. Primitives are immutable — a string or number cannot be changed in place, only replaced.
### Collections
- **Array** — ordered, allows duplicates
- **Set** — unordered (though insertion order may be preserved in practice), unique members only. Uniqueness makes identity a non-trivial concern, particularly when members are represented as [Proxy](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy) objects.
Higher-level set operations (intersection, symmetric/asymmetric difference) are out of initial scope.
### Mappings
A map from keys to values. Not required to be bijective.
### Records
Analogous to plain objects in ECMAScript — a map from string keys to arbitrary values. May be a simplified subset of the general mapping type.
---
## API Surface: Proxy-Based Live Objects
The primary interaction model exposes storage-backed objects as [ES6 Proxy](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy) objects, with mutations intercepted transparently and written to the intent log or a pending transaction.
Three retrieval modes are supported:
| Mode | Description |
|------|-------------|
| **Live proxy** | Mutations apply directly to the backend or active transaction. The primary mode for transactional edits. |
| **Detached copy** | A plain native object, disconnected from the backend. Changes are local only and will not persist unless explicitly committed back. |
| **Frozen copy** | A detached copy that is recursively [frozen](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/freeze). Any mutation attempt throws immediately, making the detached nature loud rather than silent. This prevents the failure mode of mutating a copy and losing the change without realising it. |
The difference between detached and frozen is purely about guardrails — the underlying data model is the same.
### Nested Traversal
Accessing a nested object through a live proxy returns another proxy rather than a plain value. This means mutations at any depth are intercepted correctly. The practical consequence is that a query or transaction operates over a graph of proxies, all sharing the same scope.
### Scope
Each query or transaction defines an explicit scope. Proxies operate within that scope and do not traverse outside it by default. Auto-expansion of scope on traversal is supported but is treated as a less preferred path — the design philosophy is to make convenient-but-dangerous approaches available while being somewhat opinionated against them in practice (e.g. requiring an explicit opt-in flag, or emitting a warning).
Because all scope expansions are journaled in the intent log, the expansion history is always recoverable after the fact.
### Proxy Revocation
When a transaction is committed or rolled back, the proxies associated with it become semantically invalid. The system uses [Proxy.revocable()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy/revocable) so that any access to a proxy after its scope closes throws immediately, rather than silently operating against stale state. An alternative softer behaviour — detaching the proxy into a plain copy on close — is a possible design variant.
---
## Storage and Mutation Model
### Intent Log
All mutations — record adjustments, set additions/removals, collection reorderings — are journaled as high-level operations before being applied to internal state. This is equivalent to a [Write-Ahead Log (WAL)](https://en.wikipedia.org/wiki/Write-ahead_logging) in traditional database terminology.
Each journal entry also records the relevant prior state of the objects it touches, enabling full rollback of both failed and successful transactions.
### Transactions
Changes can be staged as a pending transaction and committed atomically. To maintain consistency under concurrent access, a transaction must record the state of the data at the time it was opened. If that state has changed by commit time, the transaction must be rejected — a standard [optimistic concurrency control](https://en.wikipedia.org/wiki/Optimistic_concurrency_control) approach.
### Snapshots
Writing a full snapshot is expensive and happens infrequently. The intent log is what is continuously flushed to disk. On crash recovery, the latest snapshot is loaded and the intent log is replayed forward — a standard [checkpoint/recovery](https://en.wikipedia.org/wiki/Database_transaction#Checkpoint) pattern.
---
## Relationships and the Cross-Boundary Problem
If relationships between records are encoded externally (e.g. as a stored reference from one record to another), a rollback becomes non-local: the rolled-back changes may leave dangling references in objects outside the transaction boundary. Detecting and reversing these requires tracking reverse dependencies, which adds significant complexity.
The preferred approach is to encode relationships internally to the storage system so that the system itself can reason about them during rollback and consistency checks. The exact mechanism is not yet worked out and is one of the primary open design questions.
---
## Open Questions
- Which primitive types to support in the initial version
- Exact semantics of relationship encoding and cross-transaction consistency
- Whether proxy-close behaviour should be hard revocation (throws) or soft detach (becomes a copy)
- Isolation level: the current transaction model implies [serializable isolation](https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable), which is the strictest and most expensive; weaker levels may be worth supporting later
---
## Scope Concern
This component is intended to remain small and reusable, but several of the features above — particularly snapshot/WAL recovery, transaction isolation, and relationship integrity — each carry substantial implementation surface on their own. Keeping the initial scope tight will be important.