docs: split architecture.md into focused sub-documents

architecture.md is now a concise overview (~155 lines) with a Documentation section linking to all sub-docs. New sub-docs in docs/: transport.md — wire modes, frame header, serialization, web peer relay.md — delivery modes, memory model, congestion, scheduler codec.md — stream metadata, format negotiation, codec backends xorg.md — screen grab, viewer sink, render loop, overlays discovery.md — multicast announcements, multi-site, site gateways node-state.md — wanted/current state, reconciler, stats, queries device-resilience.md — device loss handling, stream events, audio (future) All cross-references updated to file links. Every sub-doc links back to architecture.md. docs/transport.md links to docs/protocol.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 23:23:54 +00:00
parent beaeea8dab
commit 4e40223478
8 changed files with 690 additions and 678 deletions
--- a/docs/node-state.md
+++ b/docs/node-state.md
@@ -0,0 +1,93 @@
+# Node State Model
+
+See [Architecture Overview](../architecture.md).
+
+## Wanted vs Current State
+
+Each node maintains two independent views of its configuration:
+
+**Wanted state** — the declared intent for this node. Set by the controller via protocol commands and persisted independently of whether the underlying resources are actually running. Examples: "ingest /dev/video0 as stream 3, send to 192.168.1.2:8001", "display stream 3 in a window". Wanted state survives connection drops, device loss, and restarts — it represents what the node *should* be doing.
+
+**Current state** — what the node is actually doing right now. Derived from which file descriptors are open, which transport connections are established, which processes are running. Changes as resources are acquired or released.
+
+The controller queries both views to construct the graph. Wanted state gives the topology (what is configured). Current state gives the runtime overlay (what is live, with stats).
+
+This separation means the web UI can show an edge as grey (configured but not connected), green (connected and streaming), or red (configured but failed) without any special-casing — the difference is just whether wanted and current state agree.
+
+## Reconciler
+
+A generic reconciler closes the gap between wanted and current state. It is invoked:
+
+- **On event** — transport disconnect, device error, process exit, `STREAM_OPEN` received: fast response to state changes
+- **On periodic tick** — safety net; catches external failures that produce no callback (e.g. a device that silently disappears and reappears)
+
+The reconciler does not know what a "stream" or a "V4L2 device" is. It operates on abstract state machines, each representing one resource. Resources declare their states, transitions, and dependencies; the reconciler finds the path from current to wanted state and executes the transitions in order.
+
+## Resource State Machines
+
+Each managed resource is described as a directed graph:
+
+- **Nodes** are states (e.g. `CLOSED`, `OPEN`, `STREAMING`)
+- **Edges** are transitions with associated actions (e.g. `open_fd`, `close_fd`, `connect_transport`, `spawn_process`)
+- **Dependencies** between resources constrain ordering (e.g. transport connection requires device open)
+
+The state graphs are small and defined at compile time. Pathfinding is BFS — with 3–5 states per resource the cost is negligible. The benefit is that adding a new resource type (e.g. an ffmpeg subprocess for codec work) requires only defining its state graph and declaring its dependencies; the reconciler's core logic is unchanged.
+
+**Example resource state graphs:**
+
+V4L2 capture device:
+```
+CLOSED → OPEN → STREAMING
+```
+Transitions: `open_fd` (CLOSED→OPEN), `start_capture` (OPEN→STREAMING), `stop_capture` (STREAMING→OPEN), `close_fd` (OPEN→CLOSED).
+
+Outbound transport connection:
+```
+DISCONNECTED → CONNECTING → CONNECTED
+```
+Transitions: `connect` (DISCONNECTED→CONNECTING), `connected_cb` (CONNECTING→CONNECTED), `disconnect` (CONNECTED→DISCONNECTED).
+
+External codec process:
+```
+STOPPED → STARTING → RUNNING
+```
+Transitions: `spawn` (STOPPED→STARTING), `ready_cb` (STARTING→RUNNING), `kill` (RUNNING→STOPPED).
+
+Dependency example: "outbound transport connection" requires "V4L2 device open". The reconciler will not attempt to connect the transport until the device is in state `OPEN` or `STREAMING`.
+
+## Generic Implementation
+
+The reconciler is implemented as a standalone module (`reconciler`) that is not specific to video. It operates on:
+
+```c
+typedef struct {
+    int   state_count;
+    int   current_state;
+    int   wanted_state;
+    /* transition table: [from][to] → action fn + dependency list */
+} Resource;
+```
+
+This makes it reusable across any node component in the project — not just video ingest. The video node registers its resources (device, transport connection, display sink) and their dependencies, then calls `reconciler_tick()` on events and periodically.
+
+## Per-Stream Stats
+
+Live fps and throughput are tracked per stream using a header-only rolling-window tracker (`include/stream_stats.h`). It maintains a 0.5s window and recomputes `fps` and `mbps` each time `stream_stats_tick()` returns true. Stats are recorded by calling `stream_stats_record_frame()` on each frame. The tracker is used directly in the ingest and sink paths and feeds the runtime state reported to the controller.
+
+## Node State Queries
+
+Two protocol commands allow the controller to query a node's state (planned — not yet implemented in the protocol module):
+
+**`GET_CONFIG_STATE`** — returns the wanted state: which streams the node is configured to produce or consume, their destinations/sources, format, stream ID. This is the topology view — what is configured regardless of whether it is currently active.
+
+**`GET_RUNTIME_STATE`** — returns the current state: which resources are in which reconciler state, live fps/mbps per stream (from `stream_stats`), error codes for any failed resources.
+
+The controller queries all discovered nodes, correlates streams by ID and peer address, and reconstructs the full graph from the union of responses. No central authority is needed — the graph emerges from the node state reports.
+
+## Stream ID Assignment
+
+Stream IDs are assigned by the controller, not by individual nodes. This ensures that when node A reports "I am sending stream 3 to B" and node B reports "I am receiving stream 3 from A", the IDs match and the edge can be reconstructed. Each `START_INGEST` or `START_SINK` command from the controller includes the stream ID to use.
+
+## Connection Direction
+
+The source node connects outbound to the sink's transport server port. A single TCP port per node is the default — all traffic (video frames, control messages, state queries) flows through it in both directions. Dedicated per-stream ports on separate listening sockets are a future option for high-bandwidth links and must be represented in state reporting so the graph reconstructs correctly regardless of which port a connection uses.