docs: split architecture.md into focused sub-documents

architecture.md is now a concise overview (~155 lines) with a Documentation section linking to all sub-docs. New sub-docs in docs/: transport.md — wire modes, frame header, serialization, web peer relay.md — delivery modes, memory model, congestion, scheduler codec.md — stream metadata, format negotiation, codec backends xorg.md — screen grab, viewer sink, render loop, overlays discovery.md — multicast announcements, multi-site, site gateways node-state.md — wanted/current state, reconciler, stats, queries device-resilience.md — device loss handling, stream events, audio (future) All cross-references updated to file links. Every sub-doc links back to architecture.md. docs/transport.md links to docs/protocol.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 23:23:54 +00:00
parent beaeea8dab
commit 4e40223478
8 changed files with 690 additions and 678 deletions
--- a/docs/relay.md
+++ b/docs/relay.md
@@ -0,0 +1,94 @@
+# Relay Design
+
+See [Architecture Overview](../architecture.md).
+
+A relay receives frames from one or more upstream sources and distributes them to any number of outputs. Each output is independently configured with a **delivery mode** that determines how it handles the tension between latency and completeness.
+
+## Output Delivery Modes
+
+**Low-latency mode** — minimize delay, accept loss
+
+The output holds at most one pending frame. When a new frame arrives:
+- If the slot is empty, the frame occupies it and is sent as soon as the transport allows
+- If the slot is already occupied (transport not ready), the incoming frame is dropped — the pending frame is already stale enough
+
+The consumer always receives the most recent frame the transport could deliver. Frame loss is expected and acceptable.
+
+**Completeness mode** — minimize loss, accept delay
+
+The output maintains a queue. When a new frame arrives it is enqueued. The transport drains the queue in order. When the queue is full, a drop policy is applied — either drop the oldest frame (preserve recency) or drop the newest (preserve continuity). Which policy fits depends on the consumer: an archiver may prefer continuity; a scrubber may prefer recency.
+
+## Memory Model
+
+Compressed frames have variable sizes (I-frames vs P-frames, quality settings, scene complexity), so fixed-slot buffers waste memory unpredictably. The preferred model is **per-frame allocation** with explicit bookkeeping.
+
+Each allocated frame is tracked with at minimum:
+- Byte size
+- Sequence number or timestamp
+- Which outputs still hold a reference
+
+Limits are enforced per output independently — not as a shared pool — so a slow completeness output cannot starve a low-latency output or exhaust global memory. Per-output limits have two axes:
+- **Frame count** — cap on number of queued frames
+- **Byte budget** — cap on total bytes in flight for that output
+
+Both limits should be configurable. Either limit being reached triggers the drop policy.
+
+## Congestion: Two Sides
+
+Congestion can arise at both ends of the relay and must be handled explicitly on each.
+
+**Inbound congestion (upstream → relay)**
+
+If the upstream source produces frames faster than any output can dispatch them:
+- Low-latency outputs are unaffected by design — they always hold at most one frame
+- Completeness outputs will see their queues grow; limits and drop policy absorb the excess
+
+The relay never signals backpressure to the upstream. It is the upstream's concern to produce frames at a sustainable rate; the relay's concern is only to handle whatever arrives without blocking.
+
+**Outbound congestion (relay → downstream transport)**
+
+If the transport layer cannot accept a frame immediately:
+- Low-latency mode: the pending frame is dropped when the next frame arrives; the transport sends the newest frame it can when it becomes ready
+- Completeness mode: the frame stays in the queue; the queue grows until the transport catches up or limits are reached
+
+The interaction between outbound congestion and the byte budget is important: a transport that is consistently slow will fill the completeness queue to its byte budget limit, at which point the drop policy engages. This is the intended safety valve — the budget defines the maximum acceptable latency inflation before the system reverts to dropping.
+
+## Congestion Signals
+
+Even though the relay does not apply backpressure, it should emit **observable congestion signals** — drop counts, queue depth, byte utilization — on the control plane so that the controller can make decisions: reduce upstream quality, reroute, alert, or adjust budgets dynamically.
+
+## Multi-Input Scheduling
+
+When a relay has multiple input sources feeding the same output, it needs a policy for which source's frame to forward next when the link is under pressure or when frames from multiple sources are ready simultaneously. This policy is the **scheduler**.
+
+The scheduler is a separate concern from delivery mode (low-latency vs completeness) — delivery mode governs buffering and drop behaviour per output; the scheduler governs which input is served when multiple compete.
+
+Candidate policies (not exhaustive — the design should keep the scheduler pluggable):
+
+| Policy | Behaviour |
+|---|---|
+| **Strict priority** | Always prefer the highest-priority source; lower-priority sources are only forwarded when no higher-priority frame is pending |
+| **Round-robin** | Cycle evenly across all active inputs — one frame from each in turn |
+| **Weighted round-robin** | Each input has a weight; forwarding interleaves at the given ratio (e.g. 1:3 means one frame from source A per three from source B) |
+| **Deficit round-robin** | Byte-fair rather than frame-fair variant of weighted round-robin; useful when sources have very different frame sizes |
+| **Source suppression** | A congested or degraded link simply stops forwarding from a given input entirely until conditions improve |
+
+Priority remains a property of the path (set at connection time). The scheduler uses those priorities plus runtime state (queue depths, drop rates) to make per-frame decisions.
+
+The `relay` module should expose a scheduler interface so policies are interchangeable without touching routing logic. Which policies to implement first is an open question — see [Open Questions](../architecture.md#open-questions).
+
+```mermaid
+graph TD
+    UP1[Upstream Source A] -->|encapsulated stream| RELAY[Relay]
+    UP2[Upstream Source B] -->|encapsulated stream| RELAY
+
+    RELAY --> LS[Low-latency Output<br>single-slot<br>drop on collision]
+    RELAY --> CS[Completeness Output<br>queued<br>drop on budget exceeded]
+    RELAY --> OB[Opaque Output<br>byte pipe<br>no frame awareness]
+
+    LS -->|encapsulated| LC[Low-latency Consumer<br>eg. preview display]
+    CS -->|encapsulated| CC[Completeness Consumer<br>eg. archiver]
+    OB -->|opaque| RAW[Raw Consumer<br>eg. disk writer]
+
+    RELAY -.->|drop count<br>queue depth<br>byte utilization| CTRL[Controller node]
+```