docs: split architecture.md into focused sub-documents

architecture.md is now a concise overview (~155 lines) with a
Documentation section linking to all sub-docs.

New sub-docs in docs/:
  transport.md        — wire modes, frame header, serialization, web peer
  relay.md            — delivery modes, memory model, congestion, scheduler
  codec.md            — stream metadata, format negotiation, codec backends
  xorg.md             — screen grab, viewer sink, render loop, overlays
  discovery.md        — multicast announcements, multi-site, site gateways
  node-state.md       — wanted/current state, reconciler, stats, queries
  device-resilience.md — device loss handling, stream events, audio (future)

All cross-references updated to file links. Every sub-doc links back
to architecture.md. docs/transport.md links to docs/protocol.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-28 23:23:54 +00:00
parent beaeea8dab
commit 4e40223478
8 changed files with 690 additions and 678 deletions

94
docs/relay.md Normal file
View File

@@ -0,0 +1,94 @@
# Relay Design
See [Architecture Overview](../architecture.md).
A relay receives frames from one or more upstream sources and distributes them to any number of outputs. Each output is independently configured with a **delivery mode** that determines how it handles the tension between latency and completeness.
## Output Delivery Modes
**Low-latency mode** — minimize delay, accept loss
The output holds at most one pending frame. When a new frame arrives:
- If the slot is empty, the frame occupies it and is sent as soon as the transport allows
- If the slot is already occupied (transport not ready), the incoming frame is dropped — the pending frame is already stale enough
The consumer always receives the most recent frame the transport could deliver. Frame loss is expected and acceptable.
**Completeness mode** — minimize loss, accept delay
The output maintains a queue. When a new frame arrives it is enqueued. The transport drains the queue in order. When the queue is full, a drop policy is applied — either drop the oldest frame (preserve recency) or drop the newest (preserve continuity). Which policy fits depends on the consumer: an archiver may prefer continuity; a scrubber may prefer recency.
## Memory Model
Compressed frames have variable sizes (I-frames vs P-frames, quality settings, scene complexity), so fixed-slot buffers waste memory unpredictably. The preferred model is **per-frame allocation** with explicit bookkeeping.
Each allocated frame is tracked with at minimum:
- Byte size
- Sequence number or timestamp
- Which outputs still hold a reference
Limits are enforced per output independently — not as a shared pool — so a slow completeness output cannot starve a low-latency output or exhaust global memory. Per-output limits have two axes:
- **Frame count** — cap on number of queued frames
- **Byte budget** — cap on total bytes in flight for that output
Both limits should be configurable. Either limit being reached triggers the drop policy.
## Congestion: Two Sides
Congestion can arise at both ends of the relay and must be handled explicitly on each.
**Inbound congestion (upstream → relay)**
If the upstream source produces frames faster than any output can dispatch them:
- Low-latency outputs are unaffected by design — they always hold at most one frame
- Completeness outputs will see their queues grow; limits and drop policy absorb the excess
The relay never signals backpressure to the upstream. It is the upstream's concern to produce frames at a sustainable rate; the relay's concern is only to handle whatever arrives without blocking.
**Outbound congestion (relay → downstream transport)**
If the transport layer cannot accept a frame immediately:
- Low-latency mode: the pending frame is dropped when the next frame arrives; the transport sends the newest frame it can when it becomes ready
- Completeness mode: the frame stays in the queue; the queue grows until the transport catches up or limits are reached
The interaction between outbound congestion and the byte budget is important: a transport that is consistently slow will fill the completeness queue to its byte budget limit, at which point the drop policy engages. This is the intended safety valve — the budget defines the maximum acceptable latency inflation before the system reverts to dropping.
## Congestion Signals
Even though the relay does not apply backpressure, it should emit **observable congestion signals** — drop counts, queue depth, byte utilization — on the control plane so that the controller can make decisions: reduce upstream quality, reroute, alert, or adjust budgets dynamically.
## Multi-Input Scheduling
When a relay has multiple input sources feeding the same output, it needs a policy for which source's frame to forward next when the link is under pressure or when frames from multiple sources are ready simultaneously. This policy is the **scheduler**.
The scheduler is a separate concern from delivery mode (low-latency vs completeness) — delivery mode governs buffering and drop behaviour per output; the scheduler governs which input is served when multiple compete.
Candidate policies (not exhaustive — the design should keep the scheduler pluggable):
| Policy | Behaviour |
|---|---|
| **Strict priority** | Always prefer the highest-priority source; lower-priority sources are only forwarded when no higher-priority frame is pending |
| **Round-robin** | Cycle evenly across all active inputs — one frame from each in turn |
| **Weighted round-robin** | Each input has a weight; forwarding interleaves at the given ratio (e.g. 1:3 means one frame from source A per three from source B) |
| **Deficit round-robin** | Byte-fair rather than frame-fair variant of weighted round-robin; useful when sources have very different frame sizes |
| **Source suppression** | A congested or degraded link simply stops forwarding from a given input entirely until conditions improve |
Priority remains a property of the path (set at connection time). The scheduler uses those priorities plus runtime state (queue depths, drop rates) to make per-frame decisions.
The `relay` module should expose a scheduler interface so policies are interchangeable without touching routing logic. Which policies to implement first is an open question — see [Open Questions](../architecture.md#open-questions).
```mermaid
graph TD
UP1[Upstream Source A] -->|encapsulated stream| RELAY[Relay]
UP2[Upstream Source B] -->|encapsulated stream| RELAY
RELAY --> LS[Low-latency Output<br>single-slot<br>drop on collision]
RELAY --> CS[Completeness Output<br>queued<br>drop on budget exceeded]
RELAY --> OB[Opaque Output<br>byte pipe<br>no frame awareness]
LS -->|encapsulated| LC[Low-latency Consumer<br>eg. preview display]
CS -->|encapsulated| CC[Completeness Consumer<br>eg. archiver]
OB -->|opaque| RAW[Raw Consumer<br>eg. disk writer]
RELAY -.->|drop count<br>queue depth<br>byte utilization| CTRL[Controller node]
```