# Relay Design See [Architecture Overview](../architecture.md). A relay receives frames from one or more upstream sources and distributes them to any number of outputs. Each output is independently configured with a **delivery mode** that determines how it handles the tension between latency and completeness. ## Output Delivery Modes **Low-latency mode** — minimize delay, accept loss The output holds at most one pending frame. When a new frame arrives: - If the slot is empty, the frame occupies it and is sent as soon as the transport allows - If the slot is already occupied (transport not ready), the incoming frame is dropped — the pending frame is already stale enough The consumer always receives the most recent frame the transport could deliver. Frame loss is expected and acceptable. **Completeness mode** — minimize loss, accept delay The output maintains a queue. When a new frame arrives it is enqueued. The transport drains the queue in order. When the queue is full, a drop policy is applied — either drop the oldest frame (preserve recency) or drop the newest (preserve continuity). Which policy fits depends on the consumer: an archiver may prefer continuity; a scrubber may prefer recency. ## Memory Model Compressed frames have variable sizes (I-frames vs P-frames, quality settings, scene complexity), so fixed-slot buffers waste memory unpredictably. The preferred model is **per-frame allocation** with explicit bookkeeping. Each allocated frame is tracked with at minimum: - Byte size - Sequence number or timestamp - Which outputs still hold a reference Limits are enforced per output independently — not as a shared pool — so a slow completeness output cannot starve a low-latency output or exhaust global memory. Per-output limits have two axes: - **Frame count** — cap on number of queued frames - **Byte budget** — cap on total bytes in flight for that output Both limits should be configurable. Either limit being reached triggers the drop policy. ## Congestion: Two Sides Congestion can arise at both ends of the relay and must be handled explicitly on each. **Inbound congestion (upstream → relay)** If the upstream source produces frames faster than any output can dispatch them: - Low-latency outputs are unaffected by design — they always hold at most one frame - Completeness outputs will see their queues grow; limits and drop policy absorb the excess The relay never signals backpressure to the upstream. It is the upstream's concern to produce frames at a sustainable rate; the relay's concern is only to handle whatever arrives without blocking. **Outbound congestion (relay → downstream transport)** If the transport layer cannot accept a frame immediately: - Low-latency mode: the pending frame is dropped when the next frame arrives; the transport sends the newest frame it can when it becomes ready - Completeness mode: the frame stays in the queue; the queue grows until the transport catches up or limits are reached The interaction between outbound congestion and the byte budget is important: a transport that is consistently slow will fill the completeness queue to its byte budget limit, at which point the drop policy engages. This is the intended safety valve — the budget defines the maximum acceptable latency inflation before the system reverts to dropping. ## Congestion Signals Even though the relay does not apply backpressure, it should emit **observable congestion signals** — drop counts, queue depth, byte utilization — on the control plane so that the controller can make decisions: reduce upstream quality, reroute, alert, or adjust budgets dynamically. ## Multi-Input Scheduling When a relay has multiple input sources feeding the same output, it needs a policy for which source's frame to forward next when the link is under pressure or when frames from multiple sources are ready simultaneously. This policy is the **scheduler**. The scheduler is a separate concern from delivery mode (low-latency vs completeness) — delivery mode governs buffering and drop behaviour per output; the scheduler governs which input is served when multiple compete. Candidate policies (not exhaustive — the design should keep the scheduler pluggable): | Policy | Behaviour | |---|---| | **Strict priority** | Always prefer the highest-priority source; lower-priority sources are only forwarded when no higher-priority frame is pending | | **Round-robin** | Cycle evenly across all active inputs — one frame from each in turn | | **Weighted round-robin** | Each input has a weight; forwarding interleaves at the given ratio (e.g. 1:3 means one frame from source A per three from source B) | | **Deficit round-robin** | Byte-fair rather than frame-fair variant of weighted round-robin; useful when sources have very different frame sizes | | **Source suppression** | A congested or degraded link simply stops forwarding from a given input entirely until conditions improve | Priority remains a property of the path (set at connection time). The scheduler uses those priorities plus runtime state (queue depths, drop rates) to make per-frame decisions. The `relay` module should expose a scheduler interface so policies are interchangeable without touching routing logic. Which policies to implement first is an open question — see [Open Questions](../architecture.md#open-questions). ```mermaid graph TD UP1[Upstream Source A] -->|encapsulated stream| RELAY[Relay] UP2[Upstream Source B] -->|encapsulated stream| RELAY RELAY --> LS[Low-latency Output
single-slot
drop on collision] RELAY --> CS[Completeness Output
queued
drop on budget exceeded] RELAY --> OB[Opaque Output
byte pipe
no frame awareness] LS -->|encapsulated| LC[Low-latency Consumer
eg. preview display] CS -->|encapsulated| CC[Completeness Consumer
eg. archiver] OB -->|opaque| RAW[Raw Consumer
eg. disk writer] RELAY -.->|drop count
queue depth
byte utilization| CTRL[Controller node] ```