video-setup/docs/relay.md

# Relay Design

See [Architecture Overview](../architecture.md).

A relay receives frames from one or more upstream sources and distributes them to any number of outputs. Each output is independently configured with a **delivery mode** that determines how it handles the tension between latency and completeness.

## Output Delivery Modes

**Low-latency mode** — minimize delay, accept loss

The output holds at most one pending frame. When a new frame arrives:
- If the slot is empty, the frame occupies it and is sent as soon as the transport allows
- If the slot is already occupied (transport not ready), the incoming frame is dropped — the pending frame is already stale enough

The consumer always receives the most recent frame the transport could deliver. Frame loss is expected and acceptable.

**Completeness mode** — minimize loss, accept delay

The output maintains a queue. When a new frame arrives it is enqueued. The transport drains the queue in order. When the queue is full, a drop policy is applied — either drop the oldest frame (preserve recency) or drop the newest (preserve continuity). Which policy fits depends on the consumer: an archiver may prefer continuity; a scrubber may prefer recency.

## Memory Model

Compressed frames have variable sizes (I-frames vs P-frames, quality settings, scene complexity), so fixed-slot buffers waste memory unpredictably. The preferred model is **per-frame allocation** with explicit bookkeeping.

Each allocated frame is tracked with at minimum:
- Byte size
- Sequence number or timestamp
- Which outputs still hold a reference

Limits are enforced per output independently — not as a shared pool — so a slow completeness output cannot starve a low-latency output or exhaust global memory. Per-output limits have two axes:
- **Frame count** — cap on number of queued frames
- **Byte budget** — cap on total bytes in flight for that output

Both limits should be configurable. Either limit being reached triggers the drop policy.

## Congestion: Two Sides

Congestion can arise at both ends of the relay and must be handled explicitly on each.

**Inbound congestion (upstream → relay)**

If the upstream source produces frames faster than any output can dispatch them:
- Low-latency outputs are unaffected by design — they always hold at most one frame
- Completeness outputs will see their queues grow; limits and drop policy absorb the excess

The relay never signals backpressure to the upstream. It is the upstream's concern to produce frames at a sustainable rate; the relay's concern is only to handle whatever arrives without blocking.

**Outbound congestion (relay → downstream transport)**

If the transport layer cannot accept a frame immediately:
- Low-latency mode: the pending frame is dropped when the next frame arrives; the transport sends the newest frame it can when it becomes ready
- Completeness mode: the frame stays in the queue; the queue grows until the transport catches up or limits are reached

The interaction between outbound congestion and the byte budget is important: a transport that is consistently slow will fill the completeness queue to its byte budget limit, at which point the drop policy engages. This is the intended safety valve — the budget defines the maximum acceptable latency inflation before the system reverts to dropping.

## Congestion Signals

Even though the relay does not apply backpressure, it should emit **observable congestion signals** — drop counts, queue depth, byte utilization — on the control plane so that the controller can make decisions: reduce upstream quality, reroute, alert, or adjust budgets dynamically.

## Multi-Input Scheduling

When a relay has multiple input sources feeding the same output, it needs a policy for which source's frame to forward next when the link is under pressure or when frames from multiple sources are ready simultaneously. This policy is the **scheduler**.

The scheduler is a separate concern from delivery mode (low-latency vs completeness) — delivery mode governs buffering and drop behaviour per output; the scheduler governs which input is served when multiple compete.

Candidate policies (not exhaustive — the design should keep the scheduler pluggable):

| Policy | Behaviour |
|---|---|
| **Strict priority** | Always prefer the highest-priority source; lower-priority sources are only forwarded when no higher-priority frame is pending |
| **Round-robin** | Cycle evenly across all active inputs — one frame from each in turn |
| **Weighted round-robin** | Each input has a weight; forwarding interleaves at the given ratio (e.g. 1:3 means one frame from source A per three from source B) |
| **Deficit round-robin** | Byte-fair rather than frame-fair variant of weighted round-robin; useful when sources have very different frame sizes |
| **Source suppression** | A congested or degraded link simply stops forwarding from a given input entirely until conditions improve |

Priority remains a property of the path (set at connection time). The scheduler uses those priorities plus runtime state (queue depths, drop rates) to make per-frame decisions.

The `relay` module should expose a scheduler interface so policies are interchangeable without touching routing logic. Which policies to implement first is an open question — see [Open Questions](../architecture.md#open-questions).

```mermaid
graph TD
    UP1[Upstream Source A] -->|encapsulated stream| RELAY[Relay]
    UP2[Upstream Source B] -->|encapsulated stream| RELAY

    RELAY --> LS[Low-latency Output<br>single-slot<br>drop on collision]
    RELAY --> CS[Completeness Output<br>queued<br>drop on budget exceeded]
    RELAY --> OB[Opaque Output<br>byte pipe<br>no frame awareness]

    LS -->|encapsulated| LC[Low-latency Consumer<br>eg. preview display]
    CS -->|encapsulated| CC[Completeness Consumer<br>eg. archiver]
    OB -->|opaque| RAW[Raw Consumer<br>eg. disk writer]

    RELAY -.->|drop count<br>queue depth<br>byte utilization| CTRL[Controller node]
```