architecture.md is now a concise overview (~155 lines) with a Documentation section linking to all sub-docs. New sub-docs in docs/: transport.md — wire modes, frame header, serialization, web peer relay.md — delivery modes, memory model, congestion, scheduler codec.md — stream metadata, format negotiation, codec backends xorg.md — screen grab, viewer sink, render loop, overlays discovery.md — multicast announcements, multi-site, site gateways node-state.md — wanted/current state, reconciler, stats, queries device-resilience.md — device loss handling, stream events, audio (future) All cross-references updated to file links. Every sub-doc links back to architecture.md. docs/transport.md links to docs/protocol.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
132 lines
7.3 KiB
Markdown
132 lines
7.3 KiB
Markdown
# Transport Protocol
|
|
|
|
See [Architecture Overview](../architecture.md).
|
|
|
|
Transport between nodes operates in one of two modes. The choice is per-edge and has direct implications for what the relay on that edge can do.
|
|
|
|
## Opaque Binary Stream
|
|
|
|
The transport forwards bytes as they arrive with no understanding of frame boundaries. The relay acts as a pure byte pipe.
|
|
|
|
- Zero framing overhead
|
|
- Cannot drop frames (frame boundaries are unknown)
|
|
- Cannot multiplex multiple streams (no way to distinguish them)
|
|
- Cannot do per-frame accounting (byte budgets become byte-rate estimates only)
|
|
- Low-latency output is not available — the relay cannot discard a partial frame
|
|
|
|
This mode is appropriate for simple point-to-point forwarding where the consumer handles all framing, and where the relay has no need for frame-level intelligence.
|
|
|
|
## Frame-Encapsulated Stream
|
|
|
|
Each message is prefixed with a small fixed-size header. This applies to both video frames and control messages — the transport is unified.
|
|
|
|
Header fields:
|
|
|
|
| Field | Size | Purpose |
|
|
|---|---|---|
|
|
| `message_type` | 2 bytes | Determines how the payload is interpreted |
|
|
| `payload_length` | 4 bytes | Byte length of the following payload |
|
|
|
|
The header is intentionally minimal. Any node — including a relay that does not recognise a message type — can skip or forward the frame by reading exactly `payload_length` bytes without needing to understand the payload. All message-specific identifiers (stream ID, correlation ID, etc.) live inside the payload and are handled by the relevant message type handler.
|
|
|
|
**Message types and their payload structure:**
|
|
|
|
| Value | Type | Payload starts with |
|
|
|---|---|---|
|
|
| `0x0001` | Video frame | `stream_id` (u16), then compressed frame data |
|
|
| `0x0002` | Control request | `request_id` (u16), then command-specific fields |
|
|
| `0x0003` | Control response | `request_id` (u16), then result-specific fields |
|
|
| `0x0004` | Stream event | `stream_id` (u16), `event_code` (u8), then event-specific fields |
|
|
|
|
Node-level messages (not tied to any stream or request) have no prefix beyond the header — the payload begins with the message-specific fields directly.
|
|
|
|
Control payloads are binary-serialized structures — see [Protocol Serialization](#protocol-serialization). Stream events carry lifecycle signals — see [Device Resilience](./device-resilience.md).
|
|
|
|
## Unified Control and Video on One Connection
|
|
|
|
By carrying control messages on the same transport as video frames, the system avoids managing separate connections per peer. A node that receives a video stream can be queried or commanded over the same socket.
|
|
|
|
This directly enables **remote device enumeration**: a connecting node can issue a control request asking what V4L2 devices the remote host exposes, and receive the list in a control response — before any video streams are established. Discovery and streaming share the same channel.
|
|
|
|
The V4L2 control operations map naturally to control request/response pairs:
|
|
|
|
| Operation | Direction |
|
|
|---|---|
|
|
| Enumerate devices | request → response |
|
|
| Get device controls (parameters, ranges, menus) | request → response |
|
|
| Get control values | request → response |
|
|
| Set control values | request → response (ack/fail) |
|
|
|
|
Control messages are low-volume and can be interleaved with the video frame stream without meaningful overhead.
|
|
|
|
## Capability Implications
|
|
|
|
| Feature | Opaque | Encapsulated |
|
|
|---|---|---|
|
|
| Simple forwarding | yes | yes |
|
|
| Low-latency drop | **no** | yes |
|
|
| Per-frame byte accounting | **no** | yes |
|
|
| Multi-stream over one transport | **no** | yes |
|
|
| Sequence numbers / timestamps | **no** | yes (via extension) |
|
|
| Control / command channel | **no** | yes |
|
|
| Remote device enumeration | **no** | yes |
|
|
| Stream lifecycle signals | **no** | yes |
|
|
|
|
The most important forcing function is **low-latency relay**: to drop a pending frame when a newer one arrives, the relay must know where frames begin and end. An opaque stream cannot support this, so any edge that requires low-latency output must use encapsulation.
|
|
|
|
Opaque streams are a valid optimization for leaf edges where the downstream consumer (e.g. an archiver writing raw bytes to disk) does its own framing, requires no relay intelligence, and has no need for remote control.
|
|
|
|
---
|
|
|
|
## Protocol Serialization
|
|
|
|
Control message payloads use a compact binary format. The wire encoding is **little-endian** throughout — all target platforms (Raspberry Pi ARM, x86 laptop) are little-endian, and little-endian is the convention of most modern protocols (USB, Bluetooth LE, etc.).
|
|
|
|
### Serialization Layer
|
|
|
|
A `serial` module provides the primitive read/write operations on byte buffers:
|
|
|
|
- `put_u8`, `put_u16`, `put_u32`, `put_i32`, `put_u64` — write a value at a position in a buffer
|
|
- `get_u8`, `get_u16`, `get_u32`, `get_i32`, `get_u64` — read a value from a position in a buffer
|
|
|
|
These are pure buffer operations with no I/O. Fields are never written by casting a struct to bytes — each field is placed explicitly, which eliminates struct padding and alignment assumptions.
|
|
|
|
### Protocol Layer
|
|
|
|
A `protocol` module builds on `serial` and the transport to provide typed message functions:
|
|
|
|
```c
|
|
write_v4l2_set_control(stream, id, value);
|
|
write_v4l2_get_control(stream, id);
|
|
write_v4l2_enumerate_controls(stream);
|
|
```
|
|
|
|
Each `write_*` function knows the exact wire layout of its message, packs the full frame (header + payload) into a stack buffer using `put_*`, then issues a single write to the stream. The corresponding `read_*` functions unpack responses using `get_*`.
|
|
|
|
This gives a clean two-layer separation: `serial` handles byte layout, `protocol` handles message semantics and I/O.
|
|
|
|
### Web Interface as a Protocol Peer
|
|
|
|
The web interface (Node.js/Express) participates in the graph as a first-class protocol peer — it speaks the same binary protocol as any C node. There is no JSON bridge or special C code to serve the web layer. The boundary is:
|
|
|
|
- **Socket side**: binary protocol, framed messages, little-endian fields read with `DataView` (`dataView.getUint32(offset, true)` maps directly to `get_u32`)
|
|
- **Browser side**: HTTP/WebSocket, JSON, standard web APIs
|
|
|
|
A `protocol.mjs` module in the web layer mirrors the C `protocol` module — same message types, same wire layout, different language. This lets the web interface connect to any video node, send control requests (V4L2 enumeration, parameter get/set, device discovery), and receive structured responses.
|
|
|
|
Treating the web node as a peer also means it exercises the real protocol, which surfaces bugs that a JSON bridge would hide.
|
|
|
|
### Future: Single Source of Truth via Preprocessor
|
|
|
|
The C `protocol` module and the JavaScript `protocol.mjs` currently encode the same wire format in two languages. This duplication is a drift risk — a change to a message layout must be applied in both places.
|
|
|
|
A future preprocessor will eliminate this. Protocol messages will be defined once in a language-agnostic schema, and the preprocessor will emit both:
|
|
- C source — `put_*`/`get_*` calls, struct definitions, `write_*`/`read_*` functions
|
|
- ESM JavaScript — `DataView`-based encode/decode, typed constants
|
|
|
|
The preprocessor is the same tool planned for generating error location codes (see `common/error`). The protocol schema becomes a single source of truth, and both the C and JavaScript implementations are derived artifacts.
|
|
|
|
---
|
|
|
|
For the full message payload schemas see [Protocol Reference](./protocol.md).
|