Add protocol reference document

Covers all layers: TCP → transport frame → message type dispatcher →
payload schemas. Documents frame format (6-byte header), all message
types, STREAM_OPEN/CLOSE lifecycle, codec/pixel_format/origin tables,
stream events, and discovery announcement wire format.

Also fixes stale channel_id reference in audio section of architecture.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-26 22:03:02 +00:00
parent 197ab7d5db
commit 57e46af57b
3 changed files with 265 additions and 2 deletions

View File

@@ -7,6 +7,7 @@ Designed to run on resource-constrained hardware (Raspberry Pi capturing raw MJP
## Documentation ## Documentation
- [architecture.md](architecture.md) — system design: graph model, transport protocol, relay design, codec layer, discovery, multi-site plan, device resilience, X11 integration - [architecture.md](architecture.md) — system design: graph model, transport protocol, relay design, codec layer, discovery, multi-site plan, device resilience, X11 integration
- [docs/protocol.md](docs/protocol.md) — wire protocol reference: frame format, all message types, payload schemas, stream lifecycle, discovery
- [planning.md](planning.md) — module build order and current status - [planning.md](planning.md) — module build order and current status
- [conventions.md](conventions.md) — C code and project conventions - [conventions.md](conventions.md) — C code and project conventions

View File

@@ -386,11 +386,11 @@ Scale and crop are applied at render time — the incoming frame is stretched or
Audio streams are not in scope for the initial implementation but the transport is designed to accommodate them without structural changes. Audio streams are not in scope for the initial implementation but the transport is designed to accommodate them without structural changes.
The `channel_id` field already provides stream multiplexing on a single connection. A future audio channel is just another channel on an existing transport connection — no new connection type is needed. The message type table has room for an `audio_frame` type alongside `video_frame`. A future audio stream is just another message type on an existing transport connection — no new connection type or header field is needed. `stream_id` in the payload already handles multiplexing. The message type table has room for an `audio_frame` type alongside `video_frame`.
The main open question is codec and container: raw PCM is trivial to handle but large; compressed formats (Opus, AAC) need framing conventions. This is deferred until video is solid. The main open question is codec and container: raw PCM is trivial to handle but large; compressed formats (Opus, AAC) need framing conventions. This is deferred until video is solid.
The frame allocator, relay, and archive modules should not make assumptions that `channel_id` implies video — they operate on opaque byte payloads with a message type and length, so audio frames will pass through the same infrastructure unchanged. The frame allocator, relay, and archive modules should not assume that a frame implies video — they operate on opaque byte payloads with a message type and length, so audio frames will pass through the same infrastructure unchanged.
--- ---

262
docs/protocol.md Normal file
View File

@@ -0,0 +1,262 @@
# Protocol Reference
This document describes the full wire protocol used between nodes. All multi-byte integers are **little-endian**. Serialization primitives (`put_u16`, `get_u32`, etc.) are defined in [`serial.h`](../../include/serial.h).
---
## Layers
```mermaid
flowchart TD
A[TCP connection] --> B[Transport layer<br>frame header + length-prefixed payload]
B --> C[Message type dispatcher<br>routes payload to handler]
C --> D1[Video frame handler<br>stream_id + compressed data]
C --> D2[Control request/response handler<br>request_id + command fields]
C --> D3[Stream event handler<br>stream_id + event_code]
```
- **TCP** — reliable byte stream; provides ordering and delivery but no message boundaries
- **Transport layer** — adds a 6-byte header to every message, giving a length so any node can skip or forward unknown messages
- **Message type dispatcher** — reads `message_type` and routes the payload to the appropriate handler
- **Handlers** — interpret the payload according to their own schema; the transport layer has no knowledge of handler internals
---
## Transport Layer
### Frame Format
Every message on the wire is a frame:
```
+------------------+------------------------+--------------------+
| message_type: u16 | payload_length: u32 | payload: bytes ... |
+------------------+------------------------+--------------------+
2 bytes 4 bytes payload_length bytes
```
Total header size: **6 bytes**.
`payload_length` is the byte count of the payload only — it does not include the 6-byte header itself.
A node that does not recognise `message_type` can skip the frame by consuming exactly `payload_length` bytes and discarding them. This allows relays and future nodes to forward or ignore unknown message types without understanding their structure.
### Byte Order
All fields are little-endian. This applies to the header and to all fields within payloads.
### Connection Model
Each TCP connection carries a single logical channel between two peers. Multiple streams (video, control, events) are multiplexed on the same connection by `message_type` and by stream/request identifiers inside payloads. There is no connection-level stream identifier in the header — that information belongs to the payload.
---
## Message Types
| Value | Name | Description |
|---|---|---|
| `0x0001` | `VIDEO_FRAME` | One compressed video frame for a stream |
| `0x0002` | `CONTROL_REQUEST` | Request from one node to another |
| `0x0003` | `CONTROL_RESPONSE` | Response to a prior control request |
| `0x0004` | `STREAM_EVENT` | Lifecycle signal for a stream (interrupted, resumed) |
| `0x0010` | `DISCOVERY_ANNOUNCE` | UDP multicast node announcement (see [Discovery](#discovery)) |
Values not listed are reserved. A node receiving an unknown type must skip the payload (`payload_length` bytes) and continue reading.
---
## Payload Schemas
### `VIDEO_FRAME` (0x0001)
```
+---------------+----------------------------------------------+
| stream_id: u16 | frame data (compressed, codec per stream_open) |
+---------------+----------------------------------------------+
2 bytes payload_length - 2 bytes
```
`stream_id` identifies which video stream this frame belongs to. The codec is established at stream open time (see [Stream Lifecycle](#stream-lifecycle)) and does not appear in every frame.
### `CONTROL_REQUEST` (0x0002)
```
+------------------+----------------+---------------------------+
| request_id: u16 | command: u16 | command-specific fields |
+------------------+----------------+---------------------------+
2 bytes 2 bytes remaining bytes
```
`request_id` is chosen by the sender and echoed in the matching `CONTROL_RESPONSE`. It is used to correlate responses to requests when multiple requests are in flight simultaneously.
`command` values:
| Value | Command | Description |
|---|---|---|
| `0x0001` | `STREAM_OPEN` | Open a new video stream on this connection |
| `0x0002` | `STREAM_CLOSE` | Close a video stream |
| `0x0003` | `ENUM_DEVICES` | List V4L2 devices on the remote node |
| `0x0004` | `ENUM_CONTROLS` | List V4L2 controls for a device |
| `0x0005` | `GET_CONTROL` | Get a V4L2 control value |
| `0x0006` | `SET_CONTROL` | Set a V4L2 control value |
| `0x0007` | `ENUM_MONITORS` | List X11 monitors (XRandR) on the remote node |
### `CONTROL_RESPONSE` (0x0003)
```
+------------------+---------------+---------------------------+
| request_id: u16 | status: u16 | response-specific fields |
+------------------+---------------+---------------------------+
2 bytes 2 bytes remaining bytes
```
`request_id` matches the originating request. `status` values:
| Value | Meaning |
|---|---|
| `0x0000` | OK |
| `0x0001` | Error — generic failure |
| `0x0002` | Error — unknown command |
| `0x0003` | Error — invalid parameters |
| `0x0004` | Error — resource not found |
### `STREAM_EVENT` (0x0004)
```
+---------------+------------------+---------------------------+
| stream_id: u16 | event_code: u8 | event-specific fields |
+---------------+------------------+---------------------------+
2 bytes 1 byte remaining bytes
```
`event_code` values:
| Value | Name | Meaning |
|---|---|---|
| `0x01` | `STREAM_INTERRUPTED` | Device lost; frames will stop. Receiver should reset parser state and discard any partial frame. |
| `0x02` | `STREAM_RESUMED` | Device recovered; a clean frame follows. |
---
## Stream Lifecycle
Before video frames can flow, a stream must be opened. This establishes the codec and pixel format so receivers can decode frames without per-frame metadata.
### Opening a Stream (`STREAM_OPEN` request)
The sender issues a `CONTROL_REQUEST` with command `STREAM_OPEN`:
```
+------------------+------------------+-----------------+------------------+-------------------+----------------+
| request_id: u16 | command: u16 | stream_id: u16 | format: u16 | pixel_format: u16 | origin: u16 |
+------------------+------------------+-----------------+------------------+-------------------+----------------+
```
| Field | Description |
|---|---|
| `stream_id` | Chosen by the sender; identifies this stream on subsequent `VIDEO_FRAME` and `STREAM_EVENT` messages |
| `format` | Wire format of the compressed frame data (see [Codec Formats](#codec-formats)) |
| `pixel_format` | Pixel layout for raw formats; zero for compressed formats (see [Pixel Formats](#pixel-formats)) |
| `origin` | How the frames were produced; informational only, does not affect decoding (see [Origins](#origins)) |
The receiver responds with `CONTROL_RESPONSE`. On `status = OK` the stream is open and `VIDEO_FRAME` messages with that `stream_id` may follow immediately.
### Closing a Stream (`STREAM_CLOSE` request)
```
+------------------+------------------+-----------------+
| request_id: u16 | command: u16 | stream_id: u16 |
+------------------+------------------+-----------------+
```
After a close response, no further `VIDEO_FRAME` messages should be sent for that `stream_id`.
---
## Codec Formats
Carried in the `format` field of `STREAM_OPEN`.
| Value | Format |
|---|---|
| `0x0001` | MJPEG |
| `0x0002` | H.264 |
| `0x0003` | H.265 / HEVC |
| `0x0004` | AV1 |
| `0x0005` | FFV1 |
| `0x0006` | ProRes |
| `0x0007` | QOI |
| `0x0008` | Raw pixels (requires `pixel_format`) |
| `0x0009` | Raw pixels + ZSTD (requires `pixel_format`) |
## Pixel Formats
Carried in the `pixel_format` field of `STREAM_OPEN`. Zero and ignored for compressed formats.
| Value | Layout |
|---|---|
| `0x0001` | BGRA 8:8:8:8 |
| `0x0002` | RGBA 8:8:8:8 |
| `0x0003` | BGR 8:8:8 |
| `0x0004` | YUV 4:2:0 planar |
| `0x0005` | YUV 4:2:2 packed |
## Origins
Carried in the `origin` field of `STREAM_OPEN`. Informational — does not affect decoding.
| Value | Origin |
|---|---|
| `0x0001` | Device native (camera or capture card encoded it directly) |
| `0x0002` | libjpeg-turbo |
| `0x0003` | ffmpeg (libavcodec) |
| `0x0004` | ffmpeg (subprocess) |
| `0x0005` | VA-API direct |
| `0x0006` | NVENC direct |
| `0x0007` | Software (other) |
---
## Discovery
Node discovery uses UDP multicast. The wire format is a standard transport frame (same 6-byte header) sent to the multicast group rather than a TCP peer.
### Announcement Frame (`DISCOVERY_ANNOUNCE`, 0x0010)
Sent periodically by every node and immediately on startup.
```
+----------------------+------------+-----------------+------------------+-------------------+------------------+
| protocol_version: u8 | site_id: u16 | tcp_port: u16 | function_flags: u16 | name_len: u8 | name: bytes |
+----------------------+------------+-----------------+------------------+-------------------+------------------+
```
| Field | Description |
|---|---|
| `protocol_version` | Wire format version; currently `1` |
| `site_id` | Site this node belongs to; `0` = local / unassigned |
| `tcp_port` | Port where this node accepts transport connections |
| `function_flags` | Bitfield of node capabilities (see below) |
| `name_len` | Byte length of the following name string |
| `name` | Node name in `namespace:instance` form, e.g. `v4l2:microscope` |
`function_flags` bits:
| Bit | Mask | Role |
|---|---|---|
| 0 | `0x0001` | Source — produces video |
| 1 | `0x0002` | Relay — receives and distributes streams |
| 2 | `0x0004` | Sink — consumes video |
| 3 | `0x0008` | Controller — has a user-facing control interface |
A node may set multiple bits.
### Multicast Parameters
| Parameter | Value |
|---|---|
| Group | `224.0.0.251` |
| Port | `5353` |
| TTL | `1` (LAN only) |
No Avahi or Bonjour dependency — nodes open a raw UDP multicast socket directly using standard POSIX APIs.