Redesign stream metadata: separate format, pixel_format, and origin

format (u16): what the bytes are — drives decode, stable across encoder changes pixel_format (u16): layout for raw formats, ignored otherwise origin (u16): how it was produced — informational only, no effect on decode Eliminates numerical range assumptions (0x01xx ffmpeg range). A camera outputting MJPEG natively and libjpeg-turbo encoding MJPEG are the same format with different origins; receiver handles both identically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 22:49:57 +00:00
parent 8260d456aa
commit ffaa66ab96
1 changed files with 38 additions and 13 deletions
--- a/architecture.md
+++ b/architecture.md
@@ -246,22 +246,47 @@ graph TD

 A `codec` module provides per-frame encode and decode operations for pixel data. It sits between raw pixel buffers and the transport — sources call encode before sending, sinks call decode after receiving. The relay and transport layers never need to understand pixel formats; they carry opaque payloads.

-### Codec Identification
+### Stream Metadata

-Receivers must know what format a frame payload is in. This is communicated at stream setup time via a control message that associates a `channel_id` with a codec identifier, rather than tagging every frame header. The codec identifier is a `u16`:
+Receivers must know what format a frame payload is in before they can decode it. This is communicated once at stream setup via a `stream_open` control message rather than tagging every frame header. The message carries three fields:

-| Value | Codec |
+**`format` (u16)** — the wire format of the payload bytes; determines how the receiver decodes the frame:
+
+| Value | Format |
 |---|---|
-| `0x0001` | MJPEG — same format as V4L2 hardware-encoded output; libjpeg-turbo on the encode side |
-| `0x0002` | QOI — lossless, single-header implementation, fast; good for screen content |
-| `0x0003` | Raw pixels + ZSTD — lossless; raw BGRA/RGBA compressed with ZSTD at a low level |
-| `0x0004` | H.264 intra — single I-frames via VA-API hardware encode; high compression, GPU required |
-| `0x0100` | H.265 / HEVC — via ffmpeg (libavcodec or subprocess); hardware or software encode |
-| `0x0101` | AV1 — via ffmpeg; best compression, hardware encode on modern GPUs |
-| `0x0102` | FFV1 — via ffmpeg; lossless archival format |
-| `0x0103` | ProRes — via ffmpeg; near-lossless, post-production compatible |
+| `0x0001` | MJPEG |
+| `0x0002` | H.264 |
+| `0x0003` | H.265 / HEVC |
+| `0x0004` | AV1 |
+| `0x0005` | FFV1 |
+| `0x0006` | ProRes |
+| `0x0007` | QOI |
+| `0x0008` | Raw pixels (see `pixel_format`) |
+| `0x0009` | Raw pixels + ZSTD (see `pixel_format`) |

-V4L2 camera streams typically arrive pre-encoded as MJPEG from hardware; no encode step is needed on that path. The `0x01xx` range is reserved for ffmpeg-backed formats; the receiver cares only about the wire format, not which encoder produced it.
+**`pixel_format` (u16)** — pixel layout for raw formats; zero and ignored for compressed formats:
+
+| Value | Layout |
+|---|---|
+| `0x0001` | BGRA 8:8:8:8 |
+| `0x0002` | RGBA 8:8:8:8 |
+| `0x0003` | BGR 8:8:8 |
+| `0x0004` | YUV 4:2:0 planar |
+| `0x0005` | YUV 4:2:2 packed |
+
+**`origin` (u16)** — how the frame was produced; informational only, does not affect decoding; useful for diagnostics, quality inference, and routing decisions:
+
+| Value | Origin |
+|---|---|
+| `0x0001` | Device native — camera or capture card encoded it directly |
+| `0x0002` | libjpeg-turbo |
+| `0x0003` | ffmpeg (libavcodec) |
+| `0x0004` | ffmpeg (subprocess) |
+| `0x0005` | VA-API direct |
+| `0x0006` | NVENC direct |
+| `0x0007` | Software (other) |
+
+A V4L2 camera outputting MJPEG has `format=MJPEG, origin=device_native`. The same format re-encoded in process has `format=MJPEG, origin=libjpeg-turbo`. The receiver decodes both identically; the distinction is available for logging and diagnostics without polluting the format identifier.

 ### Format Negotiation

@@ -303,7 +328,7 @@ The subprocess approach fits naturally into the completeness output path of the
 | FFV1 | Lossless, designed for archival; good compression for video content; the format used by film archives |
 | ProRes | Near-lossless, widely accepted in post-production toolchains; large files but easy to edit downstream |

-The codec identifier table uses the `0x01xx` range for ffmpeg-backed formats to distinguish them from native implementations. The actual format is fixed at stream open time via `stream_open` — the receiver does not need to know whether the encoder is libavcodec or a native implementation, only what the wire format is.
+The encoder backend is recorded in the `origin` field of `stream_open` — the receiver cares only about `format`, not how the bytes were produced. Switching from a subprocess encode to libavcodec, or from software to hardware, requires no protocol change.

 ---