From ffaa66ab962fe5adc15ae5f45a75ee914d2cda42 Mon Sep 17 00:00:00 2001 From: mikael-lovqvists-claude-agent Date: Wed, 25 Mar 2026 22:49:57 +0000 Subject: [PATCH] Redesign stream metadata: separate format, pixel_format, and origin MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit format (u16): what the bytes are — drives decode, stable across encoder changes pixel_format (u16): layout for raw formats, ignored otherwise origin (u16): how it was produced — informational only, no effect on decode Eliminates numerical range assumptions (0x01xx ffmpeg range). A camera outputting MJPEG natively and libjpeg-turbo encoding MJPEG are the same format with different origins; receiver handles both identically. Co-Authored-By: Claude Sonnet 4.6 --- architecture.md | 51 ++++++++++++++++++++++++++++++++++++------------- 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/architecture.md b/architecture.md index 823fb2c..d0fa96f 100644 --- a/architecture.md +++ b/architecture.md @@ -246,22 +246,47 @@ graph TD A `codec` module provides per-frame encode and decode operations for pixel data. It sits between raw pixel buffers and the transport — sources call encode before sending, sinks call decode after receiving. The relay and transport layers never need to understand pixel formats; they carry opaque payloads. -### Codec Identification +### Stream Metadata -Receivers must know what format a frame payload is in. This is communicated at stream setup time via a control message that associates a `channel_id` with a codec identifier, rather than tagging every frame header. The codec identifier is a `u16`: +Receivers must know what format a frame payload is in before they can decode it. This is communicated once at stream setup via a `stream_open` control message rather than tagging every frame header. The message carries three fields: -| Value | Codec | +**`format` (u16)** — the wire format of the payload bytes; determines how the receiver decodes the frame: + +| Value | Format | |---|---| -| `0x0001` | MJPEG — same format as V4L2 hardware-encoded output; libjpeg-turbo on the encode side | -| `0x0002` | QOI — lossless, single-header implementation, fast; good for screen content | -| `0x0003` | Raw pixels + ZSTD — lossless; raw BGRA/RGBA compressed with ZSTD at a low level | -| `0x0004` | H.264 intra — single I-frames via VA-API hardware encode; high compression, GPU required | -| `0x0100` | H.265 / HEVC — via ffmpeg (libavcodec or subprocess); hardware or software encode | -| `0x0101` | AV1 — via ffmpeg; best compression, hardware encode on modern GPUs | -| `0x0102` | FFV1 — via ffmpeg; lossless archival format | -| `0x0103` | ProRes — via ffmpeg; near-lossless, post-production compatible | +| `0x0001` | MJPEG | +| `0x0002` | H.264 | +| `0x0003` | H.265 / HEVC | +| `0x0004` | AV1 | +| `0x0005` | FFV1 | +| `0x0006` | ProRes | +| `0x0007` | QOI | +| `0x0008` | Raw pixels (see `pixel_format`) | +| `0x0009` | Raw pixels + ZSTD (see `pixel_format`) | -V4L2 camera streams typically arrive pre-encoded as MJPEG from hardware; no encode step is needed on that path. The `0x01xx` range is reserved for ffmpeg-backed formats; the receiver cares only about the wire format, not which encoder produced it. +**`pixel_format` (u16)** — pixel layout for raw formats; zero and ignored for compressed formats: + +| Value | Layout | +|---|---| +| `0x0001` | BGRA 8:8:8:8 | +| `0x0002` | RGBA 8:8:8:8 | +| `0x0003` | BGR 8:8:8 | +| `0x0004` | YUV 4:2:0 planar | +| `0x0005` | YUV 4:2:2 packed | + +**`origin` (u16)** — how the frame was produced; informational only, does not affect decoding; useful for diagnostics, quality inference, and routing decisions: + +| Value | Origin | +|---|---| +| `0x0001` | Device native — camera or capture card encoded it directly | +| `0x0002` | libjpeg-turbo | +| `0x0003` | ffmpeg (libavcodec) | +| `0x0004` | ffmpeg (subprocess) | +| `0x0005` | VA-API direct | +| `0x0006` | NVENC direct | +| `0x0007` | Software (other) | + +A V4L2 camera outputting MJPEG has `format=MJPEG, origin=device_native`. The same format re-encoded in process has `format=MJPEG, origin=libjpeg-turbo`. The receiver decodes both identically; the distinction is available for logging and diagnostics without polluting the format identifier. ### Format Negotiation @@ -303,7 +328,7 @@ The subprocess approach fits naturally into the completeness output path of the | FFV1 | Lossless, designed for archival; good compression for video content; the format used by film archives | | ProRes | Near-lossless, widely accepted in post-production toolchains; large files but easy to edit downstream | -The codec identifier table uses the `0x01xx` range for ffmpeg-backed formats to distinguish them from native implementations. The actual format is fixed at stream open time via `stream_open` — the receiver does not need to know whether the encoder is libavcodec or a native implementation, only what the wire format is. +The encoder backend is recorded in the `origin` field of `stream_open` — the receiver cares only about `format`, not how the bytes were produced. Switching from a subprocess encode to libavcodec, or from software to hardware, requires no protocol change. ---