Redesign stream metadata: separate format, pixel_format, and origin
format (u16): what the bytes are — drives decode, stable across encoder changes pixel_format (u16): layout for raw formats, ignored otherwise origin (u16): how it was produced — informational only, no effect on decode Eliminates numerical range assumptions (0x01xx ffmpeg range). A camera outputting MJPEG natively and libjpeg-turbo encoding MJPEG are the same format with different origins; receiver handles both identically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -246,22 +246,47 @@ graph TD
|
||||
|
||||
A `codec` module provides per-frame encode and decode operations for pixel data. It sits between raw pixel buffers and the transport — sources call encode before sending, sinks call decode after receiving. The relay and transport layers never need to understand pixel formats; they carry opaque payloads.
|
||||
|
||||
### Codec Identification
|
||||
### Stream Metadata
|
||||
|
||||
Receivers must know what format a frame payload is in. This is communicated at stream setup time via a control message that associates a `channel_id` with a codec identifier, rather than tagging every frame header. The codec identifier is a `u16`:
|
||||
Receivers must know what format a frame payload is in before they can decode it. This is communicated once at stream setup via a `stream_open` control message rather than tagging every frame header. The message carries three fields:
|
||||
|
||||
| Value | Codec |
|
||||
**`format` (u16)** — the wire format of the payload bytes; determines how the receiver decodes the frame:
|
||||
|
||||
| Value | Format |
|
||||
|---|---|
|
||||
| `0x0001` | MJPEG — same format as V4L2 hardware-encoded output; libjpeg-turbo on the encode side |
|
||||
| `0x0002` | QOI — lossless, single-header implementation, fast; good for screen content |
|
||||
| `0x0003` | Raw pixels + ZSTD — lossless; raw BGRA/RGBA compressed with ZSTD at a low level |
|
||||
| `0x0004` | H.264 intra — single I-frames via VA-API hardware encode; high compression, GPU required |
|
||||
| `0x0100` | H.265 / HEVC — via ffmpeg (libavcodec or subprocess); hardware or software encode |
|
||||
| `0x0101` | AV1 — via ffmpeg; best compression, hardware encode on modern GPUs |
|
||||
| `0x0102` | FFV1 — via ffmpeg; lossless archival format |
|
||||
| `0x0103` | ProRes — via ffmpeg; near-lossless, post-production compatible |
|
||||
| `0x0001` | MJPEG |
|
||||
| `0x0002` | H.264 |
|
||||
| `0x0003` | H.265 / HEVC |
|
||||
| `0x0004` | AV1 |
|
||||
| `0x0005` | FFV1 |
|
||||
| `0x0006` | ProRes |
|
||||
| `0x0007` | QOI |
|
||||
| `0x0008` | Raw pixels (see `pixel_format`) |
|
||||
| `0x0009` | Raw pixels + ZSTD (see `pixel_format`) |
|
||||
|
||||
V4L2 camera streams typically arrive pre-encoded as MJPEG from hardware; no encode step is needed on that path. The `0x01xx` range is reserved for ffmpeg-backed formats; the receiver cares only about the wire format, not which encoder produced it.
|
||||
**`pixel_format` (u16)** — pixel layout for raw formats; zero and ignored for compressed formats:
|
||||
|
||||
| Value | Layout |
|
||||
|---|---|
|
||||
| `0x0001` | BGRA 8:8:8:8 |
|
||||
| `0x0002` | RGBA 8:8:8:8 |
|
||||
| `0x0003` | BGR 8:8:8 |
|
||||
| `0x0004` | YUV 4:2:0 planar |
|
||||
| `0x0005` | YUV 4:2:2 packed |
|
||||
|
||||
**`origin` (u16)** — how the frame was produced; informational only, does not affect decoding; useful for diagnostics, quality inference, and routing decisions:
|
||||
|
||||
| Value | Origin |
|
||||
|---|---|
|
||||
| `0x0001` | Device native — camera or capture card encoded it directly |
|
||||
| `0x0002` | libjpeg-turbo |
|
||||
| `0x0003` | ffmpeg (libavcodec) |
|
||||
| `0x0004` | ffmpeg (subprocess) |
|
||||
| `0x0005` | VA-API direct |
|
||||
| `0x0006` | NVENC direct |
|
||||
| `0x0007` | Software (other) |
|
||||
|
||||
A V4L2 camera outputting MJPEG has `format=MJPEG, origin=device_native`. The same format re-encoded in process has `format=MJPEG, origin=libjpeg-turbo`. The receiver decodes both identically; the distinction is available for logging and diagnostics without polluting the format identifier.
|
||||
|
||||
### Format Negotiation
|
||||
|
||||
@@ -303,7 +328,7 @@ The subprocess approach fits naturally into the completeness output path of the
|
||||
| FFV1 | Lossless, designed for archival; good compression for video content; the format used by film archives |
|
||||
| ProRes | Near-lossless, widely accepted in post-production toolchains; large files but easy to edit downstream |
|
||||
|
||||
The codec identifier table uses the `0x01xx` range for ffmpeg-backed formats to distinguish them from native implementations. The actual format is fixed at stream open time via `stream_open` — the receiver does not need to know whether the encoder is libavcodec or a native implementation, only what the wire format is.
|
||||
The encoder backend is recorded in the `origin` field of `stream_open` — the receiver cares only about `format`, not how the bytes were produced. Switching from a subprocess encode to libavcodec, or from software to hardware, requires no protocol change.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user