# Codec Module

See [Architecture Overview](../architecture.md).

A `codec` module provides per-frame encode and decode operations for pixel data. It sits between raw pixel buffers and the transport — sources call encode before sending, sinks call decode after receiving. The relay and transport layers never need to understand pixel formats; they carry opaque payloads.

## Stream Metadata

Receivers must know what format a frame payload is in before they can decode it. This is communicated once at stream setup via a `stream_open` control message rather than tagging every frame header. The message carries three fields:

**`format` (u16)** — the wire format of the payload bytes; determines how the receiver decodes the frame:

| Value | Format |
|---|---|
| `0x0001` | MJPEG |
| `0x0002` | H.264 |
| `0x0003` | H.265 / HEVC |
| `0x0004` | AV1 |
| `0x0005` | FFV1 |
| `0x0006` | ProRes |
| `0x0007` | QOI |
| `0x0008` | Raw pixels (see `pixel_format`) |
| `0x0009` | Raw pixels + ZSTD (see `pixel_format`) |

**`pixel_format` (u16)** — pixel layout for raw formats; zero and ignored for compressed formats:

| Value | Layout |
|---|---|
| `0x0001` | BGRA 8:8:8:8 |
| `0x0002` | RGBA 8:8:8:8 |
| `0x0003` | BGR 8:8:8 |
| `0x0004` | YUV 4:2:0 planar |
| `0x0005` | YUV 4:2:2 packed |

**`origin` (u16)** — how the frame was produced; informational only, does not affect decoding; useful for diagnostics, quality inference, and routing decisions:

| Value | Origin |
|---|---|
| `0x0001` | Device native — camera or capture card encoded it directly |
| `0x0002` | libjpeg-turbo |
| `0x0003` | ffmpeg (libavcodec) |
| `0x0004` | ffmpeg (subprocess) |
| `0x0005` | VA-API direct |
| `0x0006` | NVENC direct |
| `0x0007` | Software (other) |

A V4L2 camera outputting MJPEG has `format=MJPEG, origin=device_native`. The same format re-encoded in process has `format=MJPEG, origin=libjpeg-turbo`. The receiver decodes both identically; the distinction is available for logging and diagnostics without polluting the format identifier.

## Format Negotiation

When a source node opens a stream channel it sends a `stream_open` control message that includes the codec identifier. The receiver can reject the codec if it has no decoder for it. This keeps codec knowledge at the edges — relay nodes are unaffected.

## libjpeg-turbo

JPEG is the natural first codec: libjpeg-turbo provides SIMD-accelerated encode on both x86 and ARM, the output format is identical to what V4L2 cameras already produce (so the ingest and archive paths treat them the same), and it is universally decodable including in browsers via `<img>` or `createImageBitmap`. Lossy, but quality is configurable.

## QOI

QOI (Quite OK Image Format) is a strong candidate for lossless screen grabs: it encodes and decodes in a single pass with no external dependencies, performs well on content with large uniform regions (UIs, text, diagrams), and the reference implementation is a single `.h` file. Output is larger than JPEG but decode is simpler and there is no quality loss. Worth benchmarking against JPEG at high quality settings for screen content.

## ZSTD over Raw Pixels

ZSTD at compression level 1 is extremely fast and can achieve meaningful ratios on screen content (which tends to be repetitive). No pixel format conversion is needed — capture raw, compress raw, decompress raw, display raw. This avoids any colour space or chroma subsampling decisions and is entirely lossless. The downside is that even compressed, the payload is larger than JPEG for photographic content; for UI-heavy screens it can be competitive.

## VA-API (Hardware H.264 Intra)

Intra-only H.264 via VA-API gives very high compression with GPU offload. This is the most complex option to set up and introduces a GPU dependency, but may be worthwhile for high-resolution grabs over constrained links. Deferred until simpler codecs are validated.

## ffmpeg Backend

ffmpeg (via libavcodec or subprocess) is a practical escape hatch that gives access to a large number of codecs, container formats, and hardware acceleration paths without implementing them from scratch. It is particularly useful for archival formats where the encode latency of a more complex codec is acceptable.

**Integration options:**

- **libavcodec** — link directly against the library; programmatic API, tight integration, same process; introduces a large build dependency but gives full control over codec parameters and hardware acceleration (NVENC, VA-API, VideoToolbox, etc.)
- **subprocess pipe** — spawn `ffmpeg`, pipe raw frames to stdin, read encoded output from stdout; simpler, no build dependency, more isolated from the rest of the node process; latency is higher due to process overhead but acceptable for archival paths where real-time delivery is not required

The subprocess approach fits naturally into the completeness output path of the relay: frames arrive in order, there is no real-time drop pressure, and the ffmpeg process can be restarted independently if it crashes without taking down the node. libavcodec is the better fit for low-latency encoding (e.g. screen grab over a constrained link).

**Archival formats of interest:**

| Format | Notes |
|---|---|
| H.265 / HEVC | ~50% better compression than H.264 at same quality; NVENC and VA-API hardware support widely available |
| AV1 | Best open-format compression; software encode is slow, hardware encode (AV1 NVENC on RTX 30+) is fast |
| FFV1 | Lossless, designed for archival; good compression for video content; the format used by film archives |
| ProRes | Near-lossless, widely accepted in post-production toolchains; large files but easy to edit downstream |

The encoder backend is recorded in the `origin` field of `stream_open` — the receiver cares only about `format`, not how the bytes were produced. Switching from a subprocess encode to libavcodec, or from software to hardware, requires no protocol change.