mikael-lovqvists-claude-agent/video-setup

Files

mikael-lovqvists-claude-agent 4e40223478 docs: split architecture.md into focused sub-documents

architecture.md is now a concise overview (~155 lines) with a
Documentation section linking to all sub-docs.

New sub-docs in docs/:
  transport.md        — wire modes, frame header, serialization, web peer
  relay.md            — delivery modes, memory model, congestion, scheduler
  codec.md            — stream metadata, format negotiation, codec backends
  xorg.md             — screen grab, viewer sink, render loop, overlays
  discovery.md        — multicast announcements, multi-site, site gateways
  node-state.md       — wanted/current state, reconciler, stats, queries
  device-resilience.md — device loss handling, stream events, audio (future)

All cross-references updated to file links. Every sub-doc links back
to architecture.md. docs/transport.md links to docs/protocol.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-28 23:23:54 +00:00

5.8 KiB

Raw Blame History

Codec Module

See Architecture Overview.

A codec module provides per-frame encode and decode operations for pixel data. It sits between raw pixel buffers and the transport — sources call encode before sending, sinks call decode after receiving. The relay and transport layers never need to understand pixel formats; they carry opaque payloads.

Stream Metadata

Receivers must know what format a frame payload is in before they can decode it. This is communicated once at stream setup via a stream_open control message rather than tagging every frame header. The message carries three fields:

format (u16) — the wire format of the payload bytes; determines how the receiver decodes the frame:

Value	Format
`0x0001`	MJPEG
`0x0002`	H.264
`0x0003`	H.265 / HEVC
`0x0004`	AV1
`0x0005`	FFV1
`0x0006`	ProRes
`0x0007`	QOI
`0x0008`	Raw pixels (see `pixel_format`)
`0x0009`	Raw pixels + ZSTD (see `pixel_format`)

pixel_format (u16) — pixel layout for raw formats; zero and ignored for compressed formats:

Value	Layout
`0x0001`	BGRA 8:8:8:8
`0x0002`	RGBA 8:8:8:8
`0x0003`	BGR 8:8:8
`0x0004`	YUV 4:2:0 planar
`0x0005`	YUV 4:2:2 packed

origin (u16) — how the frame was produced; informational only, does not affect decoding; useful for diagnostics, quality inference, and routing decisions:

Value	Origin
`0x0001`	Device native — camera or capture card encoded it directly
`0x0002`	libjpeg-turbo
`0x0003`	ffmpeg (libavcodec)
`0x0004`	ffmpeg (subprocess)
`0x0005`	VA-API direct
`0x0006`	NVENC direct
`0x0007`	Software (other)

A V4L2 camera outputting MJPEG has format=MJPEG, origin=device_native. The same format re-encoded in process has format=MJPEG, origin=libjpeg-turbo. The receiver decodes both identically; the distinction is available for logging and diagnostics without polluting the format identifier.

Format Negotiation

When a source node opens a stream channel it sends a stream_open control message that includes the codec identifier. The receiver can reject the codec if it has no decoder for it. This keeps codec knowledge at the edges — relay nodes are unaffected.

libjpeg-turbo

JPEG is the natural first codec: libjpeg-turbo provides SIMD-accelerated encode on both x86 and ARM, the output format is identical to what V4L2 cameras already produce (so the ingest and archive paths treat them the same), and it is universally decodable including in browsers via <img> or createImageBitmap. Lossy, but quality is configurable.

QOI

QOI (Quite OK Image Format) is a strong candidate for lossless screen grabs: it encodes and decodes in a single pass with no external dependencies, performs well on content with large uniform regions (UIs, text, diagrams), and the reference implementation is a single .h file. Output is larger than JPEG but decode is simpler and there is no quality loss. Worth benchmarking against JPEG at high quality settings for screen content.

ZSTD over Raw Pixels

ZSTD at compression level 1 is extremely fast and can achieve meaningful ratios on screen content (which tends to be repetitive). No pixel format conversion is needed — capture raw, compress raw, decompress raw, display raw. This avoids any colour space or chroma subsampling decisions and is entirely lossless. The downside is that even compressed, the payload is larger than JPEG for photographic content; for UI-heavy screens it can be competitive.

VA-API (Hardware H.264 Intra)

Intra-only H.264 via VA-API gives very high compression with GPU offload. This is the most complex option to set up and introduces a GPU dependency, but may be worthwhile for high-resolution grabs over constrained links. Deferred until simpler codecs are validated.

ffmpeg Backend

ffmpeg (via libavcodec or subprocess) is a practical escape hatch that gives access to a large number of codecs, container formats, and hardware acceleration paths without implementing them from scratch. It is particularly useful for archival formats where the encode latency of a more complex codec is acceptable.

Integration options:

libavcodec — link directly against the library; programmatic API, tight integration, same process; introduces a large build dependency but gives full control over codec parameters and hardware acceleration (NVENC, VA-API, VideoToolbox, etc.)
subprocess pipe — spawn ffmpeg, pipe raw frames to stdin, read encoded output from stdout; simpler, no build dependency, more isolated from the rest of the node process; latency is higher due to process overhead but acceptable for archival paths where real-time delivery is not required

The subprocess approach fits naturally into the completeness output path of the relay: frames arrive in order, there is no real-time drop pressure, and the ffmpeg process can be restarted independently if it crashes without taking down the node. libavcodec is the better fit for low-latency encoding (e.g. screen grab over a constrained link).

Archival formats of interest:

Format	Notes
H.265 / HEVC	~50% better compression than H.264 at same quality; NVENC and VA-API hardware support widely available
AV1	Best open-format compression; software encode is slow, hardware encode (AV1 NVENC on RTX 30+) is fast
FFV1	Lossless, designed for archival; good compression for video content; the format used by film archives
ProRes	Near-lossless, widely accepted in post-production toolchains; large files but easy to edit downstream

The encoder backend is recorded in the origin field of stream_open — the receiver cares only about format, not how the bytes were produced. Switching from a subprocess encode to libavcodec, or from software to hardware, requires no protocol change.

5.8 KiB Raw Blame History