Add xorg module plan and audio forward-compatibility note

xorg module: XRandR geometry queries, screen grab source (XShmGetImage),
frame viewer sink (XShmPutImage, fullscreen per monitor). All exposed as
standard source/sink node roles on the existing transport.

Audio: deferred but transport is already compatible — channel_id mux,
audio_frame message type slot reserved, relay/allocator are payload-agnostic.

Also marks serial as done in planning.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-25 22:42:19 +00:00
parent c58c211fee
commit 5cea34caf5
2 changed files with 52 additions and 4 deletions

View File

@@ -35,9 +35,9 @@ Node types:
| Type | Role |
|---|---|
| **Source** | Produces video — V4L2 camera, file, test signal |
| **Source** | Produces video — V4L2 camera, screen grab, file, test signal |
| **Relay** | Receives one or more input streams and distributes to one or more outputs, each with its own delivery mode and buffer; never blocks upstream |
| **Sink** | Consumes video — display, archiver, encoder output |
| **Sink** | Consumes video — display window, archiver, encoder output |
A relay with multiple inputs is what would traditionally be called a mux — it combines streams from several sources and forwards them, possibly over a single transport. The dispatch and buffering logic is the same regardless of input count.
@@ -242,6 +242,53 @@ graph TD
---
## X11 / Xorg Integration
An `xorg` module provides two capabilities that complement the V4L2 camera pipeline: screen geometry queries and an X11-based video feed viewer. Both operate as first-class node roles.
### Screen Geometry Queries (XRandR)
Using the XRandR extension, the module can enumerate connected outputs and retrieve their geometry — resolution, position within the desktop coordinate space, physical size, and refresh rate. This is useful for:
- **Routing decisions**: knowing the resolution of the target display before deciding how to scale or crop an incoming stream
- **Screen grab source**: determining the exact rectangle to capture for a given monitor
- **Multi-monitor layouts**: placing viewer windows correctly in a multi-head setup without guessing offsets
Queries are exposed as control request/response pairs on the standard transport, so a remote node can ask "what monitors does this machine have?" and receive structured geometry data without any X11 code on the asking side.
### Screen Grab Source
The module can act as a video source by capturing the contents of a screen region using `XShmGetImage` (MIT-SHM extension) for zero-copy capture within the same machine. The captured region is a configurable rectangle — typically one full monitor by its XRandR geometry, but can be any sub-region.
The grab loop produces frames at a configured rate, encapsulates them, and feeds them into the transport like any other video source. Combined with geometry queries, a remote controller can enumerate monitors, select one, and start a screen grab stream without manual coordinate configuration.
### Frame Viewer Sink
The module can act as a video sink by creating an X11 window and rendering the latest received frame into it. The window:
- Can be placed on a specific monitor using XRandR geometry
- Can be made fullscreen on a chosen output
- Renders using `XShmPutImage` (MIT-SHM) when the source is local, or `XPutImage` otherwise
- Displays the most recently received frame — it is driven by the low-latency output mode of the relay feeding it; it never buffers for completeness
This makes it the display-side counterpart of the V4L2 capture source: the same frame that was grabbed from a camera on a Pi can be viewed on any machine in the network that runs an xorg sink node, with the relay handling the path and delivery mode between them.
Scale and crop are applied at render time — the incoming frame is stretched or cropped to fill the window. This allows a high-resolution screen grab from one machine to be displayed scaled-down on a smaller physical monitor elsewhere in the network.
---
## Audio (Future)
Audio streams are not in scope for the initial implementation but the transport is designed to accommodate them without structural changes.
The `channel_id` field already provides stream multiplexing on a single connection. A future audio channel is just another channel on an existing transport connection — no new connection type is needed. The message type table has room for an `audio_frame` type alongside `video_frame`.
The main open question is codec and container: raw PCM is trivial to handle but large; compressed formats (Opus, AAC) need framing conventions. This is deferred until video is solid.
The frame allocator, relay, and archive modules should not make assumptions that `channel_id` implies video — they operate on opaque byte payloads with a message type and length, so audio frames will pass through the same infrastructure unchanged.
---
## Device Resilience
Nodes that read from hardware devices (V4L2 cameras, media devices) must handle transient device loss — a USB camera that disconnects and reconnects, a device node that briefly disappears during a mode switch, or a stream that errors out and can be retried. This is not an early implementation concern but has structural implications that should be respected from the start.

View File

@@ -45,14 +45,15 @@ Modules are listed in intended build order. Each depends only on modules above i
| 1 | `common` | done | Error types, base definitions — no dependencies |
| 2 | `media_ctrl` | done | Media Controller API — device and topology enumeration, pad format config |
| 3 | `v4l2_ctrl` | done | V4L2 controls — enumerate, get, set camera parameters |
| 4 | `serial` | not started | `put`/`get` primitives for little-endian binary serialization into byte buffers |
| 4 | `serial` | done | `put`/`get` primitives for little-endian binary serialization into byte buffers |
| 5 | `transport` | not started | Encapsulated transport — frame header, TCP stream abstraction, single-write send |
| 6 | `protocol` | not started | Typed `write_*`/`read_*` functions for all message types; builds on serial + transport |
| 7 | `frame_alloc` | not started | Per-frame allocation with bookkeeping (byte budget, ref counting) |
| 8 | `relay` | not started | Input dispatch to output queues (low-latency and completeness modes) |
| 9 | `ingest` | not started | MJPEG frame parser (two-pass EOI state machine, opaque stream → discrete frames) |
| 10 | `archive` | not started | Write frames to disk, control messages to binary log |
| 11 | `web node` | not started | Node.js/Express peer — speaks binary protocol on socket side, HTTP/WebSocket to browser; `protocol.mjs` mirrors C protocol module |
| 11 | `xorg` | not started | X11 screen geometry queries (XRandR), screen grab source, frame viewer sink — see architecture.md |
| 12 | `web node` | not started | Node.js/Express peer — speaks binary protocol on socket side, HTTP/WebSocket to browser; `protocol.mjs` mirrors C protocol module |
---