From 14926f5421b2123b620abd4ca859a1eb0e6dcf6d Mon Sep 17 00:00:00 2001 From: mikael-lovqvists-claude-agent Date: Sat, 28 Mar 2026 20:35:49 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20redesign=20frame=20viewer=20sink=20?= =?UTF-8?q?=E2=80=94=20GLFW+OpenGL=20now,=20Vulkan=20as=20future=20alt?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace XShmPutImage approach with GLFW+OpenGL as the initial renderer. Documents the two-renderer plan: GLFW handles window/input for both; only the rendering backend differs. Notes that both renderers should conform to the same internal interface for swappability. Adds input event forwarding (keyboard/mouse → INPUT_EVENT upstream) as a first-class capability of the viewer sink. Co-Authored-By: Claude Sonnet 4.6 --- architecture.md | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-) diff --git a/architecture.md b/architecture.md index bb1b584..7d9ed9e 100644 --- a/architecture.md +++ b/architecture.md @@ -389,16 +389,38 @@ The grab loop produces frames at a configured rate, encapsulates them, and feeds ### Frame Viewer Sink -The module can act as a video sink by creating an X11 window and rendering the latest received frame into it. The window: +The module can act as a video sink by creating a window and rendering the latest received frame into it. The window: -- Can be placed on a specific monitor using XRandR geometry +- Geometry (size and monitor placement) is specified at stream open time, using XRandR data when targeting a specific output - Can be made fullscreen on a chosen output -- Renders using `XShmPutImage` (MIT-SHM) when the source is local, or `XPutImage` otherwise -- Displays the most recently received frame — it is driven by the low-latency output mode of the relay feeding it; it never buffers for completeness +- Displays the most recently received frame — driven by the low-latency output mode of the relay; never buffers for completeness +- Forwards keyboard and mouse events back upstream as `INPUT_EVENT` protocol messages, enabling remote control use cases -This makes it the display-side counterpart of the V4L2 capture source: the same frame that was grabbed from a camera on a Pi can be viewed on any machine in the network that runs an xorg sink node, with the relay handling the path and delivery mode between them. +Scale and crop are applied in the renderer — the incoming frame is stretched or letterboxed to fill the window. This allows a high-resolution source (Pi camera, screen grab) to be displayed scaled-down on a different machine. -Scale and crop are applied at render time — the incoming frame is stretched or cropped to fill the window. This allows a high-resolution screen grab from one machine to be displayed scaled-down on a smaller physical monitor elsewhere in the network. +This makes it the display-side counterpart of the V4L2 capture source: a frame grabbed from a camera on a Pi can be viewed on any machine in the network running a viewer sink node, with the relay handling the path and delivery mode. + +#### Renderer: GLFW + OpenGL + +The initial implementation uses **GLFW** for window and input management and **OpenGL** for rendering. + +GLFW handles window creation, the event loop, resize, and input callbacks — it also supports Vulkan surface creation using the same API, which makes a future renderer swap straightforward. Input events (keyboard, mouse) are normalised by GLFW before being encoded as protocol messages. + +The OpenGL renderer: +1. Receives a decoded frame as a pixel buffer (libjpeg-turbo for MJPEG, raw for uncompressed formats) +2. Uploads it as a 2D texture +3. Runs a fragment shader that handles YUV→RGB conversion (where needed) and scaling/filtering +4. Presents via GLFW's swap-buffers call + +This keeps CPU load low — chroma conversion and scaling happen on the GPU — while keeping the implementation simple relative to a full Vulkan pipeline. + +#### Renderer: Vulkan (future alternative) + +A Vulkan renderer is planned as an alternative to the OpenGL one. GLFW's surface creation API is renderer-agnostic, so the window management and input handling code is shared. Only the renderer backend changes. + +Vulkan offers more explicit control over presentation timing, multi-queue workloads, and compute shaders (e.g. on-GPU MJPEG decode via a compute pass if a suitable library is available). It is not needed for the initial viewer but worth having for high-frame-rate or multi-stream display scenarios. + +The renderer selection should be a compile-time or runtime option — both implementations conform to the same internal interface (`render_frame(pixel_buffer, width, height, format)`). ---