feat: xorg text overlays, font atlas generator, v4l2_view_cli

- tools/gen_font_atlas: Python/Pillow build tool — skyline packs DejaVu Sans glyphs 32-255 into a grayscale atlas, emits build/gen/font_atlas.h with pixel data and Font_Glyph[256] metrics table - xorg: bitmap font atlas text overlay rendering (GL_R8 atlas texture, alpha-blended glyph quads, dark background rect per overlay) - xorg: add xorg_viewer_set_overlay_text / clear_overlays API - xorg: add xorg_viewer_handle_events for streaming use (events only, no redundant render) - xorg_cli: show today's date as white text overlay - v4l2_view_cli: new tool — V4L2 capture with format auto-selection (highest FPS then largest resolution), MJPEG/YUYV, measured FPS overlay - docs: update README, planning, architecture to reflect current status Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 22:13:59 +00:00
parent 7fd79e6120
commit 611376dbc1
11 changed files with 1204 additions and 46 deletions
--- a/architecture.md
+++ b/architecture.md
@@ -418,7 +418,7 @@ The initial implementation uses **GLFW** for window and input management and **O
 GLFW handles window creation, the event loop, resize, and input callbacks — it also supports Vulkan surface creation using the same API, which makes a future renderer swap straightforward. Input events (keyboard, mouse) are normalised by GLFW before being encoded as protocol messages.

 The OpenGL renderer:
-1. For **MJPEG**: calls `tjDecompressToYUV2` (libjpeg-turbo) to decompress directly to planar YUV — no CPU-side color conversion. JPEG stores YCbCr internally so this is the minimal decode path: Huffman + DCT output lands directly in YUV planes.
+1. For **MJPEG**: calls `tjDecompressToYUVPlanes` (libjpeg-turbo) to decompress directly to planar YUV — no CPU-side color conversion. JPEG stores YCbCr internally so this is the minimal decode path: Huffman + DCT output lands directly in YUV planes.
 2. Uploads Y, Cb, Cr as separate `GL_RED` textures (chroma at half resolution for 4:2:0 / 4:2:2 as delivered by most V4L2 cameras).
 3. Fragment shader samples the three planes and applies the BT.601 matrix to produce RGB — a few lines of GLSL.
 4. Scaling and filtering happen in the same shader pass.
@@ -428,17 +428,19 @@ For **raw pixel formats** (BGRA, YUV planar from the wire): uploaded directly wi

 This keeps CPU load minimal — the only CPU work for MJPEG is Huffman decode and DCT, which libjpeg-turbo runs with SIMD. All color conversion and scaling is on the GPU.

-#### Text overlays (future)
+#### Text overlays

-Two tiers are planned, implemented in order:
+Two tiers, implemented in order:

-**Tier 1 — bitmap font atlas (initial)**
+**Tier 1 — bitmap font atlas (done)**

-A build-time script (Python Pillow) renders glyphs from a TTF font into a packed PNG atlas and emits a metadata file (JSON or generated C header) with per-glyph UV rects and advance widths. At runtime the atlas is uploaded as a `GL_RGBA` texture and each character is rendered as a small quad, alpha-blended over the frame. Simple skyline packing keeps the atlas compact.
+`tools/gen_font_atlas/gen_font_atlas.py` (Python/Pillow) renders glyphs 32–255 from DejaVu Sans at 16pt into a packed grayscale atlas using a skyline bin packer and emits `build/gen/font_atlas.h` — a C header with the pixel data as a `static const uint8_t` array and a `Font_Glyph[256]` metrics table indexed by codepoint.

-The generator lives in `tools/gen_font_atlas/` and runs as part of `make build`. Sufficient for ASCII overlays: timestamps, stream labels, debug info.
+At runtime the atlas is uploaded as a `GL_R8` texture. Each overlay is rendered as a batch of alpha-blended glyph quads preceded by a semi-transparent dark background rect (using a separate minimal screen-space rect shader driven by `gl_VertexID`). The public API is `xorg_viewer_set_overlay_text(v, idx, x, y, text, r, g, b)` and `xorg_viewer_clear_overlays(v)`. Up to 8 independent overlays are supported.

-**Tier 2 — HarfBuzz + FreeType (later)**
+The generator runs automatically as a `make` dependency before compiling `xorg.c`. The Pillow build tool is the only Python dependency; there are no runtime font deps.
+
+**Tier 2 — HarfBuzz + FreeType (future)**

 A proper runtime font stack for full typography: correct shaping, kerning, ligatures, bidirectional text, non-Latin scripts. Added as a feature flag with its own runtime deps alongside the blit path.

@@ -446,20 +448,27 @@ When Tier 2 is implemented, the Pillow build dependency may be replaced by a pur

 #### Render loop

-The viewer is driven by incoming frames rather than a fixed-rate loop. The intended pattern for callers:
+The viewer is driven by incoming frames rather than a fixed-rate loop. Two polling functions are provided depending on the use case:
+
+**Static image / test tool** — `xorg_viewer_poll(v)` processes events then re-renders from existing textures:

 ```c
-while (xorg_viewer_poll(v)) {
-    if (new_frame_available()) {
-        xorg_viewer_push_yuv420(v, ...);  /* upload + render */
+while (xorg_viewer_poll(v)) { /* wait for close */ }
+```
+
+**Live stream** — the push functions (`push_yuv420`, `push_mjpeg`, etc.) already upload and render. Use `xorg_viewer_handle_events(v)` to process window events without an extra render:
+
+```c
+while (1) {
+    /* block on V4L2/network fd until frame or timeout */
+    if (frame_available) {
+        xorg_viewer_push_mjpeg(v, data, size);  /* upload + render */
    }
-    /* no new frame → no redundant GPU work */
+    if (!xorg_viewer_handle_events(v)) { break; }
 }
 ```

-`xorg_viewer_poll` calls `glfwPollEvents` which dispatches input and resize events. A `framebuffer_size_callback` registered on the window calls `render()` synchronously during the resize, so the image tracks the window edge without a one-frame lag. This avoids both a busy render loop and the latency of waiting for the next poll iteration.
-
-For a static image (test tool, paused stream), `glfwWaitEventsTimeout(interval)` is a better substitute for `glfwPollEvents` — it sleeps until an event arrives or the timeout expires, eliminating idle CPU usage.
+A `framebuffer_size_callback` registered on the window calls `render()` synchronously during resize, so the image tracks the window edge without a one-frame lag.

 Threading note: the GL context must be used from the thread that created it. In the video node, incoming frames arrive on a network receive thread. A frame queue between the receive thread and the render thread (which owns the GL context) is the correct model — the render thread drains the queue each poll iteration rather than having the network thread call push functions directly.