feat: xorg text overlays, font atlas generator, v4l2_view_cli

- tools/gen_font_atlas: Python/Pillow build tool — skyline packs DejaVu
  Sans glyphs 32-255 into a grayscale atlas, emits build/gen/font_atlas.h
  with pixel data and Font_Glyph[256] metrics table
- xorg: bitmap font atlas text overlay rendering (GL_R8 atlas texture,
  alpha-blended glyph quads, dark background rect per overlay)
- xorg: add xorg_viewer_set_overlay_text / clear_overlays API
- xorg: add xorg_viewer_handle_events for streaming use (events only,
  no redundant render)
- xorg_cli: show today's date as white text overlay
- v4l2_view_cli: new tool — V4L2 capture with format auto-selection
  (highest FPS then largest resolution), MJPEG/YUYV, measured FPS overlay
- docs: update README, planning, architecture to reflect current status

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-28 22:13:59 +00:00
parent 7fd79e6120
commit 611376dbc1
11 changed files with 1204 additions and 46 deletions

View File

@@ -418,7 +418,7 @@ The initial implementation uses **GLFW** for window and input management and **O
GLFW handles window creation, the event loop, resize, and input callbacks — it also supports Vulkan surface creation using the same API, which makes a future renderer swap straightforward. Input events (keyboard, mouse) are normalised by GLFW before being encoded as protocol messages.
The OpenGL renderer:
1. For **MJPEG**: calls `tjDecompressToYUV2` (libjpeg-turbo) to decompress directly to planar YUV — no CPU-side color conversion. JPEG stores YCbCr internally so this is the minimal decode path: Huffman + DCT output lands directly in YUV planes.
1. For **MJPEG**: calls `tjDecompressToYUVPlanes` (libjpeg-turbo) to decompress directly to planar YUV — no CPU-side color conversion. JPEG stores YCbCr internally so this is the minimal decode path: Huffman + DCT output lands directly in YUV planes.
2. Uploads Y, Cb, Cr as separate `GL_RED` textures (chroma at half resolution for 4:2:0 / 4:2:2 as delivered by most V4L2 cameras).
3. Fragment shader samples the three planes and applies the BT.601 matrix to produce RGB — a few lines of GLSL.
4. Scaling and filtering happen in the same shader pass.
@@ -428,17 +428,19 @@ For **raw pixel formats** (BGRA, YUV planar from the wire): uploaded directly wi
This keeps CPU load minimal — the only CPU work for MJPEG is Huffman decode and DCT, which libjpeg-turbo runs with SIMD. All color conversion and scaling is on the GPU.
#### Text overlays (future)
#### Text overlays
Two tiers are planned, implemented in order:
Two tiers, implemented in order:
**Tier 1 — bitmap font atlas (initial)**
**Tier 1 — bitmap font atlas (done)**
A build-time script (Python Pillow) renders glyphs from a TTF font into a packed PNG atlas and emits a metadata file (JSON or generated C header) with per-glyph UV rects and advance widths. At runtime the atlas is uploaded as a `GL_RGBA` texture and each character is rendered as a small quad, alpha-blended over the frame. Simple skyline packing keeps the atlas compact.
`tools/gen_font_atlas/gen_font_atlas.py` (Python/Pillow) renders glyphs 32255 from DejaVu Sans at 16pt into a packed grayscale atlas using a skyline bin packer and emits `build/gen/font_atlas.h` — a C header with the pixel data as a `static const uint8_t` array and a `Font_Glyph[256]` metrics table indexed by codepoint.
The generator lives in `tools/gen_font_atlas/` and runs as part of `make build`. Sufficient for ASCII overlays: timestamps, stream labels, debug info.
At runtime the atlas is uploaded as a `GL_R8` texture. Each overlay is rendered as a batch of alpha-blended glyph quads preceded by a semi-transparent dark background rect (using a separate minimal screen-space rect shader driven by `gl_VertexID`). The public API is `xorg_viewer_set_overlay_text(v, idx, x, y, text, r, g, b)` and `xorg_viewer_clear_overlays(v)`. Up to 8 independent overlays are supported.
**Tier 2 — HarfBuzz + FreeType (later)**
The generator runs automatically as a `make` dependency before compiling `xorg.c`. The Pillow build tool is the only Python dependency; there are no runtime font deps.
**Tier 2 — HarfBuzz + FreeType (future)**
A proper runtime font stack for full typography: correct shaping, kerning, ligatures, bidirectional text, non-Latin scripts. Added as a feature flag with its own runtime deps alongside the blit path.
@@ -446,20 +448,27 @@ When Tier 2 is implemented, the Pillow build dependency may be replaced by a pur
#### Render loop
The viewer is driven by incoming frames rather than a fixed-rate loop. The intended pattern for callers:
The viewer is driven by incoming frames rather than a fixed-rate loop. Two polling functions are provided depending on the use case:
**Static image / test tool**`xorg_viewer_poll(v)` processes events then re-renders from existing textures:
```c
while (xorg_viewer_poll(v)) {
if (new_frame_available()) {
xorg_viewer_push_yuv420(v, ...); /* upload + render */
while (xorg_viewer_poll(v)) { /* wait for close */ }
```
**Live stream** — the push functions (`push_yuv420`, `push_mjpeg`, etc.) already upload and render. Use `xorg_viewer_handle_events(v)` to process window events without an extra render:
```c
while (1) {
/* block on V4L2/network fd until frame or timeout */
if (frame_available) {
xorg_viewer_push_mjpeg(v, data, size); /* upload + render */
}
/* no new frame → no redundant GPU work */
if (!xorg_viewer_handle_events(v)) { break; }
}
```
`xorg_viewer_poll` calls `glfwPollEvents` which dispatches input and resize events. A `framebuffer_size_callback` registered on the window calls `render()` synchronously during the resize, so the image tracks the window edge without a one-frame lag. This avoids both a busy render loop and the latency of waiting for the next poll iteration.
For a static image (test tool, paused stream), `glfwWaitEventsTimeout(interval)` is a better substitute for `glfwPollEvents` — it sleeps until an event arrives or the timeout expires, eliminating idle CPU usage.
A `framebuffer_size_callback` registered on the window calls `render()` synchronously during resize, so the image tracks the window edge without a one-frame lag.
Threading note: the GL context must be used from the thread that created it. In the video node, incoming frames arrive on a network receive thread. A frame queue between the receive thread and the render thread (which owns the GL context) is the correct model — the render thread drains the queue each poll iteration rather than having the network thread call push functions directly.