Files
video-setup/docs/discovery.md
mikael-lovqvists-claude-agent f3a6be0701 docs: update discovery behaviour — targeted unicast replies, not multicast
Document that immediate re-announcements go directly to the triggering peer
(unicast) rather than to the multicast group, and explain the two conditions
that trigger a reply: new peer and restarted peer (site_id change).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 04:35:28 +00:00

6.1 KiB

Node Discovery and Multi-Site

See Architecture Overview.

Node Discovery

Standard mDNS (RFC 6762) uses UDP multicast over 224.0.0.251:5353 with DNS-SD service records. The wire protocol is well-defined and the multicast group is already in active use on most LANs. The standard service discovery stack (Avahi, Bonjour, nss-mdns) provides that transport but brings significant overhead: persistent daemons, D-Bus dependencies, complex configuration surface, and substantial resident memory. None of that is needed here.

The approach: reuse the multicast transport, define our own wire format.

Rather than DNS wire format, node announcements are encoded as binary frames using the same serialization layer (serial) and frame header used for video transport. A node joins the multicast group, broadcasts periodic announcements, and listens for announcements from peers.

Announcement Frame

Field Size Purpose
message_type 2 bytes Discovery message type (e.g. 0x0010 for node announcement)
channel_id 2 bytes Reserved / zero
payload_length 4 bytes Byte length of payload
Payload variable Encoded node identity and capabilities

Payload fields:

Field Type Purpose
protocol_version u8 Wire format version
site_id u16 Site this node belongs to (0 = local / unassigned)
tcp_port u16 Port where this node accepts transport connections
function_flags u16 Bitfield declaring node capabilities (see below)
name_len u8 Length of name string
name bytes Node name (namespace:instance, e.g. v4l2:microscope)

function_flags bits:

Bit Mask Meaning
0 0x0001 Source — produces video
1 0x0002 Relay — receives and distributes streams
2 0x0004 Sink — consumes video (display, archiver, etc.)
3 0x0008 Controller — participates in control plane coordination

A node may set multiple bits — a relay that also archives sets both RELAY and SINK.

Behaviour

  • Nodes send announcements periodically (e.g. every 5 s) and immediately on startup via multicast
  • No daemon — the node process itself sends and listens; no background service required
  • On receiving an announcement the control plane records the peer (address, port, name, function) and can initiate a transport connection if needed
  • A node going silent for a configured number of announcement intervals is considered offline
  • Announcements are informational only — the hub validates identity at connection time

Targeted replies

Multicast is only used for the periodic keep-alive broadcast. When a node receives an announcement from a peer it does not yet know, or detects that a known peer has restarted (its site_id changed for the same address and port), it sends an immediate unicast reply directly back to that peer's IP address. This ensures the new or restarted peer learns about this node quickly without waiting up to interval_ms, while avoiding a multicast blast that would unnecessarily wake every other node on the subnet.

Steady-state keepalive packets from already-known peers do not trigger any reply.

No Avahi/Bonjour Dependency

The system does not link against, depend on, or interact with Avahi or Bonjour. It opens a raw UDP multicast socket directly, which requires only standard POSIX socket APIs. This keeps the runtime dependency footprint minimal and the behaviour predictable.


Multi-Site (Forward Compatibility)

The immediate use case is a single LAN. A planned future use case is site-to-site linking — two independent networks (e.g. a lab and a remote location) connected by a tunnel (SSH port-forward, WireGuard, etc.), where nodes on both sites are reachable from either side.

Site Identity

Every node carries a site_id (u16) in its announcement. In a single-site deployment this is always 0. When sites are joined, each site is assigned a distinct non-zero ID; nodes retain their IDs across the join and are fully addressable by (site_id, name) from anywhere in the combined network.

This field is reserved from day one so that multi-site never requires a wire format change or a rename of existing identifiers.

Site Gateway Node

A site gateway is a node that participates in both networks simultaneously — it has a connection on the local transport and a connection over the inter-site tunnel. It:

  • Bridges discovery announcements between sites (rewriting site_id appropriately)
  • Forwards encapsulated transport frames across the tunnel on behalf of cross-site edges
  • Is itself a named node, so the control plane can see and reason about it

The tunnel transport is out of scope for now. The gateway is a node type, not a special infrastructure component — it uses the same wire protocol as everything else.

Site ID Translation

Both sides of a site-to-site link will independently default to site_id = 0. A gateway cannot simply forward announcements across the boundary — every node on both sides would appear as site 0 and be indistinguishable.

The gateway is responsible for site ID translation: it assigns a distinct non-zero site_id to each side of the link and rewrites the site_id field in all announcements and any protocol messages that carry a site_id as they cross the boundary. From each side's perspective, remote nodes appear with the translated ID assigned by the gateway; local nodes retain their own IDs.

This means site_id = 0 should be treated as "local / unassigned" and never forwarded across a site boundary without translation. A node that receives an announcement with site_id = 0 on a cross-site link should treat it as a protocol error from the gateway.

Addressing

A fully-qualified node address is site_id:namespace:instance. Within a single site, site_id is implicit and can be omitted. The control plane and discovery layer must store site_id alongside every peer record from the start, even if it is always 0, so that the upgrade to multi-site addressing requires only configuration and a gateway node — not code changes.