Files
gitea.efforting.tech/ci/planning/architecture.md

242 lines
15 KiB
Markdown

# CI Architecture
This document captures architectural decisions made for the efforting.tech CI system and flags what is still open.
---
## Core Concepts
### Image declarations vs task declarations
Images and tasks are declared separately. A task references an image by name. Multiple tasks may reference the same image — the image is built once and reused, rather than each task re-specifying its full setup.
**Status: decided.**
---
### Worktree layers (overlayfs)
A run may check out one or more repositories. Each repository produces two separately managed Docker volumes:
**Base worktree** — one named Docker volume per repository entry in the checkout definition, each containing a plain `git checkout` of that repository at its configured revision. The volume name is derived from a hash of the full checkout entry (owner, repo URL, revision, key, submodules flag, etc.) with keys sorted before hashing so field order in the config does not affect the result. Any change to any field produces a distinct volume. Base worktree volumes are read-only from the task container's perspective. Whether each is reused or re-checked-out on each run depends on whether the revision is mutable (see below).
**Worktree mutations** — a named Docker volume holding the overlayfs upper layer for a specific run. Unique per run. Contains only what the container wrote or modified — the base worktree is never touched.
Each repository is mounted at a distinct path inside the container under `/src/<name>`. The mount name defaults to the repository basename (mimicking `git clone` behaviour) but can be overridden per entry in the task declaration. From the outside, the CI system names and manages volumes per repository entry independently.
Using named Docker volumes means the checkout is a straightforward `git clone` or `git checkout` into a volume — no need for the `--git-dir` / `--work-tree` / `GIT_INDEX_FILE` technique. That approach remains useful for direct host-based deployments but is out of scope here.
#### Two-step sequence
The worktree is prepared in two distinct container runs:
1. **Checkout container** — mounts the base worktree volume as writable. User credentials are injected here. Runs `git clone` / `git checkout` for the target revision, then exits. Keeps git operations and credential handling sandboxed away from the host.
2. **Task container** — mounts base worktree read-only (overlayfs lower) and mutations volume writable (overlayfs upper). Runs the actual CI job. Secrets are mounted only if explicitly declared in the task configuration.
The base worktree volume is writable only during step 1. For all subsequent runs it is mounted `:ro` — the kernel enforces this. Any writes from the task container go to the mutations volume via overlayfs and cannot affect the base worktree.
#### Mutable vs immutable revisions
Each repository entry in a task declaration is flagged as mutable or immutable:
- **Immutable** (`mutable: false`) — the revision is a fixed sha. The checkout container runs once; on subsequent runs the existing volume is reused as-is.
- **Mutable** (`mutable: true`) — the revision is a ref name (branch, tag). The checkout container runs on every run to pick up any changes. The user is responsible for asserting correctness — the system trusts the flag.
There is no automatic resolution of ref names to shas. Attempting to detect whether a string is a sha or a ref name without querying the remote is unreliable and a potential security issue, and querying the remote for sha verification requires credentials and a round trip. The mutable flag keeps this explicit and honest.
> [!NOTE]
> The checkout container should restore file mtimes from git history after cloning (e.g. via `git restore-mtime`). This matters for deployment chains where downstream steps may rely on mtimes to detect what changed.
#### Credentials
Each `(owner, repo)` registration generates a unique ed25519 SSH key pair. The private key is stored in the CI secrets store scoped to that registration. The public key is presented to the user to add as a read-only deploy key on the repo in Gitea.
At checkout time the private key for that specific repo is mounted into the checkout container — no other keys are present. A compromised container cannot reach any other repo.
Initially the public key is registered manually by the user. Automating this via the Gitea API (`POST /repos/{owner}/{repo}/keys`) is a future improvement.
Keys are a first-class resource managed independently from tasks. The user creates and names them via the CI management dashboard, views the public half to register it in Gitea, and deletes them when no longer needed (with a warning if any tasks still reference the key). The private half never leaves the secrets store.
The CI system enforces no scoping policy — the user decides how broadly or narrowly to scope each key. They may use one key per repo, one key for all their repos, or any grouping that makes sense to them. Tasks reference a key by ID.
**Status: decided. Secrets store layout decided (see below). Automated key registration via Gitea API is open.**
#### Service user
All CI host operations (key generation, secrets storage, mounting volumes into containers) run as a dedicated service user — not a human account, no login shell. This user owns `secrets_base` and the CI server process runs under it.
Initially the service user is created manually during bring-up. A future Debian package should create it automatically via `adduser --system` if it does not already exist, following standard Debian packaging conventions. The package should also ensure ownership of `secrets_base` is set correctly on install.
The CI server process running as this user has access to all secrets in `secrets_base`. The checkout container always gets the deploy key for its repo mounted for the duration of the clone only. Whether secrets are also mounted into the task container is a per-task configuration decision — some tasks need no secrets (e.g. rsync to a local bind mount), others require them (e.g. SSH deploy to a remote server, pushing to a registry, publishing a package). A secrets broker would just move the same level of trust up one hop without meaningfully changing what gets mounted where.
**Status: manual setup for now. Debian packaging is a future goal.**
---
#### Secrets store layout
```
<secrets_base>/
git-ssh/
<user>/
public/
<key_name> (public key — safe to display)
private/
<key_name> (private key — mode 0600, never leaves the store)
```
`secrets_base` is a host directory (e.g. `/srv/ci-secrets`). The `git-ssh/` scope leaves room for future secret types without polluting the root. Key names are chosen by the user and have no enforced format.
The private key is mounted into the checkout container for the duration of the clone and unmounted when the container exits.
---
#### Worktree scoping
Base worktree volumes are scoped per owner. Access control is handled entirely by Gitea via SSH deploy keys — the checkout container only has the key for the specific repo it is cloning, so Gitea enforces what it can and cannot access. The per-owner scoping is about cache ownership and lifecycle: each owner manages and cleans up their own volumes independently.
#### Cleanup
Cleanup of base worktrees and worktree mutations is deferred. It may involve both manual and automatic steps. No eviction policy is defined yet.
**Status: decided. Implementation details (how overlayfs upper layer is composed with the base worktree volume inside Docker) are still open.**
---
### Container reuse (stop/start vs recreate)
Containers are not destroyed between runs unless explicitly evicted. A stopped container retains its internal state. This enables:
- **Debugging**: exec into a stopped container, patch something, restart without rebuilding.
- **Multi-image pipelines**: checkpoint after stage A, retry stage B without re-running A.
- **Faster iteration**: start/stop overhead is lower than image pull + container create.
Lifecycle states: `created → running → stopped → (restarted | removed)`
**Status: decided. Eviction policy (when to actually remove containers) is open.**
---
### Caches as bind mounts
Caches (e.g. Rust crate registry, npm cache, Maven local repo) are bind-mounted into containers at well-known paths. Cache selection is configurable per task.
#### Ownership model
| Type | Description |
|---|---|
| **Private** | Owned by one project. Removed when that project releases it. |
| **Shared** | Multiple projects declare ownership. Reference-counted — not removed until all owners release it. |
#### Quota
Each cache has a quota. The system must tally usage and enforce limits. Shared caches split responsibility across owners (exact apportionment policy is open).
**Status: concept decided. The following are open:**
- Quota enforcement mechanism (inotify + du polling? quotafs? btrfs subvolume quota?)
- How "releasing" a cache is triggered (explicit API call, TTL, run completion hook?)
- How shared cache conflicts are handled if owners have incompatible contents (e.g. different versions of a tool populating the same cache path)
- Whether caches can be layered the same way worktrees are (read-only shared base + writable upper)
---
### Multi-image pipelines
Tasks may be orchestrated in sequence. The output layer of one stage can become an additional lower layer in the overlayfs stack for the next stage. This avoids re-running earlier stages when retrying later ones.
**Status: directionally decided. Exact mechanism for passing outputs between stages is open.**
---
### Deployment bind mounts
Certain directories on the host (e.g. `sites.efforting.tech` web root) are bind-mounted into containers as write targets. The task writes its output there directly, which constitutes deployment.
Access control (which tasks are allowed to write which mounts) is a security consideration that must be addressed before this is used in untrusted contexts.
**Status: approach decided. Permission model is open.**
---
### Progress reporting
stdout/stderr from containers is captured by streaming `docker compose up` output directly to the CI server process. The container runs its task and exits naturally — no long-running idle process, no exec. This requires no networking inside the container.
For richer or asynchronous progress reporting (e.g. from a long-running build that wants to emit status mid-run), a future option is a callback channel from inside the container to the CI server — likely a Unix socket or HTTP POST to a known address. This requires limited networking back to the host only, not full WAN access.
A dedicated **text stream server** will handle log distribution:
- Live runs: consumers can subscribe and receive output as it arrives
- Past runs: logs are stored and retrievable by run ID
- The CI server pipes container stdout/stderr into this service rather than handling log storage itself
This keeps log concerns out of the CI orchestrator and gives a single place to tail, replay, or archive output.
**Status: stdout capture via `compose up` is the current approach. Text stream server is planned but not yet designed.**
A **web interface** will consume the text stream server and render output in the browser with full ANSI escape support. We will implement our own renderer — asciinema was considered but its `.cast` format silently replaces non-UTF-8 bytes with U+FFFD and has no binary escape hatch, making it a poor fit for a general log store.
The set of ANSI escapes seen in real CI output is small and well-defined:
- SGR colour/style codes (`\e[...m`) — foreground/background colours, bold, dim, reset
- Cursor movement (`\e[A/B/C/D`, `\e[H`, `\e[f`) — used by progress bars
- Erase (`\e[K`, `\e[2J`) — line/screen clearing, also used by progress bars
A purpose-built renderer targeting only these sequences is straightforward and gives full control over the UI.
**Status: decided (custom renderer). Design and implementation deferred.**
To maximise colour output from tools running inside containers, we set a combination of environment variables since there is no standard "enable colour, no interaction" signal — tools each have their own heuristic. The baseline set:
| Variable | Effect |
|---|---|
| `TERM=xterm-256color` | Tells tools the terminal supports 256 colours |
| `FORCE_COLOR=1` | Respected by Node.js ecosystem tools |
| `CLICOLOR_FORCE=1` | Respected by many Unix tools |
| `NO_COLOR` | Must be *unset* (some tools default to checking this) |
Individual tools may need additional flags (e.g. `--color=always` for git, cargo). These can be set per image or per task declaration.
> [!NOTE]
> **Future / low priority:** Allocating a PTY for the container process would make isatty() return true, solving the colour detection problem universally without any env var hacks. The trade-off is added complexity in I/O handling (terminal control sequences, input plumbing). Not worth pursuing until the env var approach proves insufficient.
---
### Network isolation
WAN access is both expected and fine for most tasks — containers need to fetch dependencies, push to registries, etc. The specific concern is containers reaching the host's localhost, which could expose internal services as an unintended back channel.
The default network policy is therefore: WAN allowed, host localhost blocked. Network policy is configurable per task — tasks that need no network at all (e.g. the rsync publish experiment) can use `network_mode: none`.
How to enforce the host localhost restriction is still open — likely a firewall rule on the Docker bridge interface blocking access to `127.0.0.1` from container network namespaces.
**Status: default policy decided. Enforcement mechanism is open.**
---
### Cross-platform builds
Builds targeting non-Linux platforms (Windows, macOS) would require QEMU or separate VPS instances. This is explicitly deferred — it will not be part of the initial architecture.
**Status: deferred.**
---
## Summary of open questions
| Topic | What's open |
|---|---|
| Overlayfs + Docker | How to compose overlayfs worktree mounts with Docker's own storage driver |
| Worktree cache invalidation | How and when owner worktree caches are invalidated on access revocation |
| Container eviction | When and how stopped containers are removed |
| Cache quota enforcement | Mechanism for measuring and enforcing per-cache quotas |
| Cache release trigger | How a project signals it no longer needs a cache |
| Shared cache conflicts | How to handle incompatible writes from different owners |
| Layered caches | Whether caches can use the same overlay approach as worktrees |
| Deployment permissions | Which tasks are allowed to write which bind-mounted targets |
| SSH key registration | Automate public key registration via Gitea API (`POST /repos/{owner}/{repo}/keys`) — requires a Gitea user token with repo admin permissions, which is a separate credential. Only worth pursuing if the CI system already holds a user token for other reasons (status checks, repo metadata etc.). |
| Network isolation | How to block host localhost from container network namespaces |
| Progress reporting | In-container callback channel (socket/HTTP) for async mid-run status |
| Multi-stage output passing | Exact format/protocol for stage-to-stage data handoff |