Add WORKFLOWS.md — use cases and workflow descriptions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
77
WORKFLOWS.md
Normal file
77
WORKFLOWS.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Voice Assistant — Workflows
|
||||
|
||||
Different ways this system is useful. Each workflow works with the current implementation to varying degrees; some require the remote voice capability still in progress.
|
||||
|
||||
---
|
||||
|
||||
## Hands-free while doing something else
|
||||
|
||||
**Current state: works**
|
||||
|
||||
The most immediate use case. You're cleaning, cooking, packing, doing physical work — anything where your hands and eyes are occupied but your mind is partly free. Instead of stopping, finding a device, and typing, you just speak.
|
||||
|
||||
Useful for:
|
||||
- Capturing ideas before they disappear
|
||||
- Adding tasks to a list
|
||||
- Quick factual questions
|
||||
- Dictating notes or reminders
|
||||
- Asking for information that would otherwise interrupt the activity
|
||||
|
||||
Today's session was largely this: several hours of house chores while producing a voice wishlist, a cleanup plan, a system architecture, a dozen TODO items, and several code fixes — without touching a keyboard.
|
||||
|
||||
---
|
||||
|
||||
## Outdoor thinking walk
|
||||
|
||||
**Current state: requires remote voice (planned)**
|
||||
|
||||
Walking outdoors, especially in nature, shifts the brain into a different mode — more associative, better for divergent thinking and idea generation. The old workaround was to record a stream of consciousness audio, come home, and manually transcribe the useful parts. This system replaces that workflow entirely.
|
||||
|
||||
Instead of recording everything and filtering later:
|
||||
- The STT + classifier process speech in real time
|
||||
- Only actionable fragments are acted on or saved
|
||||
- Ideas are captured the moment they arrive, not after a lossy review process
|
||||
- The walk becomes a working session, not a recording session
|
||||
|
||||
Requires the phone VAD architecture: Silero VAD running on the phone, speech segments sent to the server only, silence never transmitted. Clean latency reset at every silence boundary.
|
||||
|
||||
---
|
||||
|
||||
## Developer assistant at the desk
|
||||
|
||||
**Current state: works**
|
||||
|
||||
Voice as a faster interface than typing for certain tasks. Asking questions while reading code, dictating commit messages or notes, triggering common operations. The voice pipeline already integrates with Claude Code via window injection.
|
||||
|
||||
Particularly useful for:
|
||||
- Questions where the answer matters but the typing is tedious
|
||||
- Dispatching tasks to Claude while focused on something else in the same session
|
||||
- Reviewing and commenting on code hands-free
|
||||
|
||||
---
|
||||
|
||||
## Ambient brainstorming session
|
||||
|
||||
**Current state: works**
|
||||
|
||||
No specific task. Free-form thinking out loud, with the system acting as a sounding board and scribe. Ideas, observations, and plans are captured as notes or TODO items. The AI responds, pushes back, or asks clarifying questions.
|
||||
|
||||
The activation phrase model (planned) makes this more natural: say "computer" to open a query, think out loud, say "go" when you have something worth capturing. The silence timeout handles cases where a thought resolves on its own.
|
||||
|
||||
---
|
||||
|
||||
## Background note-taking during another activity
|
||||
|
||||
**Current state: partial** (requires query journal, planned)
|
||||
|
||||
Passively capturing observations, ideas, or reminders while doing something else — without expecting a response. A lightweight "note" activation phrase would skip the classifier and Claude dispatch entirely, just logging the utterance to a journal. No latency, no generation cost.
|
||||
|
||||
Related to the outdoor walk workflow but applies to any activity: a commute, a gym session, watching a show and having a thought you want to keep.
|
||||
|
||||
---
|
||||
|
||||
## The bigger picture
|
||||
|
||||
Each of these workflows shares a core property: **the interface fits around life, rather than demanding you fit around the interface**. A keyboard and screen require you to stop, sit, focus, and type. Voice removes those requirements — when the pipeline is running, the channel is always open.
|
||||
|
||||
The planned improvements (activation phrases, remote access via phone, chimes, better classifier) each incrementally close the gap between the current prototype and a system that genuinely disappears into the background.
|
||||
Reference in New Issue
Block a user