Skip to main content
OpenHome Abilities have access to raw audio recording from the device microphone — independent of the built-in STT pipeline. This means an Ability can:
  1. Start recording while OpenHome’s normal conversation flow keeps running
  2. Capture audio for seconds, minutes, or hours — as long as the Ability is active
  3. Retrieve the raw audio bytes when recording stops
  4. Send those bytes anywhere — Deepgram, ElevenLabs, any audio API
Until now, Abilities could only access what the user said (as text, after STT processed it). Now they can access what the microphone heard — multiple speakers, background sounds, music, ambient noise, tone of voice, everything. Combined with Deepgram’s API (Nova-3 transcription, diarization, sentiment, topic detection, summarization), this unlocks an entirely new class of Abilities.

The core pattern

Every hot-mic Ability follows the same architecture:
# 1. Start recording (mic stays hot — OpenHome STT still works in parallel)
self.capability_worker.start_audio_recording()

# 2. Do whatever you need while recording runs...
#    - Listen for a stop command via user_response()
#    - Run a timer with session_tasks.sleep()
#    - Continue normal conversation

# 3. Stop recording
self.capability_worker.stop_audio_recording()

# 4. Get the raw audio bytes
audio_bytes = self.capability_worker.get_audio_recording()
recording_length = self.capability_worker.get_audio_recording_length()

# 5. Send to Deepgram (or any audio API)
response = requests.post(
    "https://api.deepgram.com/v1/listen",
    headers={
        "Authorization": "Token YOUR_DEEPGRAM_KEY",
        "Content-Type": "audio/wav",
    },
    params={
        "model": "nova-3",
        "diarize": "true",
        "smart_format": "true",
        "utterances": "true",
        "punctuate": "true",
        "language": "en",
    },
    data=audio_bytes,
)
deepgram_result = response.json()
OpenHome handles the mic. Deepgram handles the intelligence. Your Ability handles the logic.

What Deepgram gives you

When you send audio to Deepgram’s pre-recorded endpoint, you’re not just getting text back. Depending on the parameters you pass:
FeatureParameterWhat it returns
Speaker diarizationdiarize=trueLabels each segment with a speaker ID (Speaker 0, Speaker 1)
Utterancesutterances=trueGroups speech into speaker turns with timestamps
Smart formattingsmart_format=trueAdds punctuation, capitalization, paragraph breaks
Keyword boostingkeyterm=["OpenHome"]Improves accuracy for domain-specific words
Language detectiondetect_language=trueAuto-detects the spoken language
Summarizationsummarize=v2Auto-generated summary of the audio
Topic detectiondetect_topics=trueIdentifies topics discussed
Sentiment analysissentiment=truePositive / negative / neutral per utterance
Word timestampsalways includedStart/end time for every word
All features can be combined in a single API call — a diarized, summarized, sentiment-analyzed transcript with topic labels and keyword boosting, all from the same audio bytes.

Recording methods

MethodWhat it does
start_audio_recording()Opens the mic buffer. Recording runs in the background.
stop_audio_recording()Closes the mic buffer.
get_audio_recording()Returns raw audio as bytes.
get_audio_recording_length()Returns duration in seconds.
flush_audio_recording()Clears the buffer so the next recording starts fresh.
OpenHome’s normal STT/TTS pipeline keeps running while recording is active. The user can still talk to OpenHome, trigger other commands, and interact normally. Recording happens in parallel — a background capture, not a modal takeover.

Showcase examples

Five Ability ideas built on this pattern. For the full catalog of 20+, see the Simple Abilities Cookbook.

1. Meeting Notes

“Hey OpenHome, take notes.” Records the entire meeting. On “meeting finished,” sends audio to Deepgram with diarization. Returns formatted notes with speaker labels, summary, and action items.
self.capability_worker.start_audio_recording()
while True:
    user_input = await self.capability_worker.user_response()
    if "meeting finished" in user_input.lower():
        break
self.capability_worker.stop_audio_recording()
audio_bytes = self.capability_worker.get_audio_recording()
# → Send to Deepgram with diarize=true, utterances=true

2. Baby Monitor

“Hey OpenHome, listen to the nursery.” Records ambient audio on a rolling basis (e.g., 30-second windows). Analyzes each window for crying, coughing, or silence-breaking events. Alerts via TTS.
while self.is_monitoring:
    self.capability_worker.start_audio_recording()
    await self.worker.session_tasks.sleep(30)
    self.capability_worker.stop_audio_recording()
    audio_bytes = self.capability_worker.get_audio_recording()
    # Analyze with Deepgram or sound classifier
    # If event detected → alert via speak()

3. Public Speaking Coach

“Hey OpenHome, coach my presentation.” Records a practice run. Analyzes pace (words per minute from timestamps), filler count (“um”, “uh”, “like”), and sentence variation. LLM generates coaching notes.

4. Voice Journal

“Hey OpenHome, start my journal.” Records a free-form spoken entry. Transcribes with Deepgram, then LLM formats it into a clean journal entry with date, mood detection (sentiment), and key topics.

5. Noise Level Monitor

“Hey OpenHome, monitor the noise level.” Analyzes raw PCM amplitude without any external API. Reports quiet stretches and noisy spikes. Triggers focus reminders.
import struct
samples = struct.unpack(f"<{len(audio_bytes)//2}h", audio_bytes)
peak = max(abs(s) for s in samples)
rms = (sum(s**2 for s in samples) / len(samples)) ** 0.5

Why this matters

Before the hot mic, Abilities were reactive — they activated on a trigger word, had a conversation, and exited. The microphone was a command input device. Now the microphone is a sensor. It can run continuously, capture rich audio data, and feed it to external intelligence services. This turns OpenHome from a voice assistant into an ambient computing platform. Because the output is just bytes, you can send them anywhere:
  • Deepgram for transcription, diarization, sentiment, topics, summarization
  • ElevenLabs for voice cloning (already used in AI Twin)
  • Any sound classification API for non-speech audio events
  • Your own models for custom audio analysis
  • Local processing for amplitude analysis, silence detection, etc.

See also