Hot Mic + Deepgram

OpenHome Abilities have access to raw audio recording from the device microphone — independent of the built-in STT pipeline. This means an Ability can:

Start recording while OpenHome’s normal conversation flow keeps running
Capture audio for seconds, minutes, or hours — as long as the Ability is active
Retrieve the raw audio bytes when recording stops
Send those bytes anywhere — Deepgram, ElevenLabs, any audio API

Until now, Abilities could only access what the user said (as text, after STT processed it). Now they can access what the microphone heard — multiple speakers, background sounds, music, ambient noise, tone of voice, everything. Combined with Deepgram’s API (Nova-3 transcription, diarization, sentiment, topic detection, summarization), this unlocks an entirely new class of Abilities.

The core pattern

Every hot-mic Ability follows the same architecture:

# 1. Start recording (mic stays hot — OpenHome STT still works in parallel)
self.capability_worker.start_audio_recording()

# 2. Do whatever you need while recording runs...
#    - Listen for a stop command via user_response()
#    - Run a timer with session_tasks.sleep()
#    - Continue normal conversation

# 3. Stop recording
self.capability_worker.stop_audio_recording()

# 4. Get the raw audio bytes
audio_bytes = self.capability_worker.get_audio_recording()
recording_length = self.capability_worker.get_audio_recording_length()

# 5. Send to Deepgram (or any audio API)
response = requests.post(
    "https://api.deepgram.com/v1/listen",
    headers={
        "Authorization": "Token YOUR_DEEPGRAM_KEY",
        "Content-Type": "audio/wav",
    },
    params={
        "model": "nova-3",
        "diarize": "true",
        "smart_format": "true",
        "utterances": "true",
        "punctuate": "true",
        "language": "en",
    },
    data=audio_bytes,
)
deepgram_result = response.json()

OpenHome handles the mic. Deepgram handles the intelligence. Your Ability handles the logic.

What Deepgram gives you

When you send audio to Deepgram’s pre-recorded endpoint, you’re not just getting text back. Depending on the parameters you pass:

Feature	Parameter	What it returns
Speaker diarization	`diarize=true`	Labels each segment with a speaker ID (Speaker 0, Speaker 1)
Utterances	`utterances=true`	Groups speech into speaker turns with timestamps
Smart formatting	`smart_format=true`	Adds punctuation, capitalization, paragraph breaks
Keyword boosting	`keyterm=["OpenHome"]`	Improves accuracy for domain-specific words
Language detection	`detect_language=true`	Auto-detects the spoken language
Summarization	`summarize=v2`	Auto-generated summary of the audio
Topic detection	`detect_topics=true`	Identifies topics discussed
Sentiment analysis	`sentiment=true`	Positive / negative / neutral per utterance
Word timestamps	always included	Start/end time for every word

All features can be combined in a single API call — a diarized, summarized, sentiment-analyzed transcript with topic labels and keyword boosting, all from the same audio bytes.

Recording methods

Method	What it does
`start_audio_recording()`	Opens the mic buffer. Recording runs in the background.
`stop_audio_recording()`	Closes the mic buffer.
`get_audio_recording()`	Returns raw audio as `bytes`.
`get_audio_recording_length()`	Returns duration in seconds.
`flush_audio_recording()`	Clears the buffer so the next recording starts fresh.

OpenHome’s normal STT/TTS pipeline keeps running while recording is active. The user can still talk to OpenHome, trigger other commands, and interact normally. Recording happens in parallel — a background capture, not a modal takeover.

Showcase examples

Five Ability ideas built on this pattern. For the full catalog of 20+, see the Simple Abilities Cookbook.

1. Meeting Notes

“Hey OpenHome, take notes.” Records the entire meeting. On “meeting finished,” sends audio to Deepgram with diarization. Returns formatted notes with speaker labels, summary, and action items.

self.capability_worker.start_audio_recording()
while True:
    user_input = await self.capability_worker.user_response()
    if "meeting finished" in user_input.lower():
        break
self.capability_worker.stop_audio_recording()
audio_bytes = self.capability_worker.get_audio_recording()
# → Send to Deepgram with diarize=true, utterances=true

2. Baby Monitor

“Hey OpenHome, listen to the nursery.” Records ambient audio on a rolling basis (e.g., 30-second windows). Analyzes each window for crying, coughing, or silence-breaking events. Alerts via TTS.

while self.is_monitoring:
    self.capability_worker.start_audio_recording()
    await self.worker.session_tasks.sleep(30)
    self.capability_worker.stop_audio_recording()
    audio_bytes = self.capability_worker.get_audio_recording()
    # Analyze with Deepgram or sound classifier
    # If event detected → alert via speak()

3. Public Speaking Coach

“Hey OpenHome, coach my presentation.” Records a practice run. Analyzes pace (words per minute from timestamps), filler count (“um”, “uh”, “like”), and sentence variation. LLM generates coaching notes.

4. Voice Journal

“Hey OpenHome, start my journal.” Records a free-form spoken entry. Transcribes with Deepgram, then LLM formats it into a clean journal entry with date, mood detection (sentiment), and key topics.

5. Noise Level Monitor

“Hey OpenHome, monitor the noise level.” Analyzes raw PCM amplitude without any external API. Reports quiet stretches and noisy spikes. Triggers focus reminders.

import struct
samples = struct.unpack(f"<{len(audio_bytes)//2}h", audio_bytes)
peak = max(abs(s) for s in samples)
rms = (sum(s**2 for s in samples) / len(samples)) ** 0.5

Why this matters

Before the hot mic, Abilities were reactive — they activated on a trigger word, had a conversation, and exited. The microphone was a command input device. Now the microphone is a sensor. It can run continuously, capture rich audio data, and feed it to external intelligence services. This turns OpenHome from a voice assistant into an ambient computing platform. Because the output is just bytes, you can send them anywhere:

Deepgram for transcription, diarization, sentiment, topics, summarization
ElevenLabs for voice cloning (already used in AI Twin)
Any sound classification API for non-speech audio events
Your own models for custom audio analysis
Local processing for amplitude analysis, silence detection, etc.

Getting Started

Integrations

Building Abilities

Best Practices

Hot Mic + Deepgram

The core pattern

What Deepgram gives you

Recording methods

Showcase examples

1. Meeting Notes

2. Baby Monitor

3. Public Speaking Coach

4. Voice Journal

5. Noise Level Monitor

Why this matters

See also

Getting Started

Integrations

Building Abilities

Best Practices

​The core pattern

​What Deepgram gives you

​Recording methods

​Showcase examples

​1. Meeting Notes

​2. Baby Monitor

​3. Public Speaking Coach

​4. Voice Journal

​5. Noise Level Monitor

​Why this matters

​See also

The core pattern

What Deepgram gives you

Recording methods

Showcase examples

1. Meeting Notes

2. Baby Monitor

3. Public Speaking Coach

4. Voice Journal

5. Noise Level Monitor

Why this matters

See also