SDK Reference - OpenHome

Every method below is accessed through self.capability_worker (the SDK) or self.worker (the Agent). This is the complete toolkit for building any Ability.

Twenty essential SDK methods

#	Method	What it does	Async?	Object
1	`speak(text)`	Speak text aloud using the Agent’s default voice	Yes	`cap_worker`
2	`text_to_speech(text, voice_id)`	Speak with a specific ElevenLabs voice ID	Yes	`cap_worker`
3	`user_response()`	Wait for the user’s next spoken input, returns string	Yes	`cap_worker`
4	`wait_for_complete_transcription()`	Wait until the user fully finishes speaking before returning	Yes	`cap_worker`
5	`run_io_loop(text)`	Speak text, then wait for user reply (speak + listen combo)	Yes	`cap_worker`
6	`run_confirmation_loop(text)`	Speak text, loop until user says yes or no. Returns bool	Yes	`cap_worker`
7	`text_to_text_response(prompt, history, system)`	Generate LLM text response. The only sync method — no `await`	No	`cap_worker`
8	`start_audio_recording()`	Begin recording from device mic (runs in background)	No	`cap_worker`
9	`stop_audio_recording()`	Stop the current mic recording	No	`cap_worker`
10	`get_audio_recording()`	Returns recorded audio as `.wav` bytes	No	`cap_worker`
11	`play_from_audio_file(filename)`	Play an audio file bundled with your Ability	Yes	`cap_worker`
12	`play_audio(file_content)`	Play audio from bytes or file-like object	Yes	`cap_worker`
13	`resume_normal_flow()`	Hand control back to the Personality. Required on every `main.py` exit	No	`cap_worker`
14	`send_interrupt_signal()`	Stop current assistant output. Call before daemon `speak()`	Yes	`cap_worker`
15	`write_file(name, content, temp)`	Write or append to persistent or session file storage	Yes	`cap_worker`
16	`read_file(name, temp)`	Read contents of a stored file as string	Yes	`cap_worker`
17	`check_if_file_exists(name, temp)`	Returns bool — always check before reading	Yes	`cap_worker`
18	`get_full_message_history()`	Full conversation transcript from current session	No	`cap_worker`
19	`get_timezone()`	User’s timezone string, e.g. `"America/Chicago"`	No	`cap_worker`
20	`session_tasks.create(coro)`	Launch a managed async task. Use this instead of `asyncio.create_task`	No	`worker`

Bonus methods

delete_file() · get_audio_recording_length() · flush_audio_recording() · send_data_over_websocket() · send_devkit_action() · get_token() · stream_init() / stream_end() · create_key() / update_key() / delete_key() / get_single_key() / get_all_keys() · update_personality_agent_prompt() · exec_local_command() · session_tasks.sleep()

OpenRouter models

Use OpenRouter (openrouter.ai) as a single API endpoint to access any model. Pick by job: fast/cheap for routing, multimodal for audio, high-quality for user-facing responses.

#	Model	Speed	Best for	Notes
1	`google/gemini-2.0-flash-001`	Very fast	Routing, general	Great all-rounder, cheap, supports audio input
2	`google/gemini-2.5-flash-preview`	Fast	Deep reasoning	Thinking model, more capable than 2.0 Flash
3	`google/gemini-3-flash-preview`	Fast	Audio analysis	Latest generation, strong multimodal
4	`anthropic/claude-sonnet-4`	Medium	Quality responses	Excellent reasoning and tone control
5	`anthropic/claude-haiku-4-5`	Very fast	Routing, speed	Cheapest Anthropic option, solid quality
6	`openai/gpt-4o`	Medium	General, vision	Strong all-rounder with multimodal support
7	`openai/gpt-4o-mini`	Very fast	Routing, cheap	Fast and affordable for utility tasks
8	`meta-llama/llama-3.3-70b-instruct`	Fast	Open source	Great quality, fast via Groq/Cerebras
9	`deepseek/deepseek-r1`	Slow	Deep analysis	Reasoning model, best for complex background tasks
10	`mistralai/mistral-large-latest`	Medium	Multilingual	Strong European language support

Mix models in a single Ability. Use fast/cheap (Gemini Flash, Haiku, GPT-4o-mini) for intent routing and keyword extraction. Use quality models (Claude Sonnet, GPT-4o) for user-facing spoken responses. Use multimodal (Gemini Flash/Pro) for audio analysis.

Battle-tested prompt patterns

Each prompt below is designed for voice output — short, spoken, no markdown.

1. Intent router (JSON classification)

Classify this user input. Return ONLY valid JSON, nothing else.
{"intent": "weather|timer|music|chat", "confidence": 0.0-1.0}
User: {user_input}

Use with text_to_text_response(). Always strip markdown fences before parsing JSON.

2. Persona system prompt (voice character)

You are Marcus, a brutally honest venture capitalist. You speak in short,
punchy sentences. 2-4 sentences max. No markdown, no lists. This is spoken
aloud, not a blog post. Never say "as a VC" or "in my experience".

Use as system_prompt parameter. Keep persona prompts specific about length, format, and forbidden phrases.

3. Audio analysis — Pass 1 (general)

You are an expert audio analyst. Listen carefully to this recording and
provide a detailed analysis. Describe: what type of sound, environment,
acoustic characteristics (rhythm, pitch, texture, layers), and anything
unusual. Do NOT address the user. Write as pure third-person analysis.

Use with an OpenRouter audio-capable model (Gemini Flash/Pro). Send alongside base64 WAV.

4. Audio analysis — Pass 2 (specific with context)

Here is a general analysis already completed:
{general_analysis}

Now answer this specific question about the audio: "{user_question}"
Be precise. Use timestamps and specific details where possible.

Inject Pass 1 results as context. The two-pass pattern hides latency while providing deep answers.

5. Conversational response (with history)

You are [persona] in conversation about [topic].
--- ANALYSIS ---
{analysis}
--- CONVERSATION ---
{chat_history}

The user just said: "{user_input}"
Respond in 1-3 sentences, spoken aloud. Don't repeat yourself.

Inject accumulated analysis + full chat history. Context compounds with every turn.

6. LLM-driven time parser (alarm pattern)

You are an alarm time parser. Current: {now_iso}, Timezone: {tz_name}
If day/date missing, respond: QUESTION:at what day?
If time missing, respond: QUESTION:at what time?
When complete, return ONLY valid JSON:
{"target_iso": "...", "human_time": "...", "timezone": "..."}

Loop up to 6 rounds. If response starts with QUESTION:, ask the user and continue.

7. Grocery list extractor

Extract a grocery list from this transcript. Organize by section
(produce, dairy, meat, pantry). Deduplicate and clean up.

Transcript: {transcript}
Grocery List:

Turns stream-of-consciousness rambling into structured, organized output.

8. Restart vs continue intent detection

A user is in a conversation about a sound they played. Determine if
they want to listen to a NEW sound (restart) or are asking about the
current sound (continue).

User said: "{user_input}"
Return ONLY valid JSON: {"intent": "restart or continue", "confidence": 0.0}

Two-tier approach: check fast keywords first, fall back to LLM only for ambiguous input.

9. Contextual voice assistant

You are a concise voice assistant for [domain] management.
USER: {name} | LOCATION: {city} | TIME: {current_time}

Rules: Keep responses to 2-4 sentences max. Be conversational.
Never say "as an AI" or "I don't have feelings".

Inject user context (name, location, time) for natural, personalized responses.

10. Farewell / exit summary

The conversation is ending. Here's the full history:
{history}

Give a 1-2 sentence parting thought. If the idea improved during the
conversation, acknowledge it. If not, give one last honest nudge.

Generate a contextual goodbye instead of a generic sign-off. Makes exits feel natural.

Architecture patterns

Ability categories

See Ability Types for the full breakdown.

Category	Behavior
Skill	Trigger-word Ability. User says hotword → runs a flow → exits with `resume_normal_flow()`
Agent Controlled	The Agent auto-triggers it when it can’t fully answer or needs to delegate an action
Background Daemon	Auto-starts on session. Runs continuously. Works in sleep mode. See Background Abilities
Local	Runs on DevKit outside the sandbox via `devkit_functions.py`. Enables direct hardware control and restricted libraries. See Local Ability

File structure

Type	Files	Description
Standard interactive	`main.py` only	Triggered by hotwords, runs, exits with `resume_normal_flow()`
Standalone daemon	`background.py` only	Auto-starts on session. Background monitoring, logging, note-taking
Interactive + daemon	`main.py` + `background.py`	Interactive handles user requests. Daemon monitors. Coordinate via shared files

`main.py` vs `background.py`

Aspect	`main.py`	`background.py`
`call()` signature	`call(self, worker)`	`call(self, worker, background_daemon_mode)`
CapabilityWorker init	`CapabilityWorker(self)`	`CapabilityWorker(self)`
Triggered by	User hotwords	Automatically on session start
Lifecycle	Runs once, then exits	Continuous `while True` loop
`resume_normal_flow()`	Required on every exit path	Not needed (independent thread)
Works in sleep mode	No	Yes

Core patterns

The loop template (multi-turn conversation)

Greet → loop (listen → process → respond) → exit on command. Most common pattern for interactive Abilities.

while True:
    user_input = await self.capability_worker.user_response()
    if any(word in user_input.lower() for word in EXIT_WORDS):
        break
    response = self.capability_worker.text_to_text_response(user_input)
    await self.capability_worker.speak(response)

self.capability_worker.resume_normal_flow()

The two-pass analysis pattern

Pass 1 fires in background immediately (general analysis). While it runs, the Ability talks to the user. Pass 2 fires with Pass 1 context injected, answering the user’s specific question from depth.

Pass 1: fire-and-forget via session_tasks.create(asyncio.to_thread(run_general))
Talk to user while Pass 1 runs (hides 10–15s of latency)
Pass 2: inject Pass 1 results as context, answer the specific question
Each follow-up turn fires a background re-analysis, enriching future turns

The rolling window pattern (ambient audio)

For always-on audio monitoring. Continuously record, slice the last N seconds, send to model on a fixed cadence. Fire-and-forget — never await inside the loop.

10-second window, 3-second refresh cadence
API call fires as background task, poll loop never waits
Responses arrive asynchronously and log themselves

The coordination pattern (`main.py` + `background.py`)

Main writes data to persistent file storage. Background polls that file on a timer and acts on it. This is how alarms, reminders, and scheduled tasks work.

main.py: parse user input, write to JSON file, resume_normal_flow()
background.py: poll file every 15–30 seconds, check conditions, act
Use delete + write for JSON files — append corrupts JSON
Call send_interrupt_signal() before speaking from a daemon

The pending state pattern (multi-step collection)

Track what info you’re waiting for with a dictionary. Each loop iteration checks pending state first and routes input to the correct handler.

self.pending_create = {"waiting_for": "title"}
# Next turn: user gives title → update to {"title": "X", "waiting_for": "time"}
# Next turn: user gives time → all info collected, execute action

Sandbox rules

Breaking these rules will fail the Ability scanner.

Never write register_capability() by hand — always use the platform tag
No import os, no import json at the top level outside the register block
No raw open() — use play_from_audio_file() for audio, the file storage API for data
No signal module — even in docstrings or comments, the scanner catches it
Always call resume_normal_flow() on every exit path in main.py
Use session_tasks.sleep() and session_tasks.create() — not raw asyncio
Wrap all blocking HTTP calls in asyncio.to_thread()
No print() — use editor_logging_handler
Blocked imports: redis, connection_manager, user_config, exec(), eval(), pickle

Voice UX best practices

Keep speak() to 1–2 sentences. This is voice, not text
Fill the silence: say “One sec” before any API call over 1 second
Read your speak() strings out loud before shipping
Handle messy voice input: use the LLM to extract clean data from noisy transcription
Offer exit at every loop iteration: check for “done”, “stop”, “quit”, etc.
Use run_confirmation_loop() before destructive actions (send, delete, cancel)
Idle detection: 1 empty response = keep going, 2 in a row = offer to leave
Namespace your filenames: smarthub_prefs.json not data.json
JSON persistence: always delete + write (append corrupts JSON)
API calls: always set timeout=10, wrap in try/except, speak errors to user

​Twenty essential SDK methods

​Bonus methods

​OpenRouter models

​Battle-tested prompt patterns

​1. Intent router (JSON classification)

​2. Persona system prompt (voice character)

​3. Audio analysis — Pass 1 (general)

​4. Audio analysis — Pass 2 (specific with context)

​5. Conversational response (with history)

​6. LLM-driven time parser (alarm pattern)

​7. Grocery list extractor

​8. Restart vs continue intent detection

​9. Contextual voice assistant

​10. Farewell / exit summary

​Architecture patterns

​Ability categories

​File structure

​main.py vs background.py

​Core patterns

​The loop template (multi-turn conversation)

​The two-pass analysis pattern

​The rolling window pattern (ambient audio)

​The coordination pattern (main.py + background.py)

​The pending state pattern (multi-step collection)

​Sandbox rules

​Voice UX best practices

Twenty essential SDK methods

Bonus methods

OpenRouter models

Battle-tested prompt patterns

1. Intent router (JSON classification)

2. Persona system prompt (voice character)

3. Audio analysis — Pass 1 (general)

4. Audio analysis — Pass 2 (specific with context)

5. Conversational response (with history)

6. LLM-driven time parser (alarm pattern)

7. Grocery list extractor

8. Restart vs continue intent detection

9. Contextual voice assistant

10. Farewell / exit summary

Architecture patterns

Ability categories

File structure

`main.py` vs `background.py`

Core patterns

The loop template (multi-turn conversation)

The two-pass analysis pattern

The rolling window pattern (ambient audio)

The coordination pattern (`main.py` + `background.py`)

The pending state pattern (multi-step collection)

Sandbox rules

Voice UX best practices