This is the single source of truth for everything available inside an Ability.
If a method or property isn’t listed here, it either doesn’t exist or hasn’t been documented yet.
Found something missing? Let us know on Discord.
Quick Orientation
Inside any Ability, you have access to two objects:
| Object | What it is | Access via |
|---|
self.capability_worker | The SDK — all I/O, speech, audio, LLM, files, and flow control | CapabilityWorker(self) |
self.worker | The Agent — logging, session management, memory, user connection info | Passed into call() |
Runtime Entry Points (main.py vs background.py)
| Runtime | Required file | call() signature | Lifecycle |
|---|
| Interactive Skill / Brain Skill | main.py | call(self, worker) | Triggered on demand, exits with resume_normal_flow() |
| Background Daemon | background.py | call(self, worker, background_daemon_mode) | Auto-starts on session begin, runs continuously |
background.py must be named exactly background.py to be detected as a background daemon.
Table of Contents
- Speaking / TTS
- Listening / User Input
- Combined Speak + Listen
- LLM / Text Generation
- Audio Playback
- Audio Recording
- Audio Streaming
- File Storage (User Data + Ability Directory)
- Ability Context Storage (Key-Value)
- WebSocket Communication
- Flow Control
- Logging
- Session Tasks
- User Connection Info
- Custom API Keys
- Conversation Memory & History
- Music Mode
- Common Patterns
- Appendix: What You CAN’T Do (Yet)
- Appendix: Blocked Imports
1. Speaking / TTS
speak(tokens, file_content=None)
Converts text to speech using the Agent’s default voice. Streams audio to the user.
await self.capability_worker.speak("Hello! How can I help?")
- Async: Yes (
await)
- Voice: Uses whatever voice is configured on the Agent
- Tip: Keep it to 1-2 sentences. This is voice, not text.
text_to_speech(prompt, voice_id)
Converts text to speech using a specific Voice ID (e.g., from ElevenLabs). Use when your Ability needs its own distinct voice.
await self.capability_worker.text_to_speech("Welcome aboard.", "pNInz6obpgDQGcFmaJgB")
- Async: Yes (
await)
- Voice: Overrides the Agent’s default
- See: Voice ID catalog at the bottom of this doc
user_response()
Waits for the user’s next spoken or typed input. Returns it as a string.
user_input = await self.capability_worker.user_response()
- Async: Yes (
await)
- Returns:
str — the transcribed user input
- Tip: Always check for empty strings (
if not user_input: continue)
wait_for_complete_transcription()
Waits until the user has completely finished speaking before returning the final transcription.
full_input = await self.capability_worker.wait_for_complete_transcription()
- Async: Yes (
await)
- Returns:
str — the final transcribed input
- When to use:
- Long-form input (descriptions, dictation, storytelling)
- Cases where partial STT results may break your logic
- Flows that need the entire spoken sentence before processing
- The first step of an ability when capturing the trigger sentence
Capturing the full trigger sentence
When a trigger word starts an ability immediately, this method still returns the full spoken sentence, including both the trigger phrase and the actual request.
Example trigger word: remind
User says: remind me to call Alex tomorrow at 6 PM
import re
async def first_function(self):
full_input = await self.capability_worker.wait_for_complete_transcription()
reminder_text = re.sub(r"^\s*remind\b", "", full_input, flags=re.IGNORECASE).strip()
await self.capability_worker.speak(f"Creating reminder: {reminder_text}")
In this flow:
- The ability is triggered by
remind
wait_for_complete_transcription() returns:
remind me to call Alex tomorrow at 6 PM
- The extracted request becomes:
me to call Alex tomorrow at 6 PM
3. Combined Speak + Listen
run_io_loop(tokens)
Speaks the text, then waits for the user’s response. Returns the user’s reply. A convenience wrapper around speak() + user_response().
answer = await self.capability_worker.run_io_loop("What's your favorite color?")
- Async: Yes (
await)
- Returns:
str — user’s reply
Note: Uses the Agent’s default voice (not a custom voice ID)
run_confirmation_loop(tokens)
Speaks the text (appends “Please respond with ‘yes’ or ‘no’”), then loops until the user clearly says yes or no.
confirmed = await self.capability_worker.run_confirmation_loop("Should I send this email?")
if confirmed:
# send it
- Async: Yes (
await)
- Returns:
bool — True for yes, False for no
4. LLM / Text Generation
text_to_text_response(prompt_text, history=[], system_prompt="")
Generates a text response using the configured LLM.
response = self.capability_worker.text_to_text_response(
"What's the capital of France?",
history=[
{"role": "user", "content": "Let's do geography trivia"},
{"role": "assistant", "content": "Great, I'll ask you questions!"}
],
system_prompt="You are a geography quiz host. Keep answers under 1 sentence."
)
- ⚠️ THIS IS THE ONLY SYNCHRONOUS METHOD. Do NOT use
await.
- Returns:
str — the LLM’s response
- Parameters:
prompt_text (str): The current prompt/question
history (list): Optional conversation history for multi-turn context. Each item: {"role": "user"|"assistant", "content": "..."}
system_prompt (str): Optional system prompt to control LLM behavior
- Tip: LLMs often wrap JSON in markdown fences. Always strip them:
clean = response.replace("```json", "").replace("```", "").strip()
5. Audio Playback
play_audio(file_content)
Plays audio directly from bytes or a file-like object.
import requests
audio = requests.get("https://example.com/song.mp3")
await self.capability_worker.play_audio(audio.content)
- Async: Yes (
await)
- Input:
bytes or file-like object
- Tip: For anything longer than a TTS clip, use Music Mode
play_from_audio_file(file_name)
Plays an audio file stored in the Ability’s directory (same folder as main.py).
await self.capability_worker.play_from_audio_file("notification.mp3")
- Async: Yes (
await)
- Input: Filename (string) — must be in the same folder as your
main.py
6. Audio Recording
Record audio from the user’s microphone during a session.
start_audio_recording()
Begins recording audio from the user’s mic.
self.capability_worker.start_audio_recording()
stop_audio_recording()
Stops the current audio recording.
self.capability_worker.stop_audio_recording()
get_audio_recording()
Returns the recorded audio as a .wav file.
wav_data = self.capability_worker.get_audio_recording()
get_audio_recording_length()
Returns the length/duration of the current recording.
length = self.capability_worker.get_audio_recording_length()
flush_audio_recording()
Clears the current recording buffer/file so the next recording starts fresh.
self.capability_worker.flush_audio_recording()
Recording Example
async def record_voice_note(self):
await self.capability_worker.speak("I'll record a voice note. Start speaking.")
self.capability_worker.start_audio_recording()
await self.worker.session_tasks.sleep(10) # Record for 10 seconds
self.capability_worker.stop_audio_recording()
duration = self.capability_worker.get_audio_recording_length()
wav_file = self.capability_worker.get_audio_recording()
await self.capability_worker.speak(f"Got it. Recorded {duration} of audio.")
self.capability_worker.resume_normal_flow()
7. Audio Streaming
For streaming audio in chunks rather than loading it all into memory at once.
stream_init()
Initializes an audio streaming session.
await self.capability_worker.stream_init()
send_audio_data_in_stream(file_content, chunk_size=4096)
Streams audio data in chunks. Handles mono conversion and resampling automatically.
await self.capability_worker.send_audio_data_in_stream(audio_bytes, chunk_size=4096)
- Input:
bytes, file-like object, or httpx.Response
- chunk_size: Bytes per chunk (default: 4096)
stream_end()
Ends the streaming session and cleans up.
await self.capability_worker.stream_end()
Streaming Example
async def stream_long_audio(self):
await self.capability_worker.stream_init()
response = requests.get("https://example.com/long-audio.mp3")
await self.capability_worker.send_audio_data_in_stream(response.content)
await self.capability_worker.stream_end()
8. File Storage (User Data + Ability Directory)
Use in_ability_directory to choose where the file operation runs:
in_ability_directory=False (default): user data storage (shared across that user’s abilities)
in_ability_directory=True: current Ability directory
Allowed file types: .txt, .csv, .json, .md, .log, .yaml, .yml
check_if_file_exists(file_name, in_ability_directory=False)
exists = await self.capability_worker.check_if_file_exists(
"user_prefs.json",
in_ability_directory=False
)
- Async: Yes (
await)
- Returns:
bool
write_file(file_name, content=None, in_ability_directory=False, mode="a+")
await self.capability_worker.write_file(
"user_prefs.json",
'{"theme": "dark"}',
False
)
- Async: Yes (
await)
- Modes:
mode="a+" (default, append) or mode="w" (overwrite)
- Default behavior (
a+): Appends to existing file; creates file if it doesn’t exist
read_file(file_name, in_ability_directory=False)
data = await self.capability_worker.read_file(
"user_prefs.json",
in_ability_directory=False
)
- Async: Yes (
await)
- Returns:
str
delete_file(file_name, in_ability_directory=False)
await self.capability_worker.delete_file(
"user_prefs.json",
in_ability_directory=False
)
get_user_data_file_names()
Returns filenames in user data storage.
files = await self.capability_worker.get_user_data_file_names()
- Async: Yes (
await)
- Returns:
list[str]
⚠️ The JSON Rule: Always Delete + Write, Never Append
Because write_file defaults to append mode (a+), writing JSON to an existing file can silently produce invalid JSON ({"a":1}{"a":1,"b":2}) — no error is thrown, but your file is broken and unreadable. There are two safe ways to overwrite:
- Delete + Write — explicitly delete the file first, then write the new content
mode="w" — passed as a parameter to write_file, it overwrites the file in place instead of appending
# ✅ RECOMMENDED — delete + write (explicit, safe)
async def save_json(self, filename, data):
if await self.capability_worker.check_if_file_exists(filename, False):
await self.capability_worker.delete_file(filename, False)
await self.capability_worker.write_file(filename, json.dumps(data), False)
# ✅ ALTERNATIVE — mode="w" (shorthand, but easy to forget)
await self.capability_worker.write_file("prefs.json", json.dumps(data), False, mode="w")
# ❌ WRONG — appending to JSON
await self.capability_worker.write_file("prefs.json", json.dumps(new_data), False)
# Result: {"old":"data"}{"new":"data"} ← broken JSON, no error thrown
9. Ability Context Storage (Key-Value)
CapabilityWorker includes a key-value context store for structured user/session state.
- Each key stores a JSON object (
dict) as the value.
- These methods are synchronous (
do not await).
- Great for conversation memory, user preferences, cart/session state, multi-step workflows, feature flags, and API cache metadata.
create_key(key: str, value: dict)
Creates a new key-value pair.
result = self.capability_worker.create_key(
key="user_preferences",
value={
"language": "en",
"theme": "dark",
"notifications": True
}
)
- Async: No (synchronous)
- Parameters:
key (str): Unique key
value (dict): JSON object to store
Note: If the key already exists, the backend may return an error.
update_key(key: str, value: dict)
Updates an existing key.
result = self.capability_worker.update_key(
key="user_preferences",
value={
"language": "en",
"theme": "light",
"notifications": False
}
)
- Async: No (synchronous)
- Parameters: same as
create_key
delete_key(key: str)
Deletes a key-value pair permanently.
result = self.capability_worker.delete_key("user_preferences")
- Async: No (synchronous)
- Parameters:
get_all_keys()
Returns all stored key-value pairs.
all_context = self.capability_worker.get_all_keys()
- Async: No (synchronous)
- Returns: Backend response containing all keys/values
get_single_key(key: str)
Returns one key’s stored value.
preferences = self.capability_worker.get_single_key("user_preferences")
- Async: No (synchronous)
- Parameters:
key (str): Key to retrieve
Example: Multi-Step Conversation State
# 1) Create state
self.capability_worker.create_key(
key="conversation_1234",
value={
"last_intent": "book_flight",
"destination": "Dubai",
"travel_date": "2026-04-01",
"step": "awaiting_confirmation"
}
)
# 2) Update state
self.capability_worker.update_key(
key="conversation_1234",
value={
"last_intent": "book_flight",
"destination": "Dubai",
"travel_date": "2026-04-01",
"step": "confirmed"
}
)
# 3) Read state
context = self.capability_worker.get_single_key("conversation_1234")
Best Practices
- Use descriptive keys (for example
user_123_preferences, conversation_456_state, cart_session_789).
- Always store structured JSON objects, not raw strings.
- Handle missing keys safely before update:
existing = self.capability_worker.get_single_key("user_preferences")
if existing:
self.capability_worker.update_key("user_preferences", updated_value)
else:
self.capability_worker.create_key("user_preferences", updated_value)
Choosing Between File Storage and Ability Context Storage
Use file storage when you are producing human-readable artifacts (for example notes.md, activity.log, report.txt, data.csv, user_prefs.json) that a user or developer might open in an editor to read or export, and writes are infrequent or append-only.
Use ability context storage (key-value) when you need internal, structured JSON state that your code reads and writes frequently (conversation state, carts, workflows, feature flags), especially when multiple Abilities or processes might touch the same state.
File Storage vs Ability Context Storage
| Aspect | File Storage | Ability Context Storage (Key-Value) |
|---|
| Data shape | Any allowed text format; you define the structure. | One JSON object (dict) stored under each key. |
| API style | Async file ops: read_file, write_file, delete_file, etc. | Sync key ops: create_key, update_key, delete_key, get_single_key. |
| Best for | Logs, notes, reports, markdown context, CSV/JSON exports. | Conversation/workflow state, carts, fast-changing preferences, feature flags. |
| Write pattern | Infrequent writes or append-only logs. | Frequent small reads/writes during interactions. |
| Concurrency / corruption | Be careful with JSON and multiple writers (delete-then-write or mode="w"). | Safer atomic key updates for concurrent access. |
| Rule of thumb | Use when you want a file a human might open in an editor. | Use when you want live structured state your code updates often. |
10. WebSocket Communication
send_data_over_websocket(data_type, data)
Sends structured data over WebSocket. Used for custom events (music mode, DevKit actions, etc.).
await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "on"})
- Async: Yes (
await)
- Parameters:
data_type (str): Event type identifier
data (dict): Payload
send_devkit_action(action)
Sends a hardware action to a connected DevKit device.
await self.capability_worker.send_devkit_action("led_on")
11. Flow Control
resume_normal_flow()
⚠️ CRITICAL FOR main.py SKILLS: You MUST call this when an interactive skill is done. It hands control back to the Agent. Without it, the Agent goes silent and the user has to restart the conversation.
self.capability_worker.resume_normal_flow()
- Async: No (synchronous)
- When to call: On EVERY exit path:
- End of your main logic (happy path)
- After a
break in a loop
- Inside
except blocks (error fallback)
- After timeout
- After user says “exit”/“stop”/“quit”
Checklist before shipping any Ability:
Do not call this in background.py daemon loops. Background daemons are independent threads and should keep running until session end.
send_interrupt_signal()
Sends an interrupt event to stop the current assistant output (speech/audio) and switch back to user input.
interrupt_signal = await self.capability_worker.send_interrupt_signal()
- Async: Yes (
await)
- Use case: Manual cutoffs when your Ability needs to immediately stop ongoing output and listen for fresh input
- Background daemon rule: Call this before daemon
speak(), play_audio(), or play_from_audio_file() to avoid audio overlap.
12. Logging
editor_logging_handler
Always use this. Never use print().
self.worker.editor_logging_handler.info("Something happened")
self.worker.editor_logging_handler.error("Something broke")
self.worker.editor_logging_handler.warning("Something suspicious")
self.worker.editor_logging_handler.debug("Debugging")
- Tip: Log before and after API calls so you can see what’s happening in the Live Editor:
self.worker.editor_logging_handler.info(f"Calling weather API for {city}...")
response = requests.get(url, timeout=10)
self.worker.editor_logging_handler.info(f"Weather API returned: {response.status_code}")
13. Session Tasks
OpenHome’s managed task system. Ensures async work gets properly cancelled when sessions end. Raw asyncio tasks can outlive a session — if the user hangs up or switches abilities, your task keeps running as a ghost process. session_tasks ensures everything gets cleaned up properly.
session_tasks.create(coroutine)
Launches an async task within the agent’s managed lifecycle.
self.worker.session_tasks.create(self.my_async_method())
- Use instead of:
asyncio.create_task() (which can leak tasks)
session_tasks.sleep(seconds)
Pauses execution for the specified duration.
await self.worker.session_tasks.sleep(5.0)
- Use instead of:
asyncio.sleep() (which can’t be cleanly cancelled)
- Daemon best practice: Background
background.py loops should always use this for polling intervals.
14. User Connection Info
get_timezone()
Returns the timezone for the active user/session when available.
timezone = self.capability_worker.get_timezone()
- Async: No (synchronous)
- Returns: Timezone string (e.g.
America/Chicago) or empty/None when unavailable
- Use case: Time-aware scheduling, local date/time formatting, reminders
- Common daemon use: Alarm/reminder checks aligned to the user’s local timezone
Returns the linked account access token for the current user.
token = self.capability_worker.get_token("google")
self.worker.editor_logging_handler.info(token)
- Async: No (synchronous)
- Parameters:
platform (str): Platform name. Supported values: Google ("google"), Slack ("slack"), Discord ("discord"), Microsoft ("microsoft"), Tesla ("tesla")
- Returns: Access token string for that linked platform
- Use case: Calling Google/Slack/Discord/Microsoft/Tesla APIs on behalf of the linked user account
user_socket.client.host
The user’s public IP address at connection time.
user_ip = self.worker.user_socket.client.host
self.worker.editor_logging_handler.info(f"User connected from: {user_ip}")
- Use case: IP-based geolocation, timezone detection, personalization
- Tip: Cloud/datacenter IPs won’t give useful location data. Check the ISP name
for keywords like
"amazon", "aws", "google cloud" before using for geolocation.
Example: IP Geolocation
import requests
def get_user_location(self):
"""Get user's city and timezone from their IP address."""
try:
ip = self.worker.user_socket.client.host
resp = requests.get(f"http://ip-api.com/json/{ip}", timeout=5)
if resp.status_code == 200:
data = resp.json()
if data.get("status") == "success":
isp = data.get("isp", "").lower()
cloud_indicators = ["amazon", "aws", "google", "microsoft", "azure", "digitalocean"]
if any(c in isp for c in cloud_indicators):
self.worker.editor_logging_handler.warning("Cloud IP detected, location may be inaccurate")
return None
return {
"city": data.get("city"),
"region": data.get("regionName"),
"country": data.get("country"),
"timezone": data.get("timezone"),
"lat": data.get("lat"),
"lon": data.get("lon"),
}
except Exception as e:
self.worker.editor_logging_handler.error(f"Geolocation error: {e}")
return None
15. Custom API Keys
Overview
Use Custom API Keys for third-party services that are not natively integrated with OpenHome.
Keys are stored securely in backend storage, values are managed in Settings → API Keys, and abilities read them at runtime via get_api_keys() instead of hardcoding secrets.
get_api_keys()
Retrieves all custom API keys and passwords stored securely in backend storage for the current account.
Returns a dict that maps each declared key name to the value set in Settings.
keys = self.capability_worker.get_api_keys()
gmail_password = keys.get("gmail_app_password")
self.worker.editor_logging_handler.info(keys)
- Async: No (synchronous)
- Returns:
dict — all key/value pairs stored by this user
Key Flow
- For Developers (2 ways to add custom API keys):
Ability Behaviour → API Keys: add key names while creating/editing the ability.
Settings → API Keys → Edit: add keys directly from settings.
- Important developer rule: Adding a key in Ability Behaviour adds the key name, but the value must still be set in
Settings → API Keys → Edit.
- For Users (install flow): When the ability is published to the marketplace, highlighted API key tags are used to notify users that these custom keys are required.
- User setup: Those required key names are auto-added to the user’s
Settings → API Keys, and the user sets the values there.
- If tags are not highlighted: Users may not know which custom API keys are required.
- Runtime: The ability reads key values from backend storage with
get_api_keys().
Runtime Example
async def call(self, worker):
self.worker = worker
self.capability_worker = CapabilityWorker(self)
keys = self.capability_worker.get_api_keys()
required_keys = ["openai_api_key", "sendgrid_api_key"]
missing = [k for k in required_keys if not keys.get(k)]
if missing:
missing_list = ", ".join(missing)
await self.capability_worker.speak(
f"I'm missing required keys: {missing_list}. "
"Please set them in Settings under API Keys."
)
self.capability_worker.resume_normal_flow()
return
openai_key = keys.get("openai_api_key")
sendgrid_key = keys.get("sendgrid_api_key")
self.worker.editor_logging_handler.info("Required API keys found. Calling service...")
# Use openai_key and sendgrid_key here
get_api_keys() vs get_token()
| Aspect | get_api_keys() | get_token(platform) |
|---|
| What it returns | User-defined custom keys/passwords | OAuth access token for a linked platform |
| Set by | Key names can be declared by developer tags in Ability Behaviour; values are set in Settings (manual or install-triggered) | OAuth flow when linking a Google/Slack/Discord/Microsoft/Tesla account |
| Supports | Any third-party service (OpenAI, SendGrid, Twilio, etc.) | Google, Slack, Discord, Microsoft and Tesla only |
| Format | dict of all key/value pairs | Single token string for one platform |
| Declared by developer | Yes — in Ability Behaviour → API Keys panel | No — platform is specified in code |
Use get_api_keys() for any raw API key or password your Ability needs that
isn’t covered by the built-in OAuth integrations.
Never hardcode secrets in your Ability code. Any key baked into main.py
is visible to anyone with access to the source. Always retrieve secrets at
runtime via get_api_keys().
Naming convention: Use lowercase snake_case for key names — e.g.
openai_api_key, twilio_auth_token, sendgrid_api_key. Document the exact
names your Ability expects so users know precisely what to enter.
16. Conversation Memory & History
get_full_message_history()
Access the full conversation message history from the current session through CapabilityWorker.
history = self.capability_worker.get_full_message_history()
self.worker.editor_logging_handler.info(f"Messages so far: {len(history)}")
- Returns: The complete message history for the active session
- Use case: Building context-aware abilities that know what was said before the ability was triggered
- Common daemon use: Live conversation monitoring for note-taking, summarization, and event detection
delete_conversation_history()
You can use self.capability_worker.delete_conversation_history() to delete your conversation history.
self.capability_worker.delete_conversation_history()
update_personality_agent_prompt(prompt_addition)
Append additional instructions/context to the active Agent personality prompt.
self.capability_worker.update_personality_agent_prompt(
"The user prefers concise answers and metric units."
)
- Async: No (synchronous)
- Use case: Persist behavior/context updates into the Agent’s prompt for later turns
Maintaining History in a Looping Ability
The text_to_text_response method accepts a history parameter. Use it to maintain multi-turn conversation context:
self.history = []
async def main_loop(self):
system = "You are a helpful cooking assistant. Keep answers under 2 sentences."
while True:
user_input = await self.capability_worker.user_response()
if "exit" in user_input.lower():
break
self.history.append({"role": "user", "content": user_input})
response = self.capability_worker.text_to_text_response( # No await!
user_input,
history=self.history,
system_prompt=system
)
self.history.append({"role": "assistant", "content": response})
await self.capability_worker.speak(response)
self.capability_worker.resume_normal_flow()
Passing Context Back After resume_normal_flow()
After an Ability finishes, you can carry context forward in a few ways. When resume_normal_flow() fires, direct execution returns to the Agent.
What you CAN do:
- Save to conversation history — Anything spoken during the Ability (via
speak()) becomes part of the conversation history, which the Agent’s LLM can see in subsequent turns.
- Update the Agent prompt — Use
update_personality_agent_prompt(prompt_addition) to append durable instructions/context to the Agent’s personality prompt.
- Use file storage — Write data to persistent files (see File Storage) that other Abilities can read later. The Agent itself won’t read these files directly, but your Abilities can share data through them.
- Memory feature — OpenHome has a new memory feature that can persist user context. (Details TBD as this feature evolves.)
What you CANNOT do (yet):
- Silently inject hidden conversation-history entries without speaking them
- Inject arbitrary structured runtime objects directly into the Agent’s LLM context without using prompt/history/file mechanisms
17. Music Mode
When playing audio that’s longer than a TTS utterance (music, sound effects, long recordings), you need to signal the system to stop listening and not interrupt.
Full Pattern
async def play_track(self, audio_bytes):
# 1. Enter music mode (system stops listening, won't interrupt)
self.worker.music_mode_event.set()
await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "on"})
# 2. Play the audio
await self.capability_worker.play_audio(audio_bytes)
# 3. Exit music mode (system resumes listening)
await self.capability_worker.send_data_over_websocket("music-mode", {"mode": "off"})
self.worker.music_mode_event.clear()
What happens if you skip Music Mode: The system may try to transcribe the audio playback as user speech, or interrupt the playback thinking the user is talking.
18. Common Patterns
LLM as Intent Router
Use the LLM to classify user intent and route to different actions:
def classify_intent(self, user_input: str) -> dict:
prompt = (
"Classify this user input. Return ONLY valid JSON.\n"
'{"intent": "weather|timer|music|chat", "confidence": 0.0-1.0}\n\n'
f"User: {user_input}"
)
raw = self.capability_worker.text_to_text_response(prompt) # No await!
clean = raw.replace("```json", "").replace("```", "").strip()
try:
return json.loads(clean)
except json.JSONDecodeError:
return {"intent": "chat", "confidence": 0.0}
Error Handling for Voice
Always speak errors to the user and always resume:
async def do_something(self):
try:
response = requests.get("https://api.example.com/data", timeout=10)
if response.status_code == 200:
data = response.json()
await self.capability_worker.speak(f"Here's what I found: {data['result']}")
else:
await self.capability_worker.speak("Sorry, I couldn't get that information right now.")
except Exception as e:
self.worker.editor_logging_handler.error(f"API error: {e}")
await self.capability_worker.speak("Something went wrong. Let me hand you back.")
self.capability_worker.resume_normal_flow() # ALWAYS called
Using a Custom Voice
ABILITY_VOICE_ID = "pNInz6obpgDQGcFmaJgB" # Deep, American, male narration voice
async def speak(self, text: str):
await self.capability_worker.text_to_speech(text, ABILITY_VOICE_ID)
Voice ID Quick Reference
Use with text_to_speech(text, voice_id) to give your Ability its own voice.
| Voice ID | Accent | Gender | Tone | Good For |
|---|
21m00Tcm4TlvDq8ikWAM | American | Female | Calm | Narration |
EXAVITQu4vr4xnSDxMaL | American | Female | Soft | News |
XrExE9yKIg1WjnnlVkGX | American | Female | Warm | Audiobook |
pMsXgVXv3BLzUgSXRplE | American | Female | Pleasant | Interactive |
ThT5KcBeYPX3keUQqHPh | British | Female | Pleasant | Children |
ErXwobaYiN019PkySvjV | American | Male | Well-rounded | Narration |
GBv7mTt0atIp3Br8iCZE | American | Male | Calm | Meditation |
TxGEqnHWrfWFTfGW9XjX | American | Male | Deep | Narration |
pNInz6obpgDQGcFmaJgB | American | Male | Deep | Narration |
onwK4e9ZLuTAKqWW03F9 | British | Male | Deep | News |
D38z5RcWu1voky8WS1ja | Irish | Male | Sailor | Games |
IKne3meq5aSn9XLyUdCD | Australian | Male | Casual | Conversation |
Full catalog with 40+ voices available in the OpenHome Dashboard.
19. Appendix: What You CAN’T Do (Yet)
Being explicit about limitations saves developers hours of guessing:
| You might want to… | Status |
|---|
| Directly replace the full Agent system prompt from an Ability | ⚠️ Not supported — use update_personality_agent_prompt(prompt_addition) to append instructions |
Pass structured data back to the Agent after resume_normal_flow() | ❌ Not possible — use conversation history, prompt updates, or file storage as workarounds |
| Access other Abilities from within an Ability | ❌ Not supported |
| Run background tasks for the active session | ✅ Supported via background.py background daemons |
| Keep tasks alive after the session ends | ❌ Not supported — session tasks are cancelled on session end |
| Access a database directly (Redis, SQL, etc.) | ❌ Blocked — use File Storage API instead |
Use print() | ❌ Blocked — use editor_logging_handler |
Use asyncio.sleep() or asyncio.create_task() | ❌ Blocked — use session_tasks |
Use open() for raw file access | ❌ Blocked — use File Storage API |
Import redis, user_config | ❌ Blocked |
20. Appendix: Blocked Imports
These will cause your Ability to be rejected by the sandbox:
| Import | Why | Use Instead |
|---|
redis | Direct datastore coupling | File Storage API |
user_config | Can leak global state | File Storage API |
Also avoid: exec(), eval(), pickle, dill, shelve, marshal, hardcoded secrets, MD5, ECB cipher mode.
Last updated: March 2026
Found an undocumented method? Report it on Discord so we can add it here.