Skip to main content

Introduction

Custom Abilities are the cornerstone of extending OpenHome’s functionality. They allow developers to:
  • Add personalized features to AI agents.
  • Integrate third-party APIs for dynamic interactions.
  • Customize logic for enhanced user engagement.
This guide walks you through:
  • Structuring and registering an Ability.
  • Using CapabilityWorker for seamless I/O management.
  • Examples showcasing how to create powerful custom Abilities.

Adding an Ability

File Structure

Each Ability resides in its folder and requires a main.py file to define the logic.
<YourAbilityFolder>

└── main.py

Example File: main.py

Here’s a basic template for building a new Ability:
import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class YourCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None
    
    # Do not change following tag of register capability
    #{{register capability}}

    def call(
        self,
        worker: AgentWorker,
    ):
        # Your capability logic here
        return "Your Capability Called!"

Key Components

  • #{{register_capability}}: is essential.
  • call: Executes the Ability’s logic when triggered.

Understanding CapabilityWorker

The CapabilityWorker class simplifies I/O interactions, enabling:
  • Speech synthesis: Using text-to-speech (TTS).
  • Listening for user input: Capturing and processing responses.
  • Running interaction loops: Supporting conversational flows.

CapabilityWorker Quick Reference

Use these functions directly on self.capability_worker.

Conversation

Speak text to the user using the configured TTS.
async def speak(self, tokens: str, file_content: str = None):
Wait for a single user reply and return the transcription.
async def user_response(self):
Speak once, then wait for one reply. Good for simple question/answer steps.
async def run_io_loop(self, tokens: str):
Ask a yes/no question and return True or False.
async def run_confirmation_loop(self, tokens: str):
Wait for the current user speech to finish before reading the final text.
async def wait_for_complete_transcription(self):
Call at the end of long-running abilities to let the agent continue normal flow.
def resume_normal_flow(self):

Text Generation

Return plain text from the model. It does not speak the response.
def text_to_text_response(self, prompt_text: str, history: list = [], system_prompt: str = ""):
Same idea as above, but routed through OpenRouter.
def generate_ttt_using_openrouter(self, prompt_text: str, system_prompt: str = "", history: list = []):
Short, web-search-backed answer. Use when you need up-to-date facts.
def llm_search(self, query: str, system_prompt: str = "", history: list = []) -> str:
Tool-calling flow. Returns the model message (tool call or final answer).
def llm_tools(self, query: str, tools: list, system_prompt: str = "", history: list = []):

File Helpers (per-user storage)

Read a file. in_ability_directory=False uses per-user storage; True uses the Ability folder.
async def read_file(self, file_name: str, in_ability_directory=False):
Append text to a file in per-user storage or the Ability folder.
async def write_file(self, file_name: str, content: str = None, in_ability_directory=False):
Delete a file.
async def delete_file(self, file_name: str, in_ability_directory=False) -> bool:
Check if a file exists.
async def check_if_file_exists(self, file_name: str, in_ability_directory=False) -> bool:

Audio Helpers

Speak with a specific voice ID.
async def text_to_speech(self, prompt, voice_id):
Play audio from bytes or a file-like object.
async def play_audio(self, file_content):
Play audio from a file in the Ability folder.
async def play_from_audio_file(self, file_name: str = None):
Stream audio over WebSocket in chunks.
async def send_audio_data_in_stream(self, file_content: bytes | IO[bytes] | httpx.Response, chunk_size=4096):
Start an audio streaming session.
async def stream_init(self):
End the audio streaming session.
async def stream_end(self):
Play a local audio file by path.
def play_audio_file(self, file_path: str):
Return the latest user recording bytes (if available) that is in wav format.
def get_audio_recording(self):
Return the recording length in seconds (if available).
def get_audio_recording_length(self):

WebSocket / Device Actions

Send structured data to the client WebSocket.
async def send_data_over_websocket(self, data_type: str = "", data: dict = {}):
Trigger a Devkit action.
async def send_devkit_action(self, action: str = ""):
Trigger a Devkit MQTT action.
async def send_devkit_mqtt_action(self, topic: str = "", action: str = "", value: str = "", command: str = ""):
Send an iOS notification.
async def send_notification_to_ios(self, title: str = "", body: str = "", time_interval: int = 1):
Send a text reply without TTS.
async def send_agent_message_without_audio(self, value: str):

Using Specific Voice IDs for Text-to-Speech

The CapabilityWorker class supports the use of specific Voice IDs for text-to-speech (TTS) functionality. This allows you to customize the voice used for speech synthesis by specifying a Voice ID from the provided list.

Available Voice IDs

You can use any of the following Voice IDs for TTS:
{
  "voices": [
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "29vD33N1CtxCmqQRPOHJ",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "2EiwWnXFnvU5JabPnv8n",
      "labels": {
        "accent": "american",
        "description": "war veteran",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "5Q0t7uMcjvnagumLfvZi",
      "labels": {
        "accent": "american",
        "description": "ground reporter",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "CYw3kZ02Hs0563khs1Fj",
      "labels": {
        "accent": "british-essex",
        "description": "conversational",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "D38z5RcWu1voky8WS1ja",
      "labels": {
        "accent": "irish",
        "description": "sailor",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "labels": {
        "accent": "american",
        "description": "soft",
        "age": "young",
        "gender": "female",
        "use case": "news"
      }
    },
    {
      "voice_id": "ErXwobaYiN019PkySvjV",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "GBv7mTt0atIp3Br8iCZE",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "male",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "IKne3meq5aSn9XLyUdCD",
      "labels": {
        "accent": "australian",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "JBFqnCBsd6RMkjVDRZzb",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "LcfcDJNUP1GQjkzn1xUU",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "MF3mGyEYCl7XYWbV9V6O",
      "labels": {
        "accent": "american",
        "description": "emotional",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "N2lVS1w4EtoT3dr4eOWO",
      "labels": {
        "accent": "american",
        "description": "hoarse",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "ODq5zmih8GrVes37Dizd",
      "labels": {
        "accent": "american",
        "description": "shouty",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "SOYHLrjzK2X1ezoPC6cr",
      "labels": {
        "accent": "american",
        "description": "anxious",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "TX3LPaxmHKxFdv7VOQHJ",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "narration",
        "description ": "neutral"
      }
    },
    {
      "voice_id": "ThT5KcBeYPX3keUQqHPh",
      "labels": {
        "accent": "british",
        "description": "pleasant",
        "age": "young",
        "gender": "female",
        "use case": "children's stories"
      }
    },
    {
      "voice_id": "TxGEqnHWrfWFTfGW9XjX",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "VR6AewLTigWG4xSOukaG",
      "labels": {
        "accent": "american",
        "description": "crisp",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "XB0fDUnXU5powFXDhCwa",
      "labels": {
        "accent": "english-swedish",
        "description": "seductive",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "Xb7hH8MSUJpSbSDYk0k2",
      "labels": {
        "accent": "british",
        "description": "confident",
        "age": "middle aged",
        "gender": "female",
        "featured": "new",
        "use case": "news"
      }
    },
    {
      "voice_id": "XrExE9yKIg1WjnnlVkGX",
      "labels": {
        "accent": "american",
        "description": "warm",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "ZQe5CZNOzWyzPSCn5a3c",
      "labels": {
        "accent": "australian",
        "description": "calm ",
        "age": "old",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "Zlb1dXrM653N07WRdFW3",
      "labels": {
        "accent": "british",
        "age": "middle aged",
        "gender": "male",
        "use case": "news",
        "description ": "ground reporter "
      }
    },
    {
      "voice_id": "bVMeCyTHy58xNoL34h3p",
      "labels": {
        "accent": "american-irish",
        "description": "excited",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "flq6f7yk4E4fJM5XTYuZ",
      "labels": {
        "accent": "american",
        "age": "old",
        "gender": "male",
        "use case": "audiobook",
        "description ": "orotund"
      }
    },
    {
      "voice_id": "g5CIjZEefAph4nQFvHAz",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "ASMR",
        "description ": "whisper"
      }
    },
    {
      "voice_id": "iP95p4xoKVk53GoZ742B",
      "labels": {
        "accent": "american",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "jBpfuIE2acCO8z3wKNLl",
      "labels": {
        "accent": "american",
        "description": "childlish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    },
    {
      "voice_id": "jsCqWAovK2LkecY7zXl4",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "female",
        "description ": "overhyped",
        "usecase": "video games"
      }
    },
    {
      "voice_id": "nPczCjzI2devNBz1zQrb",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "narration"
      }
    },
    {
      "voice_id": "oWAxZDx7w5VEj9dCyTzz",
      "labels": {
        "accent": "american-southern",
        "age": "young",
        "gender": "female",
        "use case": "audiobook ",
        "description ": "gentle"
      }
    },
    {
      "voice_id": "onwK4e9ZLuTAKqWW03F9",
      "labels": {
        "accent": "british",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "news presenter"
      }
    },
    {
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "pMsXgVXv3BLzUgSXRplE",
      "labels": {
        "accent": "american",
        "description": "pleasant",
        "age": "middle aged",
        "gender": "female",
        "use case": "interactive"
      }
    },
    {
      "voice_id": "pNInz6obpgDQGcFmaJgB",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "piTKgcLEGmPE4e6mEKli",
      "labels": {
        "accent": "american",
        "description": "whisper",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "pqHfZKP75CvOlQylNhV4",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "middle aged",
        "gender": "male",
        "use case": "documentary"
      }
    },
    {
      "voice_id": "t0jbNlBVZ17f02VDIeMI",
      "labels": {
        "accent": "american",
        "description": "raspy ",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "yoZ06aMxZJJ28mfd3POQ",
      "labels": {
        "accent": "american",
        "description": "raspy",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "z9fAnlkpzviPz146aGWa",
      "labels": {
        "accent": "american",
        "description": "witch",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "zcAOhNBS3c14rBihAFp1",
      "labels": {
        "accent": "english-italian",
        "description": "foreigner",
        "age": "young",
        "gender": "male",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "zrHiDhphv9ZnVXBqCLjz",
      "labels": {
        "accent": "english-swedish",
        "description": "childish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    }
  ]
}

text_to_speech Function

The text_to_speech function converts the provided text into speech using the specified Voice ID and streams it to the user via WebSocket.
async def text_to_speech(self, text: str, voice_id: str):

Parameters

  • text (str): The text to be converted into speech.
  • voice_id (str): The Voice ID to be used for speech synthesis.

Advanced CapabilityWorker Functions

Audio Processing Functions

The CapabilityWorker provides comprehensive audio handling capabilities:
  • play_audio: Play audio content directly or file objects
  • play_from_audio_file: Play audio files stored in the capability directory
  • send_audio_data_in_stream: Stream processed audio data over WebSocket

Text Generation Functions

Multiple options for text generation:
  • text_to_text_response: Standard text generation with history and system prompts
  • generate_ttt_using_openrouter: Alternate text generation using OpenRouter
  • llm_search: Web-search-backed short answer
  • llm_tools: Tool-calling with the model

Streaming and Communication

Advanced communication features:
  • stream_init and stream_end: Manage audio streaming sessions
  • send_data_over_websocket: Send custom data over WebSocket
  • send_agent_message_without_audio: Send a text reply without TTS
  • send_devkit_action: Trigger a Devkit action
  • send_devkit_mqtt_action: Trigger a Devkit MQTT action

Recording and Local Audio

  • get_audio_recording: Load the latest user recording bytes
  • get_audio_recording_length: Duration in seconds for the latest recording
  • play_audio_file: Play a local audio file by path in your Ability directory

Session Task Utilities (replace raw asyncio usage)

To ensure Abilities run within the agent’s managed lifecycle, avoid using raw asyncio helpers directly.
  • Use self.worker.session_tasks.sleep(seconds: float) instead of asyncio.sleep(...):
async def some_task(self):
    await self.worker.session_tasks.sleep(1.5)
    await self.capability_worker.speak("Thanks for waiting!")
These helpers ensure proper cancellation, cleanup, and session scoping.

Example 1: Basic Capability

This Ability creates a daily life advisor that:
  1. Asks the user for a problem: Initiates a conversation to gather user input.
  2. Provides advice: Offers a solution based on user input.
  3. Collects feedback: Captures user satisfaction with the advice.

Code

import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class BasicCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None
    
    # Do not change following tag of register capability
    #{{register capability}}

    async def give_advice(self):
        await self.capability_worker.speak("Hi! I'm your daily life advisor. Tell me your problem.")
        user_problem = await self.capability_worker.user_response()

        solution_prompt = f"Provide a solution for: {user_problem}"
        solution = self.capability_worker.text_to_text_response(solution_prompt)

        user_feedback = await self.capability_worker.run_io_loop(
            solution + " Are you satisfied with the advice?"
        )
        await self.capability_worker.speak("Thank you for using the advisor.")
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self.worker)
        self.worker.session_tasks.create(self.give_advice())

Key Functions

  • speak: Introduces the advisor and provides the solution.
  • user_response: Captures user input (e.g., their problem).
  • run_io_loop: Combines speaking the solution and listening for feedback.
  • resume_normal_flow: Resumes the agent’s default workflow after interaction.

Example 2: Weather Capability

This Ability integrates a weather API to fetch and share weather updates based on user-provided locations.

Code

import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class WeatherDocsCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None
    
    # Do not change following tag of register capability
    #{{register capability}}
    async def first_setup(self, location: str):
        if not location:
            await self.capability_worker.speak("Which location?")
            location = await self.capability_worker.user_response()

        geolocator = Nominatim(user_agent="weather_agent")
        loc = geolocator.geocode(location)

        if loc:
            result = requests.get(
                f"https://api.open-meteo.com/v1/forecast?latitude={loc.latitude}&longitude={loc.longitude}&current=temperature_2m"
            ).json()
            weather_report = f"The temperature in {location} is {result['current']['temperature_2m']}°C."
            await self.capability_worker.speak(weather_report)
        else:
            await self.capability_worker.speak("Invalid location. Try again.")
        
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self.worker)
        self.worker.session_tasks.create(self.first_setup(""))

Key Features

  • External API Call: Fetches real-time weather data.
  • Geolocation: Validates and processes user-provided locations.
  • Error Handling: Provides meaningful feedback for invalid inputs.

Allowed/Disallowed Libraries and Patterns

The following imports, keywords, and patterns are not allowed in Abilities. Use the safe alternatives.

Blocked Imports and Keywords

NameWhy not allowed
redisDirect datastore coupling and security concerns; not portable across deployments.
RedisHandler (src.utils.db_handler)Bypasses platform abstractions; risks data integrity and sandbox boundaries.
connection_managerCentral system control; direct use from Abilities breaks isolation and multi-tenant safety.
user_configRaw config access can leak or mutate global state; use provided APIs on CapabilityWorker/worker.
printBypasses structured logging; noisy and untraceable in production; use editor_logging_handler.
open (raw)Unmanaged filesystem access; security and portability risks; prefer approved helpers/per-user storage.
Guidance:
  • Avoid direct storage/infra access. Use platform-provided helpers within CapabilityWorker/worker or request an API if needed.
  • Use the provided logging (editor_logging_handler) instead of prints.
  • For files, prefer platform abstractions and per-user capability folders; ask for an approved helper if you need persistent storage.

Security Guidance

Avoid insecure or unsafe patterns such as runtime assert checks, exec() of dynamic code, binding servers to all interfaces, hardcoded secrets, swallowing exceptions, insecure deserialization (pickle/dill/shelve/marshal), weak hashes like MD5, or weak cipher modes (e.g., ECB). If you have a special case, request approval and an approved wrapper/utility.

Conclusion

Building Abilities in OpenHome empowers developers to create custom functionalities for AI agents. With the examples like the Basic Advisor and Weather Capability, you can:
  • Core Communication: Use speak, run_io_loop, and user_response for basic interactions.
  • Advanced Audio: Play custom audio files, and stream audio data.
  • Text Generation: Leverage multiple text-to-text options with history and system prompts.
  • Voice Customization: Use specific voice IDs for varied and engaging responses.
  • External APIs: Integrate third-party services for dynamic functionality.
The examples demonstrate everything from basic conversational flows to advanced audio processing and device control. The CapabilityWorker provides all the tools needed to create sophisticated, interactive Abilities.
Start creating innovative Abilities and push the boundaries of voice AI with OpenHome! 🎉
Note: It is recommended to use the requests module to call third-party APIs and avoid using other libraries. If any other library is needed for a special case, you can request us to add it.

Example 3: Read/Write File (from example_main.py)

This is the simplest pattern for per-user storage.
class ReadwriteFileCapability(MatchingCapability):
    async def perform_action(self):
        user_response = await self.capability_worker.wait_for_complete_transcription()
        await self.capability_worker.speak("Writing last transcription to file.")

        if await self.capability_worker.check_if_file_exists("temp_data.txt", False):
            await self.capability_worker.write_file(
                "temp_data.txt",
                "\n%s: %s" % (time(), user_response),
                False,
            )
        else:
            await self.capability_worker.write_file(
                "temp_data.txt",
                "%s: %s" % (time(), user_response),
                False,
            )

        file_data = await self.capability_worker.read_file("temp_data.txt", False)
        last_written_line = file_data.split("\n")[-1].split(":")[1]
        await self.capability_worker.speak("Last Written Line: %s" % last_written_line)

        self.capability_worker.resume_normal_flow()