Skip to main content

Introduction

Custom Abilities are the cornerstone of extending OpenHome’s functionality. They allow developers to:
  • Add personalized features to AI agents.
  • Integrate third-party APIs for dynamic interactions.
  • Customize logic for enhanced user engagement.
This guide walks you through:
  • Structuring and registering an Ability.
  • Using CapabilityWorker for seamless I/O management.
  • Examples showcasing how to create powerful custom Abilities.

Adding an Ability

File Structure

Each Ability resides in its folder and requires a main.py file to define the logic.
<YourAbilityFolder>
|── __ init __.py
|── README.md
└── main.py

Example File: main.py

Here’s a basic template for building a new Ability:
import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class YourCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None

    # Do not change following tag of register capability
    #{{register capability}}

    def call(
        self,
        worker: AgentWorker,
    ):
        # Your capability logic here
        return "Your Capability Called!"

Key Components

  • #{{register_capability}}: is essential.
  • call: Executes the Ability’s logic when triggered.

Understanding CapabilityWorker

The CapabilityWorker class simplifies I/O interactions, enabling:
  • Speech synthesis: Using text-to-speech (TTS).
  • Listening for user input: Capturing and processing responses.
  • Running interaction loops: Supporting conversational flows.

CapabilityWorker Quick Reference

Use these functions directly on self.capability_worker.

Conversation

Speak text to the user using the configured TTS.
async def speak(self, tokens: str, file_content: str = None):
Wait for a single user reply and return the transcription.
async def user_response(self):
Speak once, then wait for one reply.
async def run_io_loop(self, tokens: str):
Ask a yes/no question and return True or False.
async def run_confirmation_loop(self, tokens: str):
Wait for the user’s full transcription.
async def wait_for_complete_transcription(self):
Get full session message history.
def get_full_message_history(self):
Get the current user’s timezone.
def get_timezone(self):
Get linked account access token.
def get_token(self, platform: str):
Append context/instructions to the active Agent prompt.
def update_personality_agent_prompt(self, prompt_addition):
Return control back to normal Agent flow when your ability is done.
def resume_normal_flow(self):

Text Generation

Return plain text from the model (no speech).
def text_to_text_response(self, prompt_text: str, history: list = [], system_prompt: str = ""):
Alternate text generation routed through OpenRouter.
def generate_ttt_using_openrouter(self, prompt_text: str, system_prompt: str = "", history: list = []):
Web-search-backed short answer.
def llm_search(self, query: str, system_prompt: str = "", history: list = []) -> str:
Tool-calling flow.
def llm_tools(self, query: str, tools: list, system_prompt: str = "", history: list = []):

File Helpers

Read a file.
async def read_file(self, file_name: str, in_ability_directory=False):
Write content to a file.
async def write_file(self, file_name: str, content: str = None, in_ability_directory=False, mode: str = "a+"):
Delete a file.
async def delete_file(self, file_name: str, in_ability_directory=False) -> bool:
Check if a file exists.
async def check_if_file_exists(self, file_name: str, in_ability_directory=False) -> bool:
List user data files.
async def get_user_data_file_names(self) -> list:
Note: Storage Scope Usage
  • Use in_ability_directory=False for persistent user-level storage shared across abilities.
  • Use in_ability_directory=True for ability-scoped data that should remain isolated within the ability session.

Context Storage (Key-Value)

def create_key(self, key: str, value: dict):
def update_key(self, key: str, value: dict):
def delete_key(self, key: str):
def get_all_keys(self):
def get_single_key(self, key: str):

Audio and Streaming

async def text_to_speech(self, prompt, voice_id):
async def play_audio(self, file_content):
async def play_from_audio_file(self, file_name: str = None):
async def send_audio_data_in_stream(self, file_content, chunk_size=4096):
async def stream_init(self):
async def stream_end(self):
def get_audio_recording(self):
def get_audio_recording_length(self):
def flush_audio_recording(self):

WebSocket / Device Actions

async def send_data_over_websocket(self, data_type: str = "", data: dict = {}):
async def send_interrupt_signal(self):
async def send_devkit_action(self, action: str = ""):
async def send_devkit_mqtt_action(self, topic: str = "", action: str = "", value: str = "", command: str = ""):
async def send_notification_to_ios(self, title: str = "", body: str = "", time_interval: int = 1):
async def send_agent_message_without_audio(self, value: str):

Using Specific Voice IDs for Text-to-Speech

The CapabilityWorker class supports the use of specific Voice IDs for text-to-speech (TTS) functionality. This allows you to customize the voice used for speech synthesis by specifying a Voice ID from the provided list.

Available Voice IDs

You can use any of the following Voice IDs for TTS:
{
  "voices": [
    {
      "voice_id": "21m00Tcm4TlvDq8ikWAM",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "29vD33N1CtxCmqQRPOHJ",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "2EiwWnXFnvU5JabPnv8n",
      "labels": {
        "accent": "american",
        "description": "war veteran",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "5Q0t7uMcjvnagumLfvZi",
      "labels": {
        "accent": "american",
        "description": "ground reporter",
        "age": "middle aged",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "AZnzlk1XvdvUeBnXmlld",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "CYw3kZ02Hs0563khs1Fj",
      "labels": {
        "accent": "british-essex",
        "description": "conversational",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "D38z5RcWu1voky8WS1ja",
      "labels": {
        "accent": "irish",
        "description": "sailor",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "EXAVITQu4vr4xnSDxMaL",
      "labels": {
        "accent": "american",
        "description": "soft",
        "age": "young",
        "gender": "female",
        "use case": "news"
      }
    },
    {
      "voice_id": "ErXwobaYiN019PkySvjV",
      "labels": {
        "accent": "american",
        "description": "well-rounded",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "GBv7mTt0atIp3Br8iCZE",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "male",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "IKne3meq5aSn9XLyUdCD",
      "labels": {
        "accent": "australian",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "JBFqnCBsd6RMkjVDRZzb",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "LcfcDJNUP1GQjkzn1xUU",
      "labels": {
        "accent": "american",
        "description": "calm",
        "age": "young",
        "gender": "female",
        "use case": "meditation"
      }
    },
    {
      "voice_id": "MF3mGyEYCl7XYWbV9V6O",
      "labels": {
        "accent": "american",
        "description": "emotional",
        "age": "young",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "N2lVS1w4EtoT3dr4eOWO",
      "labels": {
        "accent": "american",
        "description": "hoarse",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "ODq5zmih8GrVes37Dizd",
      "labels": {
        "accent": "american",
        "description": "shouty",
        "age": "middle aged",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "SOYHLrjzK2X1ezoPC6cr",
      "labels": {
        "accent": "american",
        "description": "anxious",
        "age": "young",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "TX3LPaxmHKxFdv7VOQHJ",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "narration",
        "description ": "neutral"
      }
    },
    {
      "voice_id": "ThT5KcBeYPX3keUQqHPh",
      "labels": {
        "accent": "british",
        "description": "pleasant",
        "age": "young",
        "gender": "female",
        "use case": "children's stories"
      }
    },
    {
      "voice_id": "TxGEqnHWrfWFTfGW9XjX",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "VR6AewLTigWG4xSOukaG",
      "labels": {
        "accent": "american",
        "description": "crisp",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "XB0fDUnXU5powFXDhCwa",
      "labels": {
        "accent": "english-swedish",
        "description": "seductive",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "Xb7hH8MSUJpSbSDYk0k2",
      "labels": {
        "accent": "british",
        "description": "confident",
        "age": "middle aged",
        "gender": "female",
        "featured": "new",
        "use case": "news"
      }
    },
    {
      "voice_id": "XrExE9yKIg1WjnnlVkGX",
      "labels": {
        "accent": "american",
        "description": "warm",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "ZQe5CZNOzWyzPSCn5a3c",
      "labels": {
        "accent": "australian",
        "description": "calm ",
        "age": "old",
        "gender": "male",
        "use case": "news"
      }
    },
    {
      "voice_id": "Zlb1dXrM653N07WRdFW3",
      "labels": {
        "accent": "british",
        "age": "middle aged",
        "gender": "male",
        "use case": "news",
        "description ": "ground reporter "
      }
    },
    {
      "voice_id": "bVMeCyTHy58xNoL34h3p",
      "labels": {
        "accent": "american-irish",
        "description": "excited",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "flq6f7yk4E4fJM5XTYuZ",
      "labels": {
        "accent": "american",
        "age": "old",
        "gender": "male",
        "use case": "audiobook",
        "description ": "orotund"
      }
    },
    {
      "voice_id": "g5CIjZEefAph4nQFvHAz",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "male",
        "use case": "ASMR",
        "description ": "whisper"
      }
    },
    {
      "voice_id": "iP95p4xoKVk53GoZ742B",
      "labels": {
        "accent": "american",
        "description": "casual",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "conversational"
      }
    },
    {
      "voice_id": "jBpfuIE2acCO8z3wKNLl",
      "labels": {
        "accent": "american",
        "description": "childlish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    },
    {
      "voice_id": "jsCqWAovK2LkecY7zXl4",
      "labels": {
        "accent": "american",
        "age": "young",
        "gender": "female",
        "description ": "overhyped",
        "usecase": "video games"
      }
    },
    {
      "voice_id": "nPczCjzI2devNBz1zQrb",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "featured": "new",
        "use case": "narration"
      }
    },
    {
      "voice_id": "oWAxZDx7w5VEj9dCyTzz",
      "labels": {
        "accent": "american-southern",
        "age": "young",
        "gender": "female",
        "use case": "audiobook ",
        "description ": "gentle"
      }
    },
    {
      "voice_id": "onwK4e9ZLuTAKqWW03F9",
      "labels": {
        "accent": "british",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "news presenter"
      }
    },
    {
      "voice_id": "pFZP5JQG7iQjIQuC4Bku",
      "labels": {
        "accent": "british",
        "description": "raspy",
        "age": "middle aged",
        "gender": "female",
        "use case": "narration"
      }
    },
    {
      "voice_id": "pMsXgVXv3BLzUgSXRplE",
      "labels": {
        "accent": "american",
        "description": "pleasant",
        "age": "middle aged",
        "gender": "female",
        "use case": "interactive"
      }
    },
    {
      "voice_id": "pNInz6obpgDQGcFmaJgB",
      "labels": {
        "accent": "american",
        "description": "deep",
        "age": "middle aged",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "piTKgcLEGmPE4e6mEKli",
      "labels": {
        "accent": "american",
        "description": "whisper",
        "age": "young",
        "gender": "female",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "pqHfZKP75CvOlQylNhV4",
      "labels": {
        "accent": "american",
        "description": "strong",
        "age": "middle aged",
        "gender": "male",
        "use case": "documentary"
      }
    },
    {
      "voice_id": "t0jbNlBVZ17f02VDIeMI",
      "labels": {
        "accent": "american",
        "description": "raspy ",
        "age": "old",
        "gender": "male",
        "use case": "video games"
      }
    },
    {
      "voice_id": "yoZ06aMxZJJ28mfd3POQ",
      "labels": {
        "accent": "american",
        "description": "raspy",
        "age": "young",
        "gender": "male",
        "use case": "narration"
      }
    },
    {
      "voice_id": "z9fAnlkpzviPz146aGWa",
      "labels": {
        "accent": "american",
        "description": "witch",
        "age": "middle aged",
        "gender": "female",
        "use case": "video games"
      }
    },
    {
      "voice_id": "zcAOhNBS3c14rBihAFp1",
      "labels": {
        "accent": "english-italian",
        "description": "foreigner",
        "age": "young",
        "gender": "male",
        "use case": "audiobook"
      }
    },
    {
      "voice_id": "zrHiDhphv9ZnVXBqCLjz",
      "labels": {
        "accent": "english-swedish",
        "description": "childish",
        "age": "young",
        "gender": "female",
        "use case": "animation"
      }
    }
  ]
}

text_to_speech Function

The text_to_speech function converts the provided text into speech using the specified Voice ID and streams it to the user via WebSocket.
async def text_to_speech(self, text: str, voice_id: str):

Parameters

  • text (str): The text to be converted into speech.
  • voice_id (str): The Voice ID to be used for speech synthesis.

Advanced CapabilityWorker Functions

Audio Processing Functions

The CapabilityWorker provides comprehensive audio handling capabilities:
  • play_audio: Play audio content directly or file objects
  • play_from_audio_file: Play audio files stored in the capability directory
  • send_audio_data_in_stream: Stream processed audio data over WebSocket

Text Generation Functions

Multiple options for text generation:
  • text_to_text_response: Standard text generation with history and system prompts
  • generate_ttt_using_openrouter: Alternate text generation using OpenRouter
  • llm_search: Web-search-backed short answer
  • llm_tools: Tool-calling with the model

Streaming and Communication

Advanced communication features:
  • stream_init and stream_end: Manage audio streaming sessions
  • send_data_over_websocket: Send custom data over WebSocket
  • send_interrupt_signal: Interrupt ongoing output and hand control back to user input
  • send_agent_message_without_audio: Send a text reply without TTS
  • send_devkit_action: Trigger a Devkit action
  • send_devkit_mqtt_action: Trigger a Devkit MQTT action

Context and Session Helpers

  • get_timezone: Read the current user’s timezone for local-time-aware behavior
  • get_token: Read linked account access token for Google ("google"), Slack ("slack"), or Discord ("discord")
  • get_full_message_history: Read full session message history for context-aware responses
  • update_personality_agent_prompt: Append context/instructions to the Agent personality prompt
  • create_key / update_key / delete_key: Manage structured key-value context storage
  • get_single_key / get_all_keys: Read one or all stored context entries

Recording and Local Audio

  • get_audio_recording: Load the latest user recording bytes
  • get_audio_recording_length: Duration in seconds for the latest recording
  • flush_audio_recording: Clear the current recording before a new capture
  • play_from_audio_file: Play an audio file stored in the Ability directory

Session Task Utilities (replace raw asyncio usage)

To ensure Abilities run within the agent’s managed lifecycle, avoid using raw asyncio helpers directly.
  • Use self.worker.session_tasks.sleep(seconds: float) instead of asyncio.sleep(...):
async def some_task(self):
    await self.worker.session_tasks.sleep(1.5)
    await self.capability_worker.speak("Thanks for waiting!")
These helpers ensure proper cancellation, cleanup, and session scoping.

Background Daemon Entry Point (background.py)

Background daemons run automatically when a session starts. Use a separate background.py file with this entry signature:
def call(self, worker: AgentWorker, background_daemon_mode: bool):
    self.worker = worker
    self.background_daemon_mode = background_daemon_mode
    self.capability_worker = CapabilityWorker(self)
    self.worker.session_tasks.create(self.background_loop())
Daemon rules:
  • Keep daemon logic inside a continuous while True loop.
  • Use await self.worker.session_tasks.sleep(...) between cycles.
  • Do not call resume_normal_flow() inside daemon loops.
  • Call await self.capability_worker.send_interrupt_signal() before daemon speech/audio.

Example 1: Basic Capability

This Ability creates a daily life advisor that:
  1. Asks the user for a problem: Initiates a conversation to gather user input.
  2. Provides advice: Offers a solution based on user input.
  3. Collects feedback: Captures user satisfaction with the advice.

Code

import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class BasicCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None

    # Do not change following tag of register capability
    #{{register capability}}

    async def give_advice(self):
        await self.capability_worker.speak("Hi! I'm your daily life advisor. Tell me your problem.")
        user_problem = await self.capability_worker.user_response()

        solution_prompt = f"Provide a solution for: {user_problem}"
        solution = self.capability_worker.text_to_text_response(solution_prompt)

        user_feedback = await self.capability_worker.run_io_loop(
            solution + " Are you satisfied with the advice?"
        )
        await self.capability_worker.speak("Thank you for using the advisor.")
        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self)
        self.worker.session_tasks.create(self.give_advice())

Key Functions

  • speak: Introduces the advisor and provides the solution.
  • user_response: Captures user input (e.g., their problem).
  • run_io_loop: Combines speaking the solution and listening for feedback.
  • resume_normal_flow: Resumes the agent’s default workflow after interaction.

Example 2: Weather Capability

This Ability integrates a weather API to fetch and share weather updates based on user-provided locations.

Code

import json
from src.agent.capability import MatchingCapability
from src.main import AgentWorker
from src.agent.capability_worker import CapabilityWorker

class WeatherDocsCapability(MatchingCapability):
    worker: AgentWorker = None
    capability_worker: CapabilityWorker = None

    # Do not change following tag of register capability
    #{{register capability}}
    async def first_setup(self, location: str):
        if not location:
            await self.capability_worker.speak("Which location?")
            location = await self.capability_worker.user_response()

        geolocator = Nominatim(user_agent="weather_agent")
        loc = geolocator.geocode(location)

        if loc:
            result = requests.get(
                f"https://api.open-meteo.com/v1/forecast?latitude={loc.latitude}&longitude={loc.longitude}&current=temperature_2m"
            ).json()
            weather_report = f"The temperature in {location} is {result['current']['temperature_2m']}°C."
            await self.capability_worker.speak(weather_report)
        else:
            await self.capability_worker.speak("Invalid location. Try again.")

        self.capability_worker.resume_normal_flow()

    def call(self, worker: AgentWorker):
        self.worker = worker
        self.capability_worker = CapabilityWorker(self)
        self.worker.session_tasks.create(self.first_setup(""))

Key Features

  • External API Call: Fetches real-time weather data.
  • Geolocation: Validates and processes user-provided locations.
  • Error Handling: Provides meaningful feedback for invalid inputs.

Allowed/Disallowed Libraries and Patterns

The following imports, keywords, and patterns are not allowed in Abilities. Use the safe alternatives.

Blocked Imports and Keywords

NameWhy not allowed
redisDirect datastore coupling and security concerns; not portable across deployments.
user_configRaw config access can leak or mutate global state; use provided APIs on CapabilityWorker/worker.
printBypasses structured logging; noisy and untraceable in production; use editor_logging_handler.
open (raw)Unmanaged filesystem access; security and portability risks; prefer approved helpers/per-user storage.
Guidance:
  • Avoid direct storage/infra access. Use platform-provided helpers within CapabilityWorker/worker or request an API if needed.
  • Use the provided logging (editor_logging_handler) instead of prints.
  • For files, prefer platform abstractions and per-user capability folders; ask for an approved helper if you need persistent storage.

Security Guidance

Avoid insecure or unsafe patterns such as runtime assert checks, exec() of dynamic code, binding servers to all interfaces, hardcoded secrets, swallowing exceptions, insecure deserialization (pickle/dill/shelve/marshal), weak hashes like MD5, or weak cipher modes (e.g., ECB). If you have a special case, request approval and an approved wrapper/utility.

Conclusion

Building Abilities in OpenHome empowers developers to create custom functionalities for AI agents. With the examples like the Basic Advisor and Weather Capability, you can:
  • Core Communication: Use speak, run_io_loop, and user_response for basic interactions.
  • Advanced Audio: Play custom audio files, and stream audio data.
  • Text Generation: Leverage multiple text-to-text options with history and system prompts.
  • Voice Customization: Use specific voice IDs for varied and engaging responses.
  • External APIs: Integrate third-party services for dynamic functionality.
The examples demonstrate everything from basic conversational flows to advanced audio processing and device control. The CapabilityWorker provides all the tools needed to create sophisticated, interactive Abilities.
Start creating innovative Abilities and push the boundaries of voice AI with OpenHome! 🎉
Note: It is recommended to use the requests module to call third-party APIs and avoid using other libraries. If any other library is needed for a special case, you can request us to add it.

Example 3: Read/Write File (from example_main.py)

This is the simplest pattern for per-user storage.
class ReadwriteFileCapability(MatchingCapability):
    async def perform_action(self):
        user_response = await self.capability_worker.wait_for_complete_transcription()
        await self.capability_worker.speak("Writing last transcription to file.")

        if await self.capability_worker.check_if_file_exists(
            "temp_data.txt",
            in_ability_directory=False,
        ):
            await self.capability_worker.write_file(
                "temp_data.txt",
                "\n%s: %s" % (time(), user_response),
                in_ability_directory=False,
            )
        else:
            await self.capability_worker.write_file(
                "temp_data.txt",
                "%s: %s" % (time(), user_response),
                in_ability_directory=False,
            )

        file_data = await self.capability_worker.read_file(
            "temp_data.txt",
            in_ability_directory=False,
        )
        last_written_line = file_data.split("\n")[-1].split(":")[1]
        await self.capability_worker.speak("Last Written Line: %s" % last_written_line)

        self.capability_worker.resume_normal_flow()