- Description and Purpose: Defines the Agent’s role and how it behaves within the chosen LLM.
- Voice: Tailored to best represent the Agent, aligning with your preferences or project needs.
- Dynamic Feedback: Agents evolve based on user interactions, learning from conversations to provide more personalized responses over time.
How Agents Work
Every Agent is powered by three core modules working in sequence:| Module | Role |
|---|---|
| STT (Speech-to-Text) | Converts your spoken input into text |
| TTT (Text-to-Text / LLM) | Processes the text and generates a response — OpenHome supports 20+ LLMs |
| TTS (Text-to-Speech) | Converts the response back into natural, human-like speech |
The Full Workflow
- Speech Input — The system listens for voice commands, initiated by a cold-start message.
- STT Transcription — Your speech is converted into text.
- LLM Processing — The transcribed text is sent to the designated LLM, which generates a response using the Agent’s prompt, conversation history, and any injected memory context.
- TTS Synthesis — The response is spoken back using the Agent’s configured voice.

Agents vs. Abilities
Agents are the voice personality — they speak, listen, and respond. Abilities are the skills attached to an Agent that give it superpowers: fetching data, controlling devices, running background tasks, and more. An Agent without Abilities is still a fully functional conversational character. Abilities extend what that character can do.What Makes a Good Agent
- A clear purpose and personality defined in the description prompt
- A voice that matches the character’s tone
- A short, natural starting message
- Well-chosen LLM and STT/TTS providers for the use case

