βοΈ Steps To Make It Work
This guide explains only the essential settings.
You can find tooltips for each field directly in the Unity Editor.
Step 1. π§ͺ Settings
Go to UnityNeuroSpeech β Create Settings in the Unity toolbar.
Default settings are recommended.
Donβt forget to click the button (same applies for every step)!
Step 2. π UNS Manager
UnityNeuroSpeech Manager is a GameObject in your scene that controls all non-agent scripts.
Without it, no agent (talkable AI) will work.
Create a Dropdown
in your scene.
Then go to UnityNeuroSpeech β Create UNS Manager.
The important setting there is:
- Whisper model path in StreamingAssets β path to your downloaded Whisper model (
.bin
) inside theStreamingAssets
folder (without theAssets
directory).
Example:
If the full path is
Assets/StreamingAssets/UnityNeuroSpeech/Whisper/ggml-medium.bin
then you should enter
UnityNeuroSpeech/Whisper/ggml-medium.bin
Step 3. π§ Agent
An Agent in UnityNeuroSpeech is a GameObject that can listen, respond, and talk using LLMs.
Once you create your first agent, youβll be able to talk with your AI!
Add a Button
and an AudioSource
to your scene.
Then go to UnityNeuroSpeech β Create Agent.
Here are some important settings:
-
Agent index β the index mentioned in the Quick Start.
It links an agent to its voice file.
β οΈ Each agent must have a unique index! -
Emotions β the AI can respond with emotion tags.
Example:
β How are you, DeepSeek?
β <happy> Iβm feeling grateful. What about you?
The word inside< >
is the emotion chosen by the AI.
Emotions are used for monitoring via the Agent API.
The system prompt (generated automatically by UNS) defines how emotions are used. -
Actions β optional behavior tags like
"turn_off_lights"
,"enable_cutscene_123"
,"play_horror_sound"
, etc.
Click Generate Agent, then Create Agent In Scene β only in that order!
π Thatβs it! When you run the game:
- Select a microphone in the dropdown
- Click the button to start recording
- Speak β click again
- AI responds with voice
After you click Generate Agent, two files will be created:
- AgentNameController.cs
β your agent controller (you donβt need to modify it)
- AgentNameSettings.asset
β ScriptableObject with agent settings (system prompt, model name, index, etc.)
You can edit the settings as you wish.
Emotions and actions cannot be modified yet β stay tuned for updates π
Agent performance (βspeedβ) depends on:
- LLM model size
- Whisper model size
- Voice file length
- AI response size
Small models like deepseek-r1:7b or ggml-tiny.bin run fast but may ignore system prompts (emotions, actions, etc.).
Large models like ggml-large.bin usually work perfectly β but will be slow as hell π
Choose models depending on your goals.
Is it a problem? Maybe. But it only takes some time of testing to find the perfect setup and build something amazing with this tech.
On first load, TTS may respond slowly β itβs normal. It will work faster next time.