Provider
If you’re self-hosting Nysa, you’ll need to set up a provider and choose the models you want to use.
You’ll need five models: one for chat completions, one for embeddings, one for transcription, one for text-to-speech, and one for context compaction. All of them can come from the same provider, but the option to use something else is provided for flexibility; as one might want, for example, to use a heavier model from OpenRouter for chat completions, an embedding model that’s self-hosted, and a context compaction model that’s from a faster provider, like, Groq.
Any provider that uses an OpenAI-compatible API is supported. Though OpenRouter is the recommended provider, as it supports a wide range of models and is easy to set up. If you wish to use a local model, look into llama.cpp and whisper.cpp (their -server binaries, specifically), but this guide won’t cover how to do that.
Recommended models for each category are listed at the bottom of this page.
Setting up Models
Section titled “Setting up Models”Add these categories to your config.toml file.
Chat Completions
Section titled “Chat Completions”[ai.chat]base_url = "https://openrouter.ai/api/v1"api_key = "sk-or-v1-..."model = "anthropic/claude-sonnet-4.6"temperature = 0.9max_completion_tokens = 4096The base_url must end in /api/v1. The model slug is dependent on the provider; for OpenRouter, the format is company/model-name. A sensible default for temperature is 0.9, but you can go higher for more creative outputs. Maximum completion tokens are set to 4096 by default, but you can adjust this as needed.
Embeddings
Section titled “Embeddings”[ai.embedding]base_url = "https://openrouter.ai/api/v1"api_key = "sk-or-v1-..."model = "openai/text-embedding-3-small"dimensions = 512Text Embedding 3 Small is a 1,536-dimensional embedding model, but it can be downgraded to 512 dimensions for lower cost without any noticable degradation in performance. Be sure to check the model’s dimension support before downgrading. Make sure you don’t change embedding models, because certain database tables will have to be dropped and recreated, thus you’ll lose your agent’s memories.
Transcriptions
Section titled “Transcriptions”[ai.transcription]base_url = "http://localhost:8080/inference"api_key = ""model = ""This example assumes you’re using Whisper Large v3 Turbo locally with whisper-server. Replace the port in base_url and api_key with your own values, if you set any. Do mind the fact that Nysa always gives the transcription model a 16k sample rate WAV file to work with; it’s designed around Whisper workflows. The model field is provided with people who use the Groq API (or a similar one) in mind.
Text-to-Speech
Section titled “Text-to-Speech”Nothing to see here yet ;).
Context Compaction
Section titled “Context Compaction”[ai.compaction]base_url = "http://localhost:8081/api/v1"api_key = ""model = ""This example assumes you’re using a local inference server with llama.cpp. Replace the base_url and api_key with your own values, if you set any.
Recommended Models
Section titled “Recommended Models”| Category | Model | Available on OpenRouter |
|---|---|---|
| Chat Completions | Claude Sonnet 4.6 / Kimi K2.5 | Yes |
| Embeddings | Text Embedding 3 Small (OpenAI) | Yes |
| Transcriptions | Whisper Large v3 Turbo | No |
| Text-to-Speech | Fish Audio S1 / Qwen3 TTS 1.7B (CustomVoice version) | No |
| Context Compaction | Qwen 3.5 9B | Yes |
The hard part is over! Now you can continue with setting up a Discord bot.