Provider

If you’re self-hosting Nysa, you’ll need to set up a provider and choose the models you want to use.

You’ll need five models: one for chat completions, one for embeddings, one for transcription, one for text-to-speech, and one for context compaction. All of them can come from the same provider, but the option to use something else is provided for flexibility; as one might want, for example, to use a heavier model from OpenRouter for chat completions, an embedding model that’s self-hosted, and a context compaction model that’s from a faster provider, like, Groq.

Any provider that uses an OpenAI-compatible API is supported. Though OpenRouter is the recommended provider, as it supports a wide range of models and is easy to set up. If you wish to use a local model, look into llama.cpp and whisper.cpp (their -server binaries, specifically), but this guide won’t cover how to do that.

Recommended models for each category are listed at the bottom of this page.

Setting up Models

Add these categories to your config.toml file.

Chat Completions

[ai.chat]
base_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-..."
model = "anthropic/claude-sonnet-4.6"
temperature = 0.9
max_completion_tokens = 4096

The base_url must end in /api/v1. The model slug is dependent on the provider; for OpenRouter, the format is company/model-name. A sensible default for temperature is 0.9, but you can go higher for more creative outputs. Maximum completion tokens are set to 4096 by default, but you can adjust this as needed.

Embeddings

[ai.embedding]
base_url = "https://openrouter.ai/api/v1"
api_key = "sk-or-v1-..."
model = "openai/text-embedding-3-small"
dimensions = 512

Text Embedding 3 Small is a 1,536-dimensional embedding model, but it can be downgraded to 512 dimensions for lower cost without any noticable degradation in performance. Be sure to check the model’s dimension support before downgrading. Make sure you don’t change embedding models, because certain database tables will have to be dropped and recreated, thus you’ll lose your agent’s memories.

Transcriptions

[ai.transcription]
base_url = "http://localhost:8080/inference"
api_key = ""
model = ""

This example assumes you’re using Whisper Large v3 Turbo locally with whisper-server. Replace the port in base_url and api_key with your own values, if you set any. Do mind the fact that Nysa always gives the transcription model a 16k sample rate WAV file to work with; it’s designed around Whisper workflows. The model field is provided with people who use the Groq API (or a similar one) in mind.

Text-to-Speech

Nothing to see here yet ;).

Context Compaction

[ai.compaction]
base_url = "http://localhost:8081/api/v1"
api_key = ""
model = ""

This example assumes you’re using a local inference server with llama.cpp. Replace the base_url and api_key with your own values, if you set any.

Recommended Models

Category	Model	Available on OpenRouter
Chat Completions	Claude Sonnet 4.6 / Kimi K2.5	Yes
Embeddings	Text Embedding 3 Small (OpenAI)	Yes
Transcriptions	Whisper Large v3 Turbo	No
Text-to-Speech	Fish Audio S1 / Qwen3 TTS 1.7B (CustomVoice version)	No
Context Compaction	Qwen 3.5 9B	Yes

The hard part is over! Now you can continue with setting up a Discord bot.