Ollama
Local model runner with an OpenAI-compatible API. The default backend for musegpt.
Ollama is the default inference backend for musegpt. It runs models locally and exposes an OpenAI-compatible API at localhost:11434.
Configuration
[backend]
name = "ollama"
url = "http://localhost:11434"
model = "llama3.2"
Supported Features
- Streaming chat completions via SSE
- Structured output (JSON mode)
- Model switching at runtime
- GPU acceleration (automatic)
Default Endpoint
POST http://localhost:11434/v1/chat/completions