Home / Provider/ llama.cpp

llama.cpp

High-performance local inference engine with OpenAI-compatible server mode.

llama.cpp provides efficient CPU and GPU inference for GGUF models. Its built-in server mode exposes an OpenAI-compatible API.

Configuration

[backend]
name = "llamacpp"
url = "http://localhost:8080"
model = "default"

Default Endpoint

POST http://localhost:8080/v1/chat/completions

Compiled with SchemaFlux