whisper.cpp

Local speech recognition engine for audio-to-text transcription in the musegpt audio pipeline.

1 min read Provider Backend local transcription audio

whisper.cpp provides local speech recognition. musegpt uses it for audio-to-text transcription in the audio evaluation pipeline, chaining transcribe() into chat().

Configuration

[backend]
name = "whisper"
url = "http://localhost:8080"

Wire Protocol

Unlike the chat backends, whisper.cpp uses a multipart/binary body format at its /inference endpoint. The response is a JSON object with a text field.

Default Endpoint

POST http://localhost:8080/inference

Configuration

Wire Protocol

Default Endpoint

Related Documentation