whisper.cpp
Local speech recognition engine for audio-to-text transcription in the musegpt audio pipeline.
whisper.cpp provides local speech recognition. musegpt uses it for audio-to-text transcription in the audio evaluation pipeline, chaining transcribe() into chat().
Configuration
[backend]
name = "whisper"
url = "http://localhost:8080"
Wire Protocol
Unlike the chat backends, whisper.cpp uses a multipart/binary body format at its /inference endpoint. The response is a JSON object with a text field.
Default Endpoint
POST http://localhost:8080/inference