Threading Model

Three-thread architecture for real-time audio, responsive UI, and background inference in the musegpt VST3 plugin.

3 min read Architecture Core threading real-time lock-free audio

musegpt runs inference via HTTP API calls to external backends. These calls can take milliseconds to minutes (streaming). The threading model ensures the audio thread is never blocked, the UI remains responsive, and inference runs safely in the background.

The Three Threads

Audio Thread (Real-Time, DAW-Controlled)

Calls processBlock() at ~1-10ms intervals
Must never block: no allocations, no locks, no syscalls, no HTTP, no I/O
Reads inference results (e.g., generated MIDI) from a lock-free FIFO
Never initiates inference requests

UI Thread (Main Thread)

Handles user interaction and GUI repaints
Enqueues inference requests to the background thread (non-blocking)
Reads streaming response tokens from a lock-free queue
Never waits on inference completion

Background Thread (Inference Worker)

Owns the HTTP client lifecycle
Picks up requests from a thread-safe queue
Makes blocking HTTP calls to the configured inference backend
Streams response tokens back via a lock-free queue
The only thread that is allowed to block on I/O

Data Flow

sequenceDiagram
    participant UI as UI Thread
    participant BG as Background Thread
    participant Audio as Audio Thread

    UI->>BG: user sends message (queue)
    BG->>BG: HTTP call to backend
    BG->>UI: streaming tokens (lock-free)
    UI->>UI: update display
    BG->>Audio: MIDI result (lock-free FIFO)
    Audio->>Audio: read MIDI (never blocks)

Rules

No mutexes on the audio thread. All communication with the audio thread uses lock-free structures only (SPSC ring buffers, atomic flags).
Audio thread reads, never writes requests. It consumes results from a lock-free FIFO. It never initiates inference.
UI thread never waits on inference. It enqueues requests and polls/receives responses asynchronously.
Background thread owns all I/O. It is the only thread that makes HTTP calls or touches the network.
UI-to-background communication uses a thread-safe queue. A mutex is acceptable here since neither thread is real-time.
Backend lifecycle is managed on the background thread. Starting, stopping, and health-checking backends never happens on the audio or UI thread.
Streaming tokens are delivered incrementally. The background thread pushes tokens as they arrive via SSE; the UI thread consumes them for display without waiting for completion.
Cancellation is cooperative. The UI thread sets an atomic cancel flag; the background thread checks it between tokens and aborts the HTTP call.

Testing Strategy

Interface contract tests: verify the InferenceBackend abstraction behaves correctly using a mock backend
Async request/response tests: submit a request, confirm the response arrives on a callback, confirm the caller never blocks
Audio thread safety tests: prove the audio thread can read results without blocking, even during active inference
Concurrent stress tests: multiple requests, backend start/stop during inflight requests, rapid submit/cancel
ThreadSanitizer (TSan): all tests compile and run clean with -fsanitize=thread