Inter-Process Communication Model
Ports and adapters architecture for thread-to-thread and plugin-to-backend communication in musegpt.
musegpt has two distinct communication boundaries:
- In-process (thread-to-thread): the plugin's UI thread, inference worker thread, and audio thread exchange data via queues and atomics within the same process.
- Out-of-process (plugin-to-backend): the inference worker thread communicates with external backend processes (Ollama, vLLM, llama.cpp server, etc.) over HTTP, gRPC, Unix sockets, or stdio pipes depending on the backend.
This document covers boundary #1. Boundary #2 is abstracted by the InferenceBackend interface; each backend adapter handles its own wire protocol internally.
Architecture: Ports and Adapters
The key insight is separating what is communicated from how it's delivered:
- Message protocol: the typed commands and events that flow between threads
- Transport: how those messages are delivered (thread queues, lock-free FIFOs, etc.)
By defining the boundary as a port (an abstract interface), we can swap transports without changing the protocol, and test the protocol without real threads.
In-Process Communication
The Port
A port is a pair of typed message channels between the UI thread and the inference worker:
sequenceDiagram
participant UI as UI Thread
participant Port as InferencePort
participant Worker as Inference Worker
UI->>Port: Command
Port->>Worker: receives
Worker->>Port: emits
Port->>UI: Event
A separate lock-free channel carries results to the audio thread:
sequenceDiagram
participant Worker as Inference Worker
participant AP as AudioPort (lock-free)
participant Audio as Audio Thread
Worker->>AP: MidiResult
AP->>Audio: reads
Commands (UI to Inference Worker)
| Command | Description |
|---|---|
SubmitRequest |
Start inference with a prompt, temperature, max tokens |
CancelRequest |
Abort the current inference |
ChangeBackend |
Swap the active inference backend |
Events (Inference Worker to UI)
| Event | Description |
|---|---|
TokenReceived |
A single streamed token arrived |
InferenceComplete |
Final result ready (full response, error/cancel status) |
Error |
Backend error or connection failure |
BackendStatusChanged |
Backend started, stopped, or health check result |
Events (Inference Worker to Audio Thread)
| Event | Description |
|---|---|
MidiResultReady |
Parsed MIDI data available for playback |
Port Implementations
Production: ThreadedInferencePort
- Commands delivered via mutex-protected queue (safe: neither UI nor worker is real-time)
- Events delivered via lock-free SPSC queue (UI polls)
- MIDI results delivered via lock-free SPSC queue (audio thread polls)
- Worker runs on a dedicated background thread
- Condition variable wakes worker when commands arrive
Testing: SyncInferencePort
- Commands processed inline, synchronously
- Events returned immediately
- No threads, no timing, fully deterministic
- Used for protocol correctness tests
Testing: RecordingPort
- Records all commands sent and events received
- Assertions against the recorded sequence
- Used for verifying command/event ordering
Out-of-Process Communication
The inference worker thread communicates with external backend processes. The wire protocol depends on the backend:
| Backend | Protocol | Typical endpoint |
|---|---|---|
| Ollama | HTTP (OpenAI-compatible) | localhost:11434 |
| llama.cpp server | HTTP (OpenAI-compatible) | localhost:8080 |
| vLLM | HTTP (OpenAI-compatible) | localhost:8000 |
| SGLang | HTTP (OpenAI-compatible) | localhost:30000 |
| MLX (mlx-lm) | HTTP (OpenAI-compatible) | localhost:8080 |
| llamafile | HTTP (OpenAI-compatible) | localhost:8080 |
| TensorRT-LLM | HTTP (OpenAI-compatible) or gRPC | varies |
| whisper.cpp | HTTP | localhost:8080 |
The InferenceBackend abstract interface hides these details. Each backend adapter implements connection management, request serialization, response streaming, and error handling for its specific protocol.
Message Flow Examples
Happy path: submit and complete
UI sends: SubmitRequest { prompt: "Write a melody in C major" }
Worker emits: TokenReceived { "Here" }
Worker emits: TokenReceived { " is" }
Worker emits: TokenReceived { " a" }
Worker emits: TokenReceived { " melody" }
Worker emits: InferenceComplete { full_response: "Here is a melody", cancelled: false }
Worker emits: MidiResultReady { notes: [...] } (to audio thread)
Cancellation mid-stream
UI sends: SubmitRequest { prompt: "Generate a drum pattern" }
Worker emits: TokenReceived { "Kick" }
Worker emits: TokenReceived { " on" }
UI sends: CancelRequest {}
Worker emits: InferenceComplete { cancelled: true }
Backend swap
UI sends: ChangeBackend { name: "ollama", config: { ... } }
Worker emits: BackendStatusChanged { running: false, name: "llama.cpp" }
Worker emits: BackendStatusChanged { running: true, name: "ollama" }
Error
UI sends: SubmitRequest { prompt: "..." }
Worker emits: Error { message: "Connection refused: backend not running" }
Testing Strategy
| What | How | Threads? |
|---|---|---|
| Message protocol correctness | SyncInferencePort + mock backend |
No |
| Command/event sequencing | RecordingPort |
No |
| Thread safety of transport | ThreadedInferencePort + TSan |
Yes |
| Lock-free audio path | SPSC queue tests | Yes (minimal) |
| Full integration | ThreadedInferencePort + mock backend |
Yes |
The majority of tests are deterministic and fast. Only transport-level tests require real threads.