Inter-Process Communication Model

Ports and adapters architecture for thread-to-thread and plugin-to-backend communication in musegpt.

5 min read Architecture Core ipc ports-and-adapters commands events lock-free

musegpt has two distinct communication boundaries:

In-process (thread-to-thread): the plugin's UI thread, inference worker thread, and audio thread exchange data via queues and atomics within the same process.
Out-of-process (plugin-to-backend): the inference worker thread communicates with external backend processes (Ollama, vLLM, llama.cpp server, etc.) over HTTP, gRPC, Unix sockets, or stdio pipes depending on the backend.

This document covers boundary #1. Boundary #2 is abstracted by the InferenceBackend interface; each backend adapter handles its own wire protocol internally.

Architecture: Ports and Adapters

The key insight is separating what is communicated from how it's delivered:

Message protocol: the typed commands and events that flow between threads
Transport: how those messages are delivered (thread queues, lock-free FIFOs, etc.)

By defining the boundary as a port (an abstract interface), we can swap transports without changing the protocol, and test the protocol without real threads.

In-Process Communication

The Port

A port is a pair of typed message channels between the UI thread and the inference worker:

sequenceDiagram
    participant UI as UI Thread
    participant Port as InferencePort
    participant Worker as Inference Worker

    UI->>Port: Command
    Port->>Worker: receives
    Worker->>Port: emits
    Port->>UI: Event

A separate lock-free channel carries results to the audio thread:

sequenceDiagram
    participant Worker as Inference Worker
    participant AP as AudioPort (lock-free)
    participant Audio as Audio Thread

    Worker->>AP: MidiResult
    AP->>Audio: reads

Commands (UI to Inference Worker)

Command	Description
`SubmitRequest`	Start inference with a prompt, temperature, max tokens
`CancelRequest`	Abort the current inference
`ChangeBackend`	Swap the active inference backend

Events (Inference Worker to UI)

Event	Description
`TokenReceived`	A single streamed token arrived
`InferenceComplete`	Final result ready (full response, error/cancel status)
`Error`	Backend error or connection failure
`BackendStatusChanged`	Backend started, stopped, or health check result

Events (Inference Worker to Audio Thread)

Event	Description
`MidiResultReady`	Parsed MIDI data available for playback

Port Implementations

Production: ThreadedInferencePort

Commands delivered via mutex-protected queue (safe: neither UI nor worker is real-time)
Events delivered via lock-free SPSC queue (UI polls)
MIDI results delivered via lock-free SPSC queue (audio thread polls)
Worker runs on a dedicated background thread
Condition variable wakes worker when commands arrive

Testing: SyncInferencePort

Commands processed inline, synchronously
Events returned immediately
No threads, no timing, fully deterministic
Used for protocol correctness tests

Testing: RecordingPort

Records all commands sent and events received
Assertions against the recorded sequence
Used for verifying command/event ordering

Out-of-Process Communication

The inference worker thread communicates with external backend processes. The wire protocol depends on the backend:

Backend	Protocol	Typical endpoint
Ollama	HTTP (OpenAI-compatible)	`localhost:11434`
llama.cpp server	HTTP (OpenAI-compatible)	`localhost:8080`
vLLM	HTTP (OpenAI-compatible)	`localhost:8000`
SGLang	HTTP (OpenAI-compatible)	`localhost:30000`
MLX (mlx-lm)	HTTP (OpenAI-compatible)	`localhost:8080`
llamafile	HTTP (OpenAI-compatible)	`localhost:8080`
TensorRT-LLM	HTTP (OpenAI-compatible) or gRPC	varies
whisper.cpp	HTTP	`localhost:8080`

The InferenceBackend abstract interface hides these details. Each backend adapter implements connection management, request serialization, response streaming, and error handling for its specific protocol.

Message Flow Examples

Happy path: submit and complete

UI sends:       SubmitRequest { prompt: "Write a melody in C major" }
Worker emits:   TokenReceived { "Here" }
Worker emits:   TokenReceived { " is" }
Worker emits:   TokenReceived { " a" }
Worker emits:   TokenReceived { " melody" }
Worker emits:   InferenceComplete { full_response: "Here is a melody", cancelled: false }
Worker emits:   MidiResultReady { notes: [...] }   (to audio thread)

Cancellation mid-stream

UI sends:       SubmitRequest { prompt: "Generate a drum pattern" }
Worker emits:   TokenReceived { "Kick" }
Worker emits:   TokenReceived { " on" }
UI sends:       CancelRequest {}
Worker emits:   InferenceComplete { cancelled: true }

Backend swap

UI sends:       ChangeBackend { name: "ollama", config: { ... } }
Worker emits:   BackendStatusChanged { running: false, name: "llama.cpp" }
Worker emits:   BackendStatusChanged { running: true, name: "ollama" }

Error

UI sends:       SubmitRequest { prompt: "..." }
Worker emits:   Error { message: "Connection refused: backend not running" }

Testing Strategy

What	How	Threads?
Message protocol correctness	`SyncInferencePort` + mock backend	No
Command/event sequencing	`RecordingPort`	No
Thread safety of transport	`ThreadedInferencePort` + TSan	Yes
Lock-free audio path	SPSC queue tests	Yes (minimal)
Full integration	`ThreadedInferencePort` + mock backend	Yes

The majority of tests are deterministic and fast. Only transport-level tests require real threads.