Home / Architecture/ Inter-Process Communication Model

Inter-Process Communication Model

Ports and adapters architecture for thread-to-thread and plugin-to-backend communication in musegpt.

musegpt has two distinct communication boundaries:

  1. In-process (thread-to-thread): the plugin's UI thread, inference worker thread, and audio thread exchange data via queues and atomics within the same process.
  2. Out-of-process (plugin-to-backend): the inference worker thread communicates with external backend processes (Ollama, vLLM, llama.cpp server, etc.) over HTTP, gRPC, Unix sockets, or stdio pipes depending on the backend.

This document covers boundary #1. Boundary #2 is abstracted by the InferenceBackend interface; each backend adapter handles its own wire protocol internally.

Architecture: Ports and Adapters

The key insight is separating what is communicated from how it's delivered:

  1. Message protocol: the typed commands and events that flow between threads
  2. Transport: how those messages are delivered (thread queues, lock-free FIFOs, etc.)

By defining the boundary as a port (an abstract interface), we can swap transports without changing the protocol, and test the protocol without real threads.

In-Process Communication

The Port

A port is a pair of typed message channels between the UI thread and the inference worker:

sequenceDiagram
    participant UI as UI Thread
    participant Port as InferencePort
    participant Worker as Inference Worker

    UI->>Port: Command
    Port->>Worker: receives
    Worker->>Port: emits
    Port->>UI: Event

A separate lock-free channel carries results to the audio thread:

sequenceDiagram
    participant Worker as Inference Worker
    participant AP as AudioPort (lock-free)
    participant Audio as Audio Thread

    Worker->>AP: MidiResult
    AP->>Audio: reads

Commands (UI to Inference Worker)

Command Description
SubmitRequest Start inference with a prompt, temperature, max tokens
CancelRequest Abort the current inference
ChangeBackend Swap the active inference backend

Events (Inference Worker to UI)

Event Description
TokenReceived A single streamed token arrived
InferenceComplete Final result ready (full response, error/cancel status)
Error Backend error or connection failure
BackendStatusChanged Backend started, stopped, or health check result

Events (Inference Worker to Audio Thread)

Event Description
MidiResultReady Parsed MIDI data available for playback

Port Implementations

Production: ThreadedInferencePort

  • Commands delivered via mutex-protected queue (safe: neither UI nor worker is real-time)
  • Events delivered via lock-free SPSC queue (UI polls)
  • MIDI results delivered via lock-free SPSC queue (audio thread polls)
  • Worker runs on a dedicated background thread
  • Condition variable wakes worker when commands arrive

Testing: SyncInferencePort

  • Commands processed inline, synchronously
  • Events returned immediately
  • No threads, no timing, fully deterministic
  • Used for protocol correctness tests

Testing: RecordingPort

  • Records all commands sent and events received
  • Assertions against the recorded sequence
  • Used for verifying command/event ordering

Out-of-Process Communication

The inference worker thread communicates with external backend processes. The wire protocol depends on the backend:

Backend Protocol Typical endpoint
Ollama HTTP (OpenAI-compatible) localhost:11434
llama.cpp server HTTP (OpenAI-compatible) localhost:8080
vLLM HTTP (OpenAI-compatible) localhost:8000
SGLang HTTP (OpenAI-compatible) localhost:30000
MLX (mlx-lm) HTTP (OpenAI-compatible) localhost:8080
llamafile HTTP (OpenAI-compatible) localhost:8080
TensorRT-LLM HTTP (OpenAI-compatible) or gRPC varies
whisper.cpp HTTP localhost:8080

The InferenceBackend abstract interface hides these details. Each backend adapter implements connection management, request serialization, response streaming, and error handling for its specific protocol.

Message Flow Examples

Happy path: submit and complete

UI sends:       SubmitRequest { prompt: "Write a melody in C major" }
Worker emits:   TokenReceived { "Here" }
Worker emits:   TokenReceived { " is" }
Worker emits:   TokenReceived { " a" }
Worker emits:   TokenReceived { " melody" }
Worker emits:   InferenceComplete { full_response: "Here is a melody", cancelled: false }
Worker emits:   MidiResultReady { notes: [...] }   (to audio thread)

Cancellation mid-stream

UI sends:       SubmitRequest { prompt: "Generate a drum pattern" }
Worker emits:   TokenReceived { "Kick" }
Worker emits:   TokenReceived { " on" }
UI sends:       CancelRequest {}
Worker emits:   InferenceComplete { cancelled: true }

Backend swap

UI sends:       ChangeBackend { name: "ollama", config: { ... } }
Worker emits:   BackendStatusChanged { running: false, name: "llama.cpp" }
Worker emits:   BackendStatusChanged { running: true, name: "ollama" }

Error

UI sends:       SubmitRequest { prompt: "..." }
Worker emits:   Error { message: "Connection refused: backend not running" }

Testing Strategy

What How Threads?
Message protocol correctness SyncInferencePort + mock backend No
Command/event sequencing RecordingPort No
Thread safety of transport ThreadedInferencePort + TSan Yes
Lock-free audio path SPSC queue tests Yes (minimal)
Full integration ThreadedInferencePort + mock backend Yes

The majority of tests are deterministic and fast. Only transport-level tests require real threads.

Compiled with SchemaFlux