Replace the LLM-per-request flow in /pm with a stateful Python backend
(OpenSquawk-LiveATC-api). The backend owns session state, does regex-first
routing with readback evaluation, and returns the next state + ATC speech.
The frontend keeps its local cursor (communicationsEngine) for TTS and
monitoring UI, but no longer calls /api/llm/decide.
Changes:
app/composables/useRadioBackend.ts (new)
Typed Nuxt composable wrapping the Python REST API:
createSession, transmit, deleteSession, fetchFlows.
Base URL read from NUXT_PUBLIC_RADIO_BACKEND_URL (default 127.0.0.1:8000).
nuxt.config.ts
Expose radioBackendUrl as a public runtime config key so the composable
and communicationsEngine can both reach the Python backend.
shared/utils/communicationsEngine.ts
- fetchRuntimeTree now accepts an optional baseUrl so it fetches from the
Python backend instead of the Nuxt server when a URL is provided.
- renderTpl handles both {var} (old MongoDB schema) and {{var}} (new YAML
schema) — double-brace matched first to avoid partial matches.
- stateSayTpl / stateUtteranceTpl helpers unify say_tpl|say_template and
utterance_tpl|expected_pilot_template across both schema versions.
- auto_transitions from the new YAML schema are included when collecting
eligible transitions in collectAtcStatesUntilPilotTurn.
shared/types/decision.ts
RuntimeDecisionState extended with say_template and expected_pilot_template
fields (new YAML schema field names alongside the existing legacy names).
app/pages/pm.vue
- startMonitoring: loads tree from Python backend, then creates a backend
session (backendSessionId). Cursor synced to session.current_state.
- handlePilotTransmission: calls radioBackend.transmit instead of
/api/llm/decide. Applies auto_advanced_states via moveToSilent, then
the final state. Speaks controller_say_template via TTS.
- Both fetchRuntimeTree calls now pass radioBackendUrl so they hit the
Python backend, not the Nuxt flow-from-MongoDB path.
AGENTS.md (new)
Project guide updated to document the new two-backend architecture,
the Python backend session lifecycle, and the dual template schema.
docs/plans/2026-05-06-pm-python-runtime-contract.md (new)
Implementation plan and API contract written before the work started.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
17 KiB
PM Python Runtime Contract and Reimplementation Plan
Date: 2026-05-06 Scope: Rebuild the backend/runtime for the PM radio training endpoint in Python while keeping the current frontend usable. Audience: Developers who do not know the current repository.
Goal
Rebuild the PM radio training backend as a Python runtime for interactive ATC communication training.
The existing frontend remains in place during the first implementation phase. The Python backend must therefore expose a compatibility API that preserves the current frontend contract. Internally, the new implementation should be cleanly structured around domain concepts, not around the current file layout.
The system trains aviation radio communication. A user speaks or types a pilot transmission. The backend evaluates it against the current scenario state, selects the next allowed state, updates runtime variables and flags, may switch between decision flows, and returns controller speech plus trace/debug information.
Non-Goals
- Do not port the existing TypeScript files line by line.
- Do not let the LLM own the state machine.
- Do not require frontend refactoring for the first backend replacement.
- Do not duplicate phrase normalization, readback checks, or transition logic across individual flows.
Required Compatibility API
The Python backend must initially provide these endpoints, even if internally they map to cleaner services.
GET /api/decision-flows/runtime
Returns all available runtime flows.
Required shape:
{
"schema_version": "string",
"main_flow": "string",
"flows": {
"flow-slug": {
"slug": "flow-slug",
"schema_version": "string",
"name": "string",
"description": "string",
"start_state": "string",
"end_states": ["string"],
"variables": {},
"flags": {},
"policies": {},
"hooks": {},
"roles": ["pilot", "atc", "system"],
"phases": ["string"],
"states": {
"STATE_ID": {
"role": "pilot",
"phase": "clearance",
"name": "string",
"summary": "string",
"say_tpl": "string",
"utterance_tpl": "string",
"readback_required": ["callsign", "runway"],
"next": [{ "to": "STATE_ID", "label": "string", "guard": "string" }],
"ok_next": [{ "to": "STATE_ID" }],
"bad_next": [{ "to": "STATE_ID" }],
"timer_next": [{ "to": "STATE_ID", "after_s": 10 }],
"auto_transitions": [],
"triggers": [],
"conditions": [],
"actions": [],
"handoff": { "to": "tower", "freq": "118.700" },
"frequency": "118.700",
"frequencyName": "Tower"
}
},
"entry_mode": "main"
}
}
}
POST /api/llm/decide
Accepts the current frontend-built decision context and returns the next decision.
Input compatibility shape:
{
"state_id": "CURRENT_STATE",
"state": {},
"candidates": [
{ "id": "NEXT_STATE", "flow": "flow-slug", "state": {} }
],
"variables": {},
"flags": {},
"pilot_utterance": "Lufthansa 359 ready for taxi",
"flow_slug": "flow-slug"
}
Output compatibility shape:
{
"decision": {
"next_state": "NEXT_STATE",
"updates": {},
"flags": {},
"controller_say_tpl": "Lufthansa 359, taxi to holding point runway 25R via N3 U4",
"radio_check": false,
"activate_flow": null,
"resume_previous": false,
"off_schema": false
},
"trace": {
"calls": [],
"fallback": { "used": false },
"candidateTimeline": { "steps": [] },
"autoSelection": null
},
"active_nodes": [],
"pilot_intent": "taxi_request"
}
The backend must never return a next_state that was not allowed by the runtime model unless it marks the result as an explicit fallback/error. Invalid LLM output must be rejected and normalized before reaching the frontend.
POST /api/atc/say
Generates speech audio for a text phrase.
Input:
{
"text": "Lufthansa tree fife niner, contact tower wun wun eight decimal seven",
"voice": "string"
}
Output:
{
"audio": "base64-encoded-audio",
"mimeType": "audio/mpeg"
}
POST /api/atc/ptt
Accepts recorded audio from push-to-talk and returns a transcription. It may optionally include a decision result, but transcription alone is sufficient for compatibility.
Output:
{
"transcription": "Lufthansa 359 ready for taxi"
}
Supporting Data Endpoints
The frontend may also need:
GET /api/vatsim/flightplansGET /api/vatsim/metarGET /api/airports/{icao}/frequencies
These can be implemented as separate provider-backed services. They should not be coupled to the decision engine.
Core Domain Model
Use Pydantic models as the canonical source of truth. Avoid passing untyped dictionaries through the core runtime.
Flow
A DecisionFlow is a named scenario or scenario segment, such as clearance, taxi, tower, departure, approach, radio check, or abnormal event.
Fields:
slugnamedescriptionschema_versionstart_stateend_statesvariablesflagspoliciesstatesentry_mode:main,linear, orparallel
State
A DecisionState is one step in a radio interaction.
Roles:
pilot: system waits for pilot inputatc: controller speakssystem: internal transition, action, guard, timer, or flow operation
Important fields:
idrolephasesummarysay_templateutterance_templatereadback_requiredtransitionstriggersconditionsactionshandofffrequency
Transition
Transitions must use one shared model.
Types:
next: ordinary routeok: correct pilot/readback routebad: incorrect or incomplete routetimer: time-based routeauto: guard/trigger routeinterrupt: suspend current flow and enter another flowreturn: exit current flow and resume previous flow
Runtime Session
A RuntimeSession holds mutable user state:
session_idmain_flowactive_flowcurrent_statevariablesflagsflow_stackparallel_flowsmessage_historydecision_historytimers
The first compatibility implementation may still accept stateless frontend context. Internally, the runtime should be designed around sessions so the frontend can later become thinner.
Runtime Architecture
Recommended Python modules:
app/api/
decision_routes.py
speech_routes.py
data_routes.py
app/domain/
models.py
session.py
flow_registry.py
decision_engine.py
flow_orchestrator.py
candidate_builder.py
guards.py
readback.py
templates.py
radio_normalizer.py
trace.py
app/services/
radio_training_service.py
speech_service.py
transcription_service.py
flight_data_service.py
llm_router.py
app/infrastructure/
repositories.py
llm_provider.py
tts_provider.py
stt_provider.py
vatsim_client.py
airport_data_client.py
API routes should only validate, adapt, call services, and return responses. They should not contain state machine logic.
Decision Algorithm
For each pilot transmission:
- Load or build the current
RuntimeSession. - Resolve current flow and current state.
- Build candidate states from allowed transitions, active parallel flows, and valid interrupt flows.
- Evaluate guards and conditions deterministically.
- Evaluate regex or structured triggers deterministically.
- If the current state requires a readback, run the centralized
ReadbackEvaluator. - If one candidate remains, select it without an LLM call.
- If multiple candidates remain, call the LLM router with only those candidates.
- Validate the LLM response against the allowed candidate set.
- Apply variable and flag updates through a controlled update mechanism.
- Run flow activation, interruption, return, or resume behavior through
FlowOrchestrator. - Advance through ATC and system states until the next pilot state.
- Return the selected decision, controller templates, updated session state, and trace.
The compatibility response may only include the fields the current frontend expects, but the internal service should already compute the richer result.
LLM Rules
The LLM is a router, not the source of truth.
Allowed:
- classify pilot intent
- choose among explicit candidate states
- help evaluate ambiguous readbacks
- extract structured values when deterministic parsing is uncertain
Forbidden:
- invent states
- skip guards
- modify variables outside an allowed schema
- generate controller phraseology that conflicts with the selected state
- decide flow activation outside declared flow rules
Every LLM decision must be validated. Invalid output becomes a traceable fallback, not an unchecked runtime decision.
Readback and Phrase Normalization
Centralize all aviation phrase handling.
Components:
TemplateRenderer: fills templates with variables.RadioPhraseNormalizer: converts rendered text to speech-friendly aviation phraseology.ReadbackEvaluator: checks pilot response against required values.CallsignNormalizer: handles airline codes, tail numbers, and spoken variants.FrequencyNormalizer: handles121.800,121.8, and spoken variants.RunwayNormalizer: handles25R,runway two five right, etc.NumberNormalizer: handles ICAO digit pronunciation.
Do not implement these per-flow or per-state.
Flow Switching
The runtime must support moving between flows.
Flow activation modes:
main: replace the current main flow.linear: enter a flow and return when it ends.parallel: run another flow beside the current one.interrupt: suspend the active flow and handle a higher-priority flow.return: finish current flow and resume the previous stacked flow.
Examples:
- Taxi flow interrupted by radio check.
- Ground flow activates tower handoff.
- Tower flow activates departure flow after takeoff.
- Abnormal event flow temporarily interrupts approach.
All flow switching must go through FlowOrchestrator. Individual states may declare flow operations, but they must not implement them directly.
Code Patterns and Principles
Use these patterns:
- Pydantic DTOs at API boundaries.
- Pydantic domain models inside the runtime.
- Repository pattern for persistence.
- Provider interfaces for LLM, TTS, STT, VATSIM, airport data.
- Strategy pattern for trigger and condition evaluators.
- Pure functions for rendering, normalization, parsing, and guard evaluation.
- Trace-first decision design.
- Adapter pattern for current frontend compatibility.
Principles:
- Deterministic logic before LLM logic.
- One canonical model for flows and states.
- One central evaluator for readbacks.
- One central renderer and normalizer for phraseology.
- One orchestrator for flow switching.
- The frontend displays and records; the backend owns scenario truth.
- Every state transition must be explainable in a trace.
Known Risks
Frontend-owned state progression
The current frontend applies decisions locally and advances through ATC/system states itself. This is acceptable for compatibility, but the target architecture should move this responsibility into the backend session runtime.
Risk: frontend and backend can disagree about the current state.
Mitigation: return enough compatibility data now, but design the Python service to produce full runtime results internally.
Duplicate normalization
If transcription, routing, readback checking, and TTS each normalize differently, errors will be hard to debug.
Mitigation: one shared phrase normalization package in the Python runtime.
LLM overreach
An LLM can select invalid states or produce plausible but unsafe phraseology.
Mitigation: candidate-constrained routing, response validation, and fallbacks.
Flow collisions
Parallel or interrupted flows may write the same variable or flag.
Mitigation: session-scoped update policy, namespaced flow-local variables where useful, and explicit allowed update schemas.
Infinite auto transitions
System/ATC auto-advance can loop forever.
Mitigation: max-hop limits, visited-state detection, and traceable loop errors.
Timer duplication
Timer transitions can fire more than once if stored only in frontend state.
Mitigation: backend session timers with ids and consumed status.
Implementation Plan
Phase 1: Contracts and Static Runtime
- Define Pydantic models for flows, states, transitions, sessions, decisions, and traces.
- Implement
GET /api/decision-flows/runtimeusing static fixture data. - Implement
POST /api/llm/decidewithout LLM, using deterministic candidate selection. - Return the current frontend-compatible response shape.
- Add unit tests for model validation and simple state transitions.
Phase 2: Deterministic Decision Engine
- Implement
CandidateBuilder. - Implement guards, conditions, triggers, and fallback logic.
- Implement centralized readback evaluation.
- Implement template rendering and radio normalization.
- Add tests for taxi, clearance, tower handoff, bad readback, and radio check.
Phase 3: Flow Orchestration
- Add
RuntimeSession. - Add
FlowOrchestrator. - Support
main,linear,parallel,interrupt, andreturn. - Add loop protection and flow-stack tests.
- Keep compatibility mode for stateless frontend calls.
Phase 4: LLM Router
- Add provider abstraction for LLM calls.
- Call LLM only when deterministic routing is ambiguous.
- Validate selected state against candidates.
- Store request, response, and fallback reason in trace.
- Add tests with mocked LLM output, including invalid output.
Phase 5: Speech and External Data
- Add TTS provider behind
SpeechService. - Add STT provider behind
TranscriptionService. - Add VATSIM and airport frequency providers behind
FlightDataService. - Keep these services independent from the decision engine.
Phase 6: Persistence and Migration
- Choose persistence for flows and sessions.
- Implement repositories.
- Import or author initial production flows.
- Add versioning for flow schemas.
- Add admin/editor compatibility only if needed.
Later Frontend Refactor: Remove or Change
This section is intentionally explicit. These items are not required for the first Python backend, but should be removed or changed once the backend owns sessions and full progression.
Remove frontend decision ownership
Current behavior to remove later:
- frontend builds the full LLM decision context
- frontend applies
next_state - frontend mutates variables and flags
- frontend advances through ATC/system states until the next pilot turn
- frontend infers active candidates
Target behavior:
- frontend sends
session_id,pilot_utterance, audio metadata, and optional UI context - backend returns updated session state, visible state summary, messages to speak, trace, and available actions
Future endpoint:
POST /api/radio/session/{session_id}/transmissions
Future response:
{
"session": {
"id": "string",
"active_flow": "tower",
"current_state": "TOWER_LINEUP",
"variables": {},
"flags": {}
},
"messages": [
{
"role": "atc",
"template": "Lufthansa 359, line up runway 25R",
"rendered": "Lufthansa 359, line up runway 25R",
"normalized": "Lufthansa tree fife niner, line up runway too fife right"
}
],
"trace": {},
"expected_pilot": []
}
Replace compatibility field names
Fields like say_tpl, utterance_tpl, next_state, and controller_say_tpl exist for compatibility. Later frontend code can move to clearer names:
say_tpl->sayTemplateortemplateutterance_tpl->expectedPilotTemplatecontroller_say_tpl->controllerMessage.templatenext_state->transition.targetState
Do not change these during the compatibility phase.
Remove frontend ATC speech scheduling assumptions
The frontend currently receives one decision and then schedules speech from locally collected ATC states.
Target behavior:
- backend returns an ordered
messagesarray - frontend only plays messages in order
- frontend does not need to know how ATC/system auto-advance works
Remove frontend flow-stack logic
Any later UI state for active flows should be display-only. Flow activation, return, interrupt, and parallel execution should be backend session state.
Simplify frontend debug panels
The frontend may still show trace data, but it should not reconstruct trace logic. The backend should return trace steps that are ready for display:
- candidates considered
- candidates eliminated
- guard failures
- readback result
- LLM call, if any
- fallback, if any
Replace stateless decision calls
The current compatibility call sends the whole state context each time. Later, the frontend should call session-based endpoints:
- create session
- get session
- submit transmission
- reset session
- select scenario/flow
This reduces frontend complexity and prevents backend/frontend state drift.
Acceptance Criteria
- Current
/pmfrontend can run against the Python backend without functional changes. - A developer can define a new flow without writing routing code.
- Deterministic routes work without LLM calls.
- Ambiguous routes use LLM only within allowed candidates.
- Readback checks are centralized and tested.
- Flow switching is handled by one orchestrator.
- Every decision returns a useful trace.
- Later frontend refactor work is isolated to removing the compatibility adapter and replacing frontend-owned runtime behavior with session API calls.