mirror of https://github.com/OpenSquawk/OpenSquawk synced 2026-05-13 01:46:08 +08:00

Files

leubeem 9464d37293 Wire /pm to Python backend for stateful ATC training sessions

Replace the LLM-per-request flow in /pm with a stateful Python backend
(OpenSquawk-LiveATC-api). The backend owns session state, does regex-first
routing with readback evaluation, and returns the next state + ATC speech.
The frontend keeps its local cursor (communicationsEngine) for TTS and
monitoring UI, but no longer calls /api/llm/decide.

Changes:

app/composables/useRadioBackend.ts (new)
  Typed Nuxt composable wrapping the Python REST API:
  createSession, transmit, deleteSession, fetchFlows.
  Base URL read from NUXT_PUBLIC_RADIO_BACKEND_URL (default 127.0.0.1:8000).

nuxt.config.ts
  Expose radioBackendUrl as a public runtime config key so the composable
  and communicationsEngine can both reach the Python backend.

shared/utils/communicationsEngine.ts
  - fetchRuntimeTree now accepts an optional baseUrl so it fetches from the
    Python backend instead of the Nuxt server when a URL is provided.
  - renderTpl handles both {var} (old MongoDB schema) and {{var}} (new YAML
    schema) — double-brace matched first to avoid partial matches.
  - stateSayTpl / stateUtteranceTpl helpers unify say_tpl|say_template and
    utterance_tpl|expected_pilot_template across both schema versions.
  - auto_transitions from the new YAML schema are included when collecting
    eligible transitions in collectAtcStatesUntilPilotTurn.

shared/types/decision.ts
  RuntimeDecisionState extended with say_template and expected_pilot_template
  fields (new YAML schema field names alongside the existing legacy names).

app/pages/pm.vue
  - startMonitoring: loads tree from Python backend, then creates a backend
    session (backendSessionId). Cursor synced to session.current_state.
  - handlePilotTransmission: calls radioBackend.transmit instead of
    /api/llm/decide. Applies auto_advanced_states via moveToSilent, then
    the final state. Speaks controller_say_template via TTS.
  - Both fetchRuntimeTree calls now pass radioBackendUrl so they hit the
    Python backend, not the Nuxt flow-from-MongoDB path.

AGENTS.md (new)
  Project guide updated to document the new two-backend architecture,
  the Python backend session lifecycle, and the dual template schema.

docs/plans/2026-05-06-pm-python-runtime-contract.md (new)
  Implementation plan and API contract written before the work started.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-09 17:49:28 +02:00

17 KiB

Raw Permalink Blame History

PM Python Runtime Contract and Reimplementation Plan

Date: 2026-05-06 Scope: Rebuild the backend/runtime for the PM radio training endpoint in Python while keeping the current frontend usable. Audience: Developers who do not know the current repository.

Goal

Rebuild the PM radio training backend as a Python runtime for interactive ATC communication training.

The existing frontend remains in place during the first implementation phase. The Python backend must therefore expose a compatibility API that preserves the current frontend contract. Internally, the new implementation should be cleanly structured around domain concepts, not around the current file layout.

The system trains aviation radio communication. A user speaks or types a pilot transmission. The backend evaluates it against the current scenario state, selects the next allowed state, updates runtime variables and flags, may switch between decision flows, and returns controller speech plus trace/debug information.

Non-Goals

Do not port the existing TypeScript files line by line.
Do not let the LLM own the state machine.
Do not require frontend refactoring for the first backend replacement.
Do not duplicate phrase normalization, readback checks, or transition logic across individual flows.

Required Compatibility API

The Python backend must initially provide these endpoints, even if internally they map to cleaner services.

`GET /api/decision-flows/runtime`

Returns all available runtime flows.

Required shape:

{
  "schema_version": "string",
  "main_flow": "string",
  "flows": {
    "flow-slug": {
      "slug": "flow-slug",
      "schema_version": "string",
      "name": "string",
      "description": "string",
      "start_state": "string",
      "end_states": ["string"],
      "variables": {},
      "flags": {},
      "policies": {},
      "hooks": {},
      "roles": ["pilot", "atc", "system"],
      "phases": ["string"],
      "states": {
        "STATE_ID": {
          "role": "pilot",
          "phase": "clearance",
          "name": "string",
          "summary": "string",
          "say_tpl": "string",
          "utterance_tpl": "string",
          "readback_required": ["callsign", "runway"],
          "next": [{ "to": "STATE_ID", "label": "string", "guard": "string" }],
          "ok_next": [{ "to": "STATE_ID" }],
          "bad_next": [{ "to": "STATE_ID" }],
          "timer_next": [{ "to": "STATE_ID", "after_s": 10 }],
          "auto_transitions": [],
          "triggers": [],
          "conditions": [],
          "actions": [],
          "handoff": { "to": "tower", "freq": "118.700" },
          "frequency": "118.700",
          "frequencyName": "Tower"
        }
      },
      "entry_mode": "main"
    }
  }
}

`POST /api/llm/decide`

Accepts the current frontend-built decision context and returns the next decision.

Input compatibility shape:

{
  "state_id": "CURRENT_STATE",
  "state": {},
  "candidates": [
    { "id": "NEXT_STATE", "flow": "flow-slug", "state": {} }
  ],
  "variables": {},
  "flags": {},
  "pilot_utterance": "Lufthansa 359 ready for taxi",
  "flow_slug": "flow-slug"
}

Output compatibility shape:

{
  "decision": {
    "next_state": "NEXT_STATE",
    "updates": {},
    "flags": {},
    "controller_say_tpl": "Lufthansa 359, taxi to holding point runway 25R via N3 U4",
    "radio_check": false,
    "activate_flow": null,
    "resume_previous": false,
    "off_schema": false
  },
  "trace": {
    "calls": [],
    "fallback": { "used": false },
    "candidateTimeline": { "steps": [] },
    "autoSelection": null
  },
  "active_nodes": [],
  "pilot_intent": "taxi_request"
}

The backend must never return a next_state that was not allowed by the runtime model unless it marks the result as an explicit fallback/error. Invalid LLM output must be rejected and normalized before reaching the frontend.

`POST /api/atc/say`

Generates speech audio for a text phrase.

Input:

{
  "text": "Lufthansa tree fife niner, contact tower wun wun eight decimal seven",
  "voice": "string"
}

Output:

{
  "audio": "base64-encoded-audio",
  "mimeType": "audio/mpeg"
}

`POST /api/atc/ptt`

Accepts recorded audio from push-to-talk and returns a transcription. It may optionally include a decision result, but transcription alone is sufficient for compatibility.

Output:

{
  "transcription": "Lufthansa 359 ready for taxi"
}

Supporting Data Endpoints

The frontend may also need:

GET /api/vatsim/flightplans
GET /api/vatsim/metar
GET /api/airports/{icao}/frequencies

These can be implemented as separate provider-backed services. They should not be coupled to the decision engine.

Core Domain Model

Use Pydantic models as the canonical source of truth. Avoid passing untyped dictionaries through the core runtime.

Flow

A DecisionFlow is a named scenario or scenario segment, such as clearance, taxi, tower, departure, approach, radio check, or abnormal event.

Fields:

slug
name
description
schema_version
start_state
end_states
variables
flags
policies
states
entry_mode: main, linear, or parallel

State

A DecisionState is one step in a radio interaction.

Roles:

pilot: system waits for pilot input
atc: controller speaks
system: internal transition, action, guard, timer, or flow operation

Important fields:

id
role
phase
summary
say_template
utterance_template
readback_required
transitions
triggers
conditions
actions
handoff
frequency

Transition

Transitions must use one shared model.

Types:

next: ordinary route
ok: correct pilot/readback route
bad: incorrect or incomplete route
timer: time-based route
auto: guard/trigger route
interrupt: suspend current flow and enter another flow
return: exit current flow and resume previous flow

Runtime Session

A RuntimeSession holds mutable user state:

session_id
main_flow
active_flow
current_state
variables
flags
flow_stack
parallel_flows
message_history
decision_history
timers

The first compatibility implementation may still accept stateless frontend context. Internally, the runtime should be designed around sessions so the frontend can later become thinner.

Runtime Architecture

Recommended Python modules:

app/api/
  decision_routes.py
  speech_routes.py
  data_routes.py

app/domain/
  models.py
  session.py
  flow_registry.py
  decision_engine.py
  flow_orchestrator.py
  candidate_builder.py
  guards.py
  readback.py
  templates.py
  radio_normalizer.py
  trace.py

app/services/
  radio_training_service.py
  speech_service.py
  transcription_service.py
  flight_data_service.py
  llm_router.py

app/infrastructure/
  repositories.py
  llm_provider.py
  tts_provider.py
  stt_provider.py
  vatsim_client.py
  airport_data_client.py

API routes should only validate, adapt, call services, and return responses. They should not contain state machine logic.

Decision Algorithm

For each pilot transmission:

Load or build the current RuntimeSession.
Resolve current flow and current state.
Build candidate states from allowed transitions, active parallel flows, and valid interrupt flows.
Evaluate guards and conditions deterministically.
Evaluate regex or structured triggers deterministically.
If the current state requires a readback, run the centralized ReadbackEvaluator.
If one candidate remains, select it without an LLM call.
If multiple candidates remain, call the LLM router with only those candidates.
Validate the LLM response against the allowed candidate set.
Apply variable and flag updates through a controlled update mechanism.
Run flow activation, interruption, return, or resume behavior through FlowOrchestrator.
Advance through ATC and system states until the next pilot state.
Return the selected decision, controller templates, updated session state, and trace.

The compatibility response may only include the fields the current frontend expects, but the internal service should already compute the richer result.

LLM Rules

The LLM is a router, not the source of truth.

Allowed:

classify pilot intent
choose among explicit candidate states
help evaluate ambiguous readbacks
extract structured values when deterministic parsing is uncertain

Forbidden:

invent states
skip guards
modify variables outside an allowed schema
generate controller phraseology that conflicts with the selected state
decide flow activation outside declared flow rules

Every LLM decision must be validated. Invalid output becomes a traceable fallback, not an unchecked runtime decision.

Readback and Phrase Normalization

Centralize all aviation phrase handling.

Components:

TemplateRenderer: fills templates with variables.
RadioPhraseNormalizer: converts rendered text to speech-friendly aviation phraseology.
ReadbackEvaluator: checks pilot response against required values.
CallsignNormalizer: handles airline codes, tail numbers, and spoken variants.
FrequencyNormalizer: handles 121.800, 121.8, and spoken variants.
RunwayNormalizer: handles 25R, runway two five right, etc.
NumberNormalizer: handles ICAO digit pronunciation.

Do not implement these per-flow or per-state.

Flow Switching

The runtime must support moving between flows.

Flow activation modes:

main: replace the current main flow.
linear: enter a flow and return when it ends.
parallel: run another flow beside the current one.
interrupt: suspend the active flow and handle a higher-priority flow.
return: finish current flow and resume the previous stacked flow.

Examples:

Taxi flow interrupted by radio check.
Ground flow activates tower handoff.
Tower flow activates departure flow after takeoff.
Abnormal event flow temporarily interrupts approach.

All flow switching must go through FlowOrchestrator. Individual states may declare flow operations, but they must not implement them directly.

Code Patterns and Principles

Use these patterns:

Pydantic DTOs at API boundaries.
Pydantic domain models inside the runtime.
Repository pattern for persistence.
Provider interfaces for LLM, TTS, STT, VATSIM, airport data.
Strategy pattern for trigger and condition evaluators.
Pure functions for rendering, normalization, parsing, and guard evaluation.
Trace-first decision design.
Adapter pattern for current frontend compatibility.

Principles:

Deterministic logic before LLM logic.
One canonical model for flows and states.
One central evaluator for readbacks.
One central renderer and normalizer for phraseology.
One orchestrator for flow switching.
The frontend displays and records; the backend owns scenario truth.
Every state transition must be explainable in a trace.

Known Risks

Frontend-owned state progression

The current frontend applies decisions locally and advances through ATC/system states itself. This is acceptable for compatibility, but the target architecture should move this responsibility into the backend session runtime.

Risk: frontend and backend can disagree about the current state.

Mitigation: return enough compatibility data now, but design the Python service to produce full runtime results internally.

Duplicate normalization

If transcription, routing, readback checking, and TTS each normalize differently, errors will be hard to debug.

Mitigation: one shared phrase normalization package in the Python runtime.

LLM overreach

An LLM can select invalid states or produce plausible but unsafe phraseology.

Mitigation: candidate-constrained routing, response validation, and fallbacks.

Flow collisions

Parallel or interrupted flows may write the same variable or flag.

Mitigation: session-scoped update policy, namespaced flow-local variables where useful, and explicit allowed update schemas.

Infinite auto transitions

System/ATC auto-advance can loop forever.

Mitigation: max-hop limits, visited-state detection, and traceable loop errors.

Timer duplication

Timer transitions can fire more than once if stored only in frontend state.

Mitigation: backend session timers with ids and consumed status.

Implementation Plan

Phase 1: Contracts and Static Runtime

Define Pydantic models for flows, states, transitions, sessions, decisions, and traces.
Implement GET /api/decision-flows/runtime using static fixture data.
Implement POST /api/llm/decide without LLM, using deterministic candidate selection.
Return the current frontend-compatible response shape.
Add unit tests for model validation and simple state transitions.

Phase 2: Deterministic Decision Engine

Implement CandidateBuilder.
Implement guards, conditions, triggers, and fallback logic.
Implement centralized readback evaluation.
Implement template rendering and radio normalization.
Add tests for taxi, clearance, tower handoff, bad readback, and radio check.

Phase 3: Flow Orchestration

Add RuntimeSession.
Add FlowOrchestrator.
Support main, linear, parallel, interrupt, and return.
Add loop protection and flow-stack tests.
Keep compatibility mode for stateless frontend calls.

Phase 4: LLM Router

Add provider abstraction for LLM calls.
Call LLM only when deterministic routing is ambiguous.
Validate selected state against candidates.
Store request, response, and fallback reason in trace.
Add tests with mocked LLM output, including invalid output.

Phase 5: Speech and External Data

Add TTS provider behind SpeechService.
Add STT provider behind TranscriptionService.
Add VATSIM and airport frequency providers behind FlightDataService.
Keep these services independent from the decision engine.

Phase 6: Persistence and Migration

Choose persistence for flows and sessions.
Implement repositories.
Import or author initial production flows.
Add versioning for flow schemas.
Add admin/editor compatibility only if needed.

Later Frontend Refactor: Remove or Change

This section is intentionally explicit. These items are not required for the first Python backend, but should be removed or changed once the backend owns sessions and full progression.

Remove frontend decision ownership

Current behavior to remove later:

frontend builds the full LLM decision context
frontend applies next_state
frontend mutates variables and flags
frontend advances through ATC/system states until the next pilot turn
frontend infers active candidates

Target behavior:

frontend sends session_id, pilot_utterance, audio metadata, and optional UI context
backend returns updated session state, visible state summary, messages to speak, trace, and available actions

Future endpoint:

POST /api/radio/session/{session_id}/transmissions

Future response:

{
  "session": {
    "id": "string",
    "active_flow": "tower",
    "current_state": "TOWER_LINEUP",
    "variables": {},
    "flags": {}
  },
  "messages": [
    {
      "role": "atc",
      "template": "Lufthansa 359, line up runway 25R",
      "rendered": "Lufthansa 359, line up runway 25R",
      "normalized": "Lufthansa tree fife niner, line up runway too fife right"
    }
  ],
  "trace": {},
  "expected_pilot": []
}

Replace compatibility field names

Fields like say_tpl, utterance_tpl, next_state, and controller_say_tpl exist for compatibility. Later frontend code can move to clearer names:

say_tpl -> sayTemplate or template
utterance_tpl -> expectedPilotTemplate
controller_say_tpl -> controllerMessage.template
next_state -> transition.targetState

Do not change these during the compatibility phase.

Remove frontend ATC speech scheduling assumptions

The frontend currently receives one decision and then schedules speech from locally collected ATC states.

Target behavior:

backend returns an ordered messages array
frontend only plays messages in order
frontend does not need to know how ATC/system auto-advance works

Remove frontend flow-stack logic

Any later UI state for active flows should be display-only. Flow activation, return, interrupt, and parallel execution should be backend session state.

Simplify frontend debug panels

The frontend may still show trace data, but it should not reconstruct trace logic. The backend should return trace steps that are ready for display:

candidates considered
candidates eliminated
guard failures
readback result
LLM call, if any
fallback, if any

Replace stateless decision calls

The current compatibility call sends the whole state context each time. Later, the frontend should call session-based endpoints:

create session
get session
submit transmission
reset session
select scenario/flow

This reduces frontend complexity and prevents backend/frontend state drift.

Acceptance Criteria

Current /pm frontend can run against the Python backend without functional changes.
A developer can define a new flow without writing routing code.
Deterministic routes work without LLM calls.
Ambiguous routes use LLM only within allowed candidates.
Readback checks are centralized and tested.
Flow switching is handled by one orchestrator.
Every decision returns a useful trace.
Later frontend refactor work is isolated to removing the compatibility adapter and replacing frontend-owned runtime behavior with session API calls.

17 KiB Raw Permalink Blame History

PM Python Runtime Contract and Reimplementation Plan

Goal

Non-Goals

Required Compatibility API

GET /api/decision-flows/runtime

POST /api/llm/decide

POST /api/atc/say

POST /api/atc/ptt

Supporting Data Endpoints

Core Domain Model

Flow

State

Transition

Runtime Session

Runtime Architecture

Decision Algorithm

LLM Rules

Readback and Phrase Normalization

Flow Switching

Code Patterns and Principles

Known Risks

Frontend-owned state progression

Duplicate normalization

LLM overreach

Flow collisions

Infinite auto transitions

Timer duplication

Implementation Plan

Phase 1: Contracts and Static Runtime

Phase 2: Deterministic Decision Engine

Phase 3: Flow Orchestration

Phase 4: LLM Router

Phase 5: Speech and External Data

Phase 6: Persistence and Migration

Later Frontend Refactor: Remove or Change

Remove frontend decision ownership

Replace compatibility field names

Remove frontend ATC speech scheduling assumptions

Remove frontend flow-stack logic

Simplify frontend debug panels

Replace stateless decision calls

Acceptance Criteria

17 KiB

Raw Permalink Blame History

`GET /api/decision-flows/runtime`

`POST /api/llm/decide`

`POST /api/atc/say`

`POST /api/atc/ptt`