Date: 2026-03-15 (updated 2026-05-23)
Author: Ravi Natarajan
Motivation
Modern AI systems rely on multiple models with different strengths, costs, latency profiles, and governance constraints. A single model is rarely the right choice for every task. Hardcoding model choices inside every agent creates tight coupling, drives up cost, and makes compliance impossible to enforce consistently.
K9-AIF introduces the K9ModelRouter — a weighted-scoring model router built into the framework’s inference layer. Every agent call goes through it. Model selection is centralized, governed, and fully observable.
What the K9ModelRouter Does
The K9ModelRouter selects the most appropriate model for each InferenceRequest by scoring all catalog candidates and choosing the highest scorer. It is not a round-robin or a hardcoded priority list — it is a signal-driven decision.
Four signals drive the score:
| Signal | Condition | Score |
|---|---|---|
| Capability match | task_type appears in model’s capabilities[] |
+3 |
| Sensitivity gate | sensitivity == "confidential" and model has "confidential" capability |
+2 |
| Latency match | latency_budget matches model’s latency_tier |
+2 |
| Cost match | cost_profile matches model’s cost_tier |
+2 |
The model with the highest total score wins. If no candidate scores above zero — no signal fired — the router falls back to the configured default_model.
Architectural Position
The router belongs to the inference layer and sits between agents and model providers. Agents never call model providers directly.
Agent
└─ llm_invoke(config, InferenceRequest)
└─ ModelRouterFactory.get_router(config) # cached router instance
└─ K9ModelRouter.route(request) # scores all catalog models
└─ ModelCatalog.get_model(alias) # looks up llm_ref
└─ LLMFactory.get(llm_ref) # cached OllamaLLM instance
└─ OllamaLLM.invoke(prompt)
This chain is the same for every agent in every squad. Agents declare a task_type in their YAML — the router resolves it to a model. No agent ever names a model directly.
InferenceRequest — the Routing Contract
Agents build an InferenceRequest to signal what they need:
from k9_aif_abb.k9_inference.models.inference_request import InferenceRequest
req = InferenceRequest(
prompt="Assess this claim for fraud indicators...",
task_type="reasoning", # +3 for any model with "reasoning" capability
sensitivity="confidential", # +2 for models with "confidential" capability
latency_budget="interactive", # +2 if model's latency_tier matches
cost_profile="standard", # +2 if model's cost_tier matches
metadata={"agent": "FraudDetectionAgent"},
)
All fields except prompt are optional. Omitting them degrades gracefully — the router simply fires fewer signals, which is fully backwards compatible.
Concrete Example — EOC Model Catalog
The K9X Enterprise Insurance Operations Center configures four models:
| Alias | Model | Capabilities | Latency | Cost |
|---|---|---|---|---|
general |
llama3.2:1b | general, chat, summarization | realtime | minimal |
reasoning |
granite3-dense:2b | reasoning, adjudication, fraud, policy_compliance | interactive | standard |
guardian |
granite3-guardian:latest | guardrails, policy, confidential, pii_detection | interactive | standard |
extraction |
granite3-dense:2b | extraction, structured_output, ocr_post_processing | interactive | standard |
For a FraudDetectionAgent request with task_type="reasoning":
reasoningmodel scores: +3 (capability match) = 3.0guardianmodel scores: 0 (no “reasoning” capability)generalmodel scores: 0 (no “reasoning” capability)extractionmodel scores: 0
Winner: granite3-dense:2b via the reasoning alias.
For a GuardAgent request with task_type="guardrails" and sensitivity="confidential":
guardianmodel scores: +3 (guardrails) + 2 (confidential) = 5.0- All others: 0
Winner: granite3-guardian:latest. No fallback. Hard requirement.
Persistence
After every routing decision, the router persists to the state store (SQLite in development, PostgreSQL in production):
- Session — created or resumed per request
- Turn — the user prompt, role, token count
- Routing decision — selected model, rationale,
complexity_score,governance_score,score, provider metadata
complexity_score is derived from task_type (reasoning=0.8, extraction=0.6, general=0.3). governance_score is 1.0 when sensitivity=="confidential", 0.0 otherwise. Both are stored alongside every decision — enabling compliance reporting and routing analytics.
Persistence backend is configured in config.yaml:
inference:
router:
persistence: sqlite # or "postgres" for production
Why This Is an Architectural Concern
In most AI projects, model selection is buried in application code. K9-AIF treats it as a first-class architectural concern because the model chosen affects:
- operational cost
- latency
- compliance posture — some models must never handle PII
- deployment flexibility — swap providers without touching application code
- long-term maintainability
Centralizing this behind the router makes every model decision explicit, auditable, and changeable without modifying agents.
For Developers
The full invocation pattern, how to add a model to the catalog, and how to extend the scoring signals are documented in SKILLS.md (Skills 2, 3, 8).
The EOC is the reference implementation showing the router in production across 7 squads and 8 agents: → K9X Enterprise Insurance Operations Center
Architecture Diagram

K9-AIF is an architecture-first framework for governed, enterprise-scale multi-agent AI systems.
More at k9x.ai · github.com/k9aif/k9-aif-framework