> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# ot.load_router

> Load the semantic auto-router with pre-trained weights

```python theme={null}
ot.load_router(
    weights_path: Optional[Path] = None,
    weights_name: str = "default",
    embedding_model: str = "all-MiniLM-L6-v2",
    cost_weight: float = 0.0,
    use_soft_assignment: bool = True,
    allowed_models: Optional[list[str]] = None,
    download_if_missing: bool = True,
    verbose: bool = True,
    engine: str = "go",
) -> UniRouteRouter
```

Returns a router that picks a model per prompt based on learned per-cluster
error profiles + a cost weight. See
[Auto-routing](/concepts/auto-routing) for the conceptual model.

## Parameters

| Name                  | Type         | Description                                                                                                                                                                                                        |
| --------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `weights_path`        | `Path?`      | Directory containing `clusters/`, `profiles/`, and `manifest.json`. If `None`, resolves to the wheel-bundled `weights-mmlu-v1/` (zero network on the default path); non-default packs fall back to a hub download. |
| `weights_name`        | `str`        | Named weights package. `"default"` (alias for `weights-mmlu-v1`) ships in every wheel so `load_router()` works offline on first call. Pass a different name to fetch from the hub.                                 |
| `embedding_model`     | `str`        | SentenceTransformer model name. Only used by `engine="python"` — the Go backend uses its bundled ONNX MiniLM.                                                                                                      |
| `cost_weight`         | `float`      | λ in the decision rule `score = expected_error + λ · cost`. Range `[0, ∞)`. `0.0` = quality-first; `0.5` = balanced; `1.0+` = cheap-first.                                                                         |
| `use_soft_assignment` | `bool`       | If `True`, compute a probability distribution over all 100 clusters instead of hard-assigning to one. Slightly slower but more robust at cluster boundaries.                                                       |
| `allowed_models`      | `list[str]?` | Restrict candidates. E.g. `["gpt-4o-mini", "gpt-4o"]`.                                                                                                                                                             |
| `download_if_missing` | `bool`       | Download the weights package on first run. Default `True`.                                                                                                                                                         |
| `verbose`             | `bool`       | Print progress / health info on load.                                                                                                                                                                              |
| `engine`              | `str`        | `"go"` (default, production path — bundled binary), `"python"` (pure Python, no subprocess), or `"auto"` (prefer Go, silent fallback).                                                                             |

## Returns — `UniRouteRouter`

### Attributes

```python theme={null}
router.registry                # LLMRegistry — get_model_ids(), get(id) → LLMProfile
router.cluster_assigner        # KMeansClusterAssigner — .num_clusters, .assign(vec)
router.embedder                # PromptEmbedder — .embed(text), .dimension (=384)
router.cost_weight             # the λ you passed
router.allowed_models          # list[str] or None
router.stats                   # dict of per-model routing counts / latencies
```

### Methods

#### `.route(prompt, available_models=None, cost_weight_override=None) → RoutingDecision`

Routes a single prompt.

```python theme={null}
d = router.route("Write a Python function that reverses a linked list.")
d.selected_model          # str   — the chosen model id
d.cluster_id              # int   — 0..99
d.expected_error          # float — Ψ[model, cluster]
d.cost_adjusted_score     # float — error + λ·cost
d.all_scores              # dict[str, float] — scores for every candidate
d.cluster_probabilities   # np.ndarray(100,) — soft distribution if enabled
d.reasoning               # str   — human-readable explanation
```

#### `.route_batch(prompts: list[str]) → list[RoutingDecision]`

Batched routing. Embeds everything in one call for throughput.

#### `.route_and_execute(prompt, messages, **kwargs) → ModelResponse`

Convenience that routes, then immediately calls `ot.completion` on the
chosen model. Shorter than writing the pair yourself.

#### `.get_best_model_for_cluster(cluster_id) → str`

Given a cluster id, returns the model that minimizes `expected_error +
λ·cost`. Useful for analytics ("what would I route to on cluster 42?").

#### `.analyze_routing_distribution(prompts) → dict`

Returns a histogram: `{model_id: count}` over a batch of prompts.

#### `.reset_stats()`

Zero out the routing counters.

## Examples

### Simple load and route

```python theme={null}
import opentracy as ot

router = ot.load_router(cost_weight=0.5)

for p in ["What is 2+2?", "Prove √2 is irrational.", "Write a haiku."]:
    d = router.route(p)
    print(d.selected_model, "→", p)
```

### Restrict candidates (cost ceiling)

```python theme={null}
router = ot.load_router(
    cost_weight=0.5,
    allowed_models=["ministral-3b-latest", "gpt-4o-mini", "gpt-4o"],
)
```

### Override λ per-call

```python theme={null}
# Normal mode: balanced
d = router.route(prompt)

# Cost-sensitive burst: force cheap
d_cheap = router.route(prompt, cost_weight_override=2.0)
```

### Python backend (introspection)

```python theme={null}
# Useful when you want to inspect profiles, centroids, etc.
router = ot.load_router(engine="python")
for mid in router.registry.get_model_ids():
    p = router.registry.get(mid)
    print(mid, p.cost_per_1k_tokens, p.psi_vector[:5])
```

## Failure modes

| Error                                                    | Cause                                                               | Fix                                                                                     |
| -------------------------------------------------------- | ------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| `FileNotFoundError: opentracy-engine binary not bundled` | Running on a platform without a published wheel (e.g. macOS Intel). | Fall back to `engine="python"` or file an issue for the missing platform.               |
| `ImportError: sentence-transformers package required`    | `engine="python"` without `[research]` extra.                       | `pip install opentracy[research]`.                                                      |
| `ValueError: Unknown package 'weights-default'`          | `hub/index.json` missing (old wheel).                               | Upgrade: `pip install -U opentracy`.                                                    |
| Network error on first run (non-default pack)            | Hub download failed for a non-default weights name.                 | The default pack (`"default"` / `"weights-mmlu-v1"`) ships in the wheel — try it first. |

## Engine backends — why the default matters

`engine="go"` is the default because the Go backend:

* Is 5–10× faster per routing decision (sub-millisecond vs a few ms).
* Runs in a subprocess so Python GIL contention doesn't slow routing.
* Uses the same ONNX runtime across all platforms — deterministic behavior.

The Python backend exists for:

* Research / inspection (you can swap the cluster assigner, monkey-patch
  profiles, etc.).
* Environments that forbid process-spawn (some sandboxes, Lambda, etc.).

**Avoid `engine="auto"`** in production. It silently falls back to Python
if the Go binary is missing, which usually means a misconfigured install
rather than an intentional choice. Explicit `"go"` fails loudly with a
clear message.
