> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# POST /v1/route

> Get the router's decision without running the completion

Returns the model the auto-router would pick for a given prompt, along
with the full per-candidate scores. **No provider call is made** and
**no trace is written** — this is a pure routing decision.

Useful for:

* Pre-flight A/B experiments (see what would be picked before sending it)
* Analytics dashboards ("what percentage of my traffic lands in cluster 47?")
* Custom client logic that wants to make the final routing call itself

```http theme={null}
POST /v1/route HTTP/1.1
Host: localhost:8080
Content-Type: application/json
```

## Request body

```json theme={null}
{
  "prompt": "Prove the square root of 2 is irrational.",
  "cost_weight": 0.5,
  "available_models": ["gpt-4o-mini", "gpt-4o", "ministral-3b-latest"]
}
```

| Field              | Type              | Description                                                            |
| ------------------ | ----------------- | ---------------------------------------------------------------------- |
| `prompt`           | `string`          | The user text. Required unless `embedding` is supplied.                |
| `embedding`        | `array of float`  | Optional pre-computed vector (384-d MiniLM). Skips embedder.           |
| `available_models` | `array of string` | Restrict candidates to this subset. If omitted, all registered models. |
| `cost_weight`      | `float`           | Override λ for this call. `0.0` = quality-first; `1.0+` = cheap-first. |

## Response body

```json theme={null}
{
  "selected_model": "gpt-4o",
  "cluster_id": 47,
  "expected_error": 0.01,
  "cost_adjusted_score": 0.0031,
  "all_scores": {
    "gpt-4o": 0.0031,
    "gpt-4o-mini": 0.0412,
    "ministral-3b-latest": 0.1821
  },
  "cache_hit": false,
  "usage": {
    "routing_ms": 1.3,
    "embedding_ms": 0.8
  }
}
```

| Field                 | Type     | Meaning                                                                   |
| --------------------- | -------- | ------------------------------------------------------------------------- |
| `selected_model`      | `string` | The model that minimizes `error + λ·cost` on this cluster.                |
| `cluster_id`          | `int`    | Which of the 100 semantic clusters the prompt landed in.                  |
| `expected_error`      | `float`  | Predicted error rate for `selected_model` on `cluster_id`.                |
| `cost_adjusted_score` | `float`  | The minimized score — `expected_error + λ · cost_per_1k`.                 |
| `all_scores`          | `object` | Score for every candidate considered.                                     |
| `cache_hit`           | `bool`   | Whether the decision came from the LRU cache (same prompt seen recently). |
| `usage.routing_ms`    | `float`  | Wall time spent in routing logic.                                         |
| `usage.embedding_ms`  | `float`  | Time spent embedding the prompt (0 if `embedding` was supplied).          |

## Curl

```bash theme={null}
curl http://localhost:8080/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Classify this ticket: billing | technical | other",
    "cost_weight": 0.7
  }'
```

## Using a pre-computed embedding

If you already have the embedding (e.g. from another pipeline stage),
pass it to skip the embedder:

```bash theme={null}
curl http://localhost:8080/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.021, -0.114, ...384 floats...],
    "cost_weight": 0.5
  }'
```

`usage.embedding_ms` will be `0` when the embedding is supplied.

## Errors

| Status | `error.code`       | Meaning                                                       |
| ------ | ------------------ | ------------------------------------------------------------- |
| `400`  | `invalid_request`  | Neither `prompt` nor `embedding` provided, or malformed body. |
| `503`  | `router_not_ready` | Router weights still loading. Check `/health`.                |
