> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Python SDK

> Using opentracy directly — the Python-first path for new apps

The Python SDK (`opentracy`) is the native entry point. Use it if you're
starting a new project or if you want features (auto-routing, distillation,
trace ingestion) that aren't part of the OpenAI API shape.

## Install

```bash theme={null}
pip install opentracy
```

One install pulls a platform-specific wheel with the Go engine binary, the
ONNX embedder, and pre-trained routing weights bundled in. No extras
needed for the core path.

```bash theme={null}
pip install "opentracy[distill]"    # adds training deps (torch, unsloth, peft, trl)
pip install "opentracy[research]"   # adds sentence-transformers for the Python router backend
pip install "opentracy[server]"     # adds FastAPI + ClickHouse for self-hosting
pip install "opentracy[anthropic]"  # native Anthropic SDK path
pip install "opentracy[all]"        # everything
```

## The four things you'll do

### 1. One-off completion

Just a chat completion, no routing, no trace.

```python theme={null}
import opentracy as ot

resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    temperature=0,
)
print(resp.choices[0].message.content)
```

Full API: [completion reference](/api-reference/completion).

### 2. Explicit router with fallbacks

When you want deterministic rules ("try GPT-4o first, then Claude, then
DeepSeek"), use the `Router` class:

```python theme={null}
router = ot.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
    strategy="round-robin",   # or "least-cost", "lowest-latency", "weighted-random"
    num_retries=2,
    timeout=60,
)

resp = router.completion(
    model="smart",   # logical alias, resolved to one of the deployments
    messages=[{"role": "user", "content": "..."}],
)
```

Full API: [Router reference](/api-reference/router).

### 3. Semantic auto-router

Load the pre-trained router once; it picks the right model per prompt:

```python theme={null}
auto = ot.load_router(cost_weight=0.5)

decision = auto.route("Write a haiku about autumn")
print(decision.selected_model)      # e.g. "ministral-3b-latest"
print(decision.cluster_id)          # e.g. 87
print(decision.expected_error)      # e.g. 0.212
print(decision.all_scores)          # full score dict
```

Combined with `ot.completion` this becomes a cost-optimizing client:

```python theme={null}
def smart_call(prompt: str, api_key: str) -> str:
    d = auto.route(prompt)
    resp = ot.completion(
        model=d.selected_model,
        messages=[{"role": "user", "content": prompt}],
        api_key=api_key,
    )
    return resp.choices[0].message.content
```

Full API: [load\_router reference](/api-reference/load-router).

### 4. Distillation

The one-call path — `ot.distill()` runs the full 4-phase pipeline
in-process and returns a callable `Student`. Needs `opentracy[distill]`
and a CUDA GPU.

```python theme={null}
import opentracy as ot

student = ot.distill(
    dataset="tickets.jsonl",          # path, list[dict], or a callable
    teacher="openai/gpt-4o",
    student="llama-3.2-1b",
    steps=100,
    quantize="q4_k_m",                # or None to skip GGUF export
)

print(student("Classify: refund please"))       # local inference, $0

# Ship it behind a logical name — app code never changes
student.deploy("ticket-classifier")
resp = ot.completion(model="ticket-classifier", messages=[...])
```

Full API: [ot.distill reference](/api-reference/distill).

For the long-running, queued REST flow against a self-hosted engine
(ClickHouse-backed jobs, UI observability), use
[`Distiller`](/api-reference/distiller) instead — same engine,
different deployment shape.

## Async

Everything that has a sync version has `async`:

```python theme={null}
import asyncio
import opentracy as ot

async def main():
    resp = await ot.acompletion(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "hello"}],
    )
    print(resp.choices[0].message.content)

asyncio.run(main())
```

`acompletion` shares its request-preparation path with the sync version,
so `force_engine`, `force_direct`, fallbacks, and engine-prefix handling
all behave identically.

## Trace ingestion

If you have existing logs from another LLM provider and want to use them
for dataset building or distillation in OpenTracy, you can import them
directly:

```python theme={null}
from opentracy import add_trace, add_traces, import_traces

# Single trace
add_trace({
    "prompt": "Classify: ...",
    "response": "billing",
    "model": "openai/gpt-4o",
    "total_cost_usd": 0.00025,
    "latency_ms": 340,
    "metadata": {"source": "legacy-log-export"},
})

# Batch
add_traces([{...}, {...}, {...}])

# From a JSONL file
import_traces("path/to/exported-traces.jsonl")
```

## Engine routing opt-in

By default the SDK calls providers directly. To route through an OpenTracy
engine (for observability, aliases, etc.), set the env var **once**:

```bash theme={null}
export OPENTRACY_ENGINE_URL="http://localhost:8080"
```

From that point on, `ot.completion(...)` routes through the engine.
Per-call overrides:

```python theme={null}
# Always engine (even if OPENTRACY_ENGINE_URL is unset):
ot.completion(..., force_engine=True)

# Always direct (even if OPENTRACY_ENGINE_URL is set):
ot.completion(..., force_direct=True)
```

Why isn't this automatic? Because silently routing through whatever happens
to be listening on `localhost:8080` is a footgun. Opt-in is explicit.

## 13 providers via `create_client`

If you want a first-class `LLMClient` object (for profiling, or to fit into
custom routing code), `create_client` covers every provider:

```python theme={null}
c = ot.create_client("openai",   "gpt-4o-mini")       # dedicated class
c = ot.create_client("deepseek", "deepseek-chat")     # UnifiedClient wrapper
c = ot.create_client("together", "meta-llama/Llama-3")# UnifiedClient wrapper

out = c.generate("Hello", max_tokens=64, temperature=0.0)
print(out.text, out.latency_ms, out.tokens_used)
```

Five providers have dedicated classes (OpenAI, Anthropic, Google, Groq,
Mistral); the remaining seven (DeepSeek, Perplexity, Cerebras, Sambanova,
Together, Fireworks, Cohere) route through a `UnifiedClient` that speaks
the OpenAI-chat protocol. Bedrock is registered but raises a clear error
on construction — AWS SigV4 is not handled by `UnifiedClient` yet; use
`ot.completion(force_engine=True)` instead.

## Public API

Everything `import opentracy as ot` exposes publicly:

```python theme={null}
# Core
ot.completion, ot.acompletion, ot.Router, ot.ModelResponse, ot.StreamChunk, ot.parse_model
# Multi-provider
ot.create_client, ot.LLMResponse
# Pricing
ot.model_cost, ot.get_model_info, ot.supported_models
# Trace ingestion
ot.add_trace, ot.add_traces, ot.import_traces
# Distillation — one-call + REST client
ot.distill, ot.DistillError, ot.Student, ot.StudentError
ot.Distiller, ot.TrainingClient, ot.DistillerError
# Local alias registry (distilled students map to logical model names)
ot.set_alias, ot.unset_alias, ot.list_aliases, ot.get_alias
# Version
ot.__version__
```

Lazy research APIs (`load_router`, `UniRouteRouter`, `RouterEvaluator`,
`LLMJudge`, ...) resolve via `__getattr__` — they import the first
time you touch them, so they don't slow down the initial `import opentracy`.

<Info>
  Legacy code using `import lunar_router as lr` keeps working via a
  backwards-compat shim that redirects to `opentracy` and emits a
  `DeprecationWarning`. New code should use `import opentracy as ot`.
</Info>

## Next

<CardGroup cols={2}>
  <Card title="Self-host" icon="server" href="/guides/self-host">
    Run engine + ClickHouse + UI locally or in your cloud.
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/completion">
    Every parameter and return value.
  </Card>
</CardGroup>
