> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> Two lines to your first completion with cost + latency — no server, no setup

By the end of this page — **in under three minutes** — you'll have made a
real LLM call, seen the cost and latency on the response, swapped providers
with one string change, and added automatic fallbacks. No server, no
Docker, no config files.

<Info>
  **What you need right now:** an OpenAI API key (or Anthropic, Groq, etc.
  — any of the 13 providers). Nothing else.
</Info>

## 1. Install — 30 seconds

```bash theme={null}
pip install opentracy
```

```bash theme={null}
export OPENAI_API_KEY=sk-...
```

## 2. Your first call — 30 seconds

```python theme={null}
import opentracy as ot

resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)

print(resp.choices[0].message.content)
print(f"cost: ${resp._cost:.6f}  latency: {resp._latency_ms:.0f}ms")
```

```text theme={null}
Hi there, friend!
cost: $0.000008  latency: 612ms
```

<Tip>
  **This is the hook.** Every response already carries `_cost` and
  `_latency_ms`. You didn't wire up any observability — it's on by default.
  `ot.completion` is OpenAI-compatible, so `resp.choices[0].message.content`,
  `resp.usage`, and streaming all work like you'd expect.
</Tip>

## 3. Switch providers with one string — 1 minute

Same function, same message shape, different provider. No new SDK, no new
auth code:

```python theme={null}
# Anthropic
resp = ot.completion(
    model="anthropic/claude-haiku-4-5-20251001",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)

# Groq (Llama, sub-second)
resp = ot.completion(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)

# DeepSeek (cheap reasoning)
resp = ot.completion(
    model="deepseek/deepseek-chat",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)
```

Each provider reads its own env var (`ANTHROPIC_API_KEY`, `GROQ_API_KEY`,
`DEEPSEEK_API_KEY`, ...). The 13-provider matrix is in the [completion
reference](/api-reference/completion#parameters).

## 4. Add fallbacks — 1 minute

Production calls that survive one provider being down:

```python theme={null}
resp = ot.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Draft a pithy tagline."}],
    fallbacks=[
        "anthropic/claude-sonnet-4-6",
        "deepseek/deepseek-chat",
    ],
    num_retries=1,
)

print(resp._provider)   # which one actually answered
```

If OpenAI rate-limits you, Anthropic picks up. If Anthropic is degraded,
DeepSeek does. You don't get paged.

## 5. Done. What you now have.

<CardGroup cols={2}>
  <Card title="OpenAI-compatible" icon="check">
    Same message format, same response shape. Any existing code moves over.
  </Card>

  <Card title="13 providers" icon="shuffle">
    Switch at any time. One string change, no auth rewrite.
  </Card>

  <Card title="Cost + latency by default" icon="dollar-sign">
    `_cost` and `_latency_ms` on every response. No setup.
  </Card>

  <Card title="Production fallbacks" icon="shield-halved">
    Survive provider outages without writing retry logic yourself.
  </Card>
</CardGroup>

## Where to go next

<CardGroup cols={2}>
  <Card title="Drop in over the OpenAI SDK" icon="arrow-right-arrow-left" href="/guides/drop-in-openai">
    Point existing OpenAI code at OpenTracy — zero library changes.
    <br />**\~2 minutes.**
  </Card>

  <Card title="Semantic auto-routing" icon="route" href="/concepts/auto-routing">
    Let the router pick the cheapest model that's good enough per prompt.
    <br />**\~5 minutes** (downloads \~100 MB of weights once).
  </Card>

  <Card title="Full observability" icon="magnifying-glass" href="/guides/self-host">
    Self-host to capture every trace in ClickHouse + a UI for analytics.
    <br />**\~30 minutes** (needs Docker).
  </Card>

  <Card title="Distill your own model" icon="wand-magic-sparkles" href="/concepts/distillation">
    Fine-tune a tiny student from your traffic. The cost-reduction wedge.
    <br />**\~2 hours** (needs self-host + a GPU).
  </Card>
</CardGroup>

## Optional: try the semantic auto-router

If you want to see the full pipeline in action — including the model
picking itself per prompt based on learned error profiles — load the
pre-trained router. This downloads \~100 MB of weights on first run and
caches them in `~/.local/share/opentracy/`.

```python theme={null}
import opentracy as ot

router = ot.load_router(cost_weight=0.5)

for prompt in [
    "What is the capital of France?",
    "Prove the square root of 2 is irrational.",
    "Write a haiku about autumn.",
]:
    d = router.route(prompt)
    print(f"[{d.selected_model:<24}] cluster={d.cluster_id:>3}  {prompt}")
```

```text theme={null}
[ministral-3b-latest     ] cluster= 84  What is the capital of France?
[gpt-4o                  ] cluster= 47  Prove the square root of 2 is irrational.
[ministral-3b-latest     ] cluster= 29  Write a haiku about autumn.
```

Easy trivia → a cheap small model. Math proof → a strong model. **No
rules from you.** See [Auto-routing](/concepts/auto-routing) for the
full picture.
