> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# The Pipeline

> How requests, traces, datasets, and distilled models fit together into one closed loop

<img src="https://mintcdn.com/opentracy/GPH9CFICBELzB50g/images/diagram.jpeg?fit=max&auto=format&n=GPH9CFICBELzB50g&q=85&s=0d5da28aa6a9fb2587cb90eba9814eb8" alt="OpenTracy loop diagram: your app → engine → providers, with the trace/dataset/distillation/alias loop feeding back in" width="1280" height="714" data-path="images/diagram.jpeg" />

OpenTracy is one loop, four pieces. Every piece exists to feed the next one.
Once you understand the loop, the rest of the docs are detail.

## Why each piece exists

### ① Gateway

Your app needs *something* between it and thirteen different provider APIs.
OpenTracy is OpenAI-compatible, so you point your OpenAI SDK at the engine
URL and **none of your code changes**. On top of that, the gateway gives
you retries, fallbacks, provider-level observability, and the ability to
swap a model's implementation without redeploying.

Without a gateway: every new model adoption is a code change. With it:
models are a routing config.

### ② Traces

Every request the gateway handles is recorded: **prompt, response, model,
provider, cost in USD, latency in ms, token counts, and any metadata you
attach**. That's the trace. Traces are the single asset that makes the rest
of the pipeline possible — you can't distill from data you didn't capture.

Traces are stored in ClickHouse (self-hosted) and exposed via the REST API
and UI. See [Traces](/concepts/traces) for the schema and where they live.

### ③ Datasets

A single trace isn't useful. A thousand traces grouped by intent is a
dataset. OpenTracy clusters traces automatically using prompt embeddings,
names each cluster with an LLM ("JavaScript Concepts", "Invoice Classification",
etc.), and lets you curate: keep the good ones, drop the hallucinations,
add a judge's verdict per row.

A dataset is the bridge between "I have traffic" and "I can train something".
See [Datasets](/concepts/datasets).

### ④ Distillation

Pick a teacher (the expensive model — GPT-4o, Claude Sonnet, etc.) and a
student (a small open model — llama-3.2-1b, qwen3-0.6b, mistral-small).
The teacher generates high-quality labels for each prompt in your dataset;
the student is fine-tuned on those labels using the BOND (best-of-N
distillation) loss. You end up with a small LoRA adapter that **matches the
teacher on your specific workload**, at a fraction of the cost.

This is the wedge — the thing no generic gateway gives you. See
[Distillation](/concepts/distillation).

### ⑤ Auto-routing + alias swap

The router picks a model per prompt based on a learned error profile per
cluster. An **alias** (e.g. `model="smart"`) is a logical name that the
engine resolves at routing time. When a distilled student is ready, you
re-point the alias — the app keeps calling `model="smart"` and its cost
drops overnight.

This is how the loop closes. See [Auto-routing](/concepts/auto-routing).

## Concretely, what a week looks like

<Steps>
  <Step title="Day 0 — install and point traffic">
    `pip install opentracy` and change `base_url` in your OpenAI client
    to the engine URL. Your app now flows through OpenTracy.
  </Step>

  <Step title="Day 0 — same day, zero code changes ">
    Every request is captured as a trace. Cost and latency per call are
    visible in the UI. The auto-router is already picking cheaper models
    for easy prompts.
  </Step>

  <Step title="Day 2–5 — auto-clustering">
    Traces accumulate. The engine clusters prompts by intent. You review
    clusters in the UI, give them names, pick the ones worth distilling.
  </Step>

  <Step title="Day 5–7 — first distillation run">
    Submit a distillation job: teacher = your current expensive model,
    student = a small open model. Training runs on your GPU (or the engine's
    GPU if you're using a self-host with one). Output: a LoRA adapter.
  </Step>

  <Step title="Day 7 onward — alias swap">
    Point alias `smart` at the distilled student. The app keeps calling
    `model="smart"`. Cost curve drops. Rinse and repeat for other clusters.
  </Step>
</Steps>

## What OpenTracy does NOT do (yet)

* **Train from scratch.** Distillation is always teacher → student fine-tuning.
  If you need a model from raw text, use something else.
* **Handle vision or audio end-to-end.** The pipeline is chat-completion
  shaped (messages in, messages out). Images-in-prompts work; full
  multimodal training does not.
* **Replace your evaluation harness for novel research.** OpenTracy has
  evaluations for "is this distilled student as good as the teacher", not
  for "which of these 20 new models is best on a benchmark we just invented".

## Next

<CardGroup cols={2}>
  <Card title="Traces" icon="database" href="/concepts/traces">
    What's captured, the schema, where it lives.
  </Card>

  <Card title="Datasets" icon="diagram-project" href="/concepts/datasets">
    How traces become training-ready data.
  </Card>

  <Card title="Auto-routing" icon="route" href="/concepts/auto-routing">
    How the router picks a model per prompt.
  </Card>

  <Card title="Distillation" icon="wand-magic-sparkles" href="/concepts/distillation">
    Teacher, student, LoRA, alias swap.
  </Card>
</CardGroup>
