> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# ot.distill

> One-call distillation — train a custom student from a dataset and get back a callable Student

`ot.distill()` runs the full 4-phase BOND distillation pipeline
(data-generation → curation → training → export) **in-process** and
returns a callable [`Student`](#the-student-returned-by-distill). No
FastAPI service, no ClickHouse, no job-polling — if you just want to
train a student from a dataset and use it, this is the API.

For the long-running, queued, multi-tenant REST flow, see the
[`Distiller`](/api-reference/distiller) client — same engine, different
surface.

```python theme={null}
import opentracy as ot

student = ot.distill(
    dataset="tickets.jsonl",
    teacher="openai/gpt-4o",
    student="llama-3.2-1b",
    steps=60,
)

# Use it directly
print(student("Classify: please refund the double charge"))

# Or serve it under a logical name
student.deploy("ticket-classifier")
ot.completion(model="ticket-classifier", messages=[...])
```

<Note>
  `ot.distill()` needs `opentracy[distill]` and a CUDA GPU. It fails fast
  (before any teacher API spend) if `torch` can't import or
  `torch.cuda.is_available()` is `False`.
</Note>

## Signature

```python theme={null}
ot.distill(
    dataset: str | PathLike | list[dict] | Callable[[], Iterable[dict]],
    *,
    teacher: str = "openai/gpt-4o",
    student: str = "llama-3.2-1b",
    num_prompts: Optional[int] = None,
    steps: int = 500,
    n_samples: int = 4,
    bond_beta: float = 0.5,
    bond_gamma: float = 0.1,
    temperature: float = 0.8,
    output_dir: Optional[str | PathLike] = None,
    quantize: Optional[str | list[str]] = "q4_k_m",
    engine_url: Optional[str] = None,
    on_progress: Optional[Callable[[dict], None]] = None,
) -> Student
```

## Parameters

| Name          | Type                       | Description                                                                                                                                         |
| ------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dataset`     | path / list / callable     | A `.jsonl` / `.json` file, a list of `{"prompt": ..., "response": ...}` dicts, or a zero-arg callable that yields them (for streaming from traces). |
| `teacher`     | `str`                      | Provider-prefixed teacher model (e.g. `"openai/gpt-4o"`, `"anthropic/claude-sonnet-4-6"`).                                                          |
| `student`     | `str`                      | Short alias (`"llama-3.2-1b"`) or a full HF repo id. Aliases map via `opentracy.distillation.schemas.STUDENT_MODEL_MAP`.                            |
| `num_prompts` | `int?`                     | Cap on prompts consumed from `dataset`. Default: all of them.                                                                                       |
| `steps`       | `int`                      | Fine-tune optimizer steps. Small datasets need fewer (60–100 works for 20–50 rows).                                                                 |
| `n_samples`   | `int`                      | Best-of-N candidates generated by the teacher per prompt (BOND).                                                                                    |
| `bond_beta`   | `float`                    | BOND preference weight. Defaults fine for classification.                                                                                           |
| `bond_gamma`  | `float`                    | KL regularization strength. Raise if the student overfits.                                                                                          |
| `temperature` | `float`                    | Teacher sampling temperature for candidate generation.                                                                                              |
| `output_dir`  | `path?`                    | Where artifacts land. Defaults to a fresh temp dir.                                                                                                 |
| `quantize`    | `str \| list[str] \| None` | GGUF quantization(s) to export. `"q4_k_m"` (default) is \~500 MB; `None` skips the GGUF phase and returns a PEFT adapter.                           |
| `engine_url`  | `str?`                     | Override the Go engine URL used for teacher calls. If unset, a fresh engine is spawned for the duration and torn down at the end.                   |
| `on_progress` | `callable?`                | Fires once per pipeline phase transition plus any log line, with a dict `{"job_id", "phase", "status", "progress", "log"}`.                         |

## The `Student` returned by `distill()`

A thin wrapper around the freshest artifact.

```python theme={null}
student = ot.distill(...)

student.backend      # "gguf" if a quantization was exported, else "peft"
student.model_path   # absolute path to the .gguf file or the adapter dir
student.base_model   # HF repo id of the base model (needed for PEFT load)

student("Classify: refund please")          # direct inference
student.batch(["Classify: ...", "Classify: ..."])
student.generate(messages=[...])            # full OpenAI-shape response

student.save("./ticket-classifier-v1")      # copy artifact to a durable path
student.deploy("ticket-classifier")         # register under a local alias
```

After `.deploy(alias)`, calling `ot.completion(model=alias, ...)`
dispatches to this student locally — no provider call, no HTTP hop.

See the [`Student` reference](#student-class) below for the full API.

## Dataset shapes

All three are equivalent:

```python theme={null}
# 1. Path to a .jsonl (one dict per line) or .json (single list)
ot.distill(dataset="tickets.jsonl", ...)

# 2. List of dicts
rows = [
    {"prompt": "Classify: ...", "response": "billing"},
    {"prompt": "Classify: ...", "response": "technical"},
]
ot.distill(dataset=rows, ...)

# 3. Callable that yields dicts — useful for streaming from traces
def from_clickhouse():
    for trace in my_trace_source():
        yield {"prompt": trace["prompt"], "response": trace["label"]}

ot.distill(dataset=from_clickhouse, ...)
```

Row field aliases: `prompt` / `input` / `text` all work for the input;
`response` / `expected_output` all work for the gold answer.

## Progress callback

Useful for building a UI around the run or just keeping a tidy timeline
in a notebook:

```python theme={null}
last_phase = None
def on_progress(evt):
    global last_phase
    if evt["phase"] and evt["phase"] != last_phase:
        print(f"→ {evt['phase']}")
        last_phase = evt["phase"]
    if evt["log"]:
        print(f"   {evt['log']}")

ot.distill(dataset=rows, on_progress=on_progress)
```

## Graceful export fallback

If phase 4 (GGUF conversion) fails — for example, `llama.cpp` isn't
installed on the host — `ot.distill()` does **not** crash. It logs a
warning and returns a `Student(backend="peft", model_path=<adapter>)`
pointing at the LoRA adapter that was successfully trained. You still
get a working model; you just serve it via PEFT (1 GB base model in VRAM)
instead of a standalone GGUF file.

To force a GGUF-only path and raise on failure, call the REST-backed
[`Distiller`](/api-reference/distiller) instead.

## Errors — `DistillError`

`ot.distill()` raises `opentracy.DistillError` for pipeline failures.
Common causes:

| Message                                                   | Cause                                        | Fix                                              |
| --------------------------------------------------------- | -------------------------------------------- | ------------------------------------------------ |
| `Training needs PyTorch, but \`import torch\` failed.\`   | `torch` not installed.                       | `pip install -U opentracy[distill]`.             |
| `No CUDA GPU is visible to PyTorch.`                      | Training phase is CUDA-only.                 | Run on a GPU host; on Colab switch Runtime → T4. |
| `Dataset is empty — distill() needs at least one prompt.` | Empty dataset file or all rows filtered out. | Check the file format + field names.             |
| `Distillation requires the \`\[distill]\` extra.\`        | `opentracy` installed without training deps. | `pip install -U 'opentracy[distill]'`.           |

The preflight that raises the torch/CUDA errors can be bypassed in tests
by setting `OPENTRACY_SKIP_DISTILL_PREFLIGHT=1`. Don't set this in
production — it'll let a GPU-less job burn money on teacher calls before
dying in phase 3.

***

## Student class

`opentracy.Student` is callable and serializes to disk. It's what
`ot.distill()` returns, but you can also instantiate it yourself to load
a previously trained adapter.

```python theme={null}
from opentracy import Student

# Load a PEFT adapter from a path you saved earlier
student = Student(
    backend="peft",
    model_path="./ticket-classifier-v1",
    base_model="unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
)

student("Classify: refund the double charge")
```

### Constructor

```python theme={null}
Student(
    backend: Literal["peft", "gguf"],
    model_path: str,
    base_model: Optional[str] = None,
    metadata: dict = {},
)
```

| Name         | Type                 | Description                                                                                                                                                                  |
| ------------ | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `backend`    | `"peft"` or `"gguf"` | `"peft"` loads a LoRA adapter onto a base model (requires `transformers`, `peft`, `torch`). `"gguf"` loads a GGUF file via `llama_cpp` — CPU-friendly, no base model needed. |
| `model_path` | `str`                | Absolute path to the adapter directory (PEFT) or the `.gguf` file (GGUF).                                                                                                    |
| `base_model` | `str?`               | HF repo id. Required for `peft` — read from `adapter_config.json` if omitted.                                                                                                |
| `metadata`   | `dict`               | Free-form metadata persisted to disk via `.save()` and surfaced through the alias registry.                                                                                  |

### Methods

#### `student(prompt, max_new_tokens=512, temperature=0.0, **kwargs) → str`

Single-prompt inference. Returns the text response.

#### `student.batch(prompts, max_new_tokens=512, temperature=0.0) → list[str]`

Many prompts in one call (sequential).

#### `student.generate(messages, *, max_tokens=512, temperature=0.0, top_p=None, stop=None) → dict`

Full-chat-shape generation. Returns an OpenAI-shaped dict — this is what
`ot.completion(model=<Student instance>, ...)` dispatches to internally.

#### `student.save(path) → Path`

Copy the artifact (adapter dir or `.gguf` file) to a durable location.
Returns the resolved destination path.

#### `student.deploy(alias, engine_url=None) → dict`

Register the student under `alias` in the local file-based registry
(`~/.opentracy/aliases.json`). After this, `ot.completion(model=alias,
...)` resolves to this student. If `engine_url` is provided, the call
also POSTs to the engine's `/v1/models/register` so server-side callers
see the alias — failures there only emit a warning.

### Preflights at load time

Loading a PEFT student checks:

* `torch`, `transformers`, and `peft` are importable — else raises
  `StudentError` pointing at `pip install opentracy[distill]`.
* `jinja2 >= 3.1` is present — else raises `StudentError` with the exact
  `{sys.executable} -m pip install -U 'jinja2>=3.1'` command for the
  interpreter currently running. Stale `jinja2 3.0.x` in a system
  Python is a common footgun when a `uvicorn` on PATH picks up a
  different interpreter than the one that has `opentracy` installed.

***

## Alias registry

Aliases map a logical name to a `Student`. The registry lives at
`~/.opentracy/aliases.json` (or `$OPENTRACY_DATA_HOME/aliases.json`) and
is read by `ot.completion()` on every call.

```python theme={null}
import opentracy as ot

# Register — equivalent to student.deploy("ticket-classifier")
ot.set_alias(
    "ticket-classifier",
    backend="peft",
    model_path="/abs/path/to/adapter",
    base_model="unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
)

# Introspect
ot.list_aliases()
# {"ticket-classifier": {"backend": "peft", "model_path": "...", "base_model": "...", "metadata": {...}, "registered_at": "..."}}

ot.get_alias("ticket-classifier")  # single entry or None

# Remove
ot.unset_alias("ticket-classifier")  # True if removed, False if not registered
```

Any `ot.completion(model="ticket-classifier", ...)` call from any Python
process owned by the same user will resolve through this registry and
dispatch locally — no provider call, no HTTP hop, no shared state with a
remote engine.

The alias-swap pattern:

```python theme={null}
# Day 1 — alias points at a provider
ot.set_alias("smart", backend="peft", ...)          # or call through the engine

# Day 10 — distill a student from traffic
student = ot.distill(dataset=recent_traces, ...)
student.deploy("smart")                              # atomic re-point

# App code for "smart" never changed
ot.completion(model="smart", messages=[...])
```

## Serving the alias as an OpenAI-compatible HTTP endpoint

For a network-accessible endpoint, wrap the alias in a few lines of
FastAPI:

```python theme={null}
# serve.py
from fastapi import FastAPI
from pydantic import BaseModel
import opentracy as ot

app = FastAPI()

class ChatRequest(BaseModel):
    model: str = "ticket-classifier"
    messages: list
    max_tokens: int = 64
    temperature: float = 0.0

@app.post("/v1/chat/completions")
def complete(req: ChatRequest):
    return ot.completion(
        model=req.model,
        messages=req.messages,
        max_tokens=req.max_tokens,
        temperature=req.temperature,
    )
```

Launch with the interpreter that has `opentracy` installed (not a random
`uvicorn` on PATH):

```bash theme={null}
python -m pip install fastapi uvicorn
python -m uvicorn serve:app --host 0.0.0.0 --port 9000
```

## Next

<CardGroup cols={2}>
  <Card title="Distillation concepts" icon="book-open" href="/concepts/distillation">
    What the 4-phase pipeline is and when to retrain.
  </Card>

  <Card title="Distiller (REST client)" icon="code" href="/api-reference/distiller">
    Long-running, queued jobs against a remote engine.
  </Card>
</CardGroup>
