> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Distillation endpoints

> Create jobs, poll status, fetch artifacts over plain HTTP

The management API on port `8000` exposes the distillation pipeline
over REST. Use this if you're driving training from a language that
doesn't have a Python client — CI jobs, a TypeScript backend, or a
Rust CLI, for example.

<Info>
  The REST endpoints backing the Python [`Distiller`](/api-reference/distiller)
  client. Any call you make through the SDK can also be made over HTTP.
</Info>

## POST /v1/distillation

Create a new distillation job. Returns immediately with `status: "pending"`;
training happens asynchronously on the engine host.

```http theme={null}
POST /v1/distillation HTTP/1.1
Host: localhost:8000
Content-Type: application/json
```

### Request body

```json theme={null}
{
  "tenant_id": "default",
  "name": "ticket-triage v1",
  "description": "distill GPT-4o onto a 1B llama for support tickets",
  "config": {
    "teacher_model": "openai/gpt-4o",
    "student_model": "llama-3.2-1b",
    "num_prompts": 500,
    "n_samples": 4,
    "training_steps": 100,
    "bond_beta": 0.5,
    "bond_gamma": 0.1,
    "temperature": 0.8,
    "export_gguf": true,
    "quantization_types": ["q4_k_m", "q8_0"]
  }
}
```

| Field                       | Type              | Notes                                            |
| --------------------------- | ----------------- | ------------------------------------------------ |
| `tenant_id`                 | `string`          | Workspace key. Defaults to `"default"`.          |
| `name`                      | `string`          | Human label.                                     |
| `description`               | `string`          | Optional.                                        |
| `config.teacher_model`      | `string`          | Provider-prefixed, e.g. `openai/gpt-4o`.         |
| `config.student_model`      | `string`          | HF-style ID, e.g. `llama-3.2-1b`.                |
| `config.num_prompts`        | `int`             | Cap on dataset rows to use.                      |
| `config.n_samples`          | `int`             | Best-of-N candidates per prompt (default 4).     |
| `config.training_steps`     | `int`             | Fine-tune steps.                                 |
| `config.bond_beta`          | `float`           | BOND preference weight (default 0.5).            |
| `config.bond_gamma`         | `float`           | KL regularization strength (default 0.1).        |
| `config.export_gguf`        | `bool`            | Convert trained adapter to GGUF after training.  |
| `config.quantization_types` | `array of string` | Quantization flavors, e.g. `["q4_k_m", "q8_0"]`. |

### Response

```json theme={null}
{
  "id": "job_abc123",
  "name": "ticket-triage v1",
  "tenant_id": "default",
  "status": "pending",
  "phase": "initializing",
  "progress": {},
  "results": {},
  "cost_accrued": 0.0,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:00:00Z"
}
```

### Curl

```bash theme={null}
curl -X POST http://localhost:8000/v1/distillation \
  -H "Content-Type: application/json" \
  -d '{
    "name": "demo",
    "config": {
      "teacher_model": "openai/gpt-4o-mini",
      "student_model": "llama-3.2-1b",
      "num_prompts": 50,
      "training_steps": 30
    }
  }'
```

## GET /v1/distillation/{job_id}

Fetch the current state of a job.

```http theme={null}
GET /v1/distillation/job_abc123?tenant_id=default HTTP/1.1
Host: localhost:8000
```

### Response

```json theme={null}
{
  "id": "job_abc123",
  "status": "training",
  "phase": "data_generation",
  "progress": {
    "prompts_done": 120,
    "prompts_total": 500,
    "training_step": 45
  },
  "results": {},
  "cost_accrued": 0.82,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:03:41Z"
}
```

Status values progress: `pending` → `running` → `completed` | `failed`
\| `cancelled`. `phase` is more granular: `initializing` → `data_generation`
→ `curation` → `training` → `export` → (done).

### Polling idiom

```bash theme={null}
while true; do
  state=$(curl -s "http://localhost:8000/v1/distillation/$JOB?tenant_id=default")
  status=$(echo "$state" | jq -r .status)
  echo "status=$status phase=$(echo "$state" | jq -r .phase)"
  [ "$status" = "completed" ] || [ "$status" = "failed" ] && break
  sleep 10
done
```

## GET /v1/distillation — list jobs

```bash theme={null}
curl -s "http://localhost:8000/v1/distillation?tenant_id=default&limit=20"
```

### Response

```json theme={null}
{
  "jobs": [ /* same shape as GET /{id} */ ],
  "total": 42,
  "has_more": true
}
```

Supported query params: `tenant_id`, `status`, `limit` (max `100`),
`offset`.

## POST /v1/distillation/{job_id}/cancel

Cancel a running job. Safe at any phase — partial artifacts are kept.

```bash theme={null}
curl -X POST http://localhost:8000/v1/distillation/job_abc123/cancel
```

## GET /v1/distillation/{job_id}/artifacts

Fetch file paths on the engine host for the trained adapter + GGUF
exports. Paths are relative to the engine's `OPENTRACY_DATA_DIR`.

```json theme={null}
{
  "adapter_path": "/app/data/distillation/job_abc123/adapter/",
  "gguf_paths": {
    "q4_k_m": "/app/data/distillation/job_abc123/gguf/model-q4_k_m.gguf",
    "q8_0":   "/app/data/distillation/job_abc123/gguf/model-q8_0.gguf"
  },
  "tokenizer_path": "/app/data/distillation/job_abc123/adapter/tokenizer.model",
  "config_path":    "/app/data/distillation/job_abc123/train_config.json"
}
```

## Errors

| Status | `error.code`           | Meaning                                                |
| ------ | ---------------------- | ------------------------------------------------------ |
| `400`  | `invalid_config`       | Unknown model, missing required field, or bad range.   |
| `402`  | `insufficient_credits` | Cost estimate exceeds tenant's budget.                 |
| `404`  | `job_not_found`        | `job_id` doesn't exist (or belongs to another tenant). |
| `409`  | `job_already_running`  | Attempted to mutate a terminal job.                    |
| `500`  | `training_error`       | Subprocess crashed — see `logs` endpoint for details.  |
