> ## Documentation Index
> Fetch the complete documentation index at: https://opentracy.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# POST /v1/chat/completions

> The OpenAI-compatible completion endpoint — every provider, optional semantic routing

The gateway's main endpoint. Accepts OpenAI-format chat requests, routes
to any of the 13 supported providers, streams responses back, and writes
a trace to ClickHouse on the way out.

```http theme={null}
POST /v1/chat/completions HTTP/1.1
Host: localhost:8080
Content-Type: application/json
```

## Request body

```json theme={null}
{
  "model": "openai/gpt-4o-mini",
  "messages": [
    { "role": "user", "content": "Hello" }
  ],
  "temperature": 0.7,
  "max_tokens": 200,
  "stream": false
}
```

| Field         | Type                | Description                                                                             |                   |          |                                              |
| ------------- | ------------------- | --------------------------------------------------------------------------------------- | ----------------- | -------- | -------------------------------------------- |
| `model`       | `string` (required) | `provider/model` (e.g. `openai/gpt-4o`), a bare name, or `"auto"` for semantic routing. |                   |          |                                              |
| `messages`    | `array` (required)  | OpenAI-format messages. `role` ∈ \`user                                                 | assistant         | system   | tool\`.                                      |
| `temperature` | `float`             | `0.0`–`2.0`. Omitted uses the provider default.                                         |                   |          |                                              |
| `max_tokens`  | `int`               | Output cap.                                                                             |                   |          |                                              |
| `top_p`       | `float`             | Nucleus sampling.                                                                       |                   |          |                                              |
| `stream`      | `bool`              | `true` → Server-Sent Events. See [Streaming](#streaming).                               |                   |          |                                              |
| `stop`        | \`string            | array\`                                                                                 | Stop sequence(s). |          |                                              |
| `tools`       | `array`             | OpenAI-format tool definitions. Engine translates to provider-native shapes.            |                   |          |                                              |
| `tool_choice` | \`"auto"            | "none"                                                                                  | "required"        | object\` | Force a specific tool or let the model pick. |

Any OpenAI field not listed above is passed through to the provider
untouched.

## Response body (non-streaming)

```json theme={null}
{
  "id": "chatcmpl-xyz",
  "object": "chat.completion",
  "created": 1713465600,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 2,
    "total_tokens": 10
  },
  "cost": {
    "input_cost_usd": 0.0000012,
    "output_cost_usd": 0.0000012,
    "total_cost_usd": 0.0000024
  }
}
```

The `cost` object is an OpenTracy extra; the rest matches OpenAI exactly.

## Response headers

| Header                       | Example       | Meaning                                             |
| ---------------------------- | ------------- | --------------------------------------------------- |
| `X-OpenTracy-Selected-Model` | `gpt-4o-mini` | Which concrete model answered.                      |
| `X-OpenTracy-Cluster-ID`     | `84`          | Semantic cluster assigned to the prompt (0–99).     |
| `X-OpenTracy-Expected-Error` | `0.08`        | Predicted error rate for the selected model.        |
| `X-OpenTracy-Routing-Ms`     | `1.3`         | Time spent in routing decision.                     |
| `X-OpenTracy-Session-Id`     | `sess_af91`   | For multi-turn tool calls — echo back on next call. |

## Curl

```bash theme={null}
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello in three words."}]
  }'
```

## TypeScript / Node (openai SDK)

```typescript theme={null}
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "any", // engine holds provider keys
});

const resp = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(resp.choices[0].message.content);
```

## Go (net/http)

```go theme={null}
body := []byte(`{
  "model": "openai/gpt-4o-mini",
  "messages": [{"role": "user", "content": "Hello"}]
}`)
req, _ := http.NewRequest("POST",
    "http://localhost:8080/v1/chat/completions", bytes.NewReader(body))
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
```

## Semantic auto-routing

Pass `"model": "auto"` and the engine picks per-prompt based on its learned
cluster/error profiles:

```bash theme={null}
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Prove √2 is irrational."}]
  }'
```

The response headers show you which model was picked:

```
X-OpenTracy-Selected-Model: gpt-4o
X-OpenTracy-Cluster-ID: 47
X-OpenTracy-Expected-Error: 0.01
```

See [`/v1/route`](/api-reference/rest/route) if you want the decision
without generating a completion.

## Streaming

Set `"stream": true`. Responses come back as Server-Sent Events:

```bash theme={null}
curl -N http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "count to 5"}],
    "stream": true
  }'
```

```
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":"1"},"index":0}]}
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":", 2"},"index":0}]}
...
data: [DONE]
```

The engine translates Anthropic and Bedrock event-streams into OpenAI's
SSE format, so clients don't need per-provider logic.

## Tool calls

Pass OpenAI-format `tools`. The engine maps them to provider-native
shapes (Anthropic `tools`, Gemini function declarations, etc.):

```bash theme={null}
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "parameters": {"type":"object","properties":{"city":{"type":"string"}}}
      }
    }]
  }'
```

The response comes back with `tool_calls` in the assistant message.

## Errors

| Status | `error.code`      | Meaning                                                |
| ------ | ----------------- | ------------------------------------------------------ |
| `400`  | `invalid_request` | Malformed body / missing required field.               |
| `401`  | `unauthorized`    | Bearer token missing or invalid (if auth is enabled).  |
| `404`  | `model_not_found` | Unknown model string.                                  |
| `429`  | `rate_limit`      | Upstream provider rate-limited the request.            |
| `500`  | `provider_error`  | Provider returned an error; body echoes their message. |
| `504`  | `timeout`         | Upstream took longer than the configured timeout.      |