API endpoints
Base URL: https://api.cortexlayer.dev. All POST bodies are JSON; all responses are JSON or, for streaming endpoints, Server-Sent Events.
Authentication
Two credential types — pick by call site:
| Credential | Header | Where it’s safe to use | Endpoints |
|---|---|---|---|
API key (ck_live_…) | Authorization: Bearer <key> | Server-side only — never ship to a browser | All admin/CRUD endpoints; /v1/widget/session mint |
Session token (cs_…) | X-Cortex-Session: <token> | Browser-side, scoped to one agent + one origin, 15 min TTL | /v1/chat/stream (widget path) |
API keys are HMAC-SHA256 hashed at rest with a server-side pepper; the prefix is indexed for fast lookup, the secret is constant-time compared. Session tokens are opaque — they live in Redis and are revocable by deleting the entry.
Agents
POST /v1/agents
Create an agent. Auth: API key. Bodies use camelCase and additionalProperties: false is enforced — unknown fields are rejected.
Minimal runnable body (copy/paste into curl):
{ "name": "Support bot", "systemPrompt": "You are a friendly support agent for ACME Inc.", "modelPolicy": { "provider": "gemini", "model": "gemini-2.5-flash" }, "budget": { "maxCostUsd": 0.05, "maxSteps": 8, "wallClockMs": 30000 }, "tools": [], "allowedDomains": [], "allowedOrigins": ["https://your-site.com"]}Field reference (omit optional fields entirely — do not send null):
| Field | Type | Required | Constraints |
|---|---|---|---|
name | string | yes | 1–128 chars |
systemPrompt | string | yes | 1–10000 chars |
modelPolicy.provider | enum | yes | "gemini" | "openai" | "anthropic" |
modelPolicy.model | string | yes | see Models; must be in tenant’s plan allowlist |
modelPolicy.temperature | number | no | 0–2 |
modelPolicy.maxOutputTokens | int | no | 1–16384 |
fallback.provider / fallback.model | object | no | both fields required together if present |
knowledgeBaseId | uuid | no | must reference a knowledge base owned by the tenant |
budget.maxCostUsd | number | yes | 0–100 |
budget.maxSteps | int | yes | 1–32 |
budget.wallClockMs | int | yes | 1000–120000 |
tools | array | yes | ToolDefinition[], max 16; pass [] for none |
allowedDomains | array | yes | strings ≤253 chars, max 32; outbound HTTP allowlist for tools |
allowedOrigins | array | no | full origin URLs (scheme://host[:port]), max 32 entries |
Returns the created Agent — its id field is a UUID.
PATCH /v1/agents/:id
Partial update. Same body shape, all fields optional. Pass "fallback": null to clear a previously set fallback (vs. omitting, which leaves it untouched).
Dashboard playground sessions
POST /v1/agents/:id/playground-session mints a session for dashboard testing only (does not require Origin header, uses agents:write scope instead). It’s not for public widget embeds. Use /v1/widget/session for all production widget integrations.
Widget sessions
POST /v1/widget/session
Mint a short-lived browser-safe token bound to one agent + the requesting origin. Auth: API key.
{ "agentId": "<agent-uuid>" }The Origin header is required. The server reads it and validates against the agent’s allowedOrigins. Requests without it are rejected:
{ "code": "origin_not_allowed", "message": "request is missing the Origin header; widget sessions require a browser origin", "http": 403}Returns 201 Created:
{ "sessionToken": "cs_...", "expiresAt": "2026-04-21T15:30:00Z", "messageCap": 50}The token’s TTL is 15 minutes; messageCap is the total messages allowed on this session before a fresh mint is required.
Server-side proxy integration: If you proxy this call from your own backend (Express, Fastify, Django, Go, etc.), forward the inbound Origin header verbatim — don’t strip it or replace it with your server’s origin. The allowlist check is strict and rejects mismatches. See the quickstart for language-specific proxy examples.
Chat
POST /v1/chat/stream
Streaming chat (Server-Sent Events). Auth: session token (widget path, header X-Cortex-Session) or API key (server path, header Authorization: Bearer). The sessionToken is also required in the body for widget calls.
Minimal runnable body:
{ "agentId": "00000000-0000-0000-0000-000000000000", "sessionToken": "cs_...", "messages": [{ "role": "user", "content": "Hi" }]}Field reference:
| Field | Type | Required | Constraints |
|---|---|---|---|
agentId | uuid | yes | — |
sessionToken | string | yes | 16–256 chars; same value as the X-Cortex-Session header |
messages | array | yes | 1–64 items, each { role, content } |
conversationId | uuid | no | server creates one if omitted |
model | string | no | per-call override of modelPolicy.model |
provider | enum | no | per-call override of modelPolicy.provider |
temperature | number | no | 0–2 |
maxOutputTokens | int | no | 1–16384 |
requestId | string | no | 8–128 chars; for client-side correlation |
The response is always text/event-stream — there is no non-streaming mode. Frame types:
type | Payload | Notes |
|---|---|---|
start | requestId, runId, provider, model, conversationId | First frame. |
delta | text | Append to the current assistant bubble. |
tool_call | name, args | Tool runtime is about to execute. |
tool_result | name, output | Result of the preceding tool_call. |
usage | inputTokens, outputTokens, costUsd | Emitted near end-of-run for cost reporting. |
error | code, message | Recoverable; the run is over. |
done | finishReason | Last frame. |
Errors that prevent the stream from starting are returned as JSON with the standard envelope.
Rate limits
Limits stack — the strictest one wins. The numbers below are the production defaults; per-tenant overrides may apply on paid plans.
| Scope | Limit |
|---|---|
| Per-IP (global) | 100 req/min |
| Per-API-key | 60 req/min sliding window |
| Per-IP (chat stream) | 20 msg/min |
| Per-tenant | 10 simultaneous streams |
| Per-tenant/day | $2 soft (warn header) / $5 hard (429) |
| Per-session | 50 messages total over 15 min TTL |
A 429 response carries Retry-After (seconds) and the standard error envelope (see below) with code: "rate_limit_exceeded" or code: "plan_limit_exceeded".
You can read current consumption via GET /v1/usage (auth: API key with billing:read scope).
Errors
All error responses share one top-level envelope — fields are not nested under an error object:
{ "code": "schema_validation_failed", // stable machine-readable code "message": "...", // human-readable; do not parse "http": 400, // HTTP status, mirrored for convenience "details": { /* code-specific */ } // optional; e.g. Ajv issues array for validation errors}The request id is returned as the x-request-id response header, not in the body — include that header value in support tickets.
Codes are stable across versions; messages are not.