Management API · HostYourAI

Use the Management API when you provision HostYourAI per tenant, build internal tooling, or integrate with a customer-facing platform. All endpoints sit on the same host as the app (https://hostyourai.com) and follow standard REST.

1. Authentication

Send a Bearer token with every request:

Authorization: Bearer {your_personal_access_token}

Get a token from your dashboard: Settings → API tokens → Create token. The plaintext token is shown once; store it safely. Tokens never expire automatically, revoke them when no longer needed.

Two token types coexist:

Personal Access Token (this page): Bearer token for the Management API. Same authority as your dashboard session.
Router API key (hyai-rt-*): for OpenAI-compatible inference traffic at /v1/chat/completions. See Pricing and the in-app /router page.

2. Instances, lifecycle

List your instances

GET /api/dashboard/instances

Returns an array of your instances with status, region, model, endpoint, and cost.

Deploy a new instance

POST /api/dashboard/deploy
Content-Type: application/json

{
  "model_id":      "llama-3.1-8b",
  "region":        "EU",
  "provider":      "vastai",
  "instance_type": "text"
}

Returns the created instance row with status: "deploying". A background job allocates the GPU and pulls the model; the endpoint_url appears once vLLM is ready (typically within minutes). Poll status with GET /api/dashboard/instances or use a webhook (see below).

Get instance status

GET /api/dashboard/instances/{id}

Or simply re-list. Status values: pending, deploying, running, stopped, failed.

Retrieve instance's per-instance API key

GET /api/dashboard/instances/{id}/api-key

Returns the hyai-* key for this instance's OpenAI-compatible endpoint (use it at /api/v1/chat/completions).

Destroy an instance

DELETE /api/dashboard/instances/{id}

Destroys the GPU at the provider and removes the instance row. Irreversible.

Operational helpers

POST /api/dashboard/instances/{id}/refresh, re-sync status from provider.
POST /api/dashboard/instances/{id}/retry, retry a failed deploy.
POST /api/dashboard/instances/{id}/test, fire a health-check chat.
GET /api/dashboard/instances/{id}/logs, recent vLLM logs.

3. BYOK instances

Attach an upstream provider's API key (OpenAI, Anthropic, Google, Mistral) as a logical instance:

POST /api/dashboard/byok
Content-Type: application/json

{
  "name":         "OpenAI prod",
  "upstream":     "openai",
  "external_key": "sk-..."
}

Behaves like any other instance for routing, but inference goes to your upstream under your contract. We never log the upstream key (encrypted at rest) and never log prompt content.

4. Router API keys

List

GET /api/router/keys

Create

POST /api/router/keys
{
  "name":             "Klai production",
  "sovereignty_mode": true,
  "allowed_models":   ["mistral-nemo-12b", "llama-3.1-8b"]
}

Returns plaintext exactly once.

Revoke

DELETE /api/router/keys/{id}

Discover available models

GET /api/router/catalog

Returns the full catalog (~400 models): slug, display name, modality, context length, EUR pricing per million tokens, serveable (can run on EU GPU), warm (has a running instance now).

5. Inference (routing)

Once you have a router key, hit the OpenAI-compatible endpoint:

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-rt-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.1-8b","messages":[{"role":"user","content":"Hi"}]}'

Cold model + on-demand boot off → 503 model_unavailable. Cold + on-demand boot on → 503 cold_start with Retry-After. Warm → standard OpenAI-compatible response.

6. Usage & audit

(Available; documented endpoints arriving with the formal API v1.) For now, daily/monthly usage is visible on the dashboard. Contractually compelled exports of all data for an account are honoured on written request within 14 days.

7. Webhooks

Webhooks for instance state transitions and usage thresholds are on the roadmap. In the meantime, poll GET /api/dashboard/instances at ≤ 1 Hz.

8. Rate limits & conventions

Default rate limit: 60 requests / minute per token on Management endpoints.
All timestamps are ISO 8601, UTC.
Currency: EUR.
Errors follow the OpenAI error shape: { "error": { "message", "type", "code" } }.

9. End-to-end example: per-tenant deploy

# 1. Provision a dedicated instance for tenant X
curl -X POST https://hostyourai.com/api/dashboard/deploy \
  -H "Authorization: Bearer $PAT" \
  -H "Content-Type: application/json" \
  -d '{"model_id":"mistral-nemo-12b","region":"EU","provider":"vastai"}'
# → { "id": 42, "status": "deploying", ... }

# 2. Poll until running
curl https://hostyourai.com/api/dashboard/instances/42 \
  -H "Authorization: Bearer $PAT"
# → "status":"running", "endpoint_url":"http://..."

# 3. Get this instance's per-tenant inference key
curl https://hostyourai.com/api/dashboard/instances/42/api-key \
  -H "Authorization: Bearer $PAT"
# → { "api_key": "hyai-..." }

# 4. Use it from your tenant's app
curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"mistral-nemo-12b","messages":[...]}'

# 5. Offboarding, destroy when the tenant leaves
curl -X DELETE https://hostyourai.com/api/dashboard/instances/42 \
  -H "Authorization: Bearer $PAT"

10. SDKs

The OpenAI-compatible /v1 works with the official openai-python / openai-node SDKs by setting base_url = "https://hostyourai.com/api/v1". A dedicated management SDK is planned. Until then, any HTTP client works.

Questions?

info@hostyourai.com for technical questions, info@hostyourai.com for partner-tier & volume conversations.