Use the Management API when you provision HostYourAI per tenant, build internal tooling, or integrate with a customer-facing platform. All endpoints sit on the same host as the app (https://hostyourai.com) and follow standard REST.
1. Authentication
Send a Bearer token with every request:
Authorization: Bearer {your_personal_access_token}
Get a token from your dashboard: Settings → API tokens → Create token. The plaintext token is shown once; store it safely. Tokens never expire automatically — revoke them when no longer needed.
Two token types coexist:
- Personal Access Token (this page): Bearer token for the Management API. Same authority as your dashboard session.
- Router API key (
hyai-rt-*): for OpenAI-compatible inference traffic at/v1/chat/completions. See Pricing and the in-app/routerpage.
2. Instances — lifecycle
List your instances
GET /api/dashboard/instances
Returns an array of your instances with status, region, model, endpoint, and cost.
Deploy a new instance
POST /api/dashboard/deploy
Content-Type: application/json
{
"model_id": "llama-3.1-8b",
"region": "EU",
"provider": "vastai",
"instance_type": "text"
}
Returns the created instance row with status: "deploying". A background job allocates the GPU and pulls the model; the endpoint_url appears once vLLM is ready (typically within minutes). Poll status with GET /api/dashboard/instances or use a webhook (see below).
Get instance status
GET /api/dashboard/instances/{id}
Or simply re-list. Status values: pending, deploying, running, stopped, failed.
Retrieve instance's per-instance API key
GET /api/dashboard/instances/{id}/api-key
Returns the hyai-* key for this instance's OpenAI-compatible endpoint (use it at /api/v1/chat/completions).
Destroy an instance
DELETE /api/dashboard/instances/{id}
Destroys the GPU at the provider and removes the instance row. Irreversible.
Operational helpers
POST /api/dashboard/instances/{id}/refresh— re-sync status from provider.POST /api/dashboard/instances/{id}/retry— retry a failed deploy.POST /api/dashboard/instances/{id}/test— fire a health-check chat.GET /api/dashboard/instances/{id}/logs— recent vLLM logs.
3. BYOK instances
Attach an upstream provider's API key (OpenAI, Anthropic, Google, Mistral) as a logical instance:
POST /api/dashboard/byok
Content-Type: application/json
{
"name": "OpenAI prod",
"upstream": "openai",
"external_key": "sk-..."
}
Behaves like any other instance for routing, but inference goes to your upstream under your contract. We never log the upstream key (encrypted at rest) and never log prompt content.
4. Router API keys
List
GET /api/router/keys
Create
POST /api/router/keys
{
"name": "Klai production",
"sovereignty_mode": true,
"allowed_models": ["mistral-nemo-12b", "llama-3.1-8b"]
}
Returns plaintext exactly once.
Revoke
DELETE /api/router/keys/{id}
Discover available models
GET /api/router/catalog
Returns the full catalog (~400 models): slug, display name, modality, context length, EUR pricing per million tokens, serveable (can run on EU GPU), warm (has a running instance now).
5. Inference (routing)
Once you have a router key, hit the OpenAI-compatible endpoint:
curl https://hostyourai.com/api/v1/chat/completions \
-H "Authorization: Bearer hyai-rt-..." \
-H "Content-Type: application/json" \
-d '{"model":"llama-3.1-8b","messages":[{"role":"user","content":"Hi"}]}'
Cold model + on-demand boot off → 503 model_unavailable. Cold + on-demand boot on → 503 cold_start with Retry-After. Warm → standard OpenAI-compatible response.
6. Usage & audit
(Available; documented endpoints arriving with the formal API v1.) For now, daily/monthly usage is visible on the dashboard. Contractually compelled exports of all data for an account are honoured on written request within 14 days.
7. Webhooks
Webhooks for instance state transitions and usage thresholds are on the roadmap. In the meantime, poll GET /api/dashboard/instances at ≤ 1 Hz.
8. Rate limits & conventions
- Default rate limit: 60 requests / minute per token on Management endpoints.
- All timestamps are ISO 8601, UTC.
- Currency: EUR.
- Errors follow the OpenAI error shape:
{ "error": { "message", "type", "code" } }.
9. End-to-end example: per-tenant deploy
# 1. Provision a dedicated instance for tenant X
curl -X POST https://hostyourai.com/api/dashboard/deploy \
-H "Authorization: Bearer $PAT" \
-H "Content-Type: application/json" \
-d '{"model_id":"mistral-nemo-12b","region":"EU","provider":"vastai"}'
# → { "id": 42, "status": "deploying", ... }
# 2. Poll until running
curl https://hostyourai.com/api/dashboard/instances/42 \
-H "Authorization: Bearer $PAT"
# → "status":"running", "endpoint_url":"http://..."
# 3. Get this instance's per-tenant inference key
curl https://hostyourai.com/api/dashboard/instances/42/api-key \
-H "Authorization: Bearer $PAT"
# → { "api_key": "hyai-..." }
# 4. Use it from your tenant's app
curl https://hostyourai.com/api/v1/chat/completions \
-H "Authorization: Bearer hyai-..." \
-H "Content-Type: application/json" \
-d '{"model":"mistral-nemo-12b","messages":[...]}'
# 5. Offboarding — destroy when the tenant leaves
curl -X DELETE https://hostyourai.com/api/dashboard/instances/42 \
-H "Authorization: Bearer $PAT"
10. SDKs
The OpenAI-compatible /v1 works with the official openai-python / openai-node SDKs by setting base_url = "https://hostyourai.com/api/v1". A dedicated management SDK is planned. Until then, any HTTP client works.
Questions?
support@hostyourai.com for technical questions, partners@hostyourai.com for partner-tier & volume conversations.