HostYourAI offers three execution modes side by side on a single account and credit balance. Start with EU Hosted inference, move to dedicated capacity when the workload or compliance profile requires it.
1. EU Hosted Gateway — pay per token
One OpenAI-compatible API key, model catalog, hyai/auto, and scale-to-zero shared capacity. Best for SaaS integrations, agencies, agent apps, and experimentation. Indicative tariffs (EUR per million tokens):
| Model class | Input €/M | Output €/M |
|---|---|---|
| ≤ 8B (Llama 3.2, Qwen 3 small, Phi-4 mini) | 0.03 | 0.06 |
| ≤ 16B (Llama 3.1 8B, Qwen 2.5 7B, Mistral 7B) | 0.05 | 0.10 |
| ≤ 32B (Qwen 3 14B, Phi-4, Mistral Nemo 12B) | 0.10 | 0.18 |
| ≤ 48B (Mistral Small 3.1, DeepSeek Coder V2 Lite) | 0.15 | 0.25 |
| ≤ 80B (Qwen 3 32B, DeepSeek R1 distill 32B) | 0.25 | 0.40 |
| Large (70B+, Mixtral, DeepSeek-V3) | 0.40 | 0.60 |
Current beta framing: EU Hosted means EU-located GPU processing with shared router capacity. EU Sovereignty Mode is sold separately once a fully EU-sovereign provider chain, DPA, subprocessors, audit export, and support-access controls are active.
2. Dedicated EU Deployment — pay per hour
You pick a GPU class and region, deploy your own vLLM instance, and pay for as long as it runs. Best for custom Hugging Face models, BYOK upstreams, steady high-volume workloads, or when you need full control over the deployment.
| GPU class | Typical use | From (EUR / hr) |
|---|---|---|
| 1× L40S / RTX 4090 (24–48 GB) | Models up to ~20B | € 0.40 |
| 1× A100 80 GB / H100 80 GB | Models up to ~70B (quantised) | € 2.20 |
| 2× H100 (160 GB) | 70B fp16 / HA | € 4.40 |
| 4× H100 (320 GB) | Large models / high throughput | € 8.80 |
Indicative — live pricing depends on EU-region GPU availability at our providers. Exact price for each offer is shown before you deploy.
3. Single-Tenant Encrypted — capacity per month
Private runtime with dedicated, isolated GPU(s) per customer, at-rest encryption, optional BYOK (your own KMS key), and a private network policy. Best for healthcare, government, legal, finance, and workloads that cannot use shared capacity.
| Configuration | VRAM | Per month | Setup (one-off) |
|---|---|---|---|
| 1× L40S | 48 GB | € 1,200 | € 500 |
| 1× H100 | 80 GB | € 3,500 | € 1,000 |
| 2× H100 | 160 GB | € 6,500 | € 1,000 |
| 4× H100 | 320 GB | € 12,500 | € 1,500 |
Always-on or on-demand availability. BYOK onboarding +€ 1,000 one-off. Setup includes provisioning, isolation configuration, network policy, DPA onboarding. Confidential computing (TEE) is on the roadmap; we will not price what we have not yet validated.
BYOK — bring your own API key
You can attach your own OpenAI / Anthropic / Google / Mistral API key to an instance. We forward your traffic to the upstream under your contract with them — we don't bill compute, we only charge a thin proxy fee per request (€ 0.0005 / request, capped at €5 / month per BYOK instance) which covers our routing, audit log and uptime. Useful for hybrid setups that mix EU-hosted open-weights with frontier closed models.
Free tier
- 1 model:
llama-3.2-3b - 20 requests / minute, 100 000 tokens / day
- No credit card required
- Email verification + reasonable-use rate limits
Billing
- Currency: EUR. VAT added where applicable; reverse-charge for EU B2B with valid VAT number.
- Method: Stripe (credit card, iDEAL, SEPA direct debit). Invoices auto-issued from your dashboard.
- Credits: top up in advance; balance is consumed by all three modes from a single pool.
- Volume / partner tier: for €> 2 000 / month in tokens or one or more single-tenant deployments, we offer a partner tier with discounts, SLAs, and a dedicated technical contact. Contact partners@hostyourai.com.
What's not on the price list
Bespoke procurement, custom contracts, NEN 7510 / BIO audit packages, white-label / reseller arrangements, and confidential-computing deployments are quoted per project. Talk to us via /contact.