Inloggen Demo plannen Aan de slag

Product

EU Router HostYourAI Code OpenAI-compatible API Anthropic-compatible API Model Garden Dedicated Instances Playground Fine-tuning (Loes)Connect je GPU-pool

Oplossingen

Use cases

HostYourAI Code LLM Inference RAG pipelines Chatbots AI agents Fine-tuning

Sectoren

Overheid Zorg Finance Juridisch

Modellen

DeepSeek V4 Pro DeepSeek V4 Flash GLM 5.2 Llama 3.1 405B Qwen3.5 397B Llama 3.3 70B Mistral DeepSeek R1 Alle modellen →

Vergelijk

Azure OpenAI AWS Bedrock Claude API ChatGPT OpenAI

Resources

Documentatie Gids: migreren naar de EU Router Gids: eigen LLM deployen (vLLM)Gids: RAG bouwen op EU-GPUs Modelcatalogus

Prijzen

NL EN DE

Inloggen Demo plannen Aan de slag

Model garden Router · beschikbaar Dedicated · op aanvraag

GLM 4.7 Flash

Name: GLM 4.7 Flash hosting (EU)
Brand: HostYourAI
Price: 0.15 EUR
Availability: InStock

Direct via de EU-router of als dedicated GPU-deployment. Data blijft in Europa.

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

Start gratis ← Alle modellen

zai-org/GLM-4.7-Flash vLLM ready

text->text · zai-org · EU-hosted

31B

Parameters

203K

Contextvenster

80GB

Minimale VRAM

POST /api/v1/chat/completions 200 OK

Specificaties

Parameters 31B

Contextvenster 202,752 tokens

Minimale VRAM 80 GB

Architectuur Glm4MoeLiteForCausalLM (vLLM)

Licentie mit

Modaliteit text->text

Uitgebracht January 2026

Uitgever zai-org ↗

Prijzen

Gedeelde router · per token

€0.15

Input (per 1M tokens)

€0.40

Output (per 1M tokens)

Dedicated GPU · per uur

Op aanvraag

Dedicated deployment, vanaf 80 GB VRAM. Afgerekend per GPU-uur.

Gedeelde EU-router, pay-per-token, scale-to-zero. Dedicated GPU-deployments worden per uur afgerekend, zie prijzen.

✓ Werkend geverifieerd op 14-07-2026, respons in 4452 ms op onze EU-infrastructuur.

Direct aanroepen

Drop-in vervanger voor OpenAI: wijzig alleen de base-URL en de API-key. Ook het Anthropic-formaat (/v1/messages) wordt ondersteund.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-4.7-Flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Veelgestelde vragen

Kan ik GLM 4.7 Flash in de EU draaien?

Ja. HostYourAI draait GLM 4.7 Flash op GPU's in Europese datacenters via vLLM. Prompts en outputs verlaten de EU niet en er is geen Amerikaanse cloudprovider in de keten.

Is GLM 4.7 Flash hosten AVG/GDPR-compliant?

Ja. Alle verwerking vindt plaats binnen de EU, er is een verwerkersovereenkomst (DPA) beschikbaar en de subprocessor-lijst is openbaar. Open-source gewichten betekenen ook: geen training op jouw data.

Wat kost GLM 4.7 Flash?

Via de gedeelde EU-router betaal je €0.15 per miljoen input-tokens en €0.40 per miljoen output-tokens, zonder vaste kosten. Voor hoge volumes of isolatie kun je GLM 4.7 Flash ook als dedicated GPU-instance per uur draaien.

Is de API compatibel met OpenAI?

Ja. Je gebruikt de standaard OpenAI-SDK's met een aangepaste base-URL (https://hostyourai.com/api/v1). Ook de Anthropic Messages API wordt ondersteund als drop-in.

Andere modellen van Z.AI

GLM 5.2 FP8

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou

1M context Bekijk model →

GLM 5.1 FP8

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

203K context Bekijk model →

GLM 5.1

754B 203K context Bekijk model →

GLM 5

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

754B 203K context Bekijk model →

GLM 5 FP8

203K context Bekijk model →

GLM OCR

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR deliver

1.3B 131K context Bekijk model →

Probeer GLM 4.7 Flash gratis

Account aanmaken duurt een minuut. Test GLM 4.7 Flash direct in de playground.

Start gratis