Inloggen Demo plannen Aan de slag

Product

EU Router HostYourAI Code OpenAI-compatible API Anthropic-compatible API Model Garden Dedicated Instances Playground Fine-tuning (Loes)Connect je GPU-pool

Oplossingen

Use cases

HostYourAI Code LLM Inference RAG pipelines Chatbots AI agents Fine-tuning

Sectoren

Overheid Zorg Finance Juridisch

Modellen

DeepSeek V4 Pro DeepSeek V4 Flash GLM 5.2 Llama 3.1 405B Qwen3.5 397B Llama 3.3 70B Mistral DeepSeek R1 Alle modellen →

Vergelijk

Azure OpenAI AWS Bedrock Claude API ChatGPT OpenAI

Resources

Documentatie Gids: migreren naar de EU Router Gids: eigen LLM deployen (vLLM)Gids: RAG bouwen op EU-GPUs Modelcatalogus

Prijzen

NL EN DE

Inloggen Demo plannen Aan de slag

Model garden Router · beschikbaar Dedicated · op aanvraag

GLM 4 32B 0414

Name: GLM 4 32B 0414 hosting (EU)
Brand: HostYourAI
Price: 0.15 EUR
Availability: InStock

Direct via de EU-router of als dedicated GPU-deployment. Data blijft in Europa.

The GLM family welcomes new members, the GLM-4-32B-0414 series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414 was pre-tra...

Start gratis ← Alle modellen

zai-org/GLM-4-32B-0414 vLLM ready

text->text · zai-org · EU-hosted

33B

Parameters

33K

Contextvenster

80GB

Minimale VRAM

POST /api/v1/chat/completions 200 OK

Specificaties

Parameters 33B

Contextvenster 32,768 tokens

Minimale VRAM 80 GB

Architectuur Glm4ForCausalLM (vLLM)

Licentie mit

Modaliteit text->text

Uitgebracht April 2025

Uitgever zai-org ↗

Prijzen

Gedeelde router · per token

€0.15

Input (per 1M tokens)

€0.40

Output (per 1M tokens)

Dedicated GPU · per uur

Op aanvraag

Dedicated deployment, vanaf 80 GB VRAM. Afgerekend per GPU-uur.

Gedeelde EU-router, pay-per-token, scale-to-zero. Dedicated GPU-deployments worden per uur afgerekend, zie prijzen.

✓ Werkend geverifieerd op 17-07-2026, respons in 896 ms op onze EU-infrastructuur.

Direct aanroepen

Drop-in vervanger voor OpenAI: wijzig alleen de base-URL en de API-key. Ook het Anthropic-formaat (/v1/messages) wordt ondersteund.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-4-32B-0414",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Veelgestelde vragen

Kan ik GLM 4 32B 0414 in de EU draaien?

Ja. HostYourAI draait GLM 4 32B 0414 op GPU's in Europese datacenters via vLLM. Prompts en outputs verlaten de EU niet en er is geen Amerikaanse cloudprovider in de keten.

Is GLM 4 32B 0414 hosten AVG/GDPR-compliant?

Ja. Alle verwerking vindt plaats binnen de EU, er is een verwerkersovereenkomst (DPA) beschikbaar en de subprocessor-lijst is openbaar. Open-source gewichten betekenen ook: geen training op jouw data.

Wat kost GLM 4 32B 0414?

Via de gedeelde EU-router betaal je €0.15 per miljoen input-tokens en €0.40 per miljoen output-tokens, zonder vaste kosten. Voor hoge volumes of isolatie kun je GLM 4 32B 0414 ook als dedicated GPU-instance per uur draaien.

Is de API compatibel met OpenAI?

Ja. Je gebruikt de standaard OpenAI-SDK's met een aangepaste base-URL (https://hostyourai.com/api/v1). Ook de Anthropic Messages API wordt ondersteund als drop-in.

Andere modellen van Z.AI

GLM 5.2 FP8

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou

1M context Bekijk model →

GLM 5.1 FP8

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

203K context Bekijk model →

GLM 5.1

754B 203K context Bekijk model →

GLM 5

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

754B 203K context Bekijk model →

GLM 5 FP8

203K context Bekijk model →

GLM OCR

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR deliver

1.3B 131K context Bekijk model →

Probeer GLM 4 32B 0414 gratis

Account aanmaken duurt een minuut. Test GLM 4 32B 0414 direct in de playground.

Start gratis