HostYourAI

Product

EU Router HostYourAI Code OpenAI-compatible API Anthropic-compatible API Model Garden Dedicated Instances Playground Fine-tuning (Loes)Connect your GPU pool

Solutions

Use cases

HostYourAI Code LLM Inference RAG pipelines Chatbots AI agents Fine-tuning

Industries

Government Healthcare Finance Legal

Models

DeepSeek V4 Pro DeepSeek V4 Flash GLM 5.2 Llama 3.1 405B Qwen3.5 397B Llama 3.3 70B Mistral DeepSeek R1 All models →

Compare

Azure OpenAI AWS Bedrock Claude API ChatGPT OpenAI

Resources

Documentation Guide: migrate to the EU Router Guide: deploy your own LLM (vLLM)Guide: build RAG on EU GPUs Model catalog

Pricing

NL EN DE

Model garden Router · on request Dedicated · on request

AutoGLM Phone 9B

Name: AutoGLM Phone 9B hosting (EU)
Brand: HostYourAI
Price: 0.06 EUR
Availability: LimitedAvailability

This model runs as a dedicated deployment on large GPUs and isn't in the shared playground by default. Get in touch and we'll set it up for you.

⚠️ This project is intended for research and educational purposes only. Any use for illegal data access, system interference, or unlawful activities is strictly prohibited. Please review our Terms of Use carefully.

Request access ← All models

zai-org/AutoGLM-Phone-9B On request

text+image->text · zai-org · EU-hosted

Parameters

66K

Context window

32GB

Minimum VRAM

POST /api/v1/chat/completions On request

Specifications

Parameters 0B

Context window 65,536 tokens

Minimum VRAM 32 GB

Architecture Glm4vForConditionalGeneration (vLLM)

License mit

Modality text+image->text

Released December 2025

Publisher zai-org ↗

Pricing

Shared router · per token

On request

Not available on the shared router. Pricing on request as a dedicated GPU deployment.

Dedicated GPU · per hour

On request

Dedicated deployment, from 32 GB of VRAM. Billed per GPU-hour.

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly, see pricing.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/AutoGLM-Phone-9B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run AutoGLM Phone 9B in the EU?

Yes. HostYourAI runs AutoGLM Phone 9B on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting AutoGLM Phone 9B GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does AutoGLM Phone 9B cost?

Via the shared EU router you pay €0.06 per million input tokens and €0.15 per million output tokens, with no fixed costs. For high volume or isolation you can also run AutoGLM Phone 9B as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Z.AI

GLM 5.2 FP8

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: - Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work - Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency - Improved Architecture: We propose IndexShare, which reuses the same indexer across every fou

1M context View model →

GLM 5.1 FP8

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

203K context View model →

GLM 5.1

754B 203K context View model →

GLM 5

We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.

754B 203K context View model →

GLM 5 FP8

203K context View model →

GLM OCR

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR deliver

1.3B 131K context View model →

Request access

AutoGLM Phone 9B isn't available by default yet. Leave your details and we'll arrange a dedicated deployment.

Request access