HostYourAI

Product

EU Router HostYourAI Code OpenAI-compatible API Anthropic-compatible API Model Garden Dedicated Instances Playground Fine-tuning (Loes)Connect your GPU pool

Solutions

Use cases

HostYourAI Code LLM Inference RAG pipelines Chatbots AI agents Fine-tuning

Industries

Government Healthcare Finance Legal

Models

DeepSeek V4 Pro DeepSeek V4 Flash GLM 5.2 Llama 3.1 405B Qwen3.5 397B Llama 3.3 70B Mistral DeepSeek R1 All models →

Compare

Azure OpenAI AWS Bedrock Claude API ChatGPT OpenAI

Resources

Documentation Guide: migrate to the EU Router Guide: deploy your own LLM (vLLM)Guide: build RAG on EU GPUs Model catalog

Pricing

NL EN DE

Model garden Router · available Dedicated · available

Qwen 3 4B

Name: Qwen 3 4B hosting (EU)
Brand: HostYourAI
Price: 0.03 EUR
Availability: InStock

Instantly via the EU router or as a dedicated GPU deployment. Data stays in Europe.

Qwen 3 4B is an open-source language model from Qwen with 4B parameters and a 41K-token context window, hosted on EU GPUs via an OpenAI-compatible API.

Start for free ← All models

qwen-3-4b vLLM ready

text->text · qwen · EU-hosted

Parameters

41K

Context window

8GB

Minimum VRAM

POST /api/v1/chat/completions 200 OK

Specifications

Parameters 4B

Context window 40,960 tokens

Minimum VRAM 8 GB

Architecture Qwen3ForCausalLM (vLLM)

License open-weights

Modality text->text

Publisher qwen ↗

Pricing

Shared router · per token

€0.03

Input (per 1M tokens)

€0.06

Output (per 1M tokens)

Dedicated GPU · per hour

from €1,16 per hour

Your own vLLM instance on European cloud (8 GB VRAM), billed hourly.

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly, see pricing.

✓ Verified working on 15-07-2026, responded in 523 ms on our EU infrastructure.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-3-4b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run Qwen 3 4B in the EU?

Yes. HostYourAI runs Qwen 3 4B on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting Qwen 3 4B GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does Qwen 3 4B cost?

Via the shared EU router you pay €0.03 per million input tokens and €0.06 per million output tokens, with no fixed costs. For high volume or isolation you can also run Qwen 3 4B as a dedicated hourly GPU instance.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Qwen

Qwen3.6 27B FP8

[!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model.

28B 262K context View model →

Qwen3.6 27B

[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

28B 262K context View model →

Qwen3.6 35B A3B FP8

36B 262K context View model →

Qwen3.6 35B A3B

36B 262K context View model →

Qwen3.5 35B A3B GPTQ Int4

[!Note] This repository contains int4-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

36B 262K context View model →

Qwen3.5 0.8B Base

[!Note] This repository contains model weights and configuration files for the pre-trained only model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc. The intended use cases are fine-tuning, in-context learning experiments, and other research or development purposes, not direct interaction. However, the control tokens, e.g., <|imstart| and <|imend| were trained to allow efficient LoRA-style PEFT with the official chat template, mitigating the need to finetune embeddings, a significant optimization given Qwen3.5's larger

0.9B 262K context View model →

Try Qwen 3 4B for free

Creating an account takes a minute. Test Qwen 3 4B straight away in the playground.

Start for free