HostYourAI

Product

EU Router HostYourAI Code OpenAI-compatible API Anthropic-compatible API Model Garden Dedicated Instances Playground Fine-tuning (Loes)Connect your GPU pool

Solutions

Use cases

HostYourAI Code LLM Inference RAG pipelines Chatbots AI agents Fine-tuning

Industries

Government Healthcare Finance Legal

Models

DeepSeek V4 Pro DeepSeek V4 Flash GLM 5.2 Llama 3.1 405B Qwen3.5 397B Llama 3.3 70B Mistral DeepSeek R1 All models →

Compare

Azure OpenAI AWS Bedrock Claude API ChatGPT OpenAI

Resources

Documentation Guide: migrate to the EU Router Guide: deploy your own LLM (vLLM)Guide: build RAG on EU GPUs Model catalog

Pricing

NL EN DE

Model garden Router · on request Dedicated · on request

Phi 3.5 MoE instruct

Name: Phi 3.5 MoE instruct hosting (EU)
Brand: HostYourAI
Availability: LimitedAvailability

This model runs as a dedicated deployment on large GPUs and isn't in the shared playground by default. Get in touch and we'll set it up for you.

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (i...

Request access ← All models

microsoft/Phi-3.5-MoE-instruct vLLM ready

text->text · microsoft · EU-hosted

42B

Parameters

131K

Context window

160GB

Minimum VRAM

POST /api/v1/chat/completions On request

Specifications

Parameters 42B

Context window 131,072 tokens

Minimum VRAM 160 GB

Architecture PhiMoEForCausalLM (vLLM)

License mit

Modality text->text

Released August 2024

Publisher microsoft ↗

Pricing

Shared router · per token

On request

Not available on the shared router. Pricing on request as a dedicated GPU deployment.

Dedicated GPU · per hour

On request

Dedicated deployment, from 160 GB of VRAM. Billed per GPU-hour.

Shared EU router, pay-per-token, scale-to-zero. Dedicated GPU deployments are billed hourly, see pricing.

✓ Verified working on 17-07-2026, responded in 2295 ms on our EU infrastructure.

Call it now

Drop-in replacement for OpenAI: change only the base URL and API key. The Anthropic format (/v1/messages) is supported too.

curl https://hostyourai.com/api/v1/chat/completions \
  -H "Authorization: Bearer hyai-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "microsoft/Phi-3.5-MoE-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Frequently asked questions

Can I run Phi 3.5 MoE instruct in the EU?

Yes. HostYourAI runs Phi 3.5 MoE instruct on GPUs in European datacenters via vLLM. Prompts and outputs never leave the EU and there is no US cloud provider in the chain.

Is hosting Phi 3.5 MoE instruct GDPR-compliant?

Yes. All processing happens inside the EU, a Data Processing Agreement (DPA) is available and the subprocessor list is public. Open-source weights also mean: no training on your data.

How much does Phi 3.5 MoE instruct cost?

Phi 3.5 MoE instruct needs several GPUs at once, so it runs as a dedicated deployment billed per GPU-hour rather than per token. Tell us your volume and we will work it out with you.

Is the API OpenAI-compatible?

Yes. You use the standard OpenAI SDKs with a custom base URL (https://hostyourai.com/api/v1). The Anthropic Messages API is supported as a drop-in as well.

More models from Microsoft

GELab Zero 4B preview Sico Evolution

GELab Zero 4B preview Sico Evolution is an multimodal language model from Microsoft with 4.4B parameters, hosted on EU GPUs via an OpenAI-compatible API.

4.4B View model →

X Reasoner 7B

We introduce X-Reasoner, a vision-language model posttrained solely on general-domain text for generalizable reasoning, using a twostage approach: an initial supervised fine-tuning phase with distilled long chainof-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-theart models trained with in-domain and multimodal data across various general and medical benchmarks. More details can be found in the paper: X-Reasoner: T

8.3B 128K context View model →

OptiMind SFT

OptiMind-SFT is a specialized 20B parameter model designed to bridge the gap between natural language and executable optimization solvers. It automates the translation of complex decision-making problems—such as supply chain planning, scheduling, and resource allocation—into correct MILP formulations.

21B 131K context View model →

Fara 7B

Description: Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

8.3B 128K context View model →

UserLM 8b

Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation (by training it to predict user turns in a large corpus of conversations called WildChat). This model is useful in simulating more realistic conversations, which is in turn useful in the development of more robust assistants.

8B 8K context View model →

MediPhi Instruct

The MediPhi Model Collection comprises 7 small language models of 3.8B parameters from the base model Phi-3.5-mini-instruct specialized in the medical and clinical domains. The collection is designed in a modular fashion. Five MediPhi experts are fine-tuned on various medical corpora (i.e. PubMed commercial, Medical Wikipedia, Medical Guidelines, Medical Coding, and open-source clinical documents) and merged back with the SLERP method in their base model to conserve general abilities. One model combined all five experts into one general expert with the multi-model merging method BreadCrumbs. F

3.8B 131K context View model →

Request access

Phi 3.5 MoE instruct isn't available by default yet. Leave your details and we'll arrange a dedicated deployment.

Request access