Qwen3-Instruct API

Open-source MoE for production — 262K context, fast instruction following, developer-first

Qwen3-235B-A22B-Instruct-2507 delivers Mixture-of-Experts (235B total / ~22B active) performance with a native 262K-token context, tuned for non-thinking instruction execution. Ideal for chat assistants, RAG pipelines, code help, and long-document synthesis — all under Apache-2.0.

API Access

To use the API for inference, please register an account first. You can view and manage your API token in the API Token dashboard.

All requests to the inference API require authentication via an API token. The token uniquely identifies your account and grants secure access to .

When calling the API, set the Authorization header to your API token, configure the request parameters as shown below, and send the request.

Why Qwen3-Instruct API?

235B total / ~22B active MoE — efficient expert routing for strong general performance.
Native 262K context — load long contracts, papers or multi-file codebases in one call.
Non-thinking (instruction-tuned) — no >think< block; faster, concise answers by default.
Open-source Apache-2.0 — audit, self-host, fine-tune; no vendor lock-in.
Developer-first — OpenAI-compatible endpoints, streaming, retries, observability.
FP8/BF16 options — latency-aware inference with vLLM/SGLang + tensor parallel.
Enterprise-ready — role-based access, data privacy, and EU residency options.

Model	Architecture	Native Context	Open Source	Best For
Qwen3-235B-A22B-Instruct-2507	MoE (235B / ~22B active), non-thinking	262 K	Yes (Apache-2.0)	Fast instruction following, long documents, RAG
Qwen3-235B-A22B-Thinking-2507	MoE (235B / ~22B active), explicit thinking	262 K	Yes	Complex multi-step reasoning, math & code
OpenAI o4-mini	Proprietary dense + thinking	200 K →	No	General reasoning
Claude 3.5 Sonnet	Proprietary	200 K	No	Long-context assistance
Gemini 2.5 Pro	Proprietary	128 K	No	Multimodal tasks

Popular Use Cases of Qwen3-Instruct API

💬 Production Chat Assistants

High-throughput, low-latency assistants with guardrails, citations, and function calling.

📚 Long-Document Understanding (262K)

Ingest 100–200+ page specs, contracts or literature and synthesize precise, cited summaries.

🔎 Retrieval-Augmented Generation (RAG)

Combine vector search with the 262K window to reduce truncation and hallucinations.

👨‍💻 Code Assistance

Explain unfamiliar code paths, propose fixes, and generate tests across multi-file repos.

🌐 Multilingual Knowledge Tasks

Robust English-Chinese performance for support, research and analytics.

Benchmark Highlights (July 2025)

Arena-Hard v2: competitive head-to-head win rate for instruction following.
LiveCodeBench v6: robust real-world code tasks under non-thinking decoding.
MMLU-ProX / INCLUDE: strong multilingual and knowledge benchmarks.
Long-context demos: 200K+ token inputs with stable recall and coherence.

Read methodology & results →

Qwen3-Instruct API

Open-source MoE for production — 262K context, fast instruction following, developer-first

Why Qwen3-Instruct API?

Popular Use Cases of Qwen3-Instruct API

Benchmark Highlights (July 2025)

Frequently Asked Questions of Qwen3-235B-A22B-Instruct-2507 API

Q What is the Qwen3-235B-A22B-Instruct-2507 API (open-source MoE LLM with 262K context)?

Q Qwen3 Instruct vs Thinking — which API should I choose for reasoning and chain-of-thought?

Q Does the Qwen 235B API support long documents (262k context LLM API for contracts, papers, codebases)?

Q How do I integrate the Qwen3 235B API with OpenAI-compatible SDKs (Python/Node)?

Q Can I self-host Qwen3-235B-A22B with vLLM or SGLang (on-prem, private cloud)?

Q What benchmarks does the Qwen3 Instruct API perform well on (Arena-Hard, LiveCodeBench, MMLU-ProX)?

Q How does pricing, quotas and rate limits work for the Qwen3-235B-A22B Instruct API?

Q Is the Qwen3-Instruct API suitable for enterprise (EU data residency, security, compliance)?

Q Qwen3-235B-A22B vs Llama/Mixtral — which long-context MoE LLM API should I pick?