Qwen3-Thinking API
Open-source reasoning at scale — MoE efficiency, 256K context, explicit thinking mode
Qwen3-235B-A22B-Thinking-2507 is a purpose-built reasoning model. With a Mixture-of-Experts (235B total / ~22B active) and native **256K tokens**, it excels at multi-step logic, math proofs, code reasoning and long-document synthesis — while remaining Apache-2.0 open source and production-ready.
To use the API for inference, please register an account first. You can view and manage your API token in the API Token dashboard.
All requests to the inference API require authentication via an API token. The token uniquely identifies your account and grants secure access to .
When calling the API, set the Authorization
header to your API token, configure the request parameters as shown below, and send the request.
Why Qwen3-Thinking API?
- 235B total / ~22B active MoE — deep reasoning with efficient expert routing.
- Native 256K context — load long contracts, papers or multi-file codebases in one call.
- Explicit thinking mode —
<think>…</think>
more transparent multi-step reasoning. - Open-source Apache-2.0 — audit, self-host, fine-tune; no vendor lock-in.
- Developer-first — OpenAI-compatible endpoints, streaming, retries, observability.
- Designed for hard tasks — math & logic, code comprehension, research synthesis.
Model | Architecture | Native Context | Open Source | Best For |
Qwen3-235B-A22B-Thinking-2507 | MoE (235B / ~22B active), explicit thinking | 262 K | Multi-step reasoning, long documents, math & code | |
OpenAI o4-mini | Proprietary dense + thinking | 200 K → | General reasoning | |
Claude 3.5 Sonnet | Proprietary | 200 K | Long-context assistance | |
Gemini 2.5 Pro | Proprietary | 128 K | Multimodal tasks |
Popular Use Cases of Qwen3-Thinking API
Solve competition-level problems with step-by-step, verifiable chains of thought.
Ingest 100–200+ page specs, contracts or literature and synthesize precise, cited summaries.
Explain unfamiliar code paths, propose fixes, and generate tests across multi-file repos.
Plan, decompose and reason through complex topics (GPQA-style) with transparent intermediate steps.
Benchmark Highlights (July 2025)
- AIME'25: strong competition-level math performance.
- LiveCodeBench v6: robust real-world code reasoning.
- GPQA / SuperGPQA: high-level graduate-style QA.
- Arena-Hard v2: competitive head-to-head win rate.
Frequently Asked Questions of Qwen3-Thinking API
In the Qwen3-235B-A22B-Thinking-2507 API, “A22B” indicates aMixture-of-Experts (MoE) LLM with ~22B active parameters per token (out of ~235B total). This design gives the Qwen3 reasoning model APIdeep multi-step reasoning while keeping inference efficient.
The Qwen3-Thinking API uses an explicit <think>
phase (often called chain-of-thought) before the final answer. With Qwen/Qwen3-235B-A22B-Thinking-2507, you can request transparent, step-wise reasoning for logic, math and code tasks—ideal for users searchingchain-of-thought API or open-source reasoning LLM.
Yes. The native 256K-token context lets this long-context LLM API process large contracts, research papers, and multi-file repos in a single call. For discovery, users often searchLLM with 256k context window or AI model for long documents. Best practices: chunking, citations, and retrieval.
Yes. Qwen3-Thinking is released under Apache-2.0, making it a leading open-source GPT-4 alternative for reasoning. You can self-host, fine-tune, and integrate with your stack.
The Qwen3 reasoning model API is competitive on public benchmarks (e.g., AIME, GPQA, LiveCodeBench) while remaining open and self-hostable. Many users discover it via searches like best open-source reasoning model orQwen3 vs GPT-4 reasoning.
We offer a free tier for evaluation and usage-based paid plans—popular queries includeQwen3 API pricing and Qwen3-235B-A22B-Thinking-2507 API cost.