Open-weight local LLM

Qwen 3.5 MoE (397B/17B active)

Flagship open-source Qwen 3.5. Only 17B active params despite 397B total — world-class quality at MoE efficiency. Matches GPT-4o on major benchmarks. Requires multi-GPU or server-grade hardware. Apache 2.0.

Server-grade 256 GB RAM Q4_K_M Server-grade AI deployment (API serving)
Parameters
397B (17B active)
Minimum RAM
256 GB
Model size
200 GB
Quantization
Q4_K_M

Can Qwen 3.5 MoE (397B/17B active) run locally?

Qwen 3.5 MoE (397B/17B active) is server-grade locally. Keep it for comparison unless you have very large unified memory, multiple GPUs or remote inference.

Search for qwen3.5-397b-a17b in LM Studio or another GGUF-compatible runtime.

chatcodereasoningquality

Install path

01
Check RAM fitMinimum 256 GB RAM. Start with the Q4_K_M quant.
02
Load the modelSearch qwen3.5-397b-a17b in LM Studio.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • 🏆 Flagship open-source Qwen 3.5 — best quality available
  • Only 17B active params despite 397B total = MoE efficiency
  • Matches GPT-4o on major benchmarks (MMLU, HumanEval, MATH)
  • 256K context window
  • Hybrid thinking: toggle deep reasoning on/off per request
  • Apache 2.0 — fully open-source and commercial

Limitations

  • Requires ~256GB RAM (multi-GPU server or Mac Pro Ultra)
  • Files are ~200GB+ even heavily quantized
  • Not suitable for consumer hardware
  • Practical only with multi-GPU rigs or NAS + PCIe 4.0

Best use cases

  • Server-grade AI deployment (API serving)
  • Maximum quality research tasks
  • Frontier AI applications open-source
  • Complex long-context analysis
  • Replacing GPT-4o/Claude on local infrastructure
  • Enterprise AI on-premise

Capability profile

speed
2
quality
10
coding
10
reasoning
10

Technical notes

Developer
Alibaba Cloud (Qwen Team)
License
Apache 2.0
Context window
262,144 tokens
Architecture
Flagship MoE — 397B total parameters, only 17B active per token. World-record scale for open-source MoE.

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Similar models to compare

Where to go next