Open-weight local LLM

Qwen 3.5 MoE (122B/10B active)

Large MoE model with only 10B active params. 60% cheaper to run than Qwen3-Max. 256K context. Top-tier reasoning, coding and multilingual. Hybrid think/non-think. Apache 2.0.

Large-memory workstation 80 GB RAM Q4_K_M Maximum quality AI tasks on local hardware
Parameters
122B (10B active)
Minimum RAM
80 GB
Model size
65 GB
Quantization
Q4_K_M

Can Qwen 3.5 MoE (122B/10B active) run locally?

Qwen 3.5 MoE (122B/10B active) needs a serious workstation with large unified memory or high VRAM.

Search for qwen3.5-122b-a10b in LM Studio or another GGUF-compatible runtime.

chatcodereasoningqualitypower

Install path

01
Check RAM fitMinimum 80 GB RAM. Start with the Q4_K_M quant.
02
Load the modelSearch qwen3.5-122b-a10b in LM Studio.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • 122B total params with only 10B active — 60% cheaper to run than Qwen3-Max
  • 256K context window
  • Top-tier reasoning, coding and multilingual quality
  • Hybrid thinking mode
  • Strong code generation rivaling specialized code models
  • Apache 2.0 fully commercial

Limitations

  • Requires ~80GB RAM (multi-GPU or Mac Pro/Studio Ultra)
  • MoE loading overhead
  • Files are 65GB+ even quantized
  • Primarily for enthusiasts with serious hardware

Best use cases

  • Maximum quality AI tasks on local hardware
  • Complex multi-step reasoning chains
  • Enterprise-grade code generation
  • Large codebase analysis (256K context)
  • Multilingual professional tasks
  • Research requiring frontier-level quality

Capability profile

speed
4
quality
10
coding
9
reasoning
10

Technical notes

Developer
Alibaba Cloud (Qwen Team)
License
Apache 2.0
Context window
262,144 tokens
Architecture
Mixture of Experts (MoE) — 122B total, 10B active per token. Large-scale sparse MoE with hybrid attention.

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Similar models to compare

Where to go next