Open-weight MoE
ZAYA1-8B
Zyphra's Apache-2.0 reasoning MoE: 8.4B total parameters with only ~760M active, 16 experts, 131K context, Compressed Convolutional Attention and strong math/code benchmarks. Experimental for local use today: currently needs Zyphra vLLM/Transformers forks; LM Studio/GGUF/MLX support is not yet verified.
32 GB power user
24 GB RAM
BF16 (Zyphra fork)
Mathematical reasoning research
Parameters
8.4B (760M active, MoE)
Minimum RAM
24 GB
Model size
17 GB
Quantization
BF16 (Zyphra fork)
Can ZAYA1-8B run locally?
ZAYA1-8B belongs on 32 GB machines when you want stronger quality without jumping to server hardware.
Search for Zyphra/ZAYA1-8B in LM Studio or another GGUF-compatible runtime.
Zyphra/ZAYA1-8Bchatcodereasoningmathexperimental
Install path
01
Check RAM fitMinimum 24 GB RAM. Start with the BF16 (Zyphra fork) quant.02
Load the modelSearch Zyphra/ZAYA1-8B in LM Studio.03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.Strengths
- Very high intelligence density: 8.4B total with ~760M active parameters
- Strong mathematics, coding and long-form reasoning benchmarks
- 131K context window
- Apache 2.0 license
- Designed for test-time-compute workflows such as Markovian RSA
Limitations
- Experimental local runtime support today
- Currently documented with Zyphra forks of vLLM and Transformers
- No verified LM Studio, Ollama, llama.cpp, GGUF or MLX support yet
- BF16 weights are too heavy for a clean Mac mini M4 16 GB setup
Best use cases
- Mathematical reasoning research
- Coding and algorithmic problem solving
- Reasoning benchmark experimentation
- Server/local lab evaluation with Zyphra runtime forks
- Future compact on-device MoE experiments once runtimes catch up
Capability profile
Technical notes
This model fits these next steps
Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.