What is ZAYA1-8B best for?

ZAYA1-8B is best used for Mathematical reasoning research.

Open-weight MoE

ZAYA1-8B

Q: Can ZAYA1-8B run locally?

ZAYA1-8B can run locally with at least 24 GB RAM. LocalClaw recommends BF16 (Zyphra fork) quantization.

Zyphra's Apache-2.0 reasoning MoE: 8.4B total parameters with only ~760M active, 16 experts, 131K context, Compressed Convolutional Attention and strong math/code benchmarks. Experimental for local use today: currently needs Zyphra vLLM/Transformers forks; LM Studio/GGUF/MLX support is not yet verified.

32 GB power user 24 GB RAM BF16 (Zyphra fork) Mathematical reasoning research

Run with LocalClaw Compare all models

Parameters

8.4B (760M active, MoE)

Minimum RAM

24 GB

Model size

17 GB

Quantization

BF16 (Zyphra fork)

Can ZAYA1-8B run locally?

ZAYA1-8B belongs on 32 GB machines when you want stronger quality without jumping to server hardware.

Search for Zyphra/ZAYA1-8B in LM Studio or another GGUF-compatible runtime.

Zyphra/ZAYA1-8B

chatcodereasoningmathexperimental

Install path

Check RAM fitMinimum 24 GB RAM. Start with the BF16 (Zyphra fork) quant.

Load the modelSearch Zyphra/ZAYA1-8B in LM Studio.

Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

Very high intelligence density: 8.4B total with ~760M active parameters
Strong mathematics, coding and long-form reasoning benchmarks
131K context window
Apache 2.0 license
Designed for test-time-compute workflows such as Markovian RSA

Limitations

Experimental local runtime support today
Currently documented with Zyphra forks of vLLM and Transformers
No verified LM Studio, Ollama, llama.cpp, GGUF or MLX support yet
BF16 weights are too heavy for a clean Mac mini M4 16 GB setup

Best use cases

Mathematical reasoning research
Coding and algorithmic problem solving
Reasoning benchmark experimentation
Server/local lab evaluation with Zyphra runtime forks
Future compact on-device MoE experiments once runtimes catch up

Capability profile

speed

quality

coding

reasoning

Technical notes

Developer

Zyphra

License

Apache 2.0

Context window

131,072 tokens

Architecture

Sparse MoE with Compressed Convolutional Attention (CCA), 16 experts, top-1 MLP router and learned residual scaling

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Comfortable headroomMac mini M4 Pro 48GB Mobile workstationMacBook Pro M4 Max 36GB Power-user picks32GB RAM guide

Similar models to compare

Qwen 3.5 (4B) 4B Qwen 3.6 (6.7B) 6.7B Gemma 4 E4B E4B DeepSeek R1 0528 Distill (8B) 8B

Where to go next

RAM guideFind models for this memory tier HardwareSee computers for local AI LocalClawControl OpenClaw from one native app