What Is Gemma 4?
Announced in April 2026, Gemma 4 is Google DeepMind's fourth generation of open-weights language models. Building on Gemma 3 and the PaliGemma vision experiments, Gemma 4 fully unifies language and vision into a single family — every model natively processes both text and images.
Architecturally, Google doubled down on two innovations from Gemma 3: interleaved local/global attention (enabling the 128K context at manageable memory cost) and grouped-query attention (GQA) for faster inference. The two smaller models (E2B, E4B) additionally use a Mixture-of-Experts (MoE) design inspired by Gemini Flash.
Architecture: Two Design Paradigms
The Gemma 4 family is split between MoE-based "E-series" models and denser larger models:
MoE E-Series vs Dense — Visual Comparison
~16B / ~30B total params, only 2B / 4B active per token. Same knowledge, fraction of the compute. Runs on any 8GB device.
Full dense or hybrid-dense models. 26B with selective MoE layers, 31B fully dense. Maximum quality for workstation-class hardware.
The Gemma 4 Lineup — All 4 Models
Every Gemma 4 model shares the same SigLIP 2 vision encoder and 128K context window. Here's how they differ:
Gemma4-E2B
~16B total · 2B active · ~2.5GB VRAM Q4
Gemma4-E4B
~30B total · 4B active · ~4.8GB VRAM Q4
Gemma4-26B-A4B
26B total · 4B active · ~15GB VRAM Q4
Gemma4-31B
31B dense · ~22GB VRAM Q4
Hardware Requirements — What Can You Run?
Here's a clear table for each Gemma 4 model with Q4_K_M quantization:
| Model | Active Params | VRAM Q4 | Recommended Hardware | Vision |
|---|---|---|---|---|
| Gemma4-E2B | 2B (16B total) | ~2.5 GB | Any device, CPU-only, phones, Mac Mini M4 16GB ✅ | ✓ Native |
| Gemma4-E4B | 4B (30B total) | ~4.8 GB | M1/M2 8GB, RTX 3060 8GB, Mac Mini M4 16GB ✅ | ✓ Native |
| Gemma4-26B-A4B | 4B (26B total) | ~15 GB | Mac Studio M4 Max 32GB, RTX 3090 | ✓ Native |
| Gemma4-31B | 31B (dense) | ~22 GB | RTX 4090 24GB, M3 Ultra, Mac Pro | ✓ Native |
💡 Mac & GPU Quick Guide
- MacBook Air / Pro M1-M2 8GB → Gemma4-E2B or E4B ✅ Both run perfectly with vision
- Mac Mini M4 16GB → Gemma4-E2B & Gemma4-E4B ✅ Both are top picks — E2B at full speed, E4B at 70–90 tok/s with full vision
- Mac Mini M4 Pro 24GB → Gemma4-E4B Q5_K_M or Q8_0 ✅ Maximum E4B quality, very comfortable
- Mac Studio M4 Max 32GB → Gemma4-26B-A4B ✅ the sweet spot for power users
- RTX 4090 / M3 Ultra → Gemma4-31B ✅ flagship quality locally
- CPU-only PC → E2B runs at 3–8 tok/s on modern x86 — perfectly usable
Native Vision — What Can It Actually Do?
Unlike previous Gemma generations that relied on separate PaliGemma checkpoints, Gemma 4 integrates vision natively via SigLIP 2. Images are processed at up to 896×896px per tile, with up to 16 tiles per prompt.
Practical use cases out of the box:
Feed a scanned PDF page and ask questions — no OCR layer needed. Works natively on all 4 models.
Drop a screenshot of an error, Gemma 4 reads the code and identifies the bug. Works on E4B and up.
Describe trends from data visualisations — useful for business reports and research summaries.
Compare two images side-by-side in the same prompt. 16-tile support = up to ~3584×3584px effective resolution.
How to Run Gemma 4 in LM Studio
- Open LM Studio 0.3.8+ (download at lmstudio.ai)
- Click the Search tab (🔍)
- Type:
gemma4-e4b-instruct(or your chosen model) - Select the Q4_K_M quantization for best quality/size balance
- Click Download, then load in the Chat tab
- To use vision: click the 📎 attachment icon to add images to your prompt
ollama pull gemma4:e4bollama pull gemma4:26b-a4bollama pull gemma4:31b
Requires Ollama 0.5.3+ for vision support. Add --img /path/to/image.png for multimodal queries.
⚠️ Community GGUF Availability
At launch, official GGUFs are available for E2B and E4B from Bartowski and LM Studio's team. The 26B-A4B and 31B GGUFs are community-contributed. Always verify the sha256 hash before loading. Use Q4_K_M for the best quality/size trade-off.
Gemma 4 vs. The Competition
| Model | Active Params | MMLU | HumanEval | Context | Vision | License |
|---|---|---|---|---|---|---|
| Gemma4-E4B | 4B active | 79.3 | 75.8 | 128K | ✓ | Gemma ToU |
| Qwen3-8B | 8B | 77.4 | 74.2 | 128K | ✗ | Apache 2.0 |
| Llama 4 Scout | 5B active | 76.8 | 71.0 | 10M | ✓ | Llama 4 ToU |
| Gemma4-31B | 31B dense | 91.2 | 90.4 | 128K | ✓ | Gemma ToU |
| Qwen3-32B | 32B | 90.1 | 88.5 | 128K | ✗ | Apache 2.0 |
Verdict — Which Gemma 4 Should You Download?
- MacBook Air / any 8GB device → Gemma4-E4B. The everyday workhorse. Runs on hardware you already own, handles images, delivers quality unthinkable at the 4B class a year ago.
- Mac Mini M4 16GB → Gemma4-E2B & Gemma4-E4B. Both are top picks for this machine. E2B blazes at full speed, E4B runs at 70–90 tok/s with complete multimodal vision. You can even load both in LM Studio and switch between them.
- You want maximum vision quality on a laptop → Gemma4-E4B. Q4_K_M on M1/M2 8GB, with full multimodal support at 75+ tok/s.
- Mac Studio 32GB / RTX 3090 → Gemma4-26B-A4B. Same VRAM footprint as E4B but draws on a 26B knowledge base. Best capability-per-watt on mid-tier workstations.
- RTX 4090 / M3 Ultra → Gemma4-31B. Near-GPT-4o quality with complete privacy. The most capable open-weights model for multimodal tasks as of April 2026.
- Edge / embedded → Gemma4-E2B. Full vision capability in under 3GB RAM. Runs on a Raspberry Pi 5 or phone SoC.
🦀 Find Your Perfect Gemma 4 Model
Not sure which Gemma 4 to pick? Use LocalClaw's model finder — enter your RAM and get a personalized recommendation in 30 seconds.
Use Model Finder →