Gemma 4 Suite Deep Dive: Google's Most Capable Local Models Yet

What Is Gemma 4?

Announced in April 2026, Gemma 4 is Google DeepMind's fourth generation of open-weights language models. Building on Gemma 3 and the PaliGemma vision experiments, Gemma 4 fully unifies language and vision into a single family — every model natively processes both text and images.

Architecturally, Google doubled down on two innovations from Gemma 3: interleaved local/global attention (enabling the 128K context at manageable memory cost) and grouped-query attention (GQA) for faster inference. The two smaller models (E2B, E4B) additionally use a Mixture-of-Experts (MoE) design inspired by Gemini Flash.

Architecture: Two Design Paradigms

The Gemma 4 family is split between MoE-based "E-series" models and denser larger models:

MoE E-Series vs Dense — Visual Comparison

⚡ E-Series (MoE) — E2B & E4B

~16B / ~30B total params, only 2B / 4B active per token. Same knowledge, fraction of the compute. Runs on any 8GB device.

🏔️ Dense Series — 26B-A4B & 31B

Full dense or hybrid-dense models. 26B with selective MoE layers, 31B fully dense. Maximum quality for workstation-class hardware.

The Gemma 4 Lineup — All 4 Models

Every Gemma 4 model shares the same SigLIP 2 vision encoder and 128K context window. Here's how they differ:

MoE Edge Champion

Gemma4-E2B

~16B total · 2B active · ~2.5GB VRAM Q4

View Details →

Speed

10/10

Quality

6/10

Vision

✓

MMLU

72.1

Best for: Edge devices, CPU-only inference, phone-class SoCs. Runs under 3GB RAM with Q4 quantization. Best multimodal model at the 2B class.

MoE ⭐ Community Favourite

Gemma4-E4B

~30B total · 4B active · ~4.8GB VRAM Q4

View Details →

Speed

9/10

Quality

7/10

Vision

✓

MMLU

79.3

Best for: Laptop / 8GB GPU (RTX 3060, M2 Pro). The sweet spot of the family. Near-7B dense quality at MoE efficiency. Ideal for daily coding, document analysis, image understanding.

Hybrid Power User Pick

Gemma4-26B-A4B

26B total · 4B active · ~15GB VRAM Q4

View Details →

Speed

7/10

Quality

9/10

Vision

✓

MMLU

85.4

Best for: Mac Studio 32GB / RTX 3090. Smarter knowledge base than E4B with same hardware requirements. Excellent reasoning + vision. Best capability-per-watt on mid-tier workstations.

🏆 FLAGSHIP Dense

Gemma4-31B

31B dense · ~22GB VRAM Q4

View Details →

Speed

5/10

Quality

10/10

Vision

✓

MMLU

91.2

Best for: RTX 4090 / M3 Ultra workstations. Competes with GPT-4o and Claude 3.5 Sonnet. Best open-weights model for instruction following, complex reasoning, code generation, and multimodal tasks.

Hardware Requirements — What Can You Run?

Here's a clear table for each Gemma 4 model with Q4_K_M quantization:

Model	Active Params	VRAM Q4	Recommended Hardware	Vision
Gemma4-E2B	2B (16B total)	~2.5 GB	Any device, CPU-only, phones, Mac Mini M4 16GB ✅	✓ Native
Gemma4-E4B	4B (30B total)	~4.8 GB	M1/M2 8GB, RTX 3060 8GB, Mac Mini M4 16GB ✅	✓ Native
Gemma4-26B-A4B	4B (26B total)	~15 GB	Mac Studio M4 Max 32GB, RTX 3090	✓ Native
Gemma4-31B	31B (dense)	~22 GB	RTX 4090 24GB, M3 Ultra, Mac Pro	✓ Native

💡 Mac & GPU Quick Guide

MacBook Air / Pro M1-M2 8GB → Gemma4-E2B or E4B ✅ Both run perfectly with vision
Mac Mini M4 16GB → Gemma4-E2B & Gemma4-E4B ✅ Both are top picks — E2B at full speed, E4B at 70–90 tok/s with full vision
Mac Mini M4 Pro 24GB → Gemma4-E4B Q5_K_M or Q8_0 ✅ Maximum E4B quality, very comfortable
Mac Studio M4 Max 32GB → Gemma4-26B-A4B ✅ the sweet spot for power users
RTX 4090 / M3 Ultra → Gemma4-31B ✅ flagship quality locally
CPU-only PC → E2B runs at 3–8 tok/s on modern x86 — perfectly usable

Native Vision — What Can It Actually Do?

Unlike previous Gemma generations that relied on separate PaliGemma checkpoints, Gemma 4 integrates vision natively via SigLIP 2. Images are processed at up to 896×896px per tile, with up to 16 tiles per prompt.

Practical use cases out of the box:

📄 Document Analysis

Feed a scanned PDF page and ask questions — no OCR layer needed. Works natively on all 4 models.

🐛 Code Screenshot Debug

Drop a screenshot of an error, Gemma 4 reads the code and identifies the bug. Works on E4B and up.

📊 Chart Reading

Describe trends from data visualisations — useful for business reports and research summaries.

🖼️ Multi-Image Reasoning

Compare two images side-by-side in the same prompt. 16-tile support = up to ~3584×3584px effective resolution.

How to Run Gemma 4 in LM Studio

Open LM Studio 0.3.8+ (download at lmstudio.ai)
Click the Search tab (🔍)
Type: gemma4-e4b-instruct (or your chosen model)
Select the Q4_K_M quantization for best quality/size balance
Click Download, then load in the Chat tab
To use vision: click the 📎 attachment icon to add images to your prompt

Ollama — CLI

ollama pull gemma4:e4b
ollama pull gemma4:26b-a4b
ollama pull gemma4:31b

Requires Ollama 0.5.3+ for vision support. Add --img /path/to/image.png for multimodal queries.

⚠️ Community GGUF Availability

At launch, official GGUFs are available for E2B and E4B from Bartowski and LM Studio's team. The 26B-A4B and 31B GGUFs are community-contributed. Always verify the sha256 hash before loading. Use Q4_K_M for the best quality/size trade-off.

Gemma 4 vs. The Competition

Model	Active Params	MMLU	HumanEval	Context	Vision	License
Gemma4-E4B	4B active	79.3	75.8	128K	✓	Gemma ToU
Qwen3-8B	8B	77.4	74.2	128K	✗	Apache 2.0
Llama 4 Scout	5B active	76.8	71.0	10M	✓	Llama 4 ToU
Gemma4-31B	31B dense	91.2	90.4	128K	✓	Gemma ToU
Qwen3-32B	32B	90.1	88.5	128K	✗	Apache 2.0

Verdict — Which Gemma 4 Should You Download?

MacBook Air / any 8GB device → Gemma4-E4B. The everyday workhorse. Runs on hardware you already own, handles images, delivers quality unthinkable at the 4B class a year ago.
Mac Mini M4 16GB → Gemma4-E2B & Gemma4-E4B. Both are top picks for this machine. E2B blazes at full speed, E4B runs at 70–90 tok/s with complete multimodal vision. You can even load both in LM Studio and switch between them.
You want maximum vision quality on a laptop → Gemma4-E4B. Q4_K_M on M1/M2 8GB, with full multimodal support at 75+ tok/s.
Mac Studio 32GB / RTX 3090 → Gemma4-26B-A4B. Same VRAM footprint as E4B but draws on a 26B knowledge base. Best capability-per-watt on mid-tier workstations.
RTX 4090 / M3 Ultra → Gemma4-31B. Near-GPT-4o quality with complete privacy. The most capable open-weights model for multimodal tasks as of April 2026.
Edge / embedded → Gemma4-E2B. Full vision capability in under 3GB RAM. Runs on a Raspberry Pi 5 or phone SoC.

🦀 Find Your Perfect Gemma 4 Model

Not sure which Gemma 4 to pick? Use LocalClaw's model finder — enter your RAM and get a personalized recommendation in 30 seconds.

Use Model Finder →

Gemma 4 Suite Deep Dive:
E2B, E4B, 26B-A4B & 31B

⚡ TL;DR — What You Need to Know

What Is Gemma 4?

Architecture: Two Design Paradigms

MoE E-Series vs Dense — Visual Comparison

The Gemma 4 Lineup — All 4 Models

Gemma4-E2B

Gemma4-E4B

Gemma4-26B-A4B

Gemma4-31B

Hardware Requirements — What Can You Run?

💡 Mac & GPU Quick Guide

Native Vision — What Can It Actually Do?

How to Run Gemma 4 in LM Studio

⚠️ Community GGUF Availability

Gemma 4 vs. The Competition

Verdict — Which Gemma 4 Should You Download?

🦀 Find Your Perfect Gemma 4 Model

Browse All Gemma 4 Models