RAM tier guide

Best local LLMs for 8GB RAM

A static, Google-indexable guide to the best local AI models that fit in a 8GB RAM budget. Built from the LocalClaw model database and ranked by quality, reasoning, coding and speed.

Run the hardware recommender Browse all models

Compatible models

Best pick

LFM2.5-8B-A1B

RAM tier

8GB

Hardware fit

entry-level laptops, MacBook Air 8GB, compact PCs and everyday local AI experiments

Quick answer

With 8GB RAM, prioritize models with minimum RAM at or below 8GB and avoid filling memory completely. For most users, start with LFM2.5-8B-A1B, then test a faster smaller model if latency matters.

8GB RAM 16GB RAM 32GB RAM 64GB RAM 128GB RAM

Top models for 8GB RAM

LFM2.5-8B-A1B

8.3B (1.5B active) · 8GB min · Q4_K_M · 5.2GB

Liquid AI hybrid model built for on-device assistants. 8.3B total / 1.5B active, 128K context, tool use, GGUF, ONNX, MLX, llama.cpp and LM Studio support. Open-weight under LFM 1.0.

chatcodereasoningspeedstandard

Granite 4.1 (8B)

8B · 8GB min · Q4_K_M · 5GB

IBM Granite 4.1 long-context instruct model. Apache 2.0, 131K context, tool calling, RAG, code tasks, multilingual dialog and business assistant workflows on normal 8-16 GB machines.

chatcodereasoningstandardgeneral

DeepSeek R1 0528 Distill (8B)

8B · 8GB min · Q4_K_M · 5GB

Updated R1 reasoning distilled to Qwen3-8B. Improved chain-of-thought with fewer hallucinations vs original R1 distills. MIT licensed.

reasoningstandard

Qwen 3.6 (6.7B)

6.7B · 8GB min · Q4_K_M · 4.5GB

Alibaba's hybrid-thinking micro-flagship. Toggles between instant answers and deep chain-of-thought reasoning on demand. 128K context, 29 languages, outperforms Qwen3-8B on reasoning benchmarks. Apache 2.0.

chatcodereasoningspeedgeneral

Llama-3.1-Nemotron-Nano (4B)

4B · 6GB min · Q5_K_M · 2.8GB

⭐ Mac Mini M4 16GB top pick! NVIDIA fine-tune of Llama 3.1. Hybrid /think • /no_think mode — deep reasoning on demand, instant chat otherwise. ~80–120 tok/s on Apple Silicon Metal. 128K context. Apache 2.0.

chatlightspeedreasoning

Qwen 3 (8B)

8B · 8GB min · Q5_K_M · 5.5GB

One of the best 8B models ever made. Thinking mode + lightning fast. The new king of 8B.

chatcodestandardgeneralreasoning

Nemotron 3 Nano (4B)

4B · 6GB min · Q5_K_M · 2.8GB

⭐ Mac Mini M4 16GB top pick! NVIDIA's hybrid model — distilled from 9B, keeps 95% of its quality. Hybrid attention + SSM layers = ~80–120 tok/s on Apple Silicon. Blazing fast, minimal RAM. NVIDIA Open Model License.

chatlightspeedreasoning

Qwen 3.5 (9B)

9B · 8GB min · Q4_K_M · 6GB

The best small Qwen 3.5 for everyday use. Strong reasoning, coding and chat at 9B scale with hybrid thinking mode and 256K context. Runs on 8-16 GB RAM. Great for Mac Mini M4 Pro. Apache 2.0.

chatcodereasoninggeneral

Granite 3.3 (8B Instruct)

8B · 8GB min · Q5_K_M · 4.9GB

IBM enterprise-grade 8B. Trained for RAG, tool-use and structured output. Strong function calling and long-context performance (128K). Apache 2.0 with full data provenance.

chatcodestandardgeneralreasoning

#10

Gemma 4 E4B

E4B · 8GB min · Q4_K_M · 4.6GB

Gemma 4 balanced edge model with strong multimodal quality and 256K context. Great for laptops and high-end mobile devices. Apache 2.0.

chatvisionstandardmultimodalreasoning

#11

DANTE-Mosaic-3.5B

3.08B · 8GB min · BF16 · 6.2GB

OdaxAI compact dense model based on SmolLM3-3B and distilled from Kimi K2. Strong small-model benchmark profile: GSM8K 74.45, HellaSwag 76.73 and MBPP 42.6. Apache 2.0, BF16 weights, practical for local Transformers/vLLM use.

chatreasoningcodelightmultilingual

#12

Qwen 3 Coder (8B)

8B · 8GB min · Q4_K_M · 5GB

Qwen coding specialist with long context. Great for agentic coding tasks. 477K downloads.

codestandard

#13

Gemma 3n (8B)

8B · 8GB min · Q4_K_M · 5GB

Google on-device powerhouse with vision. Designed for phones/tablets/laptops but punches far above its weight. Per-layer memory management for constrained devices. Apache 2.0.

chatvisionstandardgeneral

#14

DeepSeek R1 Distill (8B)

8B · 8GB min · Q5_K_M · 5.5GB

DeepSeek's reasoning model distilled to 8B. Shows its thought process step-by-step. Mind-blowing for logic.

chatreasoningstandard

#15

Cogito (8B)

8B · 8GB min · Q4_K_M · 5GB

Hybrid reasoning model outperforming peers. Strong general + reasoning at 8B. 558K downloads.

chatreasoningstandardgeneral

#16

EXAONE Deep (7.8B)

7.8B · 8GB min · Q4_K_M · 4.7GB

LG AI Research reasoning model. Strong at math and coding reasoning. 200K downloads.

reasoningstandard

#17

Qwen 2.5 (7B)

7B · 8GB min · Q4_K_M · 4.5GB

Alibaba's 18T token trained model. Excellent multilingual and coding. 14.9M downloads. Wide community support.

chatcodestandardgeneral

#18

InternVL3 (8B)

8B · 8GB min · Q4_K_M · 5GB

Shanghai AI Lab multimodal model. Strong vision understanding for documents, charts, and photos. MIT licensed. Note: primarily PyTorch/safetensors — community GGUF may vary.

visionstandard

How to choose at 8GB

Use Q4_K_M or Q5_K_M quantization for the best quality/speed balance.
Leave memory headroom for macOS/Windows, browser tabs and LM Studio overhead.
For coding, prioritize coding and reasoning scores. For chat, quality and speed matter more.
If a model feels slow, drop one size tier before changing hardware.