Open-weight local LLM

DANTE-Mosaic-3.5B

OdaxAI compact dense model based on SmolLM3-3B and distilled from Kimi K2. Strong small-model benchmark profile: GSM8K 74.45, HellaSwag 76.73 and MBPP 42.6. Apache 2.0, BF16 weights, practical for local Transformers/vLLM use.

Laptop ready 8 GB RAM BF16 Local chat on 8GB+ machines
Parameters
3.08B
Minimum RAM
8 GB
Model size
6.2 GB
Quantization
BF16

Can DANTE-Mosaic-3.5B run locally?

DANTE-Mosaic-3.5B is a good fit for normal laptops and compact desktops with 8 GB RAM or more.

Search for OdaxAI/DANTE-Mosaic-3.5B in LM Studio or another GGUF-compatible runtime.

chatreasoningcodelightmultilingual

Install path

01
Check RAM fitMinimum 8 GB RAM. Start with the BF16 quant.
02
Load the modelSearch OdaxAI/DANTE-Mosaic-3.5B in LM Studio.
03
Control locallyUse LocalClaw to manage models, agents, chat, channels and scheduled OpenClaw work.

Strengths

  • Compact 3.08B dense model that can run on modest local hardware
  • Apache 2.0 license with open weights, scripts, configs and evaluation assets
  • Distilled from Kimi K2 while retaining a practical small-model footprint
  • Strong reported small-model results: 74.45 GSM8K, 76.73 HellaSwag and 42.6 MBPP
  • Runs from standard Hugging Face Transformers and can be served locally with vLLM/SGLang-style stacks
  • Good candidate for laptop-friendly reasoning and coding experiments

Limitations

  • No official GGUF quantization in the main repository at listing time
  • BF16 weights are larger than a 3B Q4 GGUF would be
  • Not a frontier model; quality is bounded by small dense-model capacity
  • Context window is not clearly documented in the model card

Best use cases

  • Local chat on 8GB+ machines
  • Small-model reasoning experiments
  • Light coding help and MBPP-style programming tasks
  • Research on knowledge distillation from large MoE teachers
  • Multilingual assistants with a small memory footprint

Capability profile

speed
8
quality
7
coding
6
reasoning
7

Technical notes

Developer
OdaxAI
License
Apache 2.0
Context window
Unknown tokens
Architecture
Dense SmolLM3 causal language model fine-tune with 3.08B parameters. Distilled from Kimi K2 using generative cross-architecture / cross-tokenizer distillation.

This model fits these next steps

Hardware fit is based on LocalClaw's RAM tier, model size and quantization metadata. Always leave memory headroom for your OS and runtime.

Similar models to compare

Where to go next