Gemma is Google DeepMind's family of open-weights language models, designed to run efficiently on laptops, workstations, and edge devices without requiring cloud infrastructure. The lineup includes 2B and 7B parameter models (Gemma 1), followed by Gemma 2 with 2B, 9B, and 27B variants that significantly improved benchmark performance. The models are released under a permissive license that allows commercial use with attribution. They run on consumer hardware: the 2B model fits in under 4GB of RAM and runs on an M1 MacBook without GPU acceleration. Gemma models are available via HuggingFace, Ollama, Google AI Studio, and Vertex AI, and they integrate cleanly with popular inference frameworks like vLLM and llama.cpp. Gemma also includes specialized variants: CodeGemma for code generation, PaliGemma for vision-language tasks, and RecurrentGemma for faster inference on long sequences. For developers who need a capable, locally-hostable model without Meta's Llama license restrictions on commercial use, Gemma is a practical default choice.

What the community says

Gemma 2 earned respect from the open-source AI community for competitive performance at its size class, particularly the 27B model which performs close to larger proprietary models on many benchmarks. Developers appreciate the clean HuggingFace integration and the Google-backed maintenance. Some users prefer Llama 3 for its larger ecosystem and fine-tuned variants, and note that Gemma's license has some restrictions (no use in certain regulated industries) that Llama 3 handles differently.

See alternatives to Gemma →