Small Language Models for On-Device and Private Intelligence

Mini T V

Authors

Mini T V Sacred Heart College (Autonomous), Chalakudy, Kerala, India. Author

Keywords:

Small Language Models, On-Device AI, Knowledge Distillation, Quantisation, GPTQ, AWQ, Edge AI, Private Inference

Abstract

While the public discussion of language models has been dominated by ever-larger frontier systems, a parallel research line is delivering capable models in the one-to-eight billion parameter regime that run on phones, laptops, and embedded devices. Models such as Microsoft Phi, Google Gemma, Meta Llama 3.2, Apple OpenELM, and Alibaba Qwen-1.5B demonstrate that careful data curation, knowledge distillation, and aggressive post-training quantisation can produce on-device models competitive with much larger ones on common reasoning and instruction-following benchmarks. This paper surveys the small-language-model (SLM) landscape, the algorithmic techniques that make it possible distillation, pruning, quantisation, and architectural innovation and the use cases, deployment patterns, and open challenges of running language models privately on user devices.