Analyzing Quantized Small Language Models for Efficient Edge Deployment

Sooyoung Jang; Seungho Yang; Changbeom Choi

Analyzing Quantized Small Language Models for Efficient Edge Deployment

Sooyoung Jang

Seungho Yang

Changbeom Choi

Vol. 50, No. 9, pp. 1364-1380, Sep. 2025

10.7840/kics.2025.50.9.1364

Small Language Models

PDF Full-Text

Abstract

Quantized small language models (SLMs) offer a promising approach for deploying advanced natural language process- ing (NLP) services on resource-constrained edge devices. However, an in-depth examination of how different quantization configurations influence accuracy and efficiency remains underexplored. This paper systematically evaluates 72 quantized variants of Llama 3.2 (1B and 3B parameters) and Qwen 2.5 (1.5B and 3B parameters) across 13 quantization configura- tions, ranging from q2_K to q6_K. We use the MMLU-Pro benchmark to measure the accuracy (including and excluding random guesses), inference time, resource utilization, and power consumption on an NVIDIA Jetson Orin Nano. Our findings reveal that low-bit quantized models often rely heavily on random guessing, with modest accuracy improvements observed when these are excluded. Furthermore, Qwen 2.5 models generally yield superior accuracy and lower latency than Llama 3.2, albeit with higher sensitivity to quantization, whereas Llama 3.2 exhibits more consistent performance across quantization configurations. CPU utilization remains low (approximately 1-4%), with GPU utilization peaking up to 90% and power consumption ranging from 9.2 W to 11.5 W. Variability across different domains (computer science, engineering, and math) underscores the importance of selecting the appropriate model family, parameter size, and quantization configuration for specific applications. We conclude by outlining future directions for improving on-device NLP, including mixed-precision quantization, hardware-specific optimizations, and broader assessments covering multilingual or multimodal tasks.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Kyoung-min Lee, Mee-jeong Lee · Vol. 49, No. 9

Hyeon Min Kim, Gil-Mo Kang, Jong-Ho Lee, Oh-Soon Shin · Vol. 44, No. 11

Jin-Uk Jung, Young-Woo Kwon · Vol. 50, No. 3

Dongho Seo, Haewoon Nam · Vol. 42, No. 8

Tae Hyun Kim, Yeong Min Jang · Vol. 50, No. 7

Cite this article

[IEEE Style]

S. Jang, S. Yang, C. Choi, "Analyzing Quantized Small Language Models for Efficient Edge Deployment," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 9, pp. 1364-1380, 2025. DOI: 10.7840/kics.2025.50.9.1364.

[ACM Style]

Sooyoung Jang, Seungho Yang, and Changbeom Choi. 2025. Analyzing Quantized Small Language Models for Efficient Edge Deployment. The Journal of Korean Institute of Communications and Information Sciences, 50, 9, (2025), 1364-1380. DOI: 10.7840/kics.2025.50.9.1364.

[KICS Style]

Sooyoung Jang, Seungho Yang, Changbeom Choi, "Analyzing Quantized Small Language Models for Efficient Edge Deployment," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 9, pp. 1364-1380, 9. 2025. (https://doi.org/10.7840/kics.2025.50.9.1364)

Vol. 50, No. 9 Index

Analyzing Quantized Small Language Models for Efficient Edge Deployment

Submenu

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

Recent Publications
(LAST 3 YEARS)

Analyzing Quantized Small Language Models for Efficient Edge Deployment

MEC-Based Smart Parking System Considering User Satisfaction and Network Overhead

Performance Analysis of Energy Beamforming Based on Limited Feedback in a Wireless Power Transfer System

Improving the Efficiency of Container Image Deployment through a Cluster Container Registry

Performance of Energy Detection Based on Non-Uniform Quantization in Cooperative Cognitive Networks

Performance Measurement of a Real-Time Optical Camera Communication System on an Edge Server

Submenu

Search (IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

POPULAR KEYWORDS(TOP 10 KEYWORDS)

Recent Publications(LAST 3 YEARS)

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

POPULAR KEYWORDS
(TOP 10 KEYWORDS)

Recent Publications
(LAST 3 YEARS)