ViT-Based Future Road Image Prediction: Evaluation via VLM

Donghyun Kim; Jaerock Kwon; Haewoon Nam

ViT-Based Future Road Image Prediction: Evaluation via VLM

Donghyun Kim

Jaerock Kwon

Haewoon Nam

Vol. 50, No. 10, pp. 1532-1535, Oct. 2025

10.7840/kics.2025.50.10.1532

PDF Full-Text

Abstract

This paper proposes a Vision Transformer (ViT)-based model for predicting future driving scenes. The proposed ViT architecture processes input images as patches and leverages the attention mechanism to efficiently learn global visual information, while also integrating control inputs to effectively capture correlations between visual context and driving actions. Experimental results show that the ViT-based model generates sharper images than the baseline and achieves higher semantic similarity in explanation evaluations using a Vision-Language Model (VLM). These results suggest that the ViT architecture is effective not only for future prediction but also for explainable autonomous driving control.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Youngjae Cheong, Woomin Jun, Sungjin Lee · Vol. 49, No. 1

Sang-Lim Ju, Hyunjoo Kang, Seung-Hee Oh · Vol. 49, No. 2

Na Yeon Bae, Sung Kyun Choi, Dong Seog Han · Vol. 50, No. 4

Harashta Tatimma Larasati, Janghyun Ji, Jeonghwan Park, Howon Kim · Vol. 46, No. 2

Sangeun Park, Chanin Eom, Minhae Kwon · Vol. 50, No. 6

Cite this article

[IEEE Style]

D. Kim, J. Kwon, H. Nam, "ViT-Based Future Road Image Prediction: Evaluation via VLM," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 10, pp. 1532-1535, 2025. DOI: 10.7840/kics.2025.50.10.1532.

[ACM Style]

Donghyun Kim, Jaerock Kwon, and Haewoon Nam. 2025. ViT-Based Future Road Image Prediction: Evaluation via VLM. The Journal of Korean Institute of Communications and Information Sciences, 50, 10, (2025), 1532-1535. DOI: 10.7840/kics.2025.50.10.1532.

[KICS Style]

Donghyun Kim, Jaerock Kwon, Haewoon Nam, "ViT-Based Future Road Image Prediction: Evaluation via VLM," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 10, pp. 1532-1535, 10. 2025. (https://doi.org/10.7840/kics.2025.50.10.1532)

Vol. 50, No. 10 Index

ViT-Based Future Road Image Prediction: Evaluation via VLM

Submenu

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

Recent Publications
(LAST 3 YEARS)

ViT-Based Future Road Image Prediction: Evaluation via VLM

Study on Point Cloud Based 3D Object Detection for Autonomous Driving

Evaluating Urgency Levels of Emergency Alerts through Sentiment Analysis

Farmland Segmentation for Autonomous Agricultural Machinery

Performance Evaluation of Ripple-Carry Adders in Quantum Factoring Algorithm

Inference on Driving Characteristic Based on Time-Series Partial Observation of Vehicle

Submenu

Search (IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

POPULAR KEYWORDS(TOP 10 KEYWORDS)

Recent Publications(LAST 3 YEARS)

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

POPULAR KEYWORDS
(TOP 10 KEYWORDS)

Recent Publications
(LAST 3 YEARS)