Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO

Taehyun Kim; Soohyun Park

Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO

Taehyun Kim

Soohyun Park

Vol. 50, No. 5, pp. 790-792, May 2025

10.7840/kics.2025.50.5.790

RLHF

LLMs

PDF Full-Text

Abstract

As the utilization of reinforcement learning (RL) in training large language models (LLMs) becomes more prevalent, the necessity to identify optimal RL methodologies tailored for LLMs has emerged. The fields of LLMs and RL are continually evolving through the development of novel techniques that contribute to their mutual advancement. This paper addresses the current trends in reinforcement learning algorithms aimed at enhancing the performance of large language models.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

T. Kim and S. Park, "Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 790-792, 2025. DOI: 10.7840/kics.2025.50.5.790.

[ACM Style]

Taehyun Kim and Soohyun Park. 2025. Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO. The Journal of Korean Institute of Communications and Information Sciences, 50, 5, (2025), 790-792. DOI: 10.7840/kics.2025.50.5.790.

[KICS Style]

Taehyun Kim and Soohyun Park, "Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 790-792, 5. 2025. (https://doi.org/10.7840/kics.2025.50.5.790)

Vol. 50, No. 5 Index

Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO

Submenu

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

Recent Publications
(LAST 3 YEARS)

Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO

Submenu

Search (IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

Advanced Search

POPULAR KEYWORDS(TOP 10 KEYWORDS)

Recent Publications(LAST 3 YEARS)

Search
(IN TITLE, AUTHOR, ABSTRACT,KEYWORDS)

POPULAR KEYWORDS
(TOP 10 KEYWORDS)

Recent Publications
(LAST 3 YEARS)