Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO 


Vol. 50,  No. 5, pp. 790-792, May  2025
10.7840/kics.2025.50.5.790


PDF Full-Text
  Abstract

As the utilization of reinforcement learning (RL) in training large language models (LLMs) becomes more prevalent, the necessity to identify optimal RL methodologies tailored for LLMs has emerged. The fields of LLMs and RL are continually evolving through the development of novel techniques that contribute to their mutual advancement. This paper addresses the current trends in reinforcement learning algorithms aimed at enhancing the performance of large language models.

  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

T. Kim and S. Park, "Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 790-792, 2025. DOI: 10.7840/kics.2025.50.5.790.

[ACM Style]

Taehyun Kim and Soohyun Park. 2025. Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO. The Journal of Korean Institute of Communications and Information Sciences, 50, 5, (2025), 790-792. DOI: 10.7840/kics.2025.50.5.790.

[KICS Style]

Taehyun Kim and Soohyun Park, "Research on Reinforcement Learning Methodologies for Large Language Models Using TRPO, PPO, and DPO," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 790-792, 5. 2025. (https://doi.org/10.7840/kics.2025.50.5.790)
Vol. 50, No. 5 Index