Digital Library[ Search Result ]
Search : "[ keyword: communication overhead ]" (2)
An Analysis on Inference Time, Accuracy, Communication, and GPU Memory Usage for Inference Batch of Large Language Models
Changyong Shin Younghun Go Yeonho Yoo Gyeongsik Yang Chuck Yoo
Vol. 49, No. 10, pp. 1377-1385, Oct. 2024
10.7840/kics.2024.49.10.1377
Vol. 49, No. 10, pp. 1377-1385, Oct. 2024
10.7840/kics.2024.49.10.1377