Large Language Model Development with Mixture of Experts

소식
arrow_forward_ios
세미나

세미나

Large Language Model Development with Mixture of Experts

이름:

Young Jin Kim

직함: Principal Researcher

소속:

Microsoft Machine Translation Group

주최: 전병곤 교수

날짜: 2022. 7. 28. 오전 11:00 - 오후 12:00

위치: 302동 309-1호

요약

The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunities for drastically growing model size with significant accuracy gain while consuming much lower compute budget. However, supporting large scale MoE training also has its own set of system and modeling challenges. In this talk, various training algorithms to stabilize MoE training in practice are introduced as well as highly efficient MoE implementations. Especially, MoE models usually suffer from overfitting problem with the large memory capacity by design. Several novel regularization techniques including Stochastic Experts, Gating Dropout and Random Token Selection are introduced together with multi-task training paradigm. Finally, how those algorithms are improving the real world large scale models will be presented.

연사 소개

Young Jin Kim, Ph.D., is a Principal Researcher at the Microsoft Machine Translation group where he develops machine learning models with state-of-the-art techniques. His recent research focus includes designing efficient and effective algorithms and model architectures for large scale language models. Young received his Ph.D. from Georgia Institute of Technology for his research in deep learning and high-performance computing.

expand_less

It’s time for emerging cyber-physical systems to enable distributed and fault-tolerant computation

expand_more

Compiler-Directed High-Performance Intermittent Computation

세미나

Large Language Model Development with Mixture of Experts

소식

Large Language Model Development with Mixture of Experts