Building An Efficient and Scalable Any-Modality Augmented Language Model

Community
arrow_forward_ios
Seminars

Seminars

Name: 문승환 (Seungwhan Moon)

Affiliation: Meta Reality Labs

Host: 김건희 교수

Date: 12/14/2023 오후 12:30 - 오후 01:30

Location: 302동 209호

Summary

Large Language Models (LLMs) have significantly enhanced the capacity of machines to understand and articulate human language. The progress in LLMs has also led to notable advancements in the vision-language domain, bridging the gap between image encoders and LLMs to combine their reasoning capabilities. However, most of the previous work focused on relatively small-scale models on a specific bi-modal pair (e.g. text and images), due to scaling challenges in model parameters and data availability.

In this talk, I will describe our recent work on Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, motion sensor), and generates textual responses. I will highlight several techniques we applied to efficiently reduce the training load while achieving state-of-the-art zeroshot multimodal reasoning capabilities.

Speaker Introduction

Seungwhan Moon is a Lead Research Scientist at Meta Reality Labs, conducting research in multimodal learning for AR/VR applications. His recent projects have focused on cutting-edge multimodal and knowledge-grounded conversational AI. He received his PhD in Language Technologies at School of Computer Science, Carnegie Mellon University under Prof. Jaime Carbonell. Before joining Meta, he has also worked at various research institutions including Snapchat Research, Samsung Research, etc.

expand_less

Performance, Reliability, and Security in the CXL Era

expand_more

Cost-Effective LLM Inference Solution Using SK hynix's AiM (Accelerator-in-Memory)

List

Seminars

Building An Efficient and Scalable Any-Modality Augmented Language Model

Community

Building An Efficient and Scalable Any-Modality Augmented Language Model