Spatial Physical Generative AI

소속: KAIST
주최: 주한별 교수
날짜: 2026/3/30 오후 02:00 - 오후 03:30
위치: 302동 106호
대표 이미지
요약

Recent breakthroughs in large language models (LLMs) and generative AI have demonstrated remarkable capabilities in text, image, and video synthesis. However, despite their scale and fluency, these models remain fundamentally limited in their understanding of 3D space, physical interaction, and embodied reasoning. This talk explores the next frontier beyond LLMs: Spatial Physical Generative AI — systems that not only generate content but understand and reason about the physical world. We begin by examining the evolution from Transformers and large-scale pretraining toward vision-language and world models that attempt grounded intelligence. While current generative video and multimodal models achieve impressive visual realism, they often lack true spatial consistency, physical plausibility, and compositional generalization. Addressing these limitations requires integrating 3D scene representations, physics-based simulation, and generative diffusion frameworks. The talk presents recent advances in 3D spatial AI, motion diffusion, physics-guided video generation, and category-level 6D pose estimation. We introduce methods that combine regression and diffusion modeling with score scaling sampling to capture multi-hypothesis pose distributions efficiently. Furthermore, we highlight MPMAvatar, a hybrid mesh–3D Gaussian Splatting framework that enables physically accurate cloth simulation and photorealistic avatar rendering, demonstrating realistic deformation, collision handling, and zero-shot scene interaction. By unifying generative models with spatial representation and physical simulation, we move toward AI systems capable of embodied reasoning, real-world interaction, and scalable physical intelligence. Spatial Physical Gen AI, also called world models, represents a critical step toward grounded artificial general intelligence (AGI), bridging perception, generation, and physical understanding.

연사 소개

Tae-Kyun (T-K) Kim is Professor and the director of Computer Vision and Learning Lab at School of Computing, KAIST since 2020, and has been an adjunct reader of Imperial College London (ICL), UK for 2020-2024. He led Computer Vision and Learning Lab at Imperial College during 2010-2020. He obtained his PhD from Univ. of Cambridge in 2008 and Junior Research Fellowship (governing body) of Sidney Sussex College, Univ. of Cambridge during 2007-2010. His BSc and MSc are from KAIST in 1998 and 2000, he worked at Samsung AIT for 2000-2004 (military duty). His research interests primarily lie in machine (deep) learning for 3D computer vision, generative AI and Physics-based AI, including: articulated 3D hand/body reconstruction, face analysis and recognition, 6D object pose estimation, activity recognition, object detection/tracking, active robot vision, which lead to novel active and interactive visual sensing. He has co-authored over 100 academic papers in top-tier conferences and journals in the field, and has co-organised series of HANDS workshops and 6D Object Pose workshops (in conjunction with CVPR/ICCV/ECCV) since 2015 to 2020. He was the general chair of BMVC17 in London, the program co-chair of BMVC23, and is Associate Editor of IEEE Trans on PAMI, Pattern Recognition Journal, Image and Vision Computing Journal. He regularly serves as an Area Chair for top-tier vision/ML conferences. He received KUKA best service robotics paper award at ICRA 2014, and 2016 best paper award by the ASCE Journal of Computing in Civil Engineering, and the best paper finalist at CVPR 2020, and his co-authored algorithm for face image representation is an international standard of MPEG-7 ISO/IEC.