In this talk, I will introduce GaVA-CLIP, a new knowledge-augmented framework for gait video analysis, aimed at assessing diagnostic groups and gait impairment. Built on the powerful CLIP vision–language model, GaVA-CLIP learns from three complementary sources: gait videos, medical descriptions of classes, and numerical gait parameters.
Our contributions are twofold. First, we use a knowledge-aware prompt tuning strategy that leverages class-specific medical descriptions to guide text learning. Second, we incorporate paired gait parameters as “numerical text,” enhancing the model’s ability to reason quantitatively.
I will show how this approach not only surpasses state-of-the-art methods in classifying gait videos, but also generates human-readable explanations that combine medical terminology with quantitative gait measures. I’ll conclude by sharing how this opens the door to more interpretable, clinically relevant video analysis, and point to our public release of code and models.
Hyewon Seo is a research director at CNRS (Centre National de la Recherche Scientifique), affiliated with the Université de Strasbourg. She earned her B.Sc. and M.Sc. degrees in Computer Science from KAIST and completed her Ph.D. at MIRALab. Prior to joining CNRS, she served as an assistant professor at Chungnam National University in South Korea. Dr. Seo’s research expertise centers on 3D and 4D shape analysis and modeling, with a strong focus on human data. Over her career, she has authored around 70 peer-reviewed publications. Additionally, she has significantly contributed to the scientific community by serving on editorial boards, most notably as Associate Editor-in-Chief for The Visual Computer (2016–2020), and by co-organizing key international conferences such as CGI 2015, SPM/SMI2020, and CASA 2025.