[Seminar] Enabling Scalable Learning on Massive Datasets

Date: 
Friday, April 12th 2019, 11:00am - Friday, April 12th 2019, 12:00pm
Location: 
301동 306호

호스트: 문봉기 교수

Summary

Data science has emerged as the next frontier for data-driven decision making, innovation, and discovery. As a result, there is tremendous interest in developing automated methods to extract insights from massive datasets. While statistical models provide an elegant framework to gain knowledge from data, the volume and variety of big data (arising in many domains) demand a paradigm shift—datasets are heterogeneous, massive, and distributed in nature. Massive datasets are being stored and processed in large-scale commodity clusters, and several new frameworks have emerged for scalable machine learning (e.g., Parameter Server, Petuum, SystemML).

In this talk, we will present our effort to scale Bayesian network structure learning on large datasets. Specifically, we will present a novel approach called DiSC (Distributed Score Computation) for fast, approximate score computation required during Bayesian network structure learning. We will discuss the design of DiSC and present some key theoretical results along with an empirical comparison with MapReduce-style computation. Finally, we will briefly discuss our ongoing work in scaling statistical relational learning on social media data for cyber threat detection and deep learning for cervical cancer cell classification.

Speaker Bio

Dr. Praveen Rao is an associate professor in the Department of Computer Science & Electrical Engineering at University of Missouri-Kansas City (UMKC). He has been awarded fundings from the National Science Foundation (NSF), Intel Labs, University of Missouri Research Board, Amazon Web Services, IBM, Kansas City Power and Light to pursue projects from Scalable RDF Query Processing Using a Cloud Infrastructure to health informatics. Praveen Rao was recognized by IBM for his work in the development of curriculum that will prepare a skilled workforce capable of tackling challenges in Big Data management in 2013. He is one of 14 professors worldwide that were selected by IBM Big Data and Analytics Faculty Awards for this award.

문의: DBS연구실 (880-6575)