Data Analytics: Integration, Privacy, and Knowledge
Data analytics has become an extremely important and challenging problem in disciplines like computer science, biology, and medicine. As massive amounts of data are available for analysis, scalable integration techniques and knowledge bases are becoming important. At the same time, new privacy issues arise where one's sensitive information can easily be inferred from a large amount of data.
In my talk, I will first focus on the problem of entity resolution (ER), which identifies database records that refer to the same real world entity. Next, I will introduce my work on managing information leakage where one must try to prevent important bits of information from being resolved by ER in order to gain data privacy. I will explain our information leakage model and propose using "disinformation" as a tool for reducing information leakage. Finally, I will talk about how knowledge bases are impacting search engines in understanding data and explain a new ontology being developed at Google Research that is specialized for search applications.
2012-현재 Research Scientist, Google Research (Structured Data Group)
2012 PhD in Computer Science, Stanford Univ.
2007 MS in Computer Science, Stanford Univ.
2003 BS in Computer Science, KAIST