Towards multi-modal extraction and summarization of conversations

Monday, November 23rd 2009, 10:30am
302 Building, Room 105


For many business intelligence applications, decision making depends critically on the information contained in all forms of "informal" text documents, such as emails, meeting summaries, attachments and web documents. For example, in a meeting, the topic of developing a new product was first raised. In subsequent follow-up emails, additional comments and discussions were added, which included links to web documents describing similar products in the market and user reviews on those products. A concise summary of this "conversation" is obviously valuable. However, existing technologies are inadequate in at least two fundamental ways. First, extracting "conversations" embedded in multi-genre documents is very challenging. Second, applying existing multi-document summarization techniques, where were designed mainly for formal documents, have proved to be highly ineffective when applied to informal documents like emails.

In this presentation, we give an overview of email summarization and meeting summarization methods. We give short demos on what we have developed so far. We discuss how some of the developed tools could be applicable to SAP/BObj activities. We conclude by presenting several open problems that need to be solved for multi-modal extraction and summarization of conversations to become a reality.

Speaker Bio

Dr. Raymond Ng is a professor in Computer Science at the University of British Columbia. His main research area for the past two decades is on data mining, with a specific focus on health informatics and text mining. He has published over 150 peer-reviewed publications on data clustering, outlier detection, OLAP processing, health informatics and text mining. He is the recipient of two best paper awards - from 2001 ACM SIGKDD conference, which is the premier data mining conference worldwide, and the 2005 ACM SIGMOD conference, which is one of the top database conferences worldwide. He was one of the program co-chairs of the 2009 International conference on Data Engineering, and one of the program co-chairs of the 2002 ACM SIGKDD conference. He was also one of the general co-chairs of the 2008 ACM SIGMOD conference. He was an editorial board member of the Very large Database Journal and the IEEE Transactions on Knowledge and Data Engineering until 2008.

For the past decade, Dr. Ng has co-led several large scale genomic projects, funded by Genome Canada, Genome BC and NSERC. The total amount of funding of those projects well exceeded $40 million Canadian dollars. He is also affiliated with the Heart and Lung Institute at the St Paul's Hospital and the BC Cancer Research Centre.