With the ever increasing popularity of emails, it is very common nowadays that people discuss specific issues, events or tasks among a group of people by emails. Those discussions can be viewed as conversations via emails and are valuable for the user as a personal information repository. In this talk, we discuss the problem of discovering and summarizing email conversations. We believe that solutions to this problem can greatly support users with their email folders. However, the characteristics of email conversations, e.g., lack of synchronization, conversational structure and informal writing style, make this task particularly challenging. We tackle this task by considering the following aspects: discovering emails in one conversation, capturing the conversation structure and summarizing the email conversation. We first study how to discover all emails belonging to one conversation. Second, we build a fragment quotation graph to capture email conversations. Based on the quotation graph, we develop a novel email conversation summarizer, ClueWordSummarizer(CWS). Furthermore, we study several ways to improve the accuracy by considering more lexical features. The comparison with a state-of-the-art email summarizer as well as with a popular multi-document summarizer shows that the method we propose obtains a higher accuracy in most cases. In addition, many of those improvements can significantly increase the accuracy especially the subjective words and phrases.
Xiaodong Zhou is a Research Scientist at AOL China. He got his PhD degree at the University of British Columbia on 2008 under the supervision of Dr. Raymond T. Ng and Dr. Giuseppe Carenini. He also got a MSc degree from CS@UBC in 2002. Before he came to Canada, he got a M.Eng degree from the Automation Department(National CIMS-ERC) in Tsinghua University. He also got the B.Eng. degree from the Automation Department in Dalian Maritime University in 1996. His interest covers email summarization, sentiment analysis, and many topics in text mining and information extraction.