* News


Capturing Global Coherence for Cross-source Information Extraction

Speaker: Heng Ji
CUNY, Queens College

March 19, 2013 - 4:00 p.m. to 5:00 p.m.
Location: CII(Low) 3051
Hosted By: Dr. Elliot Anshelevich (x6491)


Information Extraction (IE) is a task of identifying "facts", such as the attack/arrest events, people's jobs, people's whereabouts, merger and acquisition activity from news, patient diagnosis history from discharge summaries and experiment chains from scientific papers. Traditional IE techniques assess the ability to extract information from individual documents in isolation. However, users need to gather information which may be scattered among a variety of sources. These facts may be redundant, complementary, incorrect or ambiguously worded. Furthermore, the extracted information from a document may need to augment an existing Knowledge Base (KB). This requires the ability to link events, entities and associated relations in a document to KB entries and thus present many unique challenges. In this talk, I define several new extensions to state-of-the-art IE and systematically present the foundation, methodologies, algorithms, and implementations needed for more accurate, coherent, complete, concise, and most importantly, dynamic and resilient extraction capabilities. More specifically, I will focus on present several inference frameworks to ensure global coherence and commonality in topically-related documents to reduce uncertainty. I will present a case study of resolving morphed and implicit information in data under active censorship. I will also briefly present my other research programs on cross-lingual IE, cross-genre IE and cross-media IE and future research directions.


Heng Ji is an assistant professor in Computer Science at Queens College, and a doctoral faculty member in the Computer Science Department and Linguistics Department at the Graduate Center of City University of New York. She received her Ph.D. in Computer Science from New York University in 2007. Her research interests focus on Natural Language Processing, especially on Cross-source Information Extraction and Knowledge Base Population. She has published over 90 papers. Her recent work on uncertainty reduction for Information Extraction was invited for publication in the Centennial Year Celebration of IEEE Proceedings. She received a Google Research Award in 2009, NSF CAREER award in 2010, Sloan Junior Faculty award and IBM Watson Faculty award in 2012. She served as the coordinator of the NIST TAC Knowledge Base Population task in 2010 and 2011, the Information Extraction area chair of NAACL-HLT2012 and ACL2013 and the co-leader of the information fusion task of ARL NS-CTA program in 2011 and 2012. Her research has been funded by NSF, ARL, DARPA, Google and IBM.

Last updated: March 14, 2013