* Faculty       * Staff       * Students & Alumni       * Committees       * Contact       * Institute Directory
* Undergraduate Program       * Graduate Program       * Courses       * Institute Catalog      
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Colloquia       * Seminars       * News       * Events       * Institute Events      
* Overview       * Lab Manual       * Institute Computing      
No Menu Selected

* Research

Ph.D. Theses

Modeling Heterogeneous Networks for Information Ranking, Enrichment and Resolution on Microblogs

By Hongzhao Huang
Advisor: Heng Ji
April 9, 2015

Microblogging, a new type of online information sharing platform through short messages of up to 140 characters, has grown up quickly and received increasing attentions in recent years. A microblogging platform (e.g., Twitter) enables both individuals and organizations to disseminate information, from current affairs to breaking news in a timely fashion, which makes it a valuable knowledge source with super-fresh information. For example, during Hurricane Irene in 2011, updates from users living in New York City and transportation/ evacuation posts from the government are very useful information for people to keep track of the disaster. Therefore, conducting related Natural Language Processing (NLP) research on this new genre is demanded to assist knowledge mining and discovery.

Different from the semi-structured knowledge bases (e.g., Wikipedia) and the traditional news, the informal microblogs tend to be noisy, short, and informal. And the phenomenon of information implicitness is more prominent and pervasive in microblogging. These characteristics bring unique challenges to people.s reading and understanding of the informal microblogs, as well as many knowledge mining and discovery tasks. Thus, in order to alleviate these problems, in this thesis we propose to filter noisy and uninformative information, enrich the short microblogs with background knowledge from knowledge bases such as Wikipedia, and resolve the informal and implicit information to their regular referents.

To achieve our goals, we propose to leverage and model heterogeneous information networks (HINs), in contrast to most existing NLP approaches on traditional genres (e.g., news) that only explored single type of information (e.g., texts). Microblogging contains heterogeneous types of information from social network structures to cross-genre linkages, forming rich HINs. By designing effective approaches to model both unstructured texts and structured HINs, we can incorporate additional evidence from HIN structures beyond texts. In this thesis, we present different approaches to construct HINs from cross-genre, cross-source, and cross-type information by incorporating the existing clean social relations, as well as performing deep content analysis with some of the well-developed NLP approaches. We also present various effective approaches including unsupervised propagation, semi-supervised graph regularization, supervised learning-to- rank and deep neural networks to model HINs for ranking, classification, and similarity measurement. Our experimental results demonstrate that heterogeneous information network analysis approaches are also powerful in the field of NLP. Thus this thesis sheds light on the exploration of HINs for other NLP tasks, especially on microblogs.

* Return to main PhD Theses page



---