Domain Cartridge:

Unsupervised Framework for Shallow Domain Ontology Construction from Corpus

In this work we propose an unsupervised framework to construct a shallow domain ontology from corpus. It is essential for Information Retrieval systems, Question-Answering systems, Dialogue etc. to identify important concepts in the domain and the relationship between them. We identify important domain terms of which multi-words form an important component. We show that the incorporation of multi-words improves parser performance, resulting in better parser output, which improves the performance of an existing Question-Answering system by upto 7%. On manually annotated smartphone dataset, the proposed system identifies 40.87% of the domain terms, compared to 22% recall obtained using WordNet, 43.77% by Yago and 53.74% by BabelNet respectively. However, it does not use any manually annotated resource like the compared systems. Thereafter, we propose a framework to construct a shallow ontology from the discovered domain terms by identifying four domain relations namely, Synonyms ('similar-to'), Type-Of ('isa'), Action-On ('methods') and Feature-Of ('attributes'), where we achieve significant performance improvement over WordNet, BabelNet and Yago without using any mode of supervision or manual annotation.

Publications

  • Subhabrata Mukherjee, Jitendra Ajmera and Sachindra Joshi.
    Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construction from Corpus
    Proc. of the 23rd ACM International Conference on Information and Knowledge Management (CIKM). 2014.
    PDF BIB SLIDES

Downloads

Dataset used in the CIKM 2014 paper: