Automated Word Sense Disambiguation Using WordNet Ontology

Automated Word Sense Disambiguation Using WordNet Ontology

Khaoula Belila, Okba Kazar, Mohammed Charaf Eddine Meftah
Copyright: © 2022 |Pages: 18
DOI: 10.4018/IJOCI.313604
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Automatic word sense disambiguation is a major challenge in natural language processing domain. In recent years, many of supervised and knowledge-based approaches were developed to solve this problem. The use of a sense inventory as a knowledge background to disambiguate words is a very useful technique rather than supervised approaches, with its need of a large pre-trained text corpus. This paper proposes a new approach to disambiguate words in text based on WordNet ontology, as sense inventory. The authors introduce a new technique called Gloss+ for word-sense disambiguation (WSD), which is based on using of the glosses of WordNet synonyms of the target word and the local context in which this word is used. This technique exploits a special behavior of the polysemous synsets. This behavior is detected during the disambiguation process and is used to improve the results obtained. In the experiment part, the authors compare the proposed approach to the methodologies which use synonyms or glosses only.
Article Preview
Top

1. Introduction

Knowledge discovery in text (KDT) or text mining (TM) is the process of mining useful information or patterns in text documents. These patterns are implicitly presented in the non structured data included in the text. Traditionally, the information included in the text data is usually represented in the form of bag of terms1 or words (BOW). In these methods, documents features are weighted by occurrence frequencies of words, which give efficient computational performances to the classification task rather than others methods. However these approaches suffer from several problems; first, synonymy and polysemy (Zhong,N et al, 2012), where synonymy means different words of the same sense are not classified into the same class (sense), and polysemy means a word with multiple senses is not classified into different senses. Second, it breaks a phrase (Bing,L et al, 2015), say “computer sciences”, into independent features . In the

literature, many terms-based methods are found; it uses algorithms which are corpus-based, and are generally employed as supervised machine learning systems. So, an ample training on the corpus is needed (Charhate,S al, 2012) before being applied to the current data set. Among these methods, we cite, support vector machine (SVM) (Robertson, S. al, 2002), naive Bayesian and probabilistic models (Baeza,Y,1999) (Li,X,2003). Others methods have been introduced, to overcome the term-based approaches drawbacks in which, the features are represented by the use of phrases (Scott,S.,1999) (Lewis, D,20031992) (Sebastiani,F. 2002) (Ni,Pin. 2020). Although phrases are less ambiguous and more discriminative than individual terms(Zhong,N et al, 2012). In the presence of these backs in traditional techniques, the knowledge-based approaches have been introduced. Linguistics resources, such as dictionaries (a dictionary could be obtained by a word segmentation method like in (Yao,Linxia, et al.2019) of documents or from a corpus) and ontologies are used as background knowledge. The goal is to improve the semantic representation of the features extracted, by the use of semantic entities. For this reason it was necessary to search a semantic representation for the text content. This paper,

proposes a new approach called Gloss+ to extract useful concepts from text, based on the WordNet2 ontology. This approach refer to the Word Sense Disambiguation (WSD) problem. Solutions of WSD are generally categorized into supervised and knowledge-based approaches. The supervised methods are modeled as a classification problem and we have a classifier for each target word; in this case a large annotated corpora is needed and the problem solution will be expensive. However in the knowledge based category there is no need to a large sense-annotated corpora but only the need of a knowledge source as well, WordNet. For this reason this category have gained a rapid development in the recent years.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing