Semi-Supervised Event Extraction Incorporated With Topic Event Frame

Gongqing Wu, Zhuochun Miao, Shengjie Hu, Yinghuan Wang, Zan Zhang, Xianyu Bao

Source Title: Journal of Database Management (JDM) 34(1)

DOI: 10.4018/JDM.318453

Article PDF Download Open access articles are freely available for download

Abstract

Supervised Meta-event extraction suffers from two limitations: (1) The extracted meta-events only contain local semantic information and do not present the core content of the text; (2) model performance is easily degraded because of labeled samples with insufficient number and poor quality. To overcome these limitations, this study presents an approach called frame-incorporated semi-supervised topic event extraction (FISTEE), which aims to extract topic events containing global semantic information. Inspired by the frame-based knowledge representation, a topic event frame is developed to integrate multiple meta-events into a topic event. Combined with the tri-training algorithm, a strategy for selecting unlabeled samples is designed to expand the training sets, and labeling models based on conditional random field (CRF) are constructed to label meta-events. The experimental results show that the event extraction performance of FISTEE is better than supervised learning-based approaches. Furthermore, the extracted topic events can present the core content of the text.

Article Preview

Top

Introduction

Rapid advancements made in Internet technologies have resulted in massive volumes of data in the form of text digitization. In the light of ever-increasing textual data, a technology that can automatically mine useful information from text is urgently needed. In this context, information extraction technology emerged at a historic moment and has been widely used (Fiori et al., 2014). Event extraction is the most challenging operation in information extraction, which aims to automatically extract information that users are interested in from unstructured text and present it in the form of structured events (Ahn, 2006). Event extraction has given a huge impetus to the development of knowledge graph construction (Bu et al., 2021), text mining (Lyu & Liu, 2021), information retrieval (Feng et al., 2021), etc.

At present, event extraction can be divided into meta-event extraction and topic event extraction, where a meta-event only describes simple actions or state changes, whereas a topic event describes the developmental processes of things. Event extraction broadly involves two subtasks: trigger extraction and event argument extraction, where a trigger refers to a keyword that can clearly express the occurrence of an event, and an event argument refers to the related descriptions such as time, place, and participant of the event. An event can be detected, and its type can be determined by identifying the trigger. Each event type is provided with a unique representation frame, and each relevant entity in the sentence determines whether it is an event argument based on the frame, and if so, its argument role can be determined.

Traditional meta-event extraction approaches mainly adopt pattern matching and machine learning. The former refers to the detection and extraction of meta-events under the guidance of meta-event templates, which show effective performance in specific fields. However, building meta-event templates is time-consuming and laborious; furthermore, building a general meta-event template is difficult. The latter is modeled as a multi-classification task or sequence labeling task, after which the extracted features are used as model inputs to complete the meta-event extraction. However, training models using supervised learning strategy requires large volumes of labeled samples and considering that these labeled samples are generally manufactured by experts, their manufacturing cost is high. When the quantity of labeled samples is small and the categories are unbalanced, the extraction performance of the models decreases. To overcome this limitation, researchers have proposed the adoption of semi-supervised learning strategy (Zhou & Li, 2010) that utilizes a small number of labeled samples and a large number of unlabeled samples to train models. Tri-training (Zhou & Li, 2005) is a classical semi-supervised learning algorithm that adopts bootstrapping to train three classifiers, makes them work together, and expands the training set by constantly introducing new training samples from unlabeled sample set to obtain three classifiers with excellent performance. Because unlabeled samples are cheap and easy to obtain, the use of semi-supervised strategy to train high-performance event extraction models is a current research hotspot.

Compared with sentence-level meta-events, document-level topic events contain richer global semantic information, including multi-facet meta-events, which can present the core content of the text from a global perspective. However, the description information of topic events is scattered in the text, and the existing meta-event extraction approaches cannot meet the demand of topic event extraction, which is a complicated procedure. The difficulty lies in determining all topic-related meta-events within the scope of the document and merging and extracting these meta-events. At present, the event frame or ontology is usually applied in some topic event extraction work to represent each component of the topic event and the relations between them, which has achieved superior results in specific fields. Nevertheless, the existing topic event extraction technologies are not mature enough; especially the intra-textual semantic understanding and cross-textual event extraction need further research.

Complete Article List

Search this Journal:

Reset

Volume 35: 1 Issue (2024)

Volume 34: 3 Issues (2023)

Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming

Volume 32: 4 Issues (2021)

Volume 31: 4 Issues (2020)

Volume 30: 4 Issues (2019)

Volume 29: 4 Issues (2018)

Volume 28: 4 Issues (2017)

Volume 27: 4 Issues (2016)

Volume 26: 4 Issues (2015)

Volume 25: 4 Issues (2014)

Volume 24: 4 Issues (2013)

Volume 23: 4 Issues (2012)

Volume 22: 4 Issues (2011)

Volume 21: 4 Issues (2010)

Volume 20: 4 Issues (2009)

Volume 19: 4 Issues (2008)

Volume 18: 4 Issues (2007)

Volume 17: 4 Issues (2006)

Volume 16: 4 Issues (2005)

Volume 15: 4 Issues (2004)

Volume 14: 4 Issues (2003)

Volume 13: 4 Issues (2002)

Volume 12: 4 Issues (2001)

Volume 11: 4 Issues (2000)

Volume 10: 4 Issues (1999)

Volume 9: 4 Issues (1998)

Volume 8: 4 Issues (1997)

Volume 7: 4 Issues (1996)

Volume 6: 4 Issues (1995)

Volume 5: 4 Issues (1994)

Volume 4: 4 Issues (1993)

Volume 3: 4 Issues (1992)

Volume 2: 4 Issues (1991)

Volume 1: 2 Issues (1990)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Semi-Supervised Event Extraction Incorporated With Topic Event Frame

Abstract

Introduction

Complete Article List