Knowledge Extraction From National Standards for Natural Resources: A Method for Multi-Domain Texts

Knowledge Extraction From National Standards for Natural Resources: A Method for Multi-Domain Texts

Taiyu Ban, Xiangyu Wang, Xin Wang, Jiarun Zhu, Lvzhou Chen, Yizhan Fan
Copyright: © 2023 |Pages: 23
DOI: 10.4018/JDM.318456
Article PDF Download
Open access articles are freely available for download

Abstract

National standards for natural resources (NSNR) plays an important role in promoting efficient use of China's natural resources, which sets standards for many domains such as marine and land resources. Its revision is difficult since standards in different domains may overlap or conflict. To facilitate the revision of NSNR, this paper extracts structural knowledge from the NSNR files to assist its revision. NSNR files are in multi-domain texts, where the traditional knowledge extraction methods could fall short in recalling multi-domain entities. To address this issue, this paper proposes a knowledge extraction method for multi-domain texts, including sub-domain relation discovery (SRD) and domain semantic features fusion (DSFF) module. SRD splits NSNR into sub-domains to facilitate the relation discovery. DSFF integrates relation features in the conditional random field (CRF) model to improve the capability of multi-domain entity recognition. Experimental results demonstrate that the proposed method could effectively extract structural knowledge from NSNR.
Article Preview
Top

1. Introduction

National Standards for Natural Resources (NSNR) (Huang, C. et al., 2018; Wang, S. et al., 2018) plays an important role in the utilization of China’s natural resources such as marine, land and urban resources. It contains the standards of many kinds of important industries1 and guides the future of national development and natural resource utilization. NSNR sets standards for multiple domains (Huang, C. et al., 2018), and the standards of each domain are organized from the perspective of different disciplines2, as shown in Figure 1. This characteristic could lead to duplicated and conflicting standards and makes the revision of NSNR difficult. Therefore, the knowledge extraction of NSNR (Wu, X et al., 2015) has significance for the analysis and revision of NSNR3.

For domain knowledge extraction (Chen, Y., 2018), entity recognition (Li, B., 2019) and relationship extraction (Wu, W. et al., 2019) are essential steps. The traditional methods of domain entity recognition and relationship extraction (Wang, B. et al., 2019) mainly include rule-based domain expert system (Jun-Ke, Z. et al., 2019) and machine learning (Chen, H. et al., 2008, 2010a) based methods (Chen, Y. et al., 2020; Liu, Z., &Chen, H., 2017). These methods mostly target certain domains with single industry or subject, instead of multiple domains. For the expert system, it has higher accuracy, but it is greatly time-consuming and requires high labor cost (Eftimov, T. et al. 2017). Moreover, the rule designing process is complicated, where various detailed features of domain entities and relationships should be considered requiring lots of manual labor. In addition, this kind of method has poor portability, an expert system is only applicable to one domain. When the domain changes, the expert system needs to be redesigned. For machine learning-based methods or some mixed methods (Jiang, Y., 2019; Savova, G. K. et al. 2010), they are more automatic (Qi, Y. et al., 2020) than the expert system. But these methods usually rely heavily on tagged corpus or domain relational database (Wang, J. et al. 2019). For NSNR, there are little tagged corpus. Therefore, it is difficult to use methods that demand lots of labeled domain corpus.

Complete Article List

Search this Journal:
Reset
Volume 35: 1 Issue (2024)
Volume 34: 3 Issues (2023)
Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 32: 4 Issues (2021)
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing