Article Preview
Top1. Introduction
National Standards for Natural Resources (NSNR) (Huang, C. et al., 2018; Wang, S. et al., 2018) plays an important role in the utilization of China’s natural resources such as marine, land and urban resources. It contains the standards of many kinds of important industries1 and guides the future of national development and natural resource utilization. NSNR sets standards for multiple domains (Huang, C. et al., 2018), and the standards of each domain are organized from the perspective of different disciplines2, as shown in Figure 1. This characteristic could lead to duplicated and conflicting standards and makes the revision of NSNR difficult. Therefore, the knowledge extraction of NSNR (Wu, X et al., 2015) has significance for the analysis and revision of NSNR3.
For domain knowledge extraction (Chen, Y., 2018), entity recognition (Li, B., 2019) and relationship extraction (Wu, W. et al., 2019) are essential steps. The traditional methods of domain entity recognition and relationship extraction (Wang, B. et al., 2019) mainly include rule-based domain expert system (Jun-Ke, Z. et al., 2019) and machine learning (Chen, H. et al., 2008, 2010a) based methods (Chen, Y. et al., 2020; Liu, Z., &Chen, H., 2017). These methods mostly target certain domains with single industry or subject, instead of multiple domains. For the expert system, it has higher accuracy, but it is greatly time-consuming and requires high labor cost (Eftimov, T. et al. 2017). Moreover, the rule designing process is complicated, where various detailed features of domain entities and relationships should be considered requiring lots of manual labor. In addition, this kind of method has poor portability, an expert system is only applicable to one domain. When the domain changes, the expert system needs to be redesigned. For machine learning-based methods or some mixed methods (Jiang, Y., 2019; Savova, G. K. et al. 2010), they are more automatic (Qi, Y. et al., 2020) than the expert system. But these methods usually rely heavily on tagged corpus or domain relational database (Wang, J. et al. 2019). For NSNR, there are little tagged corpus. Therefore, it is difficult to use methods that demand lots of labeled domain corpus.