Similarity Discriminating Algorithm for Scientific Research Projects

Chong Li, Jinjie Zhang, Anyu Wang, Xuemin Liu, Yunchsun Sun, Shibo Zhang, Zhixia Ji, Justin Z. Zhang

Source Title: Journal of Organizational and End User Computing (JOEUC) 35(1)

DOI: 10.4018/JOEUC.332008

Article PDF Download Open access articles are freely available for download

Abstract

An enormous challenge for project management is to identify similar research projects accurately and efficiently among numerous proposals. To address this challenge, this paper proposes an algorithm to calculate the similarity between research projects using an improved generating method for fused word order sentence vectors based on USIF (unsupervised random walk sentence embeddings). The experimental results show that the proposed algorithm is about 15.8% more accurate than the existing approaches. The authors also propose a pre-checking algorithm by introducing a complex research cooperation graph to enhance query efficiency. The results show the pre-checking method reduces the query time cost by 96% on average.

Article Preview

Top

Introduction

Checking the similarity of a scientific research project to other projects is the first step in determining whether it is worthy of funding. According to statistics, the duplication rate of research projects in China is 40%(Zhang et al., 2011). The repeated scientific research projects have caused a waste of scientific research resources and affected the national scientific and technological layout.

Research project similarity discrimination algorithm is a comprehensive technology involving natural language processing, knowledge graphs, information retrieval, and other fields. Combining multi-domain knowledge and research project data helps screen existing research projects similar to those in applications (including similar research contents, research objects, and research objectives), providing a reference for reviewers and funding agencies. Current research in scientific project similarity discrimination mostly focuses on keyword extraction, text similarity calculation, and project clustering, ignoring the correlation relationships embedded in the data. There are still some deficiencies in the model design, accuracy, and query efficiency of the algorithm.

To address the above problems, this paper conducts research based on the data of completed projects and project results of the National Natural Science Foundation of China. To improve the accuracy of scientific research project similarity discrimination, we propose a method for generating fused word order sentence vectors (IUFWO) based on improved Unsupervised Random Walk Sentence Embeddings (USIF). This method can improve the semantic characterization ability of USIF by introducing part-of-speech weight and position weight and integrating word order features into sentence vectors. Based on IUFWO, this paper designs a new research project similarity calculation method. This method judges the similarity of scientific research projects by the weighted sum of cosine similarity between the project name, abstract, keywords, and the conclusion summary of scientific research projects and improves the accuracy of the similarity.

Projects submitted by scholars with close cooperation are usually more likely to be similar or duplicates. From the perspective of the query efficiency of degree discrimination, the project cooperation relationship information between scholars and entities is extracted to construct a scientific research cooperation network, which is the basis for the scientific research project similarity discrimination algorithm. This algorithm prioritizes checking for duplication of projects where a collaborative relationship exists between participants. The experimental results show that the improved sentence vector generation method is about 16% higher than the TF-IDF weighted method so that the sentence vector can more accurately express the semantics of the text. The similarity calculation method of scientific research projects makes the similarity judgment results more discriminative. Compared with the calculation method of the average similarity of each content item, it is improved by about 15.8%. The similarity discrimination algorithm of scientific research projects based on a scientific research cooperation network makes the detection process more targeted. When there are repeated projects among related scholars, the troubleshooting time is shortened by 96% on average, which improves the efficiency of large-scale checking.

Complete Article List

Search this Journal:

Reset

Volume 36: 1 Issue (2024)

Volume 35: 3 Issues (2023)

Volume 34: 10 Issues (2022)

Volume 33: 6 Issues (2021)

Volume 32: 4 Issues (2020)

Volume 31: 4 Issues (2019)

Volume 30: 4 Issues (2018)

Volume 29: 4 Issues (2017)

Volume 28: 4 Issues (2016)

Volume 27: 4 Issues (2015)

Volume 26: 4 Issues (2014)

Volume 25: 4 Issues (2013)

Volume 24: 4 Issues (2012)

Volume 23: 4 Issues (2011)

Volume 22: 4 Issues (2010)

Volume 21: 4 Issues (2009)

Volume 20: 4 Issues (2008)

Volume 19: 4 Issues (2007)

Volume 18: 4 Issues (2006)

Volume 17: 4 Issues (2005)

Volume 16: 4 Issues (2004)

Volume 15: 4 Issues (2003)

Volume 14: 4 Issues (2002)

Volume 13: 4 Issues (2001)

Volume 12: 4 Issues (2000)

Volume 11: 4 Issues (1999)

Volume 10: 4 Issues (1998)

Volume 9: 4 Issues (1997)

Volume 8: 4 Issues (1996)

Volume 7: 4 Issues (1995)

Volume 6: 4 Issues (1994)

Volume 5: 4 Issues (1993)

Volume 4: 4 Issues (1992)

Volume 3: 4 Issues (1991)

Volume 2: 4 Issues (1990)

Volume 1: 3 Issues (1989)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Similarity Discriminating Algorithm for Scientific Research Projects

Abstract

Introduction

Complete Article List