On Evaluating Web-Scale Extracted Knowledge Bases in a Comparative Way

On Evaluating Web-Scale Extracted Knowledge Bases in a Comparative Way

Tong Ruan, Liang Zhao, Yang Li, Haofen Wang, Xu Dong
Copyright: © 2018 |Pages: 23
DOI: 10.4018/IJSWIS.2018010104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this article, the authors design two metric sets considering Richness and Correctness based on a quasi-formal conceptual representation. They also design a novel metric set on overlapped instances of different KBs to make the metric results comparable. Finally, they use random sampling techniques to reduce human efforts for assessing the correctness. The authors evaluate three large Chinese KBs including DBpedia Chinese, Zhishi.me and SSCO comparatively, and further compare them with English KBs in terms of data set qualities. They also compare different versions of DBpedia and YAGO. The findings in these KBs not only give a detailed report of the current situation of extracted KBs, but also show the effectiveness of their methods in assessing the quality of Web-Scale KBs comparatively.
Article Preview
Top

1. Introduction

In recent years, an increasing number of semantic data sources have been published on the Web. These sources are further interlinked to form Linking Open Data (LOD). Among LOD, DBpedia1 and YAGO2 are the two main data sources serving as the hub. The DBpedia project (Bizer et al., 2009) extracts structured information from Wikipedia and publishes this information on the Web. DBpedia is currently one of the largest hubs of LOD. YAGO (Suchanek, Kasneci, & Weikum, 2007) is another huge and well-known semantic knowledge base (KB), derived from Wikipedia, WordNet and GeoNames. Both DBpedia and YAGO evolve and have published many versions.

Due to the multilingual nature of Wikipedia, both DBpedia and YAGO contain semantic data in Chinese. While Wikipedia is one of the largest encyclopedias on the Web, the number of Chinese articles is much fewer than that of articles in English or German. Thus, DBpedia and YAGO do not contain adequate Chinese knowledge compared with the size of knowledge expressed in English. On the other hand, in China, there are 10 times as many articles in Hudong-Baike3 and Baidu-Baike4, which are two Chinese encyclopedia Websites, as the Chinese version of Wikipedia. Emerging projects such as Zhishi.me (Niu et al., 2011), SSCO (Hu, Shao, & Ruan, 2014) and XLore (Wang et al., 2013) try to extract structured Chinese information from a combination of Chinese encyclopedia Web sites including Hudong-Baike, Baidu-Baike and Chinese Wikipedia. Both Zhishi.me5 and SSCO6 have Web sites with user-friendly GUIs for user access.

Since there are so many KBs in different languages that are extracted from different sources via different methods, it is natural to ask questions such as: How do the qualities of KBs change when new data sets of KBs are published? Are the qualities of Chinese KBs comparable to or better than their English counterparts? How are the qualities of extracted KBs with multiple data sources impacted by these data sources? Will these KBs share similar errors or not?

To address the assessment requirements of comparing Web-scale extracted KBs, we focus on two quality dimensions, namely Richness and Correctness. The reason is, whether a KB is Web-scale depends on the richness of the data, and extracted data is prone to errors. To find suitable metric sets to measure the above quality dimensions, we survey the research on metrics and methodologies on LOD evaluation, as all the above KBs are inspired by the design principles of LOD. Zaveri et al. (2016) summarized 69 metrics and categorized them into 4 dimensions, namely Accessibility, Intrinsic, Contextual and Representational. The sub dimensions of Intrinsic include Syntactic validity, Semantic accuracy, Consistency and Completeness. Our Richness dimension relates to Completeness sub dimensions in Zaveri et al. (2016), and our Correctness dimension relates to Syntactic validity, Semantic accuracy and Consistency. However, the metrics in a metric set of a sub dimension from Zaveri et al. (2016) are collected from different research works, and they logically overlap and interweave. In the meantime, they do not share a unified representation. In another pilot study, Glenn and Dave7 listed 15 metrics to assess the quality of a data set. The metrics include Accuracy, Completeness, Typing and Currency, etc. However, they do not provide any formulas on how to calculate these metrics.

We provide a graph-based conceptual representation for Web-scale KBs and define metric sets of the two dimensions in a quasi-formal way. Different KBs are represented by the same conceptual representation. This approach is different from TripleCheckMate (Kontokostas, Zaveri, Auer, & Lehmann, 2013), which is solely based on DBpedia. The conceptual representation consists of a schema graph and data graph. The metrics are defined on the two graphs, and we focus on the metrics on a data graph because our Chinese KBs have little schema information.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing