Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

On Evaluating Web-Scale Extracted Knowledge Bases in a Comparative Way

Tong Ruan, Liang Zhao, Yang Li, Haofen Wang, Xu Dong

Source Title: International Journal on Semantic Web and Information Systems (IJSWIS) 14(1)

DOI: 10.4018/IJSWIS.2018010104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

In this article, the authors design two metric sets considering Richness and Correctness based on a quasi-formal conceptual representation. They also design a novel metric set on overlapped instances of different KBs to make the metric results comparable. Finally, they use random sampling techniques to reduce human efforts for assessing the correctness. The authors evaluate three large Chinese KBs including DBpedia Chinese, Zhishi.me and SSCO comparatively, and further compare them with English KBs in terms of data set qualities. They also compare different versions of DBpedia and YAGO. The findings in these KBs not only give a detailed report of the current situation of extracted KBs, but also show the effectiveness of their methods in assessing the quality of Web-Scale KBs comparatively.

Article Preview

Top

1. Introduction

In recent years, an increasing number of semantic data sources have been published on the Web. These sources are further interlinked to form Linking Open Data (LOD). Among LOD, DBpedia¹ and YAGO² are the two main data sources serving as the hub. The DBpedia project (Bizer et al., 2009) extracts structured information from Wikipedia and publishes this information on the Web. DBpedia is currently one of the largest hubs of LOD. YAGO (Suchanek, Kasneci, & Weikum, 2007) is another huge and well-known semantic knowledge base (KB), derived from Wikipedia, WordNet and GeoNames. Both DBpedia and YAGO evolve and have published many versions.

Due to the multilingual nature of Wikipedia, both DBpedia and YAGO contain semantic data in Chinese. While Wikipedia is one of the largest encyclopedias on the Web, the number of Chinese articles is much fewer than that of articles in English or German. Thus, DBpedia and YAGO do not contain adequate Chinese knowledge compared with the size of knowledge expressed in English. On the other hand, in China, there are 10 times as many articles in Hudong-Baike³ and Baidu-Baike⁴, which are two Chinese encyclopedia Websites, as the Chinese version of Wikipedia. Emerging projects such as Zhishi.me (Niu et al., 2011), SSCO (Hu, Shao, & Ruan, 2014) and XLore (Wang et al., 2013) try to extract structured Chinese information from a combination of Chinese encyclopedia Web sites including Hudong-Baike, Baidu-Baike and Chinese Wikipedia. Both Zhishi.me⁵ and SSCO⁶ have Web sites with user-friendly GUIs for user access.

Since there are so many KBs in different languages that are extracted from different sources via different methods, it is natural to ask questions such as: How do the qualities of KBs change when new data sets of KBs are published? Are the qualities of Chinese KBs comparable to or better than their English counterparts? How are the qualities of extracted KBs with multiple data sources impacted by these data sources? Will these KBs share similar errors or not?

To address the assessment requirements of comparing Web-scale extracted KBs, we focus on two quality dimensions, namely Richness and Correctness. The reason is, whether a KB is Web-scale depends on the richness of the data, and extracted data is prone to errors. To find suitable metric sets to measure the above quality dimensions, we survey the research on metrics and methodologies on LOD evaluation, as all the above KBs are inspired by the design principles of LOD. Zaveri et al. (2016) summarized 69 metrics and categorized them into 4 dimensions, namely Accessibility, Intrinsic, Contextual and Representational. The sub dimensions of Intrinsic include Syntactic validity, Semantic accuracy, Consistency and Completeness. Our Richness dimension relates to Completeness sub dimensions in Zaveri et al. (2016), and our Correctness dimension relates to Syntactic validity, Semantic accuracy and Consistency. However, the metrics in a metric set of a sub dimension from Zaveri et al. (2016) are collected from different research works, and they logically overlap and interweave. In the meantime, they do not share a unified representation. In another pilot study, Glenn and Dave⁷ listed 15 metrics to assess the quality of a data set. The metrics include Accuracy, Completeness, Typing and Currency, etc. However, they do not provide any formulas on how to calculate these metrics.

We provide a graph-based conceptual representation for Web-scale KBs and define metric sets of the two dimensions in a quasi-formal way. Different KBs are represented by the same conceptual representation. This approach is different from TripleCheckMate (Kontokostas, Zaveri, Auer, & Lehmann, 2013), which is solely based on DBpedia. The conceptual representation consists of a schema graph and data graph. The metrics are defined on the two graphs, and we focus on the metrics on a data graph because our Chinese KBs have little schema information.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

On Evaluating Web-Scale Extracted Knowledge Bases in a Comparative Way

Abstract

1. Introduction

Complete Article List