Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Pre-Cutoff Value Calculation Method for Accelerating Metric Space Outlier Detection

Honglong Xu, Zhonghao Liang, Kaide Huang, Guoshun Huang, Yan He

Source Title: International Journal of Grid and High Performance Computing (IJGHPC) 16(1)

DOI: 10.4018/IJGHPC.334125

Article PDF Download Open access articles are freely available for download

Abstract

Outlier detection is an important data mining technique. In this article, the triangle inequality of distances is leveraged to design a pre-cutoff value (PCV) algorithm that calculates the outlier degree pre-threshold without additional distance computations. This algorithm is suitable for accelerating various metric space outlier detection algorithms. Experimental results on multiple real datasets demonstrate that the PCV algorithm reduces the runtime and number of distance computations for the iORCA algorithm by 14.59% and 15.73%, respectively. Even compared to the new high-performance algorithm ADPOD, the PCV algorithm achieves 1.41% and 0.45% reductions. Notably, the non-outlier exclusion for the first data block in the dataset is significantly improved, with an exclusion rate of up to 36.5%, leading to a 23.54% reduction in detection time for that data block. While demonstrating excellent results, the PCV algorithm maintains the data type generality of metric space algorithms.

Article Preview

Top

Introduction

The continuous expansion of data volumes and innovative applications propels the burgeoning growth of the big data industry. The cumulative volume of data places significant stress on data storage capabilities. However, simply increasing storage devices is not a sustainable solution. Viable strategies include timely data analysis and mining, as they alleviate the pressure on storing raw data and maximize data value.

As a core step in data processing, data mining is an active academic research field, giving rise to various techniques, such as classification, clustering, association analysis, and outlier detection. While most data mining techniques focus on discovering regular patterns within datasets, non-regular patterns are sometimes equally valuable. Outlier detection algorithms constitute a vital branch of data mining designed to identify these non-regular patterns. They enable the discovery of distinctive data points within a dataset, known as outliers or anomalies. Thanks to outlier detection, we can promptly identify exceptional events within data, such as network attacks (Catillo et al., 2023), fraudulent transactions (Hilal et al., 2022), or equipment malfunctions (Nesa et al., 2018), thus reducing losses. Moreover, outlier detection contributes to enhancing data quality (Larson et al., 2019), recognizing exceptional outcomes within datasets, and even unearthing new knowledge.

Research into outlier detection algorithms focuses on either specialized or generic algorithms. Specialized algorithms are tailored and optimized for the characteristics of data in various domains, making full use of available information to expedite outlier detection. However, in the era of big data, the ever-expanding and diverse data types pose significant challenges to the design capacity of specialized algorithms. In such circumstances, generic outlier detection algorithms have emerged. Generic algorithms abstract and unify commonalities across data types in different domains, performing searches and mining using only partial information from the dataset, thereby enabling the application of a single algorithm across diverse data types. While there may be some performance trade-offs, generic algorithms significantly reduce data mining systems’ development and maintenance costs, bringing them significant attention in academic and industrial circles in recent years.

Fortunately, most data types can be designed with distance functions that adhere to the triangle inequality, allowing them to be mapped to metric spaces (Mao et al., 2015). This, in turn, facilitates the use of metric space algorithms for retrieval, analysis, and mining. Metric space outlier detection utilizes a definition of outliers entirely based on distance and detection algorithms that are also fully reliant on distance. It eschews the use of any information other than distance, making it applicable to a wide range of data types.

Metric space outlier detection algorithms belong to the distance-based outlier detection algorithm class. Much like their counterparts, they employ distance triangle inequality for pruning to eliminate non-outliers and accelerate the outlier detection process. This step heavily depends on the cutoff value of the outlier degree, where a higher value leads to improved efficiency in non-outlier exclusion. However, existing metric space outlier detection algorithms face a critical issue when detecting the first data block, where the available outlier degree cutoff value is set to 0, causing significant delays in detecting that block and severely impeding overall detection efficiency. To address this problem, we propose a pre-cutoff value calculation method that can be used to accelerate metric space outlier detection. By making full use of the distance triangle inequality, this method calculates an initial outlier degree cutoff value, hereafter referred to as the “pre-cutoff value,” through minimal computations.

The main contributions of this paper are summarized as follows:

(1)
Analyzing the time allocation of each data block in the outlier detection process reveals that the time cost of the first data block is significantly greater than that of other blocks, and we analyze the reasons behind this.
(2)
Introducing a pre-cutoff value calculation method to accelerate metric space outlier detection, utilizing the distance triangle inequality, and designing a method that does not require additional distance computations.
(3)
Proving mathematically that the pre-cutoff value–based accelerated outlier detection algorithm guarantees correct results.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 2 Issues (2023)

Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Pre-Cutoff Value Calculation Method for Accelerating Metric Space Outlier Detection

Abstract

Introduction

Complete Article List