Index Key Compression and On-the-Fly Reconstruction of In-Memory Indexes

Index Key Compression and On-the-Fly Reconstruction of In-Memory Indexes

Yongsik Kwon, Cheol Ryu, Sang Kyun Cha, Arthur H. Lee, Kunsoo Park, Bongki Moon
Copyright: © 2022 |Pages: 17
DOI: 10.4018/JDM.305732
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This article proposes an index key compression scheme based on the notion of distinction bits. It proves that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys. The actual compression ratio may vary depending on the characteristics of datasets (an average of 2.76:1 compression ratio was observed in the authors’ experiments). However, the index key compression scheme leads to significant performance improvements during the reconstruction of large-scale indexes. This study’s index key compression can be effectively used for database replication and index recovery in modern main-memory database systems.
Article Preview
Top

Introduction

Main-memory database systems have been used for many applications to keep the latency low and the transaction throughput high. In such a main-memory database system, indexes are often deployed without on-disk representations (Diaconu et al., 2013; Faerber et al., 2017; Microsoft, 2020; Zhang et al., 2016). By letting the indexes reside solely in memory, it can sustain the best attainable performance of the indexes (even in the presence of updates). Insertions, deletions, and updates to a database table will reflect to all associated indexes and the table itself. While those changes are applied to the table in both memory and disk, they will only be applied to the indexes residing in the main memory. Consequently, those update operations will not incur any disk accesses for keeping the indexes current. In addition, the changes to the indexes will not be written to the log or checkpointed to disk. The most recent copy of each index can be restored from its base table (Malviya et al., 2014).

None of the index updates are propagated to disk; however, all the indexes must be reconstructed from scratch when the database system restarts from a failure, an anti-cached table is loaded back from disk to memory, or an entire database is replicated from the master node to a slave node. For a table with many associated indexes, the cost of loading the table may increase due to the additional cost of constructing the indexes from the rows of the table. Therefore, it is an important challenge to limit the cost of index reconstruction so the database loading time and restart time can be kept to its minimum.

This article’s proposed approach relies on the distinction bits of the index keys. This compressed key sort method, which utilizes the distinction bits of the keys, is considered a kind of key compression. The compressed key sort approach assumes a range of key lengths. The experimental evaluation demonstrates that the overall cost of B+-tree construction can be reduced by 21% to 54% for real-world datasets. These experiments also show that the compressed key sort is readily parallelizable for multicore processors, yields near-linear speedup, and can build a B+-tree index faster than loading its image from disk or enterprise-class solid-state drive (SSD).

The key question posed in this article asks: What is the minimum amount of information required (in terms of the number of bits in the index keys) to determine the sorted order among the keys correctly? Whoever can determine the answer will extract the minimum number of bits from index keys and sort them correctly but more efficiently. Once the keys are sorted, the B+-tree index will be built by following the standard bulk-load procedure. In this process, as illustrated in Figure 1, the top flow shows the conventional steps for index construction. The bottom shows the proposed compressed key sort applied to index construction. Based on the formal definition of distinction bits presented in this article, the authors will prove that those bits are sufficient information to determine the sorted order of the index keys correctly.

This article is organized as follows. First, the article will discuss related work and introduce the authors’ compressed key sort. Then, the article will describe the index key formats for various data types and the metadata for efficient index rebuilding. The authors will explain the procedure of rebuilding the index before showing the results of their experiments. Finally, the article presents a conclusion.

Figure 1.

Compressed key sort

JDM.305732.f01
Top

An extensive amount of research on efficient index structures for database tables has measured efficiency through index size, search time, and concurrency control. For example, the following work focused on improving index structures with respect to one or more of those measures: preðx B-trees (Bayer & Unterauer, 1977), partial-key B-tree (Bohannon et al., 2001), CSB+-tree (Rao & Ross, 2000), pB+-tree (Chen et al., 2001), generalized prefix tree (Boehm et al., 2011), KISS-tree (Kissinger et al., 2012), FAST (Kim et al., 2010), adaptive radix tree (Leis et al., 2013), dynamic similarity index (Kozak et al., 2014), iCPI-tree (Andrzejewski & Boinski, 2015), spatiotemporal index (Eom & Lee, 2017), HOT (Binna et al., 2018), and SuRF (Zhang et al., 2018).

Complete Article List

Search this Journal:
Reset
Volume 35: 1 Issue (2024)
Volume 34: 3 Issues (2023)
Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 32: 4 Issues (2021)
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing