Article Preview
TopIntroduction
In this paper, we focus the attention to the context of query optimization techniques over relational Data Warehouses (RDW) developed on top of cluster environments (Lima et al., 2009). A RDW is usually modeled by means of a star schema consisting of a huge fact table and a number of dimension tables, similarly to what shown in Figure 1 as related to the widely-known data warehouse benchmark APB-1 release II (OLAP Council, 2010). Here, the fact table Sales is joint to the following four dimension tables: Product, Customer, Time, Channel. Star queries are typically executed against RDW. Star queries retrieve aggregate information (e.g., based on standard SQL aggregate operators like SUM, COUNT etc) from measures stored in the fact table by applying selection conditions on joint dimension table columns, and they are extensively used as conceptual basis for more complex OLAP queries, which, in turn, are exploited to extract useful summarized knowledge from RDW for decision making purposes.
Figure 1. Logical schema of the data warehouse benchmark APB-1 release II
Unfortunately, evaluating OLAP queries over RDW typically demands for a high-performance that is difficult to ensure over large amounts of multidimensional data, even because such queries are usually complex in nature (BellatrecheF&ABoukhalfa, 2005). This complexity is mainly due to the presence of joins and aggregation operations over huge fact tables, which very often involve billions of tuples to be accessed and processed. In order to speed-up OLAP queries over RDW, several optimization approaches, mainly inherited from classical database technology, have been proposed in literature. Among others, we recall materialized views (Gupta, 1999), indexing (Sarawagi, 1997), data partitioning (Bellatreche et al., 2009), data compression (CuzzocreaF&ASerafino, 2009) etc. Despite this, it has been demonstrated that the sole use of these approaches singularly is not sufficient to gain efficiency during the evaluation of OLAP queries over RDW (Stöhr et al., 2000). As a consequence, in order to overcome limitations deriving from these techniques, high-performance in database technology, including RDW (Furtado, 2004; DeWitt et al., n.d.), has traditionally been achieved by means of parallel processing methodologies (ÖzsuF&AValduriez, 1999).