Methods for solving the mean query execution time minimization problemEuropean Journal of Operational Research

About

Authors
Marek Łatuszko, Radosław Pytlak
Year
2015
DOI
10.1016/j.ejor.2015.04.041
Subject
Information Systems and Management / Management Science and Operations Research / Modelling and Simulation

Text

Innovative Applications of O.R.

Accepted Manuscript

Methods for Solving the Mean Query Execution Time Minimization

Problem

Marek Łatuszko, Radosław Pytlak

PII: S0377-2217(15)00334-3

DOI: 10.1016/j.ejor.2015.04.041

Reference: EOR 12913

To appear in: European Journal of Operational Research

Received date: 18 June 2014

Revised date: 21 January 2015

Accepted date: 21 April 2015

Please cite this article as: Marek Łatuszko, Radosław Pytlak, Methods for Solving the Mean Query

Execution Time Minimization Problem, European Journal of Operational Research (2015), doi: 10.1016/j.ejor.2015.04.041

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T

Highlights • It is shown that greedy algorithms can solve efficiently the view selection problem. • Several algorithms for the view selection problem are comprehensively compared. • Improvements of used algorithms for the view selection problem are considered. • The proposed methods can be used to solve the largest problems considered so far. 1

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T

Methods for Solving the Mean Query Execution Time

Minimization Problem

Marek Latuszkoa,∗, Rados law Pytlaka aInstitute of Automatic Control and Robotics, Warsaw University of Technology, S´w.

Andrzeja Boboli 8, 02-525 Warsaw, Poland

Abstract

One of the most significant and common techniques to accelerate user queries in multidimensional databases is view materialization. The problem of choosing an appropriate part of data structure for materialization under limited resources is known as the view selection problem. In this paper, the problem of the mean query execution time minimization under limited storage space is studied. Different heuristics based on a greedy method are examined, proofs regarding their performance are presented, and modifications for them are proposed, which not only improve the solution cost but also shorten the running time. Additionally, the heuristics and a widely used Integer Programming solver are experimentally compared with respect to the running time and the cost of solution. What distinguishes this comparison is its comprehensiveness, which is obtained by the use of performance profiles.

Two computational effort reduction schemas, which significantly accelerate heuristics as well as optimal algorithms without increasing the value of the cost function, are also proposed. The presented experiments were done on a large dataset with special attention to the large problems, rarely considered in previous experiments. The main disadvantage of a greedy method indicated in literature was its long running time. The results of the conducted experiments show that the modification of the greedy algorithm together with the computational effort reduction schemas presented in this paper result in the method which finds a solution in short time, even for large lattices. ∗Corresponding author: Marek Latuszko. Mob. +48 693 096 006.

Email addresses: M.Latuszko@mchtr.pw.edu.pl (Marek Latuszko),

R.Pytlak@mchtr.pw.edu.pl (Rados law Pytlak)

Preprint submitted to European Journal of Operational Research April 25, 2015

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP

T

Keywords: Decision support systems, heuristics, OLAP, view materialization, view selection problem. 2010 MSC: 68P15, 68T20, 90C59, 90C10 1. Introduction

On-Line Analytical Processing (OLAP) is one of the most important technologies applied in modern decision support systems. OLAP provides users with ability to perform on-line, multidimensional analysis requiring the computation of many aggregating functions, frequently on large volumes of data. The query response time is considered as the main measure of system efficiency. Many methods are used to meet the performance demands, beginning from the base OLAP characteristic–the specialized multidimensional structure, through data indexing strategies, partitioning, or query optimizers. One of the common techniques is precomputing (materializing) the part of data cube aggregates. The user queries can retrieve data from prepared structures instead of making all calculations on the fly [7]. The problem of choosing part of data structure for materialization under limited resources is universally known as the view selection problem, or as the warehouse view selection problem when used in designing data warehouses. The solution to this problem is not as simple as materializing the cells which are most frequently requested by users queries, since cells are dependent on each other and some may be even not asked at all but their materialization will greatly facilitate calculation of other cells. In general case, the problem of selecting the right part of cube for materialization is NP-hard [14].

To illustrate the concept of using materialized views to answer queries let us consider an example of simple star schema database [22], which is derived from [26]. The schema consists of four dimension tables: CUSTOMER,

SUPPLIER, PART and DATE and one fact table: LINEORDER. For simplification, assume that every dimension table has only one attribute, called the same as dimension name and fact table has only one measure Revenue.

The database schema is presented in Figure 1 (a). For example, a user may be interested in revenue by customers and parts, which is expressed by the following SQL query:

SELECT C.Customer, P.Part, SUM(L.Revenue)

FROM dbo.LINEORDER L, dbo.CUSTOMER C, dbo.PART P

WHERE L.Customer=C.Customer and L.Part=P.Part

GROUP BY C.Customer, P.Part 3

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN

US

CR

IP