Parallel Query Support for Multidimensional Data

Intra-query parallelism is a well-established mechanism for achieving high performance in (object) relational database systems. However, the methods have yet not been applied to the upcoming field of multidimensional array databases. Specific properties of multidimensional array data require new parallel algorithms. A number of new techniques for parallelizing queries in multidimensional array database management systems. It discusses their implementation in the RasDaMan DBMS, the first DBMS for generic multidimensional array data. The efficiency of the techniques presented is demonstrated using typical queries on large   multidimensional data volumes.

Recently, integration of an application domain-independent and of a generic type constructor for such Multidimensional Discrete Data (MDD) into Database Management Systems (DBMS) has received growing attention. Current scientific contributions in this area mainly focus on MDD algebra and specialized storage architectures MDD objects may have a magnitude of several MB and much more and, compared to scalar values, operations on these values can be very complex, their efficient evaluation becomes a critical factor for the overall query response time. Beyond query optimization, parallel query processing is the most promising technique to speed up complex operations on large data volumes.

One of the outcomes of the predecessor project of ESTEDI (European Spatio-Temporal Data Infrastructure), called RasDaMan in which the Array DBMS RasDaMan has been developed, was the awareness that most queries on multidimensional array data are in fact CPU-bound. Therefore, one major research issue of the succeeding project ESTEDI is the parallel processing. Furthermore, ESTEDI, an initiative of European software vendors and supercomputing centers, will establish an European standard for the storage and retrieval of multidimensional high-performance computing (HPC) data. It addresses a main technical obstacle, the delivery bottleneck of large HPC results to the users, by augmenting high volume data generators with a flexible data management and extraction tool for multidimensional array data. Special properties of array data, e.g. the size of one single data object combined with expensive cell operations require adapted algorithms for parallel processing. Suitable concepts found in relational DBMS were implemented and evaluated in the RasDaMan Array DBMS.

 

Tags : , , , , , , , ,

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Leave Comment