Data Structures and Database Systems

3.1 Data Structures and Database Systems

The research work is concentrated on algorithms and data structures, especially emphasizing the performance of database and telecommunication systems.

The research has been financed by the Academy of Finland, TEKES and Nokia Telecommunications. The members of the group in 1996 were Professor Eljas Soisalon-Soininen (leader), Vesa Hirvisalo, Antti-Pekka Liedes, Hannu Mallat, Lauri Malmi, Esko Nuutila, Sirkku-Liisa Paajanen, Kenneth Oksanen, Kerttu Pollari-Malmi, Tatu Saloranta and Kengatharan Sivalingam.

Projects:

HiBase
Peak
Index Structures in Main Memory Databases

3.1.1 HiBase

In 1995 a three-year research project, the Shadows-project, was concluded at Helsinki University of Technology (HUT). The project studied and improved the shadow paging algorithm which is used in some database systems to provide crash recovery. Currently Shadows is being reimplemented commercially to be used as a basis for a transactional file system.

In 1996 a development project, called HiBase, was started as a joint effort mainly between HUT and Nokia Telecommunications. The aim of HiBase is to develop a persistent functional programming environment with challenging additional features required e.g. in telecom environments. These features include high performance, heterogeneous transaction load, complex dynamic database schemes, main-memory databases with real-time response, and very high availability ensured by replication and instantaneous takeover.

The research is lead by professor Eljas Soisalon-Soininen and the project team includes Jukka-Pekka Iivonen (NTC), Susanna Lehtinen (NTC), Antti-Pekka Liedes (HUT), Kenneth Oksanen (HUT), Sirkku-Liisa Paajanen (HUT), Kengatharan Sivalingam (HUT) and Lars Wirzenius (HUT). The project is funded by TEKES (40 %) and Nokia Telecommunications (NTC).

This note summarizes the work done at HUT.

Shades

Early in 1996 a new recovery algorithm reminiscent of shadow paging was invented at HUT. The new algorithm, baptized Shades, is particularly apt for versions (snapshots), small objects, and a main-memory environment.

Avid development of Shades has furnished it with other desirable features, such as a real-time compacting garbage collector with negligible memory overhead. This suggests using Shades as a run-time of a persistent programming language.

Depending on the update pattern, we have been able to achieve a sustained performance of 23.000 to 63.000 updates, or 15.000 TPC-B-like transactions per second with 50% memory utilization on a 90 MHz Pentium.

Except for some optimizations in garbage collection and IO, most of the implementation work on Shades has been done. Additional mechanisms for main-memory recovery and replication have been presented but not yet implemented.

Indexes

To date the following data structures have been implemented at HUT:

LISP-like lists.
Functional FIFO-queues.
Priority queues based on binomial trees.
Leaf-oriented AVL-trees.
Path-compressed four-way tries.
A functional string representation based on a variant of B+-trees.

A Persistent Functional Programming Language

The execution mechanism of the language is a bytecode interpreter. The run-time organization is based on continuation passing style (CPS), which is more expressive than that of e.g. Java.

Since the byte code interpreter's internal data structures are managed by Shades, running bytecode processes can be treated with the same tools as data. For example, processes are recoverable, and they can be automatically resumed after recovery. Migrating processes, e.g. for purposes of load balancing, is as easy as migrating data.

Currently the Shades database server can be interfaced to the outside world with TCP/IP connections, and in future also with OMG Interface Definition Language.

Future tasks include designing the programming language, developing a compiler, and writing an extensive set of auxiliary programming tools.

Administrativia

M.Sc. Kenneth Oksanen has acted as the chief surgeon under the auspices of prof. Eljas Soisalon-Soininen. Oksanen will write his Ph.D. thesis on Shades and the byte code interpreter, Sirkku-Liisa Paajanen writes her M.Sc. thesis on index structures in Shades, and Kengatharan Sivalingam is finishing his M.Sc. thesis on versioning and the relational data model implementation of the preceding Shadows project. Antti-Pekka Liedes has participated as a half-time employee in the project, and Lars Wirzenius joined the HUT team in the beginning of '97.

3.1.2 Peak

The Peak project is a joint venture of Helsinki University of Technology (HUT) and University of Vaasa (UV) and is lead by professor Eljas Soisalon-Soininen. HUT subproject team has four members: Vesa Hirvisalo, Hannu Mallat, Esko Nuutila, and Tatu Saloranta. UV subproject team has three members: Jarmo T. Alander, Timo Mantere, and Ghodrat Moghadampour. The project is funded by TEKES and the industrial partners of the project.

Performance is a major criterion in designing, implementing, and using software systems. In many applications, for instance, in embedded systems, the software has strict performance requirements. In almost all applications, good performance is an important factor in comparing different software systems.

We study methods for improving performance of large software systems. Our research concentrates on methods for performance evaluation and prediction. We are especially interested in large embedded systems, which have real-time requirements on their performance.

We have two lines of work: developing new methods for improving software performance and applying the methods in practice. The practical work consists of studying the software products and development methods of our industrial partners. The practical work guides us in developing execution architectures for high-performance software and methods for performance evaluation and prediction.

During the first project year (1996), our focus was on getting experience of using well-known performance analysis methods in practice. Developing new methods and tools was secondary. The HUT subproject concentrated on software execution modeling and analysis. The UV subproject concentrated on software workload analysis.

We studied the performance of large embedded real-time software systems with our industrial partners. In workload analysis, we used genetic algorithms to search the input space and find out the inputs that yield bad performance. In execution analysis, we used execution graphs to model the software and queueing networks to model the execution environment.

To improve the performance of a software system we tried to understand, which inputs are hard and how the execution proceeds when the inputs are handled. Further, we tried to understand, why the inputs are hard and why the execution is designed to proceed as it does. Without such understanding it is hard to decide, which features of the system operation are critical and they could be improved.

In this practical work, we learned two lessons. Firstly, a single method or tools is adequate for analyzing and improving the performance of large software systems. Secondly, finding and managing the information needed in the performance evaluation is difficult. It seems that these are common problems in performance evaluations of complicated software systems.

Based on the practical experience gained with our industrial partners, we have started to develop tools for software performance modeling and evaluation. The main idea is to make an integrated set of tools that supports the co-operation of many methods.

One problem that we try to solve is how to effectively combine all information that is produced in different phases of a performance study. To solve the problem we have build a prototype of a tool set kernel that collects and stores the information that the different tools produce and use.

3.1.3 Index Structures in Main Memory Databases

This project concentrates on designing and evaluating new tree structures and algorithms for index structures in main memory. The project has one full time researcher, Lauri Malmi, in HUT.

In 1996, the project has concentrated on two topics. First, we have designed, analyzed and implemented algorithms for performing group operations in balanced binary search trees. This means that a large number of searches, insertions and/or deletions are performed in the tree at a time and thereafter the tree is rebalanced as a whole. This method applies well to relaxed balanced trees which contain local information about the balance status of the tree.

Second, an environment for performing run time tests on various search and update algorithms in trees has been implemented. This continues the previous work where we investigated how the layout of a tree in a hierarchical memory structure affects the performance of tree algorithms.

[Helsinki University of Technology] [Laboratory of Information Processing Science] [Annual Report 1996]