HUT HUT - SWT-lab - Research - Groups - DS2G

Data Structures and Database Systems Group (DS2G)

Main page

Staff

Projects

Publications

Database research

The focus of this research area is in developing algorithms for different topic of database management. One such topic is concurrency control and recovery of database transactions. This is a traditional area of research, but there are still many interesting problems to be solved. This is especially the case when the necessary search structure operations are included in the transaction model. The classical theory of concurrency control and recovery defines transactions as strings of abstract read and write operations. In our new model, the logical database is assumed to consist of records with unique keys taken from a totally ordered domain. The database operations include operations of the forms "fetch the first or next matching record", "insert a record", or "delete a record". Recently, we (I.Jaluta, S.Sippu, E.Soisalon-Soininen) have shown in a paper published in The VLDB Journal how to implement efficient concurrency control and recovery within this model, when the underlying index structure is a B-link tree. We present a new technique to implement a small structure modification, such as a page spit or a page merge as an atomic action, so that interrupted structure modifications are never rolled back (undone). This is in contrast to the previous algorithms, in which an interrupted tree-structure modification always has to be rolled back during restart recovery. Our transaction model has been extended to page-server database systems in an article published in the ACM Transactions on Database Systems. Future research includes further development of the theory, performance studies of the already proposed methods, implementation of the model in the case of multidimensional search structures such as R-trees, as well as introducing bulk updates as new basic operations applied to the index structure. Preliminary results about bulk operations (large amounts of search, insert, or delete operations performed at the same time) will be presented in 2007 at the ICDE (International Conference on Data Engineering) conference, which is a practically oriented top-level conference for database research.

As a new research topic, algorithmic problems related to data stream management will be considered. The focus is especially on problems that are relevant to document filtering and content-based routing. The idea in filtering is that the user defines her interests or profile, and the data producers transmit to the user those documents that match the user's profile. Especially filters defined by regular expressions will be considered. Then the problem is to find for a given document or string x those regular expressions in the filter that match x, i.e. those expressions whose language contains x. A preliminary article in this area has been published by Eljas Soisalon Soininen and Tatu Ylönen, but it seems that results obtained thus far can be considerably improved.

Searching for optimal algorithms

In some areas of algorithms research the algorithm designer's work can be successfully mechanized by modeling it as a two player game: the algorithm designer (or "oracle") makes a "move" - the choice of the next computational step - with the intent to find the result as efficiently as possible, and the imaginary input-player (or "adversary") counters with another "move" - the result of that computational step - attempting to postpone the solution as late possible. Suitable areas for such mechanized treatment are, for example, the selection problem (finding the i'th largest of n input values) and its many variations, and in finding distributed concurrency control algorithms for given legal schedules. So far we have found improvements to previously known upper or lower bounds for sixteen different selection problem instances, a counter-example to a previously published hypothesis of the exact complexity of selection, and confirmation to a theoretical hypothesis (Yao's hypothesis) up to a certain limit. In addition to results in the problem domain this work has lead us to find several interesting optimizations that speed up the search process and prune irrelevant or isomorphic parts of the search space.


This page is maintained by webmaster, Email: webmaster@cs.hut.fi
URL: http://www.cs.hut.fi/Research/DS2G/index.html