The Chair of Computer Science 6 (Data Management) was founded in 1979 by appointing Prof. Dr. Hartmut Wedekind. After his retirement, Prof. Dr. Klaus Meyer-Wegener was appointed in 2001 as the new head. Since April 2007, the associated professorship is staffed with Prof. Dr. Richard Lenz.
The chair is engaged in research on the foundations of data management and the application-driven deployment of data-management technologies. We strive to immediately transfer our research results and developed concepts into industrial practice in the context of projects with partners from economy and public services. Both research and project activities together are important cornerstones of student education.
Database systems as the most important technology of data management have reached large impact in all areas of economy and administration. The growing need to integrate different database systems and the increasing request for efficient support of inter-system and inter-organizational business processes motivate application-oriented research in evolutionary information systems and data quality. As far as fundamental research is concerned, the chair works on technologies that support scalability and modularity of database systems as well as their functional enhancement with data-stream processing.
The chair offers each semester the mandatory course on "Conceptual Modeling" and the mandatory course on "Implementation of Database Systems," both in the Bachelor program in Computer Science. In the Master program in Computer Science, the chair provides a specialization in "Database Systems" and contributes to the specialization in "Media Informatics." In addition to that, the chair participates intensively in the delivery of courses to other programs, namely International Information Systems, Mechanical Engineering and Engineering and Business.
Focus of research
1. Application Integration and Evolutionary Information Systems
Database systems also play a major role in application integration. The kernel of each integration project is the data integration, which requires on one hand the semantic mapping and on the other the inter-system synchronization. Data must be exchanged and kept consistent among applications. Here, the semantic integration of data types and instances requires a substantial manual effort. It is a must to search for methods and technologies that minimize this effort. An important constraint is the fact that commercial information systems undergo permanent change. In the context of research on evolutionary information systems, the chair deals with the issue of how the effort for demand-driven system evolution can be minimized and how organizational learning can be supported.
In the MedITalk project, an ERP system for networks of medical offices is being developed (in the context of the leading-edge cluster on medical technology). From a research point of view, the evolvability is of prime interest. Autonomous medical-office systems and other data sources are the starting point, and they must remain in operation as they are. The system being developed enables cooperation without a complete integration in advance. Instead, new methods and technologies are employed to allow a demand-driven soft migration towards systems parts cooperating better and better.
The project ProHTA (Prospective Health-Technology Assessment) was an interdisciplinary project in the leading-edge cluster on medical technology. The project ended in 2015. It addressed an early technology impact assessment in the public-health sector. From the data-management point of view, the development of methods and technologies for dealing with a strongly varying and dynamically growing information demand was of prime interest. In addition to that, issues of data quality played a significant role in this project. Especially the requirement to measure the quality of simulation results and furthermore to control them with a goal-oriented data management defined a great challenge.
The project on Speechact-based Case Management adresses the increasing need to support knowledge-intensive processes that cannot be modelled in advance. We investigate how the explicit classification of interactions as speechacts can be used to support such processes or integrate them better with explicitly modelled work schedules.
One of the main drivers of change in enterprises is the increased need to rapidly react on new trends and developments. However, the traditional enterprise data warehouse is not sufficient to support short-term decisions. It must be complemented with external data sources. In the project DM4DS (Data Managment for Data Science), we investigate how a data scientist can be supported in identifying relevant data sources and integrating them with an existing enterprise data warehouse.
2. Data Quality
Database systems today offer only limited support for the preservation of data quality. To cross the borders of individual database systems in guaranteeing high data quality in information systems, new methods and tools are required to support an encompassing Data-Quality Management in an appropriate way.
TDQMed was a BMBF-funded project in the leading-edge cluster on medical technology. It ended in 2015. Its goal was the analysis and optimization of test-data quality in the development of medical modalities. This included the investigation of the specific quality measures required for test data (e.g. closeness to reality) and of the automatic generation of high-value test data.
The already mentioned project medITalk also has to cope with the quality of the data delivered by the medical offices. It has developed methods to forecast future deliveries in order to evaluate the completeness of the actual deliveries once they arrive.
3. Database and Data-stream Systems
Database systems allow the efficient management of structured data. However, in dealing with time-stamped, incrementally arriving records that must be analysed as soon as possible they reach their limits. Such data streams are created in a growing number of scenarios (e.g. in sensor networks) and it is required to extract relevant events from them. Data-stream management systems (DSMS) use techniques known from DBMS - most prominently the declarative programming inherent to queries - in order to address this issue.
The project DSAM (Data-Stream Application Manager) strives for the linkage of heterogeneous DSMS in order to exploit the strengths of each individual system. For that purpose, cost models have been developed for queries on data streams, which are then used in the optimization of data-stream processing. Furthermore, a detailled semantical analysis of DSMS has been done in order to capture their differences and to use them in the distribution of queries. In the context of the DFG Research Unit 1508 BATS, this is evaluated and advanced using the given example of bats observation.
The project "Know Your Queries" is aimed at storing and analysing database queries for different purposes. The DSAM system already maps queries on different DSMS; this can be generalized further. In addition, query-driven integration of different data sources is increasingly important: an a-priori integration of different data sources requires a tremendous effort - if the query is known, this effort can be reduced to the amount that is actually needed for that query. This principle has already been used in the medITalk project and should be generalized now. Also the new Project DM4DS is based on the idea of demand-driven integration.
Another project is aimed at the assessment and classification of different types of new storage systems, that are currently popular under the term "NoSQL." These systems typically provide only reduced functionality, but offer more scalability and/or fault tolerance than traditional database systems. This, however, is only a rough and imprecise characterization. In this project we try to describe the differences and specific strengths more precisely in order to support the descision which system to use for which purpose. We start by analysing the few available benchmarks for NoSQL systems.