研讨会中国人民大学数据库研究组 WAMDM

WAMDM PhD Seminar

2008-01-08 Scalable Trigger Processing by Xiao Pan
Abstract: Current database trigger systems have extremely limited scalability. A key observation is that if a very large number of triggers are created, many will have the same structure, except for the appearance of different constant values. This paper proposes a way to develop a truly scalable trigger system.
2008-01-08 DynaMat: A Dynamic View Management System for Data Warehouses by Da Zhou
Abstract: In this paper we present DynaMat, a system that dynamically materializes information at multiple levels of granularity in order to match the demand(workload) but also takes into account the maintenance restrictions for the warehouse, such as down time to update the views and space availability. DynaMat unifies the view selection and the viewmaintenance problems under a single framework using a novel “goodness” measure for the materialized views. DynaMat constantly monitors incoming queries and materializes the best set of views subject to the space constraints. During updates, DynaMat reconciles the current materialized view selection and refreshes the most beneficial subset of it within a given maintenance window.

2008

2008

2008.01.08 Venue: FL1, Meeting Room, Information Building
Xiao Pan (Mobile Group)	Scalable Trigger Processing [pdf] Abstract: Current database trigger systems have extremely limited scalability. A key observation is that if a very large number of triggers are created, many will have the same structure, except for the appearance of different constant values. This paper proposes a way to develop a truly scalable trigger system.
Da Zhou (Mobile Group)	DynaMat: A Dynamic View Management System for Data Warehouses [pdf] Abstract: In this paper we present DynaMat, a system that dynamically materializes information at multiple levels of granularity in order to match the demand(workload) but also takes into account the maintenance restrictions for the warehouse, such as down time to update the views and space availability. DynaMat unifies the view selection and the viewmaintenance problems under a single framework using a novel “goodness” measure for the materialized views. DynaMat constantly monitors incoming queries and materializes the best set of views subject to the space constraints. During updates, DynaMat reconciles the current materialized view selection and refreshes the most beneficial subset of it within a given maintenance window.

2007

2007.12.25 Venue: FL1, Meeting Room, Information Building
Qiong Wu (Web Group)	Applying Model Management to Classical Meta Data Problems [pdf] Abstract: Model management is a new approach to meta data management that offers a higher level programming interface .The main abstracts are models and mappings between models.It treats these abstractions as bulk objects and offers such operations as Match,Merge,Diff,Compose,Apply,and ModelGen.
Yukun Li (Web Group)	A RISC Machine Sort [pdf] Abstract: A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using Alpha AXP processors, commodi~ memory, and arrays of SCSI disks, AlphaSort runs the industry-standard sort benchmark in seven seconds. This beats the best published record on a 32-cpu 32-disk Hypercube by 8

2007.12.18 Venue: FL1, Meeting Room, Information Building
Xian Tang (web Group)	Generalized Search Trees for Database Systems [pdf] Abstract: This paper introduces the Generalized Search Tree (GiST), an index structure supporting an extensible set of queries and data types. The GiST allows new data types to be indexed in a manner supporting queries natural to the types; this is in contrast to previous work on tree extensibility which only supported the traditional set of equality and range predicates. In a single data structure, the GiST provides all the basic search tree logic required by a database system, thereby unifying disparate structures such as B+-trees and R-trees in a single piece of code, and opening the application of search trees to general extensibility.
Chunjie Zhou (Mobile Group)	R-TREE: A DYNAMIC INDEX STRUCTURE FOR SPATIAL SEARCHING [pdf] Abstract: In order to handle spatial data efficiently, this paper describes a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well.

2007.12.11 Venue: FL1, Meeting Room, Information Building
Xiao Pan (Mobile Group)	Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals [pdf] Abstract: Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensioanl generalization of these operators. This paper defines that operator, called the data cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.
Da Zhou (Mobile Group)	The Dangers of Replication and a Solution [pdf] Abstract: Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up

2007.12.04 Venue: FL1, Meeting Room, Information Building
Yukun Li (Web Group)	AutoAdmin "What-if" Index Analysis Utility [pdf] Abstract: As databases get widely deployed, it becomes increasingly important to reduce the overhead of database administration. An important aspect of data administration that critically influences performance is the ability to select indexes for a database. Furthermore, the DBA should have the ability to propose hypothetical (“what-if”) indexes and quantitatively analyze their impact on performance of the system. This paper describes a novel index analysis utility and the interfaces exposed by this utility. The authors also discuss the implementation techniques for efficiently supporting “what-if” indexes.
Qiong Wu (Web group)	An Array-Based Algorithm for Simultaneous Multidimensional Aggregates [pdf] Abstract: In this paper, we present a MOLAP algorithm to compute the Cube, and compare it to a leading ROLAP afgorithm. The comparison between the two is interesting, since although they are computing the same function, one is value-based (the ROLAP algorithm) whereas the other is position-based (the MOLAP algorithm.) Our tests show that, given appropriate compression techniques, the MOLAP algorithm is significantly faster than the ROLAP algorithm. In fact, the difference is so pronounced that this MOLAP algorithm may be usefuf for ROLAP systems as wefl as MOLAP systems, since in many cases, instead of cubing a table directly, it is faster to fist convert the table to an array, cube the array, then convert the result back to a table.

2007.11.27 Venue: FL1, Meeting Room, Information Building
Chunjie Zhou (Mobile Group)	The RStar-tree: An Efficient and Robust Access Method for Points and Rectangles [pdf] Abstract: The R-tree is based on the heuristic optimization of the area of the enclosing rectangle in each inner node,while the RStar-tree is a combiner optimization of area, margin and overlap of each enclosing rectangle in the directory. Using the standard testbed, it turned out that the RStar-tree outperforms the R-tree.
Xian Tang (web Group)	BIRCH:An Efficient Data Clustering Method for Very Large Databases [pdf] Abstract: Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multidimensional dataset.Prior work does not adequately address the problem of large datasets and minimization of I/O costs. This paper presents a data clustering method named BIRCH , and demonstrates that it is especially suitable for very large databases BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources. BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans. BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" effectively.

2007.11.20 Venue: FL1, Meeting Room, Information Building
Yukun Li (Web Group)	NiagaraCQ: A Scalable Continuous Query System for Internet Databases [pdf] Abstract: While continuous query systems can transform a passive web into an active environment, they need to be able to support millions of queries due to the scale of the Internet. NiagaraCQ addresses this problem by grouping continuous queries based on the observation that many web queries share similar structures. This paper also presents the design of NiagaraCQ system and gives some experimental results on the system’s performance and scalability.
Xiao Pan (Mobile Group)	Efficient Locking for Concurrent Operations on B-tress [pdf] Abstract: The B-tree and its variants have been found to be highly useful (both theoretically and in practice) for storing large amounts ofinformation, especially on secondary storage devices. We examine the problem of overcoming the inherent difficulty of concurrent operations on such structures, using a practical storage model. A single additional “link” pointer in each node allows a process to easily recover from tree modifications performed by other concurrent processes. Our solution compares favorably with earlier solutions in that the locking scheme is simpler (no read-locks are used) and only a (small) constant number of nodes are locked by any update process at any given time. An informal correctness proof for our system is given,
Da Zhou (Mobile Group)	What Goes Around Comes Around [pdf] Abstract: This paper provides a summary of 35 years of data model proposals, grouped into 9 different eras. We discuss the proposals of each era, and show that there are only a few basic data modeling ideas, and most have been around a long time. Later proposals inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a worthwhile exercise to study previous proposals.

2007.11.13 Venue: FL1, Meeting Room, Information Building
Chunjie Zhou (Mobile Group)	Social Join Processing in Database Systems with Large Main Memories [pdf] Abstract: This paper introduces four kinds of join algorithms
Qiong Wu (Web group)	Access path Selection in a Relational Database Management System [pdf] Abstract: In a high level query and data manipulation language such as SQL,requests are stated non-procedurally,without reference to access paths for both simple and complex queries,given a user specification of desired data as a boolean expression of predicates.
Xian Tang (web Group)	Parallel Database Systems: The Future of High Performance Database Processing [pdf] Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems.

2007.11.07 Venue: FL1, Meeting Room, Information Building
Yukun Li (Web Group)	Anatomy of a Database System [pdf] Abstract: This paper is a survey on Database System. In this paper, we attempt to capture the main architectural aspects of modern database systems, with a discussion of advanced topics. Some of these appear in the literature, and we provide references where appropriate. Other issues are buried in product manuals, and some are simply part of the oral tradition of the community. Our goal here is not to glory in the implementation details of specific components. Instead, we focus on overall system design, and stress issues not typically discussed in textbooks. For cognoscenti, this paper should be entirely familiar, perhaps even simplistic. However, our hope is that for many readers this paper will provide useful context for the algorithms and techniques in the standard literature.
Xiao Pan (Mobile Group)	Anatomy of a Database System [pdf] Abstract: The early DBMSs are among the most influential software systems in computer science. Unfortunately, many of the architectural innovations implemented in high-end database systems are regularly reinvented both in academia and in other areas of the software industry.In this paper, we attempt to capture the main architectural aspects of modern database systems, with a discussion of advanced topics.
Da Zhou (Mobile Group)	What Goes Around Comes Around [pdf] Abstract: This paper provides a summary of 35 years of data model proposals, grouped into 9 different eras. We discuss the proposals of each era, and show that there are only a few basic data modeling ideas, and most have been around a long time. Later proposals inevitably bear a strong resemblance to certain earlier proposals. Hence, it is a worthwhile exercise to study previous proposals.

Maintained by WAMDM Administrator()

Zhongyuan's Website