WAMDM Seminar  
  • 2012-04-13 Phase Change Memory Aware Data Management and Application by Jiangtao Wang
  • Abstract: Phase change memory(PCM) is an emerging memory technology which appear some outstanding features of storage and memory. It is highly effective to integrate PCM into the memory/storage hierarchy on data management and application. We discussed two kinds of way to improve the performance of DBMS ,which are using PCM as main memory and auxiliary memory,respectively. Due to the inherent characteristics of phase change memory which includes asymmetry read/write latency and limited write endurance ,these strategies provided PCM-friendly data structures and algorithms to enhance the availability and reliability of PCM.
  • 2012-04-13 Storage Class Memory: Technology Overview and System Impact by Zhichao Liang
  • Abstract: Storage Class Memory (SCM) is IBM's term for a new class of data storage and memory devices. SCM enjoy some special features such as solid state, short access time(within an order-of-magnitude of DRAM), low cost per bit(DISK like) and non-volatile(~10 years). SCM blurs the distinction between main memory and storage, hence it brings huge impact on the design of database system. This report gives an overview of SCM technology and an introduction of phase change memory, a typical SCM device. Furthermore, the reconsideration of the database system design based-on SCM is dicussed in this report.
    2012
     2012.04.20  Topic: DSFAA Report
    DASFAA
    Participants

    DSFAA Report  
    Abstract:
    DASFAA participants: Jinzeng Zhang, Yingjie Shi, Zheng Huo, Qingling Cao share their experience and report involved sessions separately.

     2012.04.13  Topic: PCM
     (Flash Group) Phase Change Memory Aware Data Management and Application [pptx]
    Abstract:
    Phase change memory(PCM) is an emerging memory technology which appear some outstanding features of storage and memory. It is highly effective to integrate PCM into the memory/storage hierarchy on data management and application. We discussed two kinds of way to improve the performance of DBMS ,which are using PCM as main memory and auxiliary memory,respectively. Due to the inherent characteristics of phase change memory which includes asymmetry read/write latency and limited write endurance ,these strategies provided PCM-friendly data structures and algorithms to enhance the availability and reliability of PCM.
     (Flash Group) Storage Class Memory: Technology Overview and System Impact [pdf]
    Abstract:
    Storage Class Memory (SCM) is IBM's term for a new class of data storage and memory devices. SCM enjoy some special features such as solid state, short access time(within an order-of-magnitude of DRAM), low cost per bit(DISK like) and non-volatile(~10 years). SCM blurs the distinction between main memory and storage, hence it brings huge impact on the design of database system. This report gives an overview of SCM technology and an introduction of phase change memory, a typical SCM device. Furthermore, the reconsideration of the database system design based-on SCM is dicussed in this report.
     2012.04.06  Topic: DSFAA Pre-Report
    DASFAA
    Participants

    DSFAA Pre-Report  
    Abstract:
    DASFAA participants: Jinzeng Zhang, Yingjie Shi, Zheng Huo, Qingling Cao share their pre-report separately.


     2012.03.30  Topic: Flash & Architecture
     (Flash Group) Flash Devices Aware RAID 
    Abstract:
    More and more properties of solid state disk(SSD) have been explored by researchers and industries, such as internal parallelism, but there existed some problems in SSD and applications built on SSD. This report presents the integration of RAID and SSD from three sides: 1, intra-SSD RAID; 2, inter-SSD RAID; 3, inter-SSD&HDD RAID.
     (Flash Group) Flash Memory Aware Software Architectures and Applications 
    Abstract:
    Flash memory has been widely used in laptop and enterprise applications. In these situations, most system needs to provide high throughput and low latency performance for storage. So flash memory become the best choice as a non-volatile cache between RAM and hard disk. In this slides, we present two kinds of system designs called FlashStore and SkimpyStash.
     2012.03.23  Topic: Cloud & RDF
     (Cloud Group) Scalable RDF Store Based on HBase and MapReduce [pptx]
    Abstract:
    With development of the RDF dataset , it becomes too scalable to store based on the traditional RDBMS and conventional RDF storage structures can not satisfy the store and the query needs .So it urge to put forward a kind of high efficient storage schema and query processing.
     (Cloud Group) Jena-HBase: A Distributed, Scalable and Efficient RDF Triple Store 
    Abstract:
    Traditioanlly, the way of storing RDF triples is to store them in single machine. However, as the Big Data emerges, scalability becomes one of the most important features in storing RDF. In this paper, the author introduces Jena-HBase, an efficient and scalable RDF triple store to solve this problem.
     2012.03.16  Topic: Introduction to WSDM2012
     (Web Group) An Overview of WSDM2012 
    Abstract:
    Analyse the current hot research issues based on the accessed papers of WSDM2012, and introduce three papers related with social network.
     (Web Group) An Overview of WSDM2012 II
    Abstract:
    Introduce two papers from WSDM2012 related with social network.
     2012.03.11  Topic: Introduction to XLDB2011
     (Cloud Group) Introduction to XLDB 
    Abstract:
    A brief introduction to XLDB and focus on XLDB 2011.
     (Cloud Group) Facebook Data Freeway [pptx]
    Abstract:
    We introduced the system achitecture of facebook's data freeway used for log anlysis. Facebook uses scribe for log collection and Calligphus is used for label the catergory of the logs and stored them into HDFS,Puma copies log line from storage system with Ptail and do aggreation operation and flush the aggregation results into HBase periodly.
     2012.03.02  Topic: Introduction to Linked Data
     (Web Group) Linked Data - The Story So Far 
    Abstract:
    We introduced linked data and its research issues in this talk, including the foundation concepts of linked data, guidelines for publishing linked data on the Web and some applications based on linked data. We also presented Linking Open Data Project, which is a grassrot effort to publish open licence data on the web as linked data. We summarized this presentation with some research directions of linked data.
     (Web Group) Introduction to RDF--Resource Description Framework 
    Abstract:
    RDF is the data format for linked data. RDF is g general data format, and provides a resource description framework. Therefore, it can be used for descripting anything in the world. In this report, we introduce RDF in six aspects, that is RDF'background, what is RDF, RDF's syntax, RDF's schema, RDF's application and query language.

     2012.01.08  Topic: Inside and Outside SSD
     (FlashGroup) Trading Flash Translation Layer For Performance and Lifetime [pptx]
    Abstract:
    The Flash Translation Layer is a software built on raw flash memory that carries out translation mapping,garbage collection and wear leveling strategies. Address mapping performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash.Wear leveling methods can enhence wear evenness and improve the lifespan of flash memory.
     (Flash Group) Performance of SSD 
    Abstract:
    we know some characteristics of SSD from lots of papers,but we do not find them from testing.Therefor we conduct some experiments on SSD. We test on 6 SSD and collect the data:IOps,MBps,and average response time.After analysis,we get some common characteristics of SSD from the test,and we also discover others different and strange results.
     (Web Group) TextDigger: Recovering Themes of Textual Documents 
    Abstract:
    This report introduces a new method for keyphrase extraction. This method is graph-based and can overcome the vocabulary gap problem.

    2011
     2011.12.31  Topic: Primary Exploring of Differential Privacy II
     (Web Group) Graphical Query Optimization of Degree Sequence under Differential Privacy 
    Abstract:
    Many algorithms on privacy preserving of degree sequence have been proposed in social networks and graph-structured datasets. However, those works all focus on some special attacks and cannot provide rigorous preservation. In this paper, a new problem of protecting degree sequence based on differential privacy is proposed. differential privacy can strongly avoid the disclosure of degree sequence and still answer analysts' query. However, the error of query result is large as well as the utility is low due to noise perturbation associated with real answer. For balancing privacy and utility, an effective and graphical inference technique is proposed. Based on the proposed inferring technique, and efficient algorithm GQODS is presented for this new problem. It has been theoretically proven that the novel inferring technique and the proposed algorithm are correct.
     (Web Group) Data Mining under Differential Privacy 
    Abstract:
    Differential privacy is new and powerful privacy requirements. If a algorithm satisfies differential privacy, then it can ensure that the adversary cannot get any individual information. I introduced two papers for data mining under differential privacy.
     2011.12.24  Topic: Series Reports on Flying Elephant in the Cloud I
     (Cloud Group) Update Efficient Indexing of Massive IoT Data in the Cloud 
    Abstract:
    Because the high update frequency and large scale volume of the IoT data, the traditional DBMS techniques come into troubles with the scalability and can not deal with high insert throughput, so we want to exploit how to management the IoT data efficient in the cloud environment. In this report, we mainly analysed the characteristics of the IoT data, the shortcomings of the existing cloud data management system and corresponding index solutions, and we proposed a new index framework in the cloud environment that can support high insert throughput and efficient multi-dimensional range query.
     (Cloud Group) Hadoop in SIGMOD 2011 [ppt]
    Abstract:
    In order to show the state of the art in hadoop,we introduce some papers in sigmod 2011.
     2011.12.17  Topic: Series Reports on Flying Elephant in the Cloud: the Amazing MapReduce World
     (Cloud Group) Online Aggregation over MapReduce 
    Abstract:
    With the development of cloud computing, OLA(online aggregation) which is introduced in 1997 has retained interests in nowadays.In this report, we discussed the challenges of implmenting OLA in the cloud, and tried to propose an initial solution.
     (Cloud Group) Introducion and Application of MapReduce 
    Abstract:
    MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers. Computational processing can occur on data stored either in a filesystem (unstructured) or in a database (structured). Nowaday, more and more application dealing with big data start to use mapreduce to solve problems.
     2011.12.10  Topic: Series Reports on Mobile Computing and Social Network: Spears vs. Shields
     (Mobile Group) Location Privacy in Geo-Social Networks 
    Abstract:
    With the booming of social networks and smartphones, Geo-Social networks have been drawing more and more attention of the public. However, Geo-information presents new challenges for privacy preservation. This report made a close analysis of location privacy in Geo-Social networks and introduced possible solutions.
     (Mobile Group) Feel Free to Check-in: Privacy-preserving against Hidden Location Inference Attack in Geo-Social Networks 
    Abstract:
    With the development of Geo-social network/mobile social network, location privacy is one of the most concerns for users. We analyze characteristics of Geo-SN and hidden location inference attacks, then we show a basic method of location privacy-preserving against hidden location inference attack.
     2011.12.03  Topic: Series Reports on Mobile Computing and Social Network: When New Meets Old
     (Mobile Group) Privacy-Preserving Spatial Keyword Search over Encrypted Cloud Data 
    Abstract:
    With the development of cloud computing, many companies outsource their databases to cloud in order to cut down financial and technical cost of data management. Cloud manages those databases and provides services to query users. However, cloud is a potential attacker, so it's important to address the issue of data privacy and query privacy leakage. Our work is to encrypt databases as well as queries in order to protect their privacy, and to design a proper query processing technique so that cloud could correctly process spatial keyword queries withoud decrypt databases and queries.
     (Mobile Group) Virtual towards Reality-Exploration and Analysis of Geo-social Network 
    Abstract:
    Geo-social network is a type of social networking in which geographic capabilities are used to enable additional social dynamics. It bridges the gap between the virtual and physical worlds. This talk includes three parts. First, we give an introduction to geo-social network. Next, the existing research works is analyzed form the following perspectives: mining and recommendation of location and friends, friends locater and trajectory query. Finally, the changeling works in the next step is presented.

     2011.11.26  Topic: XML Database
     (XML Group) New Version Of OrientX 
    Abstract:
    Recently lots of IT developers are keen on XML DB domestic and overseas.There are hundreds of companies busy studying on commerial non-structured databases,in the meantime we can see that it is great important to develop our own xml databse.And OrientX is developed by WAMDM,is a representive of...
     (XML Group) Labeling Schemes in XML Databases 
    Abstract:
    When ID/IDF is considered, XML data must be modeled as a graph not a tree. So, when we process a query in XML database,it is more difficult to juadge the AD relationship between nodes. To deal with this problem, lots of labeling shemes are proposed for XML data. In this presentation, I introduces some labeling shemes for graph-structured XML data.
     (XML Group) XML Database Testing [pptx]
    Abstract:
    Using about 1000 cases to test the XML databases,by analysing the resulting data we can find the comparative performance of different XML database.
     2011.11.19  Topic: Topic Detection and Tracking(TDT)
     (Web Group) Event Detection in Microblog 
    Abstract:
    Event is refered to something happened at a specific time and location. Not only do the real-time distrubuted characteristics of posts in microblog provide a guarantee for event detection, but also they bring many challenges. This report introduces the challenges of event detection in microblog,related works and some improved ideas.
     (Web Group) Topic Detection and Tracking - Review and Challenges 
    Abstract:
    Topic Detection and Tracking research mainly focuses on discovering and threading together topically related materials in streams of data such as newswire and radio transcripts. We introduced tasks and research directions in Topic Detection and Tracking and presented some works on New Event Detection and Topic Tracking tasks. Finally, we proposed unsolved problems and challenges for Topic Detection and Tracking.
     2011.11.12  Topic: the Perfect Match: Log-Structure & SSD
     (Flash Group) Some key-value stores using log-structure [pptx]
    Abstract:
    The concept of log-structure was first introduced in log-structure file system, which is a file system design first proposed in 1988 by John K. Ousterhout and Fred Douglis. Nowadays, some key-value stores using log-structure, including Riak, RethinkDB and LevelDB, emerge with different log-structure implementations in many industrial applications.
     (Flash Group) Flash and SSD 
    Abstract:
    Flash, with its excellent characteristics, has been widely used in the mobile and embedded fields. This report mainly describes knowledges about flash memory and SSD(Solid State Disk), including the classification , performance, limitations and trends of flash memory, SSDs' architecture and interface types; In addition, this report also introduces some recent test results on our SSDs.
     (Flash Group) Optimizations of Column-Store and Adaption for SSD 
    Abstract:
    In column-stores there are usually three main optimizations, namely compression, block iteration and late materialization. But compression play a most important role. It can improve the performance of column-store by an order of magnitude. Concidering these features of column-store, it can get much more improvement on flash. But flash has its unique features, so column storage should make some change to fit for flash.

     2011.10.29  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Internet of Things and Cloud Computing [ppt]
    Abstract:
    Since IBM made the concept of "Smarter Planet" in 2008, the Internet of Things(IOT) are getting more and more attention. In general, the basic structure of IOT is divided into three layers: the RFID, sensor networks compose of the perception layer; Internet, Wifi, 3G and other networks form the network layer; In addition, the application for the various social needs construct the application layer. The cloud computing, which is the key technology in the chain of IOT, will be an important cornerstone of the development of IOT.
     (Cloud Group) Introduction to linux 
    Abstract:
    Mainly talked about some basic frequently-used commands and software and some skills or experience in using them to do test.
     2011.10.21  Venue: FL1, Meeting Room, Information Building
     (Web Group) Personalized Privacy Protection in Social Networks [pptx]
    Abstract:
    Due to the popularity of social networks, many proposals have been proposed to protect the privacy of the networks. All these works assume that the attacks use the same background knowledge. However, in practice, different users have different privacy protect requirements. Thus, assuming the attacks with the same background knowledge does not meet the personalized privacy requirements, meanwhile, it looses the chance to achieve better utility by taking advantage of differences of users' privacy requirements. In this paper, we introduce a framework which provides privacy preserving services based on the user's personal privacy requests.
     2011.10.14  Venue: FL1, Meeting Room, Information Building
     (Flash Group) Flash-based Storage System Supporting Range Query 
    Abstract:
    Because differences between hard disk and SSD, especially performance of random write, SSD adopt out-of-place update but hard disk use in-place update. Flash-based storage model includes PAX, IPL and Append-only. Though PAX have high performance of querying but not considering update operations. IPL and Append-only have high performance of update opreations but not considering quering processing, especially range querying. So we proposed block-page storage management and in-memory B+-tree index.

     2011.09.24  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Index for Cloud Data Management [ppt]
    Abstract:
    Cloud Data Management Systems have attracted more and more attentions because of its high scalability, high availability, while up to now, they only provide efficient query on rowkey, and can not support efficient query on non-rowkey and multi-dimensional query. In this report we did a survey about the index techniques about Cloud Data Management and analysed the Pros and Cons of them, finally point the future work.
     (Web Group) An Introduction of Big Data [ppt]
    Abstract:
    Recently, many enterprises and research domains begain to focus on Big data. This seminar introduces Big data from the view of definition, framework, application, and challenges respectively. Since Big data differs from large-scale data (massive data), new computing models, algorithms and storage strategies must be provided and designed. In this seminar, we mainly present three models for computing Big data, which are random sampling model, data streaming model, and sketching model.

     2011.06.24  Venue: FL1, Meeting Room, Information Building
     (Web Group) Privacy Scores Computing and Trust Predicating in Online Social Network 
    Abstract:
    Recently, privacy and trust problems have attracted a lot of attention in social networks. This seminar mainly introduces two questions, one of which is privacy score computing in terms of individuals' profile, and MLE and EM methods are illustrated in this part. The other one is how to predicate the trust between entities by using balance theory and status theory.
     2011.06.17  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) An Efficient Tag-based Spatial Collaborative Search on Geo-social Networking 
    Abstract:
    The proliferation of geo-social networking enables users to generate amounts of location information and corresponding descriptive tags, as well as find and connect other users by mobile devices. In this context, users often have similar interests for planning one or more social activities collaboratively.This report introduce a novel type of query,called Tag-based top-k Spatial Collaborative (TkSCo) query.To answer TkSCo query efficiently, we propose two algorithm to slove this problm. Experimental results validate efficiency of the proposed algorithm.
     (Mobile Group) You Can Walk Alone: Trajectory Privacy-Preserving through Stay Points Protection 
    Abstract:
    Stay points on trajectories contain more sensitive information than ordinary location samples, so we propose a novel method to protect trajectory privacy through stay points protection, which will sharply reduce information loss.
     2011.06.10  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Research & Demo on Flickr 
    Abstract:
    Recently,researches based on Flickr are getting more and more.Facebook,twitter,flickr not only offer us perfect platform to use,but also offer help for researchers to study.We can download many informations from flickr with its api,like tag tiltle picture,etc.At the moment researches based on flickr include flickr distance,tourism recommendation,use flickr to predict information or image retrival and so on...
     (XML Group) GILX:A compressed interval labeling for grpah-structured XML 
    Abstract:
    As far as the ID/IDREF relationship is concerned, XML documents are no longer modeled as trees, but graphs. Many new problems are arising.Reachability queries in graphs are fundamental to XML database.In this report, we introduce a noval compressed intervallabeling scheme to surpport the reachability queries.
     2011.06.03  Venue: FL1, Meeting Room, Information Building
     (Web Group) Finding the Bias and Prestige of Nodes in Networks Based on Trust Scores 
    Abstract:
    This paper proposed an algorithm to compute the bias and prestige of nodes in networks where the edge weight denotes the trust score.

     2011.05.27  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Join Algorithms Using MapReduce [ppt]
    Abstract:
    MapReduce as a usefull parallel programing framework enables easy development of scalable paralell applications to process vast amounts of data on large clusters of commodity machines, but it can not directly support processing multiple related heterogeneous datasets,such as join query processing.
     (Mobile Group) Privacy-Preserving Query Processing in Cloud Computing [ppt]
    Abstract:
    With the development of cloud computing, DaaS in cloud becomes a trend. However, this service leads to privacy leak in both query content and data. Two papers published in ICDE 2011 and DASFAA 2011 give two different frames to preserve privacy in cloud computing. The first frame is based on Privacy Homomorphism, where clients lead query processing so as to protect query privacy and data privacy. The second frame is based on secret share scheme. Before outsourcing, data is divided into n shares by secret share function and stored in n DSPs. In this way, the data privacy is protected.
     2011.05.20  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Semantic Rules Optimization and Data Cleaning on Knowledge Base 
    Abstract:
    Human language is very difficult to handle, so when we build a knowledge base, we need to do semantic rules optimization and also the data cleaning.
     (Flash Group) Query Processing and Optimizing on SSDs [ppt]
    Abstract:
    A survey on query processing and optimizing for SSDs.
     2011.05.13  Venue: FL1, Meeting Room, Information Building
     (Flash Group) Considering Transaction on Append-Only Storage 
    Abstract:
    Recently, Flash-based DBMS is proposed to utilize the advantages of Flash and reduce the random write for Flash, because the performance of random write is very bad. There are three kinds of Flash-based storage strategy, such as PAX, log-based and Append-only. they have many advantages and many shortcomings. Appen-only is firstly proposed to implement in the key-value data management system. If we migrate the append-only storage method into DBMS, there will be many problems, such as Index and transaction and so on. Rollback and recovery are important components for transaction, so we propose improved flash-based rollback and recovery methods to speed up the recovery and rollback.
     (Web Group) Topical Semantics of Twitter Links 
    Abstract:
    This report introduces a paper about analysising the link semantics in Twitter which was published in WSDM2011. Moreover, I present the experimental results on the Sina data set.
     2011.05.06  Venue: FL1, Meeting Room, Information Building
    DASFAA
    Participants

    DSFAA Report  
    Abstract:
    DASFAA participants: Teacher Cao, Yulei Wang, Zhichao Liang, Xiaoying Qi share their experience and report involved sessions separately.


     2011.04.22  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Introduction to Redis,a key-value memory store 
    Abstract:
    Redis is a key-value memory store.Since it is in memory Redis holds and deals with data, it can reach high performance.Due to the limited capacity and volatility of memory, Redis also support virtual memory management and data persistence.This ppt talks about the data procedure of Redis and a naive idea to improve the virtual memory management.
     (Flash Group) Logging in Flash-based DBMS 
    Abstract:
    Flash memory, as a new kind of data storage media, is considered as the main storage device instead of disk in the next generation. We analyze the logging design issues in the flash memory based database and put forward some new solutions. The first method, HV-Logging, makes use of the history versions of data which is naturally emerged in flash memory duo to the out-of-place update. In the second method, we proposed a novel logging method called LB-logging which using list structure instead of sequence structure of the traditional databases to store log records.
     2011.04.15  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Online Aggregation [ppt]
    Abstract:
    Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and eventually, the final answer is returned. In this paper, the author propose a new online aggregaion interface that permits users to both observe progress of their aggregation queries and control execution on the fly.
     (Mobile Group) A Collaborative Location Privacy-preserving Method without Cloak Region 
    Abstract:
    Serious location privacy problems arise with extensive application of location-based services. Nowadays, location k-anonymity is the one of the most popular location privacy-preserving method, it requires a trusted third party as an anonymity server which is proved to be the performance bottleneck and aim point of attacks. This lecture introduced a collaborative location privacy-preserving method without anonymity server and cloaking region.
     2011.04.08  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Special Issue Theme: Interplay between Architecting a Software System And the Hardware Especially Evident in Data Management [ppts]
    Abstract:
    The theme introduct the hardware such as tape, disk, Flash/SSD,and Storage Class Memory and then analysize the relationship and interplay between DBMSs and these hardware. This theme includes 7 reports. One details road map of magnetic tape, magnetic disk, and a host of solid state technology. Three other papers are about data management on NAND flash. Two papers talk about software consequences of technology beyond flash. One paper Investigates energy efficiency of current SSDs.
     2011.04.02  Venue: FL1, Meeting Room, Information Building
     (Cloud Computing Group) Estimating the Progress of Queries on the Cloud 
    Abstract:
    There are many chanllenges of estimating the progress of queries on the cloud, such as task parallelism, variable execution speed, concurrent workloads, task failure, data skew, etc. In this report, we introduce how the existing methods solve the proble, and then we propose our intial idea about progress estimate.
     (Cloud Group) System Performance Test Report of Cassandra and Hbase 
    Abstract:
    A series of test cases about cassandra and hbase ,include data extension , multi-client , multi-table ,consistency and so on.

     2011.03.25  Venue: FL1, Meeting Room, Information Building
     (Web Group) Mobile Apps Project Report 
    Abstract:
    More and more handset manufacturer,carrier and ISP launch their own app store with the big success of Apple's App Store. While it is becoming a nightmare for the user to find the desired apps from so many apps. So mobile apps searh and recommendation techniques are deserved to be studied.The author introduced the project from background,motivation,proposed solutions and some works done, and proposed some open questions at the end.
     (Web Group) What is Twitter, a Social Network or a News Media 
    Abstract:
    Twitter is a application that is more than popular all over the world. So what is Twitter? This report is going to dig some high level characteristics of twitter based on the paper "What is Twitter, a Social Network or a News Media" in WWW2010.
     2011.03.11  Venue: FL1, Meeting Room, Information Building
     (Web Group) Topical Authorities Identification and Search in Twitter 
    Abstract:
    In this report, we introduced 2 papers about topical authorities identification in Twiiter in WSDM 2010 and WSDM 2011. TwitterRank is a graph-based approach to rank twitterers while the other paper in WSDM 2011 using Gaussian Mixture Model Clustering to choose authority candidates. Besides, we reported a detailed comparison between Microblog Search and Web Search.
     (Web Group) Information Cascades on Twitter 
    Abstract:
    Twitter is a microblogging service and is growing fast. In this report, we focused on the information diffusion on Twitter. We introduced two papers of the WSDM 2011 confercence. In the first paper, the author studied correcting for missing data in information cascades. In the second paper, the author concerned about quantifying influence on twitter. Through the two papers, we learned about some issues of information diffusion on Twitter.

     2011.01.14  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Introduction to UDT 
    Abstract:
    UDT performs much better than tranditional Network protocol like TCP, while in some case when the latency in the network is large tuning some parameters should be done.
     (XML Group) XML Database Test Report 
    Abstract:
    A summary of four xml databases test report.
     (Cloud Group) Metadata Management 
    Abstract:
    In recent years,to meet the need of large-scale data storage,cluster storage has become more and more popular.Then how to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems.Research of metadata management is to solve this problem.This report mainly introduces some existed methods in metadata management research and some possible research directions in TaijiDB
     2011.01.07  Venue: FL1, Meeting Room, Information Building
     (XML Group) XML Keyword Query Refinement 
    Abstract:
    In this report, we discussed about the problem of query refinement on XML keyword search. Firstly,I have make a classification of existing methods on xml keyword search refiement.Then, we discussed about my newly method on xml keyword search,we transforoming the keyword query to structure query automaitclly.The main part we mentioned is about the task and ways of XML keywords query refinement. We classfied the keywords to structure terms and content term with the xml data, and we can abstracting the relationship graph of these structure terms, which is a weighted digraph. We compute the best and the k best spanning rooted tree of the relationship graph, and take their as the best and top-k refined structure queries.
     (Cloud Group) Internship Report in Nokia Siemens Networks [ppt]
    Abstract:
    A summary of internship in Nokia Siemens Networks.The main contents of the report is the performance testing of UDT transfer protocol. First, UDT is a massive data transfer protocol oriented to high-speed WAN. Secondly, each part of the test script is described in detail.
     (Mobile Group) Trajectory Privacy Protection for Mobile Users 
    Abstract:
    Most of the existing works on trajectory privacy protection focused on trajectory k-anonymity, but k-anonymity alone does not put us on the safe side, although one individual is hidden in a group, if the group doesn't have enough diversity of the sensitive attributes then an attacker can still associate one individual to sensitive information. So, we are trying to figure out a way to offer a strong privacy protection for trajectory data.

    2010
     2010.12.24  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Continuous Density Query 
    Abstract:
    Give a brief introduction to continuous density query. State the results loss problem it has in the previous work. Give an advanced TPR-Tree based approarch to solve the problem. What's more, the new approarch returns all density regions with a higher accuracy.
     (Mobile Group) Research Review and Discussion 
    Abstract:
    summarize the research process during 4-year PhD candidate.propose some experience.
     2010.12.17  Venue: FL1, Meeting Room, Information Building
     (Web Group) Opinion Retrieval in Blogs 
    Abstract:
    Web opinion monitoring is becoming a main focus for opinion monitoring task as a result to the huge amount of user generated contents such as blogs and forums. Opinion retrieval in blogs has been studied for a long time by researchers in Text Retrieval. Web reported the goal, framework and approaches of blog opinion retrieval in recent years' papers briefly.
     (Web Group) Privacy Preserving on the Searchable Internet 
    Abstract:
    The Internet is the largest repository of information. With the advent of Web2.0, the number of personal informationon on the Internet increased sharply. Malicious attackers may collect a user's information scattered on the Web via search engines, and obtain some privacy-sensitive information. So we have observed a new privacy problem on the Web: Privacy Mining Attack via Search Engines. In this report, we will extend an existing method which was proposed by our graduated student, Jing Ai. We proposed a clustering method on bipartite graphs to resolve this problem.
     2010.12.10  Venue: FL1, Meeting Room, Information Building
     (Flash Group) A novel method to extend flash memory lifetime in flash-based DBMS 
    Abstract:
    As the capacity increases and the price drops gradually, flash memory is becoming the promising replacement of disk, even in the enterprise applications. However, flash memory suffers from erase-before-write and limited write-erase cycles at the same time, which means the abuse of write,especially small and random write, will wear a flash block out quickly. We analyze the free space management in traditional DBMS and point out its disadvantage when used on flash device. In addition, we also propose a new solution involving free space management and buffer management to extend the lifetime of flash memory by reducing the number of write I/O.
     (Flash Group) An Operation Aware Flash Translation Layer for Enterprise-class SSDs 
    Abstract:
    Flash translation layer is an important firmware in flash-based devices. It is critical to affect the performance of flash-based devices. So when SSDs are used in enterprise-class environment, FTL should be redesigned to improve the whole performance. In this report, we introduce an operation aware flash translation layer for enterprise-class SSDs.
     2010.12.03  Venue: FL1, Meeting Room, Information Building
     (Web Group) A Structured Approach to Query Recommendation With Social Annotation Data [ppt]
    Abstract:
    Query recommendation has been recognized as an important mean to help users search and also improve the usability of search engines.
     (Web Group) Introduction to OpenScholar 
    Abstract:
    OpenScholar is a web system to build scholars' homepage automatic. Its features of searching scholars' infomation and dynamic maintenance can help users build their homepages easily and fast.

     2010.11.26  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Research of query optimization in the cloud 
    Abstract:
    In cloud data management systems,data is partitioned into blocks and replicated.It is nesscary to translate some data blocks when we do some types of query processing.So we did some research on how to finish the query with little costs.
     (Web Group) Record Linkage with Uniqueness Constraints and Erroneous Values [ppt]
    Abstract:
    This paper presents some challenges of record linkage and data fusion in heterogeneous data sources with uniqueness constraints and erroneous values, models those records by utilizing K-partite graph, and proposes clustering algorithm and matching algorithm to cope with duplicates and conflicting data.
     2010.11.19  Venue: FL1, Meeting Room, Information Building
     (Web Group) Evaluating Entity Resolution Results [ppt]
    Abstract:
    Entity Resolution is an important technique in data integration. Similar to clutering and partition, ER tries to identity the same entity among messes of records. This report focus on an ER results measure,GMD.
     (Cloud Group) Research on Query Processing 
    Abstract:
    Query Processing is an difficult problem in both parallel database and cloud-based database. We briefly introduce basic query processing steps in centralized database and parallel database, and talk something about web-scale query processing, including MapReduce debates, MapReduce-based join algorithms, etc. Finally, we introduce main idea of our work and some future work.
     2010.11.14  Venue: FL1, Meeting Room, Information Building
     (XML Group) Diversification for Keyword Search on Graph Data 
    Abstract:
    Keyword search is the de facto information retrieval mechanism for data on theWorld WideWeb. It also proves to be an effective mechanism for querying semi-structured and structured data, because of its user friendly query interface.Recently, query processing over graph-structured data has attracted increasing attention.In this report,we focus on the semantic Diversification of results from keyword search on graph.
     (Flash Group) Enterprise Application of SSD [ppt]
    Abstract:
    SSD is becoming more and more popular in enterprise.But there is a question,if the platform ready for SSD?This report solved the question.And it also introduced about SSD RAID.
     2010.11.06  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) CIKM2010 Story 
    Abstract:
    In this talk, I presented some papers and one panel related to Cloud Data Management in CIKM2010. Then I gave some summary of CIKM2010.
     (Cloud Group) RHP:a new partitioner to improve the efficiency of range query in cassandra 
    Abstract:
    The conflicting problems of ensuring data-access load balancing and efficiently processing range queries leads to that cassandra can't support range query very well.So how to trade off them is the key point.

     2010.10.30  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Spatial-temporal sequence views query demo [ppt]
    Abstract:
    We have taken some informations of views on flicker to analyse how to traverse these views from the realistic perspective.If a user wants to traverse the views in a limited time,he may have several solutions,but which one is the most valuable one?Based on our ideas,we give three solutions to slove this problem,and will show you the solutions in our demo.
     (Cloud Group) Survey of Object-based Storage [ppt]
    Abstract:
    Object-based Storage, a new approach to storage technology, is a subject of academic research and development in the storage industry. This survey describes the main points of object-based storage technology from five aspects. That is why we introduce the concept of object-based storage, what it is, how to take advantage of it, what the status of object-based storage in both industry and academic research is, and what we can do about it.
     (Mobile Group) Android Development tutorial [ppt]
    Abstract:
    Android, released by Google on Nov. 5th, 2007, is a Linux kernel-based operating system designed for smartphones. In the past three years, Android system has archived a great market share and this share is still increasing. Meanwhile, Android has been attracting more and more developers who have made contributions to more than 100,000 applications in the second largest online app store called Android Market. This tutorial introduces application development on Android platform and the mechanism of Android as well.
     2010.10.23  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Flash-based Multi-Version Data Storage 
    Abstract:
    Because of characteristics of Flash Memory and Data storage of PostgreSQL, More update operations and small random write operations run on flash memory. These operations will degrade the performance of DBMS and age of flash memory. Flash-based Multi-Version Data Storage(FMVDS) is proposed to reduce update and write operations and finally reduce erase times. In FMVDS, transaction table item with timestamp and data record with a point to older version data implement high concurrency control and quickly recovery.
     (MSRA) Context-Aware Search 
    Abstract:
    Introduce the research on context-aware search in MSRA.

     2010.09.25  Venue: FL1, Meeting Room, Information Building
     (Web Group) Entity Resolution with Evolving Rules [ppt]
    Abstract:
    Entity resolution (ER) identifies database records that refer to the same real world entity. In practice,ER is not a one-time process,but is constantly improved as the data, schema and application are better understood. We address the problem of keeping the ER result up-to-date when the ER logic “evolves” frequently. A naive approach that re-runs ER from scratch may not be tolerable for resolving large datasets. This paper investigates when and how we can instead exploit previous “materialized” ER results to save redundant work with evolved logic. We introduce algorithm properties that facilitate evolution, and we propose efficient rule evolution techniques for two clustering ER models: match-based clustering and distance-based clustering. Using real data sets, we illustrate the cost of materializations and the potential gains over the naive approach.
     (Mobile Group) VLDB paper report 
    Abstract:
    This report includes two parts.The fisrt is retrieving top-k prestige-based relevant spatial web objects,this method proposes the concept of prestige-based relevance, the top-k spatial web objects is ranked according to both prestige-based relevance and location proximity.The second part introduces how to mine significant sematic location from GPS data,this method models the relationships between locations and the relationships between locations and users with a two-layered graph.Based on this,this paper proposes a new ranking model which assign significance to locations.
     (Web Group) Paper Summary of VLDB2010 
    Abstract:
    Papers of VLDB2010 about cloud are classified into four aspects: Cloud Data Management Systems, Benchmark, Query Processing and open questions. This report introduces the motivation, key technology and inspiration to our research work.
     2010.09.18  Venue: FL1, Meeting Room, Information Building
     (Graduate) New Experience in MSRA 
    Abstract:
    Introduce personal life , feelings in MSRA.
     (Graduate) Introduction to Cloud and Flash Memory Management
    Abstract:
    Share new findings and thoughts about cloud computing and flash memory management.
       

     2010.06.19  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Privacy-preserving of Trajectory Data: A Survey [ppt]
    Abstract:
    This survey discussed trajectory data privacy preservation techniques in 4 motivating applications. For online trajectory data privacy preservation, service is centric, trade-off is between QoS and privacy preservation; For offline trajectory data privacy preservation, data is centric, trade-off is between data quality and privacy preservation.
     (XML Group) XML Keyword Query Refinement [ppt]
    Abstract:
    In this report, we discussed about the problem of query refinement in traditional IR and novel XML keyword search. The main part we mentioned is about the task and ways of XML keywords query refinement. In addition, we classified the existing work of XML keywords query refinement, and give out my own work on it.
     2010.06.12  Venue: FL1, Meeting Room, Information Building
     (Web Group) Credibility on the Web: A Survey 
    Abstract:
    This survey discussed credibility on the web from three kinds of entities
     (Web Group) Information Quality and Trustworthiness in Wikipedia 
    Abstract:
    In this talk we discussed the problem of information quality and trustworthiness of Wikipedia and introduced some research topics. In addition, we gave an brief overview of current research papers about this topic in WWW, WICOW etc.
     2010.06.05  Venue: FL1, Meeting Room, Information Building
     (Cloud Group) Index for cloud data management 
    Abstract:
    This report mainly introduces why we build index on cloud data management、some related work about index for cloud data management and our work progress on index research.
     (Cloud Computing Group) NoSQL Overview [ppt]
    Abstract:
    This report simply introduced NoSQL,four reasons why nosql concept was introduced, the history, definition,Three fundamental theories of NoSQL and categories of NoSQL databases.

     2010.05.29  Venue: FL1, Meeting Room, Information Building
     (XML Group) Keyword search on Graph 
    Abstract:
    In this report, I introduce methods that perform keyword search on graph data. Keyword search provides a simple but user-friendly interface to retrieve information from complicated data structures. In this discussion, I focus on three major challenges of keyword search on graphs. First, an answer to a keyword search on graphs,or, what qualifies as an answer to a keyword search. second, what constitutes a good answer, or how to rank the answers;Third, how to perform keyword search efficiently.
     (XML Group) The Integration of TelecommuniCations Networks, Cable TV Networks and The Internet [ppt]
    Abstract:
    This report introduces the conception The Integration of TelecommuniCations Networks, Cable TV Networks and The Internet firstly.then present its development Process and its advantages. At last,I describe the current situation of Integration of the three kides of networks at abroad.
     2010.05.22  Venue: FL1, Meeting Room, Information Building
     (Web Group) Elementary Structure-based Graph Matching 
    Abstract:
    Past graph matching techniques is vertex-based. Which means they first find candidate set for each node in the query, then perform searching algorithm to find a match. This approach cost too much since there might be too many candidates for each node, and these candidates will form a large search space. To reduce the search space, it is profitable to elevate the granularity of matching algorithm
     (XML Group) Data deduplication 
    Abstract:
    This report introduces some methods of data deduplication, such as Hash-based algorithms, Delta algorithms.
     2010.05.08  Venue: FL1, Meeting Room, Information Building
     (Web Group) Benchmark results and analysis 
    Abstract:
    This report introduces the test results of benmarks on cloud-based DBMSs, and does analysis on the restuls.
     (Cloud Computing group) Architecture and Design of Distributed Database Systems [ppt]
    Abstract:
    This report introduces serval kinds of architectures about Distributed Database Systems based on relational data model, it also introduces two horizonal and a verical fragmentatin method and the allocation model for DDBMS.

     2010.04.24  Venue: FL1, Meeting Room, Information Building
    Xuan Zhou (CSIRO, Australia) Integrating User Interfaces of DB and IR Systems 
    Abstract:
    In contrast to classical databases and IR systems, real-world information systems have to deal increasingly with very vague and diverse data structures. While current object-relational database systems require clear and unified data schemas, IR systems usually ignore the structured information completely. Malleable schemas, as recently introduced, provide a novel way to deal with vagueness,ambiguity and diversity by incorporating imprecise and overlapping definitions of data structures. In this talk, I will introduce a novel query relaxation scheme that enables users to find best matching information by exploiting malleable schemas. Our scheme utilizes duplicates to discover the correlations within a malleable schema, and then uses these correlations to appropriately relax users' queries.Then, it ranks results of the relaxed queries according to their respective probability of satisfying the original query’s intent. Our experiments with real-world data confirmed its performance and practicality.
     2010.04.17  Venue: FL1, Meeting Room, Information Building
     (Flash Group) Hush-Tell You Something Novel About Flash Memory ! 
    Abstract:
    This report introduces some work of Non-volatile Systems Laboratory in UCSD in which a lot of tests on flash memory were done. According to the test results, some applications were deviced, including a variation-aware FTL which is called Mango, a flash-aware data encoding and a system architecture for data-centric applications whose name is Gordon.
     (Mobile Group) Existed DBMS on SSD 
    Abstract:
    By analysis of IOps of HDD and SSD,we can compare IOps of SSD with IOps of HDD. By analysis of tpcc of MySQL and PG on SSD and HDD, we can compare performance of existing DBMS on SSD with that on HDD. Then we propose some ideas
     2010.04.03  Venue: FL1, Meeting Room, Information Building
     (Web Group) Web Pages Extraction Technologies in the Opinion Monitoring System 
    Abstract:
    This report introduces two web pages extraction technologies in our opinion monitoring system, and some popular tools for system development.
     (Mobile Group) An Introduction to Flex [ppt]
    Abstract:
    Nowadays Flex is very popular in developing Rich Internet Applications. This report introduces what is Flex and its history and also discusses its mechanism, advantages, applications and the differences between other RIA techniques.
     (Web Group) System Environment and MapReduce Framework 
    Abstract:
    This report includes the introduction of the construction of our cloud data management platform and a brief talk about MapReduce framework.
     (Flash Group) An Introduction to the Source Insight [ppt]
    Abstract:
    This report introduces a project-oriented program editor and code browser,Source Insight,which parsers your source code and maintains its own database of symbolic information dynamically while you work,and presents useful contextual information to you automatically.

     2010.03.27  Venue: FL1, Meeting Room, Information Building
     (Web Group) IO3:Interval-based Out-of-order Event Processing in Pervasive Computing 
    Abstract:
    In pervasive computing environments, complex event processing has become increasingly important in modern applications. A key aspect of complex event processing is to extract patterns from event streams to make informed decisions in real-time. However, network latencies and machine failures may cause events to arrive out-of-order. In addition, existing literatures assume that events do not have any duration, but events in many real world application have durations, and the relationships among these events are often complex. In this work, we first analyze the preliminaries of time semantics and propose a model of it. A hybrid solution including time-interval to solve out-of-order events is also introduced, which can switch from one level of output correctness to another based on real time. The experimental study demonstrates the effectiveness of our approach.
     (Cloud Group) ICDE2010 Keynote - what's new in the cloud [ppt]
    Abstract:
    This report talks about why we should do cloud computing,how to do and what to do.
     (Web Group) Survey of ICDE2010 and SIGMOD2010 
    Abstract:
    Based on the accepted papers, this presentation made a survey on recent international database conferences ICDE2010 and SIGMOD2010, and analyzed the research focuses of database area.
     2010.03.20  Venue: FL1, Meeting Room, Information Building
     (Flash Group) RWConvertor: Random Write Optimization for SSD 
    Abstract:
    With the development of electronic technologies, Solid State Drive (SSD) emerge as new data storage media with low power consumption, high shock resistance and lightweight form. Besides these, the most attractive characteristic is the high random read speed because of no mechanical latency. Therefore SSD have been widely used in laptops, desktops, and data servers in place of hard disk during the past few years. However, poor random write performance becomes the bottle neck in wider applications. Random write is almost two orders of magnitude slower than both random read and sequential access, so write-intensive applications have very low performance on SSD. In this paper, the first time we propose to insert unmodified data into random write sequence in order to convert random writes into sequential writes, and then data sequence can be flushed at the speed of sequential write. Further, we improve the write performance by Optimum Converted Write Sequence (OCWS). Strict mathematical proof decides the location and number of inserted data items during the course of getting OCWS. We also optimized our method with throughput, which is decided by gain and granularity, of OCWS when applied in data stream.
     2010.03.13  Venue: FL1, Meeting Room, Information Building
     (XML Group) Approaches to internet of things 
    Abstract:
    As the next generation of information technology,the internet of things has drawn public attenention.It enables the internet to reach out into the real world of physical objects.This report first gives the concept of the internet of things,then introduces the system architecture and key techniques and gives three applications.Fianlly,I put forward to the furture direction.
     (Mobile Group) Related Work about Internet of Things [ppt]
    Abstract:
    This report gives an overview of the related and future work about Internet of Things and focus on the The RFID Ecosystem Experience handled by University of Washington.
     2010.03.06  Venue: FL1, Meeting Room, Information Building
     (Web Group) Open Source Cloud-based DBMS Experiments 
    Abstract:
    This report introduces existing expriment benchmarks of cloud-based DBMS experiments. We describe the testbed of our experiment, and show the tasks and results.
     (Web Group) System Architecture Design and Implementation of Cloud-based Database System 
    Abstract:
    The Cloud-based Database project at WAMDM aims at researching new storage and database system which can support the next generation of data storage and management and applied to mobile communications. This report introduced the architecture design and implementation of our cloud-based database system.

     2010.01.09  Venue: FL1, Meeting Room, Information Building
     
    (Invited Talk)
    Time series and Interactive media 
    Abstract:
    Time series and interractive media have large applications in computer games or so. One of the most important problem for pattern detection in streaming time series could be how to define a effective distance metric.We propose a novel warping distance and efficient approach for continuous pattern detection. For the interavtive media database, it focus on the index,storage structure for smart media objects, similarity metrics and query procesing on multimedia data.
     (Flash Group) FTL Algorithms and Native Flash Experiments 
    Abstract:
    This report introduces five flash translation layer algorithms, such as BAST, FAST, LAST, and DFTL etc. We mainly describe the main ideas of those algorithms and their realization. Then we introduce the native flash experiments.

    2009
     2009.12.26  Venue: FL1, Meeting Room, Information Building
    Dr. Rui Zhang 
    (Invited Talk)
    Continuous Intersection Joins Over Moving Objects 
    Abstract:
    The continuous intersection join query is computationally expensive yet important for various applications on moving objects. No previous study has specifically addressed this query type. We can adopt a naive algorithm or extend an existing technique (TP-Join) to process the query. However, they compute the answer for either too long or too short a time interval, which results in either a very large computation cost per object update or too frequent answer updates, respectively. This motivates us to optimize the query processing in the time dimension. In this study, we achieve this optimization by introducing the new concept of time-constrained (TC) processing. Further, TC processing enables a set of effective improvement techniques on traditional intersection join algorithms. With a thorough experimental study, we show that our algorithm outperforms the best adapted existing solution by several orders of magnitude.
    Dr. Jinchuan Chen 
    (Invited Talk)
    Uncertain Data Management 
    Abstract:
    Dr. Jinchuan Chen gave a brief introduction to research frontier in uncertain data management and some typical method to handle data uncertainty. He also proposed some research topics in uncertain data management.
     2009.12.19  Venue: FL1, Meeting Room, Information Building
     (Cloud Computing Group) cassandra and sigmod contest [ppt]
    Abstract:
    Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.The task of sigmod programing contest 2010 is to implenment a simple distributed query executor built on top of the last year's main-memory index.
     (Mobile Group) Hammer & Nail 
    Abstract:
    "Research is actually a process of hammers(methods) hammer nails(problem)". This report first presents three hammers, i.e.three kinds of hash functions, which are signature, OPMPHF(Order Preserving Minimal Perfect Hash Function) and LSH(Location Sensitive Hashing).Then it introduces a nail using the hammers above.It is called Reveser k Spatial and Textual Nearest Neighbor(RkSTNN).
     2009.12.12  Venue: FL1, Meeting Room, Information Building
     (Web Group) Survey on Data Management in the Cloud 
    Abstract:
    With the development of computer and communication technology, a large scale of data are produced. Cloud-based database is one solution to efficiently store and analyze these data. In this talk, we present some cloud-based database and summarize them from different aspects.
     (Cloud Computing Group) Hive – A Warehousing Solution Over a MapReduce Framework [ppt]
    Abstract:
    Introduce a system which support managing and querying structured data and builded on the top of hadoop and the query language.
     2009.12.05  Venue: FL1, Meeting Room, Information Building
     (Web Group) Trust Metric on Social Network 
    Abstract:
    This report introduces five trust metric mechanisms on social network, such as
     (Web Group) Data Fusion-Resolve Data Conflicts in Integration 
    Abstract:
    In this talk we gave a brief introdution to data fusion, including data conflict types, conflict resolution strategies, the role played by data fusion in integration programs and current approaches to data fusion. Then we addressed some challenges and open problems in data fusion research. Finally we presented a brief summary to this talk.

     2009.11.28  Venue: FL1, Meeting Room, Information Building
     (mobile Group) ACR: an Adaptive Cost-Aware Buffer Replacement Algorithm for Flash Storage Devices 
    Abstract:
    In this talk, we propose an adaptive cost-aware buffer replacement algorithm--ACR, which adapt to various access patterns on flash disks.
     (Mobile Group) Multi-version Concurrency Control of Database Based on Flash Memory 
    Abstract:
    Data may have multiple versions as because of the feature of not-in-place update and in-page logging store mechnism in flash memory. Multi-version concurrency control has to be implented based on the Serialization theory, and it includes MV2PL(multi-version 2PL), MVTO(multi-version TO), MVSGT(multi-version SGT), TW(time warp) and ROMV(read-only multi-version). We evaluated the performance of these algorithms by implementing experiments on existing DBMS such as MS SQLServer, MySQL and Postgres. Finally, we proposed some future work in Multiple-version Concurrency Control.
     2009.11.21  Venue: FL1, Meeting Room, Information Building
     (XML Group) Efficient String Similarity Search Using Synonyms 
    Abstract:
    This report introduces the gram_based string matching functions and the new similarity funcion.
     (XML Group) Reachability Queries on Large Directed Acyclic Graphs 
    Abstract:
    In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. In this reprot, I introduce my new graph label to speed up the processing of reachablity queries on DAG,which index is small and which can be constructed easily。
     (XML Group) Information Retrieval Model and Relevance Feedback 
    Abstract:
    This report first introduces four classic information retrieval models. Based on those models, we present two methods of improving retrieval results
     2009.11.14  Venue: FL1, Meeting Room, Information Building
     (Web Group) Review our studies on dataspace 
    Abstract:
    Reviewed our works on dataspace research, and introduced a work we are doing.
     (Web Group) Dataspace Research Report 
    Abstract:
    Introduced research and system implementation progress on Dataspace research.
     (Web Group) Leveraging Feature Context to Facilitate Sub-graph Query in Graph Database 
    Abstract:
    Previous techniques focus on feature selection strategy to filter false graphs as more as possible. This approach has met a bottleneck, that as the feature is becoming more and more complicated, precision is still low. Thus we propose to investigate into how feature context could help improve pruning power in sub-graph query.
     2009.11.08  Venue: FL1, Meeting Room, Information Building
     (Web Group) About CIKM2009 Story 
    Abstract:
    Give a short summary on CIKM 2009 based on my impression on this confference, esspecially introduece the three keynotes.
     (Flash Group) Review of CIKM 2009 [ppt]
    Abstract:
    CIKM is a high level international conference. There are three tracks
     (Web Group) Summary of CIKM2009 
    Abstract:
    In this talk, I presented three papers and one tutorial related to Web data management and click log mining in CIKM2009. Then give some summary of CIKM2009.
     (Web Group) IR is Interesting-CIKM 2009 Report 
    Abstract:
    In this presentation, I gave a brief summary and introduction to the CIKM 2009 conference and some of my own experience on this conference.

     2009.10.31  Venue: FL1, Meeting Room, Information Building
     (Web Group) An Efficient Multi-Dimensional Index for Cloud Data Management [ppt]
    Abstract:
    In this presentation, I introduced our work of multi-dimensional index structure for Cloud Computing platforms.
     (Web Group) Supporting Context-based Query in Personal DataSpace [poster]
    Abstract:
    Many users need to refer to content in existing files (pictures,tables, emails, web pages and etc.) when they write documents(programs, presentations, proposals and etc.), and often need to revisit these referenced files for review, revision or reconfirmation. In this paper, we propose an efficient solution for this problem. We firstly define a new personal data relationship
     (Flash Group) Pre-Report for CIKM 2009 [poster]
    Abstract:
    Solid State Drive (SSD), emerging as new data storage media with high random read speed, has been widely used in laptops, desktops, and data servers to replace hard disk during the past few years. However, poor random write performance becomes the bottle neck in practice. In this paper, we propose to insert unmodified data into random write sequence in order to convert random writes into sequential writes, and thus data sequence can be flushed at the speed of sequential write.
     2009.10.24  Venue: FL1, Meeting Room, Information Building
     (Web&Mobile Group) Overview of Talks in NDBC 2009 
    Abstract:
    Dr. Xiangye Xiao gave a brief review of invited talks in NDBC 2009 which includes Dr. Xin Dong from AT&T, Prof. Weiyi Meng from Binghamton Univ., Haixun Wang from MSRA and Lei Chen from HKUST.
     (Web Group) Report on SKG2009 
    Abstract:
    Give an introduction on SKG2009, and focusing on introducing the two keynotes of this conference.
     (Mobile Group) A new topic: queries with geo-information [ppt]
    Abstract:
    Discovering users' specific and implicit geographic intention in web search can greatly help satisfy users' information needs. Research on queries with geo-information has becoming hot these years. There are several methods. First, the training data based methods, these methods need big data of query logs; another is spatial and texual information retrieval methods, but these methods can only deal with local geo-informaiton. The challege is how to discover users' implicit geo-information in queries.
     (Web Group) trajectory pattern mining 
    Abstract:
    The pervasiveness of mobile devices and location based services is leading to an incresing volume of mobility data.This side effect provides the opportunity to analyse the behaviors of movements.With this background,trajectory pattern mining has been a popular topic.This report mainly introduces some representative work about this topic and points out some defects.
     2009.10.11  Venue: FL1, Meeting Room, Information Building
     (Web Group) C-Rank -- A Credibility Evaluation Method for Deep Web Records 
    Abstract:
    How to identify and evaluate information credibility ranking has become an increasing important problem. To address the issue, an effective credibility evaluation method called C-Rank to compute trust values of records in Deep Web databases is proposed, which constructs an S-R Credibility Graph for each record.
     (Mobile Group) Privacy Preserving towards Continuous Query in Location-based Services 
    Abstract:
    With advances in wireless communication and mobile positioning technologies, location-based mobile services have been gaining increasingly popularity in recent years. Privacy preservation, including location privacy and query privacy, has recently received considerable attention for location-based mobile services. A lot of location cloaking approaches have been proposed for protecting the location privacy of mobile users. However, they mostly focus on anonymizing snapshot queries based on proximity of locations at query issued time. Therefore, most of them are ill-suited for continuous queries. In view of the privacy disclosure (including location and query privacy) and poor quality of service under continuous query anonymization, a δp-privacy model and a δq-distortion model is proposed to balance the tradeoff between privacy preserving and quality of service. Meanwhile a temporal distortion model is proposed to measure location information loss during a time interval, and it is mapped to a temporal similar distance between two queries. Finally, a greedy cloaking algorithm (GCA) is proposed, which is applicable for both anonymizing snapshot queries and continuous queries. Average cloaking success rate, cloaking time, processing time and anonymization cost for successful requests is evaluated with increasing privacy level (k). Experimental results validate the efficiency and effectiveness of the proposed algorithm.
     (XML Group) Algebra-based Transform query optimization strategy 
    Abstract:
    XQuery/Update defines a special Transform query, which is similar to be hypothetical query in relation databases, and can be expressed as“Q when {U}”. In other words, the results of query Q are the same as the results after executing hypothetical update {U} on the original database, without actually updating database. The Transform queries need to copy the nodes in XML database and then update copied nodes, so it doesn’t affects the database. But Transform queries will usually copy and update a lot of nodes which are useless for query Q and result in high cost. It is critical for query optimization to decrease the number of copied nodes and the update operation. In this paper, we propose a set of rules for Transform query optimization techniques based on OrientXA. Which are implemented in OrientX3.0.
     (Mobile Group) HF-Tree--An Update-Efficient Index for Flash Memory 
    Abstract:
    Due to the expensive write cost of flash memory, traditional disk-based indexes have a poor update performance when directly applied to flash drives. In this talk, Da Zhou proposed a novel index called HF tree to improve the update performance of Flash memory, which integrates BF -tree with Tri-hash.
     (Mobile Group) Sub-Join--A Query Optimization Algorithm for Flash-based Database 
    Abstract:
    Compared with Hard Drive Disk (HDD), SSD has a lot of advantages, such as high random read performance, low power consumption and lightweight form. Therefore it is envisioned to be next generation data storage instead of HDD. However, the enhancement of query performance for flash-based database is not the same as the IO ratio of SSD to HDD. The reason is existing databases which are designed for HDD can not take full advantage of high IO performance of SSD. In this paper, a new join algorithm, Sub-Join, is proposed. Sub-Join first projects the column of join and primary key as Sub-Table, and then executes join operations on Sub-Tables. Finally results are gotten from original table according to the result of join on Sub-Tables. The compared experiments with Oracle Berkeley DB show Sub-Join outperforms original indexed nested-loop join at the ratio of about 40%~100%. The result strongly shows the high efficiency of this method.

     2009.09.28  Venue: FL1, Meeting Room, Information Building
     (AT&T Research) Data Integration with Uncertainty 
    Abstract:
    Dr. Xin (Luna) Dong from Data Management Department at AT&T Research visited Web And Mobile Data Management (WAMDM) lab and gave an invited talk about Data Integration with Uncertainty. Her talk mainly focused on some important and valuable topics in uncertain data integration.
     2009.09.19  Venue: FL1, Meeting Room, Information Building
     (Web&Mobile Group) Efficient Co-Location Pattern Discovery 
    Abstract:
    Dr. Xiangye Xiao gave a brief talk about her research topics when she was a PHD candidate in the Hong Kong University of Science and Technology. Her talk included efficient co-location pattern discovery and Web browsing on mobile devices. Besides, Dr. Xiangye Xiao proposed some ideas about future research.
     (XML Group) Keyword Search Techniques in Mobile Web 
    Abstract:
    Dr. Jiaheng Lu received an a funding award about "keyword search in mobile web" from National Science Foundation China (NSFC). He gave a detailed demonstration about the project and proposed some possible topics.

     2009.07.25  Venue: FL1, Meeting Room, Information Building
     (XML Group) OrientX4.0 - Supproting Keyword Search 
    Abstract:
    With the developing of xml technology, more and more pepole using xml data. In traditional, we use the standard query lanaguage XQuery to find the data we need, but we need to learn the "XQuery" and we must know the structure and content of the xml document. It is great challenge of naive users. For this popose, in the new edition-OrientX4.0, we supporting the xml keyword-search , which can solve the problem we meet by using XQuery and make pepole using xml more easier.
     (XML Group) OrientX4.0 System Development Report [ppt]
    Abstract:
    the implement of XML keyword search
     2009.07.18  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Probabilistic kNN Query in Road Network 
    Abstract:
    Queries for moving objects in road network, especially kNN(k Nearest Neighbor) queries are very important and have received considerable attention. This speech discusses how to model the uncertainty data and process kNN queries in road network.
     (Mobile Group) Report on Privacy Protection Demo Appplication Development 
    Abstract:
    In order to apply the current privacy protection algorithms and integrate them in the 863 Pervasive Computing project, we decided to develop a demonstration application.This report introduced the technical and functional characteristics of the application as well as the development plan.
     (Mobile Group) Query Processing over Interval-based Out-of-order Event Streams 
    Abstract:
    Complex event processing has become increasingly important in modern applications, ranging from supply chain management for RFID tracking to real-time intrusion detection. A key aspect of complex event processing is to extract patterns from event streams to make informed decisions in real-time. However, network latencies and machine failures may cause events to arrive out-of-order at the event processing engine. In addition, existing temporal pattern mining assumes that events do not have any duration. However, events in many real world applications have durations, and the relationships among these events are often complex. In this work, we propose solution to process both sequence and parallel pattern queries on out-of-order event streams. First, we analyze the preliminaries and the problems caused by out-of-order data arrival. We then propose a method to detect out-of-order event patterns. A new solution including time-interval to solve out-of-order problems is also introduced. Lastly, we conduct an experimental study demonstrating the effectiveness of our approach.
     2009.07.11  Venue: FL1, Meeting Room, Information Building
     (Flash Group) System Development Report of Flash Group [ppt]
    Abstract:
    Our target is to develop a special flash-based DBMS,and we decide to do some changes on an existing open source DBMS to work it out. However, as a matter of fact,there are lots of open source systems. Which one is the best choice? After a detailed analysis, we believe MySQL,which contains the Berkeley DB as one of its storage engines,is the answer to our problem.
     2009.07.04  Venue: FL1, Meeting Room, Information Building
     (Web Group) SIGMOD2009 Overview [ppt]
    Abstract:
    Analyze the current hot research issues based on the accessed papers of SIGMOD2009, and introduce two papers of this conference.
     (Mobile Group) Flash Research Report [ppt]
    Abstract:
    Flash-based database systems research becomes more and more hot. In sigmod2009 and VLDB2009, we are glad to see that there are some papers about the indexing, query processing and transaction processing. This report gives a coarse overview to the motivations and ideas of these papers.
     (XML Group) XML Labeling and Query Optimization in Sigmod09 [ppt]
    Abstract:
    Optimization of complex XQueries combining many XPath steps and joins is currently hindered by the absence of good cardinality estimation and cost models for XQuery.Labeling schemes lie at the core of query processing for many XML database management systems. Designing labeling schemes for dynamic XML documents is an important problem that has received a lot of research attention. This presention introduce a new labeling scheme DDE and a new Runtime Optimization approach ROX in sigmod09.

     2009.06.27  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Logging in Flash-based Database Systems [ppt]
    Abstract:
    Synchronous transactional logging is the central mechanism for ensuring data persistency and recoverability in database systems. In this report,we discussed the solutions about exploiting different kinds of flash drives for synchronous logging and the recovery processing technologies related with them.
     (Web Group) Location-based Database Selection 
    Abstract:
    Location_based database selection is a new topic,This report mainly gives an introducton about this topic,including why we choose this topic,what the problem is,some related work and how to solve the problem.
     (Web Group) Snippet of Structured Data 
    Abstract:
    It is expected that more and more people will search the web when they are on the move. But there are many limitations when we browsing the web page in mobile devices, especially small screen. A record in database usually contain lots of information, which is not useful for user and is so much for small screen. So we try to extract the most useful attributes to return to user.
     2009.06.20  Venue: FL1, Meeting Room, Information Building
     (XML Group) XML Keyword-Search engine 
    Abstract:
    XML has already became the de-facto of data exchange. So, how to query XML data is becoming very important. We can use the query language XQuery and XPath, which is the standard query language of XML recommended of W3C, to get what we need. But the user must be familiar with the query languages, and know the content and structure of XML data at first, so that the users can write the accurate query. It is not easy for most users, and it forcing the study of XML keyword-search, With it, we needn't learn the XML query language, and also, we needn't known the content and structure of XML. It make the query easier. The main features of next edition of OrientX(edition 4.0) is to supprot the keyword-search, in the presentation, qingsong guo analized the existing XML keyword-search engine and made a comparison and get their features in common . And based it, we defined the main features of OrientX 4.0 . Wei wang analized the key technologies of xml keyword-search, such as the priciple and algorithms of computing SLCA, the ranking of query results.
     2009.06.13  Venue: FL1, Meeting Room, Information Building
     (XML) Query Processing over Graph-structured XML Data 
    Abstract:
    When XML documents are modeled as graphs, many research issues arise. In particular, there are many new challenges in query processing on graph-structured XML documents because traditional query processing techniques for tree-structured XML documents cannot be directly applied.
     (Mobile Group) MVCC on Flash Memory [ppt]
    Abstract:
    First, Flash has the characteristic of Out-of-Place Updating, which lead to multiple version of data on Flash. Second, I introduce the basic priciple and some protocols of MVCC, such as MVSR, MVCR, MVTO, MV2PL and so on. Finally, I present some information of transaction in BDB and PG.
     2009.06.06  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Location,Location, Location 
    Abstract:
    This talk focuses on the dicussion of Keynote of Christian S. Jensen on MDM2009.
     (Web Group) C-Query: Context-based Query in Personal DataSpace 
    Abstract:
    Many users need to refer to content in existing files (pictures,tables, emails, web pages and etc.) when they write documents(programs, presentations, proposals and etc.), and often need to revisit the referenced files for review, revision or reconfirmation. In this paper, we propose an efficient method for users to revisit these refferenced files by identifying a context-based refference relationship.

     2009.05.23  Venue: FL1, Meeting Room, Information Building
     (XML Group) OrientX system development report [ppt]
    Abstract:
    The main features of OrientX3.5 version and its implementation.
     2009.05.16  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Random Write Optimization for SSD 
    Abstract:
    Random write of SSD has low IO performance when compared with sequential/random read and write. This paper propose a novel method to avoid the low performance of random write.
     (Mobile Group) buffer management policy [ppt]
    Abstract:
    In this talk, I introduced several interesting buffer management algorithms, including some algorithms which work well on disk-based DBMS, others are buffer management algorithms on flash-based DBMS.

     2009.04.25  Venue: FL1, Meeting Room, Information Building
     (Web Group) An Indexing Framework for Efficient Retrieval on the Cloud [ppt]
    Abstract:
    The emergence of the Cloud system has simplified the deployment of large-scale distributed systems for software vendors. The Cloud system provides a simple and unified interface between vendor and user, allowing vendors to focus more on the software itself rather than the underlying framework. Existing Cloud systems seek to improve performance by increasing parallelism. This paper explores an alternative solution, proposing an indexing framework for the Cloud system based on the structured overlay. Its indexing framework reduces the amount of data transferred inside the Cloud and facilitates the deployment of database back-end applications.
     (Web Group) Data Management in the Cloud - Limitations and Opportunities [ppt]
    Abstract:
    Analysed data management applications that are suitable to move to the cloud platform and discussed remaining challenges of such movement.
     2009.04.18  Venue: FL1, Meeting Room, Information Building
     (XML Group) MCN: A New Semantics Towards Effective XML Keyword Search [ppt]
    Abstract:
    In this talk, We propose a new XML Keyword Search Semantics aiming at capturing meaningful results while avoiding returning meaningless results. This contribution is based on the observation that when talking about relationship between data elements, users query intension is always based on the relationship of real word entities.
     (Web Group) Selectivity Estimation for Exclusive Query Translatio in Deep Web Data Integration [ppt]
    Abstract:
    In Deep Web data integration, some Web database interfaces express exclusive predicate,which permits only one predicate to be selected at a time. Accurately and efficiently estimating the selectivity of each Qe is of critical importance to optimal query translation. In this paper, we mainly focus on the selectivity estimation on infinite-value attribute which is more difficult than that on key attribute and categorical attribute. We start with two observations
     2009.04.11  Venue: FL1, Meeting Room, Information Building
     (Web Group) Summary of ICDE2009 keynotes [ppt]
    Abstract:
    This slides give a summary on three keynotes of ICDE2009.
     (mobile Group) ICDE 2009 Introduction 
    Abstract:
    ICDE is a very important international meeting about data management. In this conference, there are a lot of works related to flash-based database. transaction becomes an important topic in this field.
     (Flash Group) Demo in ICDE 2009 Conference [ppt]
    Abstract:
    WEST(Web Entity Search Technologies),instead of returning webpages that are related to any people who happened to have the queried name,is to output a set of clusters of webpages,one cluster per each distinct person.Fa is a new system for automated diagnosis of system failures that is designed to address the SLO violations.UQLIPS is a Web-based integrated platform which performs online detection of near-duplicate occurrences over continuous video streams,as well as retrieval of near-duplicate clips from segmented video collections.
     2009.04.04  Venue: FL1, Meeting Room, Information Building
     (Mobile Group) Distortion-based Anonymity towards Continuous Query in Mobile Services 
    Abstract:
    Privacy preservation has recently received considerable attention for location-based mobile services. A lot of location cloaking approaches have been proposed for protecting the location privacy of mobile users. In this paper, we present continuous query privacy disclosed and worst QoS resulting from anonymizing continuous query.
     (Mobile Group) Complex Event Detection in Pervasive Computing 
    Abstract:
    In pervasive computing environments, wide deployment of sensor devices has generated an unprecedented volume of atomic events. However, most applications such as healthcare, surveillance and facility management, as well as environmental monitoring require such events to be filtered and correlated for complex event detection. Therefore how to extract interesting, useful and complex events from low-level atomic events is becoming more and more important in daily life. Due to the increasing importance of complex event detection, this paper proposes a framework of Complex Event Detection and Operation (CEDO) in pervasive computing. It gives an event model and extends current detection by incorporating temporal and spatial settings of events and different levels of granularity for event representation. We first show research issues, related works, and main research problems in this area. Then our current research works and the preliminary results are introduced. Finally, the research plan of my PhD project is presented for discussion.

     2009.03.28  Venue: FL1, Meeting Room, Information Building
     (Web Group) Deep Web Integration:Querying Structured Data on the Deep Web [ppt]
    Abstract:
    In this report, I will introduce the background of Deep Web, the key technologies of Deep Web data integration and the active research groups. Then I will compare the metaquerier with metasearch engine. Finally I will give the research problems in the future.
     (Web Group) database selection 
    Abstract:
    Database selection is a important topic,this report gives an introduciton to database selection and then introduces our new problem.
     2008.03.21  Venue: FL1, Meeting Room, Information Building
     (Web Group) CoreSpace: A personal dataspace framework based on user activity 
    Abstract:
    Present a new framework of personal dataspace by hightlighting relationship between users and average objects, which provides more effective approaches of querying personal dataspace.
     (Web Group) An efficient method to Identify personal task 
    Abstract:
    Present a new method to identify personal task based on user access activity.
     2009.03.14  Venue: FL1, Meeting Room, Information Building
     (Cloud Computing) Research Report on Map/Reduce Framework Based on Hadoop [ppt]
    Abstract:
    Map/Reduce is the crucial algorithm of Hadoop. It is a easy but powerful algorithm that can solve the problems based on mass data. In this report,I will introduce the concept of Hadoop and Map/Reduce, then the detail of how the Map/Reduce framework do jobs.
     (Web Group) Introduction to HBase [ppt]
    Abstract:
    As sub-project of Hadooop, HBase focus on providing storage for the Hadoop Distributed Computing Environment. HBase is a table coloum-oriented operating. Its three-layer file system provides the feasible scheme for the distributing data storage while its three-layer architecture solves the problems of region assignment and region location. To get intuitionistic understanding of HBase, comparison with MySQL has been made in the test.
     (Web Group) The Progress of C-DBLP's Development and Future Plans 
    Abstract:
    The develop team of C-DBLP system has added some attractive functions and features to the site based on user's feedback and researching demand since the release of C-DBLP. Besides, we are working on some interesting problems such as Name Disambiguation and Mining of Relations among Authors. This report presented the progress of C-DBLP's development and showed intuitive approaches to the research problems in C-DBLP. Also, we made a detailed plan for future work in C-DBLP.
     2009.03.07  Venue: FL1, Meeting Room, Information Building
     (Web Group) Study on Fast Approxmate Membership checking 
    Abstract:
    Introduce ISH for approximate membership checking and analyze its disadvantage. We propose a new index and a corrresponding algorithm, the experiments indicate that the new method is more efficient than ISH.
     (XML Group) String Similarity 
    Abstract:
    This report introduces the methods about counting string similarity, including edit distance and gram_based similarity.

     2009.02.28  Venue: FL1, Meeting Room, Information Building
     (Web Group) Faceted Search [ppt]
    Abstract:
    A introduction to faceted search, including the evolution of faceted search, the differences between faceted search and navigational search, direct search, and differences between cluster, tag and facet.
     (Web Group) Automatic Construction of Facet Hierarchies 
    Abstract:
    Facet hierarchies are the main forms of data organization in facet search system. They are used to support facet-based navigation and refine the search results through different facets. The construction of facet hierarchies is one of the most important research topics in facet search. Since most facet hierarchies in current systems are built mannually, the automatic construction method is in great need. This presentation addressed W. Dakka and P. G. Ipeirotis's research progress in automatic construction of facet hierarchies.

     2009.01.11  Venue: FL1, Meeting Room, Information Building
     (XML Group) Survey of XML Database Technology [ppt]
    Abstract:
    In this talk, I give the main topics about XML database and explain the existing solutions using simple examples.
     (XML Group) Graph DataBases 
    Abstract:
    This presentation introduces some rearch hotspots on Graph DataBases,including the construction of the index, the processing of containment queryquery and reachability query answering.



    Seminars in 2008 and before