The Development of digital and Internet technology brings new characters to data management. The data people face today is always versatile, unstructured and heterogeneous, so the schema-dependence makes conventional data management technologies fail to fit these characters nicely. As a new concept of data management, dataspace focuses on modeling the new characteristics of data. On the other hand, explosion of personal information made the management of personal data an urgent task. Besides large quantity, high heterogeneity is another characteristic of personal data, and this characteristic makes it natural to combine the research of dataspace and personal data management and reach to the topic of Personal Dataspace Management, which is the spirit of the OrientSpace prototype system --- A Personal Dataspace Management System.
OrientSpace is built to help users to better manage personal data of large quantity and high heterogeneity. In our system, we try to discover and utilize association information as much as possible because we believe associations are valuable and important to users.. Particularly, we try to attack several research problems:
- flexible query ability over unstructured personal data.
- efficient management of heterogeneous personal data.
- enable lightweight pay-as-you-go personal data integration.
To understand why you want to try OrientSpace, consider how to solve the following problems with existing tools:
- You try to find a certain document, but you can't remember its name or exact text content, but only remember something related to it such as other files or e-mails.
- You acquired new information about a contact person (such as the history of its working place), and you try to fit the new attributes into the attributes available in "contact function" of the Outlook. But you will fail because the available attributes of the Outlook cannot be changed.
- You get a lot of work going on every day, and you hope to find the most important files for current work as quickly as possible. However, all you can do is to check the ¡°Recent Used Files¡±, but some of your desired files are not among the "Recent Used Files".
- You have so many tasks going on simultaneously. And files related to one task may span several directories so it always takes you a lot of time to locate desired files for a certain task.
OrientSpace is built to tackle problems in Personal Information Management including but not limited to those stated above.
Search your data by content-based associations
Most PIM applications allow users to issue keyword search to in order to find their desired files, but there're occasions when users don't remember the exact content of the file or don't know what keyword to input.
If you unfortunately run into these situations, OrientSpace can help you with content-based associations among files. You can first using keyword search to search for some files that you think are related to the desired file, and then leveraging the content-based associations to browse files that are related to the search results. And in the related files you may find your desired file, and if not, you can repeat this process until succeed. [Example]
Manage evolving data in flexible schema
One of the challenges of PIM nowadays is that the schemas of data are sometimes evolving over time. For instance, you may keep get new information about a person, but the attributes of contact in outlook never change, so you have to find other ways.
In OrientSpace, we enable users to manage their data schema in a very flexible way. Users can create and modify data schemas as they wish in a lightweight way. Typically, when you get a piece of new information about something, you can add a new attribute and insert the new information. And the new attribute is added to the schema, so later insertion of data can see this new attribute. [Example]
Locate important data quickly using CoreSpace
Nowadays each person has a lot of documents in their computer, but among which only a fraction of files are important to users for a certain period. By important, we mean this file is frequently accessed by the user recently and therefore is of more importance to the user than others.
We implemented an algorithm to build the CoreSpace of users' whole data space in the hope of recording the most important files so that users can quickly find their frequently used files in no time. [Example]
Manage Disordered Data with Automatically Organized Tasks
Nowadays the most often used organization approach of personal data and files is the hierarchy structure approach, in which each file belongs to one directory designated by user. However, working with such hierarchy structure is not very efficient since the structure does not always agree with the structure of ¡°tasks¡± in user¡¯s mind, which is people¡¯s most natural way of organizing data. And when such disagreement happens, it would be hard to find your desired data efficiently.
Based on this observation, in OrientSpace we organize user data file by the concept of task. We define a task as containing a set of files correlated with a common task of the user, e.g., writing a paper or making a presentation. We extract tasks by analyzing user¡¯s behavior log and based on that we provide various task related services including: dynamically evolution of task, browse and search of task, interactive task maintenance and so on. We also integrate task into the ¡°search by association¡± feature that we introduced above, and this means you can explore both files and tasks by associations.[Example]
|OrientSpace is an open source system.The following step will guide you:
- Extract the package to a directory whose path contains ONLY English characters.
- Note that Orientspace runs on Windows XP.
- It also need to be supported by Java 1.6 . You can download it from here.
- You can get follow our Demonstration or get README from the download package.
Download the OrientSpace software packages:
- Y. Li, X. Meng, X. Zhang: Research on Dataspace. Journal of Software, 2008,19(8):2018-2031. 10.3724/SP.J.1001.2008.02018.
- Y Li, X. Meng: Research on Personal Dataspace Management, The Second SIGMOD PhD Workshop on Innovative Dataspace Research (IDAR2008), Vancouver, BC, Canada, June 10-12, 2008.
- X. Zhang, J. Chen, Y. Li, X. Meng: TEXEM : An Entity-based Task Extraction Approach for Emails, Journal of Computer Research and Development, Vol. 45 Suppl (NDBC2008 GuiLin).
- X. Min, H. Wang, J. Yin, X. Meng: Integrity Auditing of Outsourced Data. In Proceedings of 33th International Conference on Very Large Data Bases(VLDB2007), pages 782-793, Vienner, Austria, September 24-28, 2007.
- Z Bao, T W Ling, B Chen, J. Lu: Effective XML Keyword Search with Relevance Oriented Ranking. ICDE 2009.
- A Behm, S Ji, C Li, J. Lu: Space-Constrained Gram-Based Indexing for Efficient Approximate String Search. ICDE 2009
- X. Min, H. Wang, J. Yin, X. Meng,:Providing Freshness Guarantees for Outsourced Databases. In Proceedings of 11th International Conference on Extending Database Technology(EDBT2008), page 323-332, Nantes, France, March 25-30, 2008.
- J. Zhou, X. Meng, T. Ling:Efficient Processing of Partially Specified Twig Pattern Queries,Science in China Series F: Information Sciences.
- Z. Wang, J Ai, X. Meng :A Data Driven Approach for Automatic Wrapper Generation and Maintenance. Journal of Computer Research and Development, Vol.43, Suppl,2008.11(NDBC2008 GuiLin).
- W. Liu,X. Meng,W. Meng :A Survey of Deep Web Data Integration.Chinese Journal of Computers, Vol.30 No.9 P.1475-1489 2007.
- J. Zhou, X. Meng , X. Zhang , J. Huang :Keyword Based Multiple Query Processing over XML Streams.Journal of Computer Research and Development,Vol.44, Page: 374-378 2007.
- J. Huang,J. Xu,J. Zhou, X. Meng: MLCEA:An Entity Based Semantics for XML Keyword Search.Journal of Computer Research and Development,Vol. 45 Suppl Page: 372-377 2008.10 (NDBC2008 GuiLin).
Xiaofeng Meng (xfmeng AT ruc.edu.cn)
Yukun Li (liyukun AT ruc.edu.cn)
Xiangyu Zhang (zhangxy AT live.com)
Yubo Kou (k3k3k33 AT 163.com)
Jing Zhao (zmfeiyinggtxy AT 163.com)
Bingbing Liu (rucbing AT gmail.com)