(Native XML Database
XML Group, WAMDM, Renmin University of China
|ARCHITECTURE OF ORIENTX|
|OrientX adopts client-server architecture. Client provides graphical interfaces for user managing and retrieving data.Server provides an API interface to access database. The communication between them is implemented by socket technique. The overall architecture of OrientX is shown in Figure 1. We introduce in brief some modules here, and some important modules are focused on in the following sections.|
|File Manager: The underlying file manager communicateswith file
system to create, delete, open and close data les,in units of fixed size
such as 8 MB.
Storage Manager: The storage manager manages the storage space of the file in units of a physical page, which is set to 8 KB. The main tasks include: apply/free physical page,create/delete dataset, etc.
Buffer Manager: There are two layers of our Buffer Mechanism: the lower layer is page buffer, and the higher layer is record buffer. Like RDBMS, page buffer manager managing the physical pages with LRU(Least Recently Used)method. Unlike RDBMS, the record in OrientX is tree structure, and need to be generated from the byte stream, which may cost some CPU time. Record buffer cached such tree structures to reduce the generating time. Another main target of record buffer is to enable OrientX query large documents. Through record buffer, documents can be read in peaces(records), and the unoccupied record can be freed to accommodate new records. In OirentX system, the record buffer is called treefrog, which means the current cursor can jump from records to records on the XML tree.
Access Manager: The access manager provides a uniform access interface to data manager, index manager, and schema manager. Details of the buffer manager and storage manager are hidden.
Data Manager: The data manager provides functions for importing, exporting, and retrieving the root of a document,etc. It formats a record(memory object) into (and from) a byte-stream.
Schema Manager: Schema-independent system can import XML data without schema. But for accelerating query processing, the system need to extract the schema form the data. That may make the schema even more huge and complex than the data. Moreover, the schema has not the function of constraining data, which will limit the use cases of schema, such as type checking in query and update. Like traditional database, OrientX is schema-based. Schema strictly constraint the type and structure of data. So, data retrieving, updating and storing are all under the schema's guidance.Schema information can be used in data layout, in choice of index, in type checking, in user access control, and in query optimization. Schema in OrientX is consistent with the XML Schema standard. Schema information is stored as a special data set in the database. Meanwhile, schema saved by tree structure is semi-structure itself, so it can restrict XML data without breaking features of XML data. Schema manager provides a uniform interface for other modules to access the schema information.
Data Processor: The data processor includes query evaluator and data updater. The former will be described in Section 5. Now we introduce the later in brief. In RDBMS,relationship between the records is represented by foreign key, and in OODBMS, relationship between objects is represented by object containment. While XML supports both of them: identity reference and nesting structure. OrientX keeps the reference integrity within updating. While deleting a complex element, all of the nested elements and values will be removed. While deleting an element referenced by other elements, the corresponding reference will be found by the value index and then deleted. The deleting of reference directly is also supported.
In our storage prototype, the elements are stored as variable length records. Each record has its parent record's or neighbor sibling record's pointer. The records may change their address because of increase or decrease contents during update operations, thus leads to the changes of the pointer.In order to decrease the modification of the pointers we introduce the oid(object id). Each element has a unique id.We use the oid table to store the oid and its corresponding storage address. In the system the record stores its parent and children oid as the pointer rather than their storage address. Therefore if the storage address of one record is changed due to update, we just to update the oid table.
To decrease the address modification of the updating record,we set a preserve factor of each page to preserve space for updating record. We supply garbage collection mechanismfor space reuse.