Databases on the Web and Semi Structured Data-ADBMS

Databases on the Web and Semi Structured Data-ADBMS

Overview of XML

XML stands for Extensible Markup Language.
XML has been proposed as a possible model for data storage and retrieval.
XML can be used to provide information about the structure and meaning of the data in the Web pages rather than just specifying how the Web pages are formatted for display on the screen

Structure of XML data

XML document have hierarchical structure. It must contains root element. Along with that it contains sub-elements, text & attributes.
XML Documents starts with root element and branches to the lowest level of elements.
XML specification has two key points: 1) The start, end and empty element tags that delimit the elements are nested with none missing & none overlapping. 2) Single root elements contains all the other elements.
XML tree represented graphically.

Ex.

Product

-------Name

-------Details

-----Price

-----Description

Document schema

Commonly known as XML Schema Definition(XSD).
It describes structure of an XML document.
XML Schema defines elements, attributes & data types.
We can define XML Schema with three ways:
Simple type, Complex type & Global type

Global Type:

Querying XML data

XQuery is designed to query XML data.
XQuery finds & extract elements & attributes from XML documents.
Example:
for $x in doc("product.xml")/store/products
where $x/price>100
order by $x/name
return $x/name

Storage of XML data

XML Database is use to store huge amount of data.
XML Documents can be stored in the database using XQuery.
There are two types of XML Database: 1) XML enabled 2) Native XML(NXD).
XML enabled is use to convert the XML Document. Here the data is stored in table in the form of row & column.
Native XML(NXD) contains container to hold the data instead of table. It has huge capacity to hold the XML Document & data. Native XML(NXD) is queried by the X-Path expressions.

XML applications

http://xml.coverpages.org/xml.html#applications

The semi structured data model

Data Model organizes elements of data which relate to one another.
The semi structure model is a database model has some structure.
It lacks a fixed or rigid schema.
The data does not reside in rational database but it has organisational properties which helps in analysis.
We can store semi structure data in the relational database.

Sources of semi structure data:

Emails
XML and other markup languages
Zipped files
TCP/IP packages
web pages
Binary executables

Implementation issues

Lack of fixed or rigid schema makes difficult to storage of data.
Queries are less efficient.
Data has irregular structure.
Due to implicit structure of data, difficult to interpret relationship between data.
Schema and data are not linked together.
Storage cost is high

Indexes for text data

Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed.
The index is a type of data structure. It is used to locate and access the data in a database table quickly.

If the index is created on the basis of the primary key of the table, then it is known as primary indexing. These primary keys are unique to each record and contain 1:1 relation between the records.
The Dense index contains an index record for every search key value in the data file. It makes searching faster.
Sparse Index: In the data file, index record appears only for a few items. Each item points to a block.
A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key columns which may not be unique for each record.
Secondary indexing, to reduce the size of mapping, it is introduced.

Comments