Spatial Database Concepts


Introduction to Spatial Databases


       Spatial databases incorporate functionality that provides support for databases that keep track of objects in a multidimensional space.
       For example, cartographic databases that store maps include two-dimensional spatial descriptions of their objects—from countries and states to rivers, cities, roads, seas, and so on.
       The systems that manage geographic data and related applications are known as Geographical Information Systems (GIS).
       Spatial Database are used in areas such as environmental applications, transportation systems, emergency response systems, and battle management.
       Also they are use as meteorological databases for weather information, are three-dimensional, since temperatures and other meteorological information are related to three-dimensional spatial points.
       In general, a spatial database stores objects that have spatial characteristics that describe them and that have spatial relationships among them.
       The spatial relationships among the objects are important, and they are often needed when querying the database.
       Although a spatial database can in general refer to an n-dimensional space for any n, we will limit our discussion to two dimensions as an illustration.
       A spatial database is optimized to store and query data related to objects in space, including points, lines and polygons.
       Satellite images are a prominent example of spatial data.
       Queries posed on these spatial data, where predicates for selection deal with spatial parameters, are called spatial queries.
       For example, “What are the names of all bookstores within five miles of the College of Computing building at Georgia Tech?” is a spatial query.
       A query such as “List all the customers located within twenty miles of company headquarters” will require the processing of spatial data types typically outside the scope of standard relational algebra and may involve consulting an external geographic database that maps the company headquarters and each customer to a 2-D map based on their address.
       Effectively, each customer will be associated to a position. A traditional B+ -tree index based on customers’ zip codes or other nonspatial attributes cannot be used to process this query since traditional indexes are not capable of ordering multidimensional coordinate data.
        Therefore, there is a special need for databases tailored for handling spatial data and spatial queries.

Spatial Data Types

       Spatial data comes in three basic forms. These forms have become a de facto standard due to their wide use in commercial systems.
       Map Data
       Attribute data
       Image data
       Map Data: Includes various geographic or spatial features of objects in a map, such as an object’s shape and the location of the object within the map.
       The three basic types of features are points, lines, and polygons.
       Points are used to represent spatial characteristics of objects whose locations correspond to a single 2-d coordinate (x, y, or longitude/latitude) in the scale of a particular application.
       Depending on the scale, some examples of point objects could be buildings, cellular towers, or stationary vehicles.
       Moving vehicles and other moving objects can be represented by a sequence of point locations that change over time.
       Lines represent objects having length, such as roads or rivers, whose spatial characteristics can be approximated by a sequence of connected lines.
       Polygons are used to represent spatial characteristics of objects that have a boundary, such as countries, states, lakes, or cities.
       Notice that some objects, such as buildings or cities, can be represented as either points or polygons, depending on the scale of detail.
       Attribute data is the descriptive data that GIS systems associate with map features.
       For example, suppose that a map contains features that represent counties within a US state (such as Texas or Oregon).
       Attributes for each county feature (object) could include population, largest city/town, area in square miles, and so on. Other attribute data could be included for other features in the map, such as states, cities, congressional districts, census tracts, and so on.
       Image data includes data such as satellite images and aerial photographs, which are typically created by cameras. Objects of interest, such as buildings and roads, can be identified and overlaid on these images.
       Images can also be attributes of map features. One can add images to other map features so that clicking on the feature would display the image.
       Aerial and satellite images are typical examples of raster data.

Spatial Models

       Models of spatial information are sometimes grouped into two broad categories: field and object.
       A spatial application (such as remote sensing or highway traffic control) is modeled using either a field- or an object-based model, depending on the requirements and the traditional choice of model for the application.
       Field models are often used to model spatial data that is continuous in nature, such as terrain elevation, temperature data, and soil variation characteristics.
             object models have traditionally been used for applications such as transportation networks, land parcels, buildings, and other objects that possess both spatial and non-spatial attributes.

Spatial Operators

       Spatial operators are used to capture all the relevant geometric properties of objects embedded in the physical space and the relations between them, as well as to perform spatial analysis. Operators are classified into three broad categories:
       Topological operators
                -Projective operators
                -Metric operators
       Dynamic Spatial Operators
       Spatial Queries
       Topological properties are invariant when topological transformations are applied.
       These properties do not change after transformations like rotation, translation, or scaling.
       Topological operators are hierarchically structured in several levels, where the base level offers operators the ability to check for detailed topological relations between regions with a broad boundary, and the higher levels offer more abstract operators that allow users to query uncertain spatial data independent of the underlying geometric data model.
       Examples include open (region), close (region), and inside (point, loop).
       Projective operators: Projective operators, such as convex hull, are used to express predicates about the concavity/convexity of objects as well as other spatial relations (for example, being inside the concavity of a given object).
       Metric operators. Metric operators provide a more specific description of the object’s geometry. They are used to measure some global properties of single objects (such as the area, relative size of an object’s parts, compactness, and symmetry), and to measure the relative position of different objects in terms of distance and direction.
       Examples include length (arc) and distance (point, point).
       Dynamic Spatial Operators. The operations performed by the operators mentioned above are static, in the sense that the operands are not affected by the application of the operation.
       For example, calculating the length of the curve has no effect on the curve itself.
       Dynamic operations alter the objects upon which the operations act. The three fundamental dynamic operations are create, destroy, and update.
       A representative example of dynamic operations would be updating a spatial object that can be subdivided into translate (shift position), rotate (change orientation), scale up or down, reflect (produce a mirror image), and shear (deform).
       Spatial Queries. Spatial queries are requests for spatial data that require the use of spatial operations.
       The following categories illustrate three typical types of spatial queries:
       Range query. Finds the objects of a particular type that are within a given spatial area or within a particular distance from a given location. (For example, find all hospitals within the Metropolitan Atlanta city area, or find all ambulances within five miles of an accident location.)
       Nearest neighbor query. Finds an object of a particular type that is closest to a given location. (For example, find the police car that is closest to the location of crime.)
           ■ Spatial joins or overlays. Typically joins the objects of two types based on some spatial condition,    such as the objects intersecting or overlapping spatially or being within a certain distance of one another. (For example, find all townships located on a major highway between two cities or find all homes that are within two miles of a lake.)

Spatial Data Indexing

       A spatial index is used to organize objects into a set of buckets (which correspond to pages of secondary memory), so that objects in a particular spatial region can be easily located.
       Each bucket has a bucket region, a part of space containing all objects stored in the bucket.
       The bucket regions are usually rectangles; for point data structures, these regions are disjoint and they partition the space so that each point belongs to precisely one bucket.
            
           There are essentially two ways of providing a spatial index
  • 1. Specialized indexing structures that allow efficient search for data objects based on spatial search operations are included in the database system. These indexing structures would play a similar role to that performed by B+-tree indexes in traditional database systems.
  • Examples of these indexing structures are grid files and R-trees. Special types of spatial indexes, known as spatial join indexes, can be used to speed up spatial join operations.
  • 2. Instead of creating brand new indexing structures, the two-dimensional (2-d) spatial data is converted to single-dimensional (1-d) data, so that traditional indexing techniques (B+ -tree) can be used.

            The algorithms for converting from 2-d to 1-d are known as space filling curves. We will not discuss  these methods in detail.

Spatial Indexing Techniques

       Grid Files: The fixed-grid method divides an n-dimensional hyperspace into equal size buckets. The data structure that implements the fixed grid is an n-dimensional array.
       R-Trees: The R-tree is a height-balanced tree, which is an extension of the B+-tree for k-dimensions, where k > 1. For two dimensions (2-d), spatial objects are approximated in the R-tree by their minimum bounding rectangle (MBR), which is the smallest rectangle, with sides parallel to the coordinate system (x and y) axis, that contains the object.
•    Spatial Join Index: A spatial join index precomputes a spatial join operation and stores the pointers to the related object in an index structure. Join indexes improve the performance of recurring join queries over tables that have low update rates.

Spatial Data Mining

       Spatial data tends to be highly correlated. For example, people with similar characteristics, occupations, and backgrounds tend to cluster together in the same neighborhoods.
       The three major spatial data mining techniques are spatial classification, spatial association, and spatial clustering.
       The goal of classification is to estimate the value of an attribute of a relation based on the value of the relation’s other attributes.
       Spatial association rules are defined in terms of spatial predicates rather than items. A spatial association rule is of the form
P1 ^ P2 ^ ... ^ PnQ1 ^ Q2 ^ ... ^ Qm,
                where at least one of the Pi’s or Qj’s is a spatial predicate.
       Spatial Clustering attempts to group database objects so that the most similar objects are in the same cluster, and objects in different clusters are as dissimilar as possible.
       One application of spatial clustering is to group together seismic events in order to determine earthquake faults.
       An example of a spatial clustering algorithm is density-based clustering, which tries to find clusters based on the density of data points in a region.

Applications of Spatial Data


       Spatial data management is useful in many disciplines, including geography, remote sensing, urban planning, and natural resource management.
       Spatial database management is playing an important role in the solution of challenging scientific problems such as global climate change and genomics.
       Due to the spatial nature of genome data, GIS and spatial database management systems have a large role to play in the area of bioinformatics.
       Some of the typical applications include pattern recognition (for example, to check if the topology of a particular gene in the genome is found in any other sequence feature map in the database), genome browser development, and visualization maps.
     •         Another important application area of spatial data mining is the spatial outlier detection. A spatial outlier is a spatially referenced object whose nonspatial attribute values are significantly different   from those of other spatially referenced objects in its spatial neighborhood.

Comments