Deductive Database Concept

Overview


  • In a deductive database system we typically specify rules through a declarative language— a language in which we specify what to achieve rather than how to achieve it.
  • An inference engine (or deduction mechanism) within the system can deduce new facts from the database by interpreting these rules. The model used for deductive databases is closely related to the relational data model, and particularly to the domain relational calculus formalism (see Section 6.6). It is also related to the field of logic programming and the Prolog language.
  • The deductive database work based on logic has used Prolog as a starting point.
  • A variation of Prolog called Datalog is used to define rules declaratively in conjunction with an existing set of relations, which are themselves treated as literals in the language. Although the language structure of Datalog resembles that of Prolog, its operational semantics—that is, how a Datalog program is executed—is still different.
  • A deductive database uses two main types of specifications: facts and rules.
  • Facts are specified in a manner similar to the way relations are specified, except that it is not necessary to include the attribute names. Recall that a tuple in a relation describes some real-world fact whose meaning is partly determined by the attribute names.
  • In a deductive database, the meaning of an attribute value in a tuple is determined solely by its position within the tuple.
  • Rules are somewhat similar to relational views. They specify virtual relations that are not actually stored but that can be formed from the facts by applying inference mechanisms based on the rule specifications.
  • The main difference between rules and views is that rules may involve recursion and hence may yield virtual relations that cannot be defined in terms of basic relational views.

Prolog/Datalog Notation

  • The notation used in Prolog/Datalog is based on providing predicates with unique names.
  •  A predicate has an implicit meaning, which is suggested by the predicate name, and a fixed number of arguments.
  • If the arguments are all constant values, the predicate simply states that a certain fact is true. If, on the other hand, the predicate has variables as arguments, it is either considered as a query or as part of a rule or constraint.
  • In our discussion, we adopt the Prolog convention that all constant values in a predicate are either numeric or character strings; they are represented as identifiers (or names) that start with a lowercase letter, whereas variable names always start with an uppercase letter.
  • Consider the example shown in Figure 26.11, which is based on the relational database in Figure 3.6, but in a much simplified form. There are three predicate names:
  • supervise, superior, and subordinate.
  • The SUPERVISE predicate is defined via a set of facts, each of which has two arguments: a supervisor name, followed by the name of a direct supervisee (subordinate) of that supervisor.
  • These facts correspond to the actual data that is stored in the database, and they can be considered as constituting a set of tuples in a relation SUPERVISE with two attributes whose schema is

                                SUPERVISE(Supervisor, Supervisee)







Datalog Notation


  • In Datalog, as in other logic-based languages, a program is built from basic objects called atomic formulas.
  • It is customary to define the syntax of logic-based languages by describing the syntax of atomic formulas and identifying how they can be combined to form a program. In Datalog, atomic formulas are literals of the form p(a1, a2, ..., an), where p is the predicate name and n is the number of arguments for predicate p.
  • Different predicate symbols can have different numbers of arguments, and the number of arguments n of predicate p is sometimes called the arity or degree of p. The arguments can be either constant values or variable names.
  • As mentioned earlier, we use the convention that constant values either are numeric or start with a lowercase character, whereas variable names always start with an uppercase character.
  • A number of built-in predicates are included in Datalog, which can also be used to construct atomic formulas.
  • A literal is either an atomic formula as defined earlier—called a positive literal—or an atomic formula preceded by not.
  • The latter is a negated atomic formula, called a negative literal.
  • Datalog programs can be considered to be a subset of the predicate calculus formulas, which are somewhat similar to the formulas of the domain relational calculus.
  • In Datalog, however, these formulas are first converted into what is known as clausal form before they are expressed in Datalog, and only formulas given in a restricted clausal form, called Horn clauses, can be used in Datalog.

Clausal Form and Horn Clauses


  • A formula in the relational calculus is a condition that includes predicates called atoms (based on relation names). Additionally, a formula can have quantifiers—namely, the universal quantifier (for all) and the existential quantifier (there exists). In clausal form, a formula must be transformed into another formula with the following characteristics:
  • All variables in the formula are universally quantified. Hence, it is not necessary to include the universal quantifiers (for all) explicitly; the quantifiers are removed, and all variables in the formula are implicitly quantified by the universal quantifier.
  • In clausal form, the formula is made up of a number of clauses, where each clause is composed of a number of literals connected by OR logical connectives only. Hence, each clause is a disjunction of literals.
  • The clauses themselves are connected by AND logical connectives only, to form a formula. Hence, the clausal form of a formula is a conjunction of clauses.

Interpretations of Rules

  • There are two main alternatives for interpreting the theoretical meaning of rules: proof-theoretic and model-theoretic.
  • In the proof-theoretic interpretation of rules, we consider the facts and rules to be true statements, or axioms.
  • Ground axioms contain no variables.
  • The facts are ground axioms that are given to be true.
  • Rules are called deductive axioms, since they can be used to deduce new facts.
  • The second type of interpretation is called the model-theoretic interpretation.
  • Here, given a finite or an infinite domain of constant values,33 we assign to a predicate every possible combination of values as arguments.
  • We must then determine whether the predicate is true or false.
  • In general, it is sufficient to specify the combinations of arguments that make the predicate true, and to state that all other combinations make the predicate false. If this is done for every predicate, it is called an interpretation of the set of predicates.

Datalog Programs and Their Safety

  • There are two main methods of defining the truth values of predicates in actual Datalog programs.
  • Fact-defined predicates (or relations) are defined by listing all the combinations of values (the tuples) that make the predicate true.
  • Rule-defined predicates (or views) are defined by being the head (LHS) of one or more Datalog rules; they correspond to virtual relations whose contents can be inferred by the inference engine.


Comments