Posts Tagged ‘SQL query

Extensions of Relational and Object oriented Database Systems

In this approach a relational or object-oriented database system is extended to support SGML/XML data management. The proposed SGML extensions included, for example, a system where SGML files were mapped to the O2 database management system, and the extension of operators of SQL to accommodate structured text. All current commercial database systems provide some XML support. Examples of commercial systems are Oracle’s XML SQL Utility and IBM’s DB2 XML Extender. For the sake of discussion, we consider IBM’s DB2 XML Extender as representative of the many systems following this approach.

Data model: When conventional database systems are used for XML, data structuring is systematic and explicitly defined by a database schema. The data model of the original system is typically extended to encompass XML data, but the extensions define simplified tree models rather than rich XML documents.The XML extensions are intended primarily to support the management of enterprise data, wrapped as elements and attributes in an XML document. A problem in using the systems is the need for parallel understanding of two different kinds of data models.

Data definition: The extended systems require explicit definition of transformation of a DTD to the internal structures. XML elements are typically mapped to objects in object-oriented systems, but relational systems require more elaborate transformations to represent hierarchic and ordered structures in unordered tables. In the DB2 XML Extender the whole document can be stored either externally as a file or as a whole in a column of a table. Elements and attributes can also be stored separately inside tables, which can be accessed independently or used for selecting whole documents (as if the side tables were indexes). DTDs, which are stored in a special table, can be associated with XML documents and used to validate them.

Data manipulation: In relational extensions, whole documents and DTDs that are stored in tables can be accessed and manipulated through the SQL database language. As explained above, specific elements of XML data can be extracted when documents are loaded, maintained separately, and accessed directly through SQL. Support for accessing elements that have not been extracted as part of document loading is provided through limited XPath queries, and the DB2 XML Extender can be used together with DB2 UDB Text for full-text search. DB2 also provides document assembly via a function call that can be embedded in an SQL query.

Tags : , , , , , , , , , , , ,

Reducing Multi Block Queries to Single Block

The technique described in this section shows how under some conditions, it is possible to collapse a multi-block SQL query into a single block SQL query.

1. Merging Views

Let us consider a conjunctive query using SELECT ANY. If one or more relations in the query are views, but each is defined through a conjunctive query, then the view definitions can simply be “unfolded” to obtain a single block SQL query. For example, if a query Q = Join(R,V) and view V = Join(S,T), then the query Q can be unfolded to Join(R,Join(S,T)) and may be freely reordered. Such a step may require some renaming of the variables in the view definitions.

Figure 1. Group By and Join

Unfortunately, this simple unfolding fails to work when the views are more complex than simple SPJ queries. When one or more of the views contain SELECT DISTINCT, transformations to move or pull up DISTINCT need to be careful to preserve the number of duplicates correctly. More generally, when the view contains a group by operator, unfolding requires the ability to pull-up the group-by operator and then to freely reorder not only the joins but also the group-by operator to ensure optimality. In particular, we are given a query such as the one in Fig. 1(b) and we are trying to consider how we can transform it in a form such as Fig. 1(a) so that R1 and R2 may be freely reordered.

2. Merging Nested Subqueries

Consider the following example of a nested query from where Emp# and Dept# are keys of the corresponding relations:

SELECT Emp.Name

FROM Emp

WHERE Emp.Dept# IN

SELECT Dept.Dept# FROM Dept

WHERE Dept.Loc=‘Denver’

AND Emp.Emp# = Dept.Mgr

If tuple iteration semantics are used to answer the query, then the inner query is evaluated for each tuple of the Dept relation once.An obvious optimization applies when the inner query block contains no variables from the outer query block (uncorrelated).In such cases, the inner query block needs to be evaluated only once. However, when there is indeed a variable from the outer block, we say that the query blocks are correlated. For example,in the query above, Emp.Emp# acts as the correlated variable.Some have identified techniques to unnest a correlated nested SQL query and “flatten” it to a single query. For example, the above nested query reduces to:

SELECT E.Name

FROM Emp E, Dept D

WHERE E.Dept# = D.Dept#

AND D.Loc = ‘Denver’ AND E.Emp# = D.Mgr

The complexity of the problem depends on the structure of the nesting, i.e., whether the nested subquery has quantifiers (e.g., ALL, EXISTS), aggregates or neither. In the simplest case, of which the above query is an example, observed that the tuple semantics can be modeled as Semijoin(Emp,Dept, Emp.Dept# = Dept.Dept#). Once viewed this way, it is not hard to see why the query may be merged since:

Semijoin(Emp, Dept, Emp.Dept# = Dept. Dept#) =Project(Join(Emp, Dept), Emp.*)

Where Join(Emp, Dept) is on the predicate Emp.Dept# =Dept. Dept# . The second argument of the Project operator indicates that all columns of the relation Emp must be retained. The problem is more complex when aggregates are present in the nested subquery, as in the example below from since merging query blocks now requires pulling up the aggregation without violating the semantics of the nested query:

SELECT  Dept.name FROM Dept WHERE Dept.num-of-machines ≥(SELECT COUNT(Emp.*) FROM Emp WHERE Dept.name= Emp.Dept_name)

It is especially tricky to preserve duplicates and nulls. To appreciate the subtlety, observe that if for a specific value of Dept.name (say d), there are no tuples with a matching Emp.Dept_name, i.e., even if the predicate Dept.name=Emp.dept_name fails, then there is still an output tuple for the Dept tuple d. However, if we were to adopt the transformation used in the first query of this section, then there will be no output tuple for the dept d since the join predicate fails. Therefore, in the presence of aggregation, we must preserve all the tuples of the outer query block by a left outer join. In particular, the above query can be correctly transformed to:

SELECT Dept.name FROM Dept LEFT OUTER JOIN Emp

ON (Dept.name= Emp.dept_name )

GROUP BY Dept.name

HAVING Dept. num-of-machines < COUNT (Emp.*)

Thus, for this class of queries the merged single block query has outer joins. If the nesting structure among query blocks is linear, then this approach is applicable and transformations produce a single block query that consists of a linear sequence of joins and outer joins.

Tags : , , , , , , , , , , ,