Posts Tagged ‘SQL

Extensions of Relational and Object oriented Database Systems

In this approach a relational or object-oriented database system is extended to support SGML/XML data management. The proposed SGML extensions included, for example, a system where SGML files were mapped to the O2 database management system, and the extension of operators of SQL to accommodate structured text. All current commercial database systems provide some XML support. Examples of commercial systems are Oracle’s XML SQL Utility and IBM’s DB2 XML Extender. For the sake of discussion, we consider IBM’s DB2 XML Extender as representative of the many systems following this approach.

Data model: When conventional database systems are used for XML, data structuring is systematic and explicitly defined by a database schema. The data model of the original system is typically extended to encompass XML data, but the extensions define simplified tree models rather than rich XML documents.The XML extensions are intended primarily to support the management of enterprise data, wrapped as elements and attributes in an XML document. A problem in using the systems is the need for parallel understanding of two different kinds of data models.

Data definition: The extended systems require explicit definition of transformation of a DTD to the internal structures. XML elements are typically mapped to objects in object-oriented systems, but relational systems require more elaborate transformations to represent hierarchic and ordered structures in unordered tables. In the DB2 XML Extender the whole document can be stored either externally as a file or as a whole in a column of a table. Elements and attributes can also be stored separately inside tables, which can be accessed independently or used for selecting whole documents (as if the side tables were indexes). DTDs, which are stored in a special table, can be associated with XML documents and used to validate them.

Data manipulation: In relational extensions, whole documents and DTDs that are stored in tables can be accessed and manipulated through the SQL database language. As explained above, specific elements of XML data can be extracted when documents are loaded, maintained separately, and accessed directly through SQL. Support for accessing elements that have not been extracted as part of document loading is provided through limited XPath queries, and the DB2 XML Extender can be used together with DB2 UDB Text for full-text search. DB2 also provides document assembly via a function call that can be embedded in an SQL query.

Tags : , , , , , , , , , , , ,

Privilege Elevation via SQL Injection

Most organizations are familiar with the risk posed by SQL injection in web applications, but fewer are aware of the implications of SQL injection in stored procedures. Any component that dynamically creates and executes a SQL query could in theory be subject to SQL injection. In those databases where mechanisms exist to dynamically compose and execute strings, SQL injection in stored procedures can pose a risk.

In Oracle, for example, stored procedures can execute with either the privilege of the invoker of the procedure, or the definer of the procedure. If the definer was a high-privileged account, and the procedure contains a SQL injection flaw, attackers can use the flaw to execute statements at a higher level of privilege than they should be able to. The following procedures all allow privilege elevation in one form or another:

The DRILOAD.VALIDATE_STMT procedure is especially interesting since no “SQL injection” is really necessary; the procedure simply executes the specified statement with DBA privileges, and the procedure can be called by anyone,for example the default user “SCOTT” can execute the following:

exec CTXSYS.DRILOAD.VALIDATE_STMT(‘GRANT DBA TO PUBLIC’);

This will grant the “public” role DBA privileges.

In most other databases the effect of SQL injection in stored procedures is less dramatic — in Sybase, for example, “definer rights” immediately back down to “invoker rights” as soon as a stored procedure attempts to execute a dynamically created SQL statement. The same is true of Microsoft SQL Server.

It isn’t true to say that SQL injection in stored procedures has no effect in SQL Server, however — if an attacker can inject SQL into a stored procedure,he can directly modify the system catalog — but only if he already had permissions that would enable him to do so. The additional risk posed by this is slight, since the attacker would already have to be an administrator in order to take advantage of any SQL injection flaw in this way — and if he is a database administrator, there are many other, far more serious things he can do to the system.

One privilege elevation issue in SQL Server is related to the mechanism used to add jobs to be executed by the SQL Server Agent (#NISR15002002B).Essentially, all users were permitted to add jobs, and those jobs would then be executed with the privileges of the SQL Agent itself (by getting the SQL Agent to re-authenticate after it had dropped its privileges).

In general, patching is the answer to this class of problem. In the specific case of Oracle, it might be worth investigating which sets of default stored procedures you actually need in your environment and revoking access to“public” — but as we previously noted, this can cause permission problems that are hard to debug.

Tags : , , , , , ,

SQLJ

Along with host language embedded SQL type applications, there are also embedded Java applications, better known as SQLJ programs. SQLJ is a method for accessing DB2 from a Java application that supports static execution. Again, the benefits of a static execution are reduced resource consumption, improved diagnostics, improved security, and greater repeatability of SQL performance due to static access paths and plans. Everything you need to get from the data is already in the package bound at bind time.

SQLJ provides performance benefits of static query execution by embedding SQL queries into Java applications. SQLJ still utilizes the JDBC driver to access data source and is the layer above JDBC. SQLJ translator is used to process SQLJ source files with the extension .sqlj. It translates .sqlj source files into .java files and an SQLJ serialized profile into a form of .ser file. The serialized file contains all the SQL statements in original SQLJ source file. The translated resulting .java file will contain calls to SQLJ run-time libraries in place of SQL statements. In order to bind the application statically to a DB2 database, you usethe DB2 profile customizer tool called db2 sqlj customize. The db2 sqlj customize connects and binds a package on the target database using the serialized profile.The package bound in the target database using db2 sqlj customize will contain sections which correspond to each SQL query in the serialized profile.

Commands associated with SQLJ:

1. sqlj:

sqlj is the translator that takes an embedded SQLJ program and creates a .ser file used for binding and a .java file that will also be compiled into byte code, as typical Java programs are compiled.

2. db2 sql jcustomize:

This command will take the .ser file from the sqlj step, connect to the database against which the application will be run, and bind four bind files for this application, all with different isolation levels.

3. db2 sql jbind:

This command can be used to rebind this application against other databases; for example, it can be used for moving the application from the test to the production database.

The following packages need to be imported for SQLJ:

import java.sql.*;

import sqlj.runtime.*;

import sqlj.runtime.ref.*;

Tags : , , , , , , , , , , , , , ,

Tandem NonStop SQL

The Tandem NonStop SQL system is composed of processor clusters interconnected via 4-plexed fiber optic rings. The systems are typically configured at a disk per MIPS, so each ten MIPS, processor has about ten disks. Disks are typically duplexed and managed in the standard way [BITT88]. Each disk pair has a set of processes managing a large RAM cache, a set of  locks,and log records for the data on that disk pair. Considerable effort is spent on optimizing sequential scans by prefetching large units, and by filtering and manipulating the tuples with sql predicates at these disk servers.

Relations are range partitioned across multiple disk pairs [TAND87] which is the only strategy provided. The partitioning attribute is also used as the primary, clustering attribute on each node, making it impossible to decluster a relation by partitioning on one attribute and then constructing a clustered index at each node on a different attribute. Parallelization of operators in a query plan is achieved by inserting a parallel operator (which is similar to Volcano’s exchange operator [GRAE90]) between operator nodes in the query tree and joins are executed using nested or son-merge algorithms. Scans, aggregates, updates, and deletes are parallelized. In addition several utilities use parallelism (e.g. load, reorg, …). A hash join algorithm which uses hash declustering of intermediate and final results has recently been implemented [ZELL90]. [ENGL89] contains a performance evaluation of the speedup and scaleup characteristics of the parallel version of the Non-Stop SQL system.

The system is primary designed for transaction processing. The main parallelism feature for OLTP is parallel index update. Relational tables typically have five indices on them, although it is not uncommon to see ten indices on a table. These indices speed reads, but slow down inserts, updates, and deletes. By doing the index maintenance in parallel, index maintenance time can be held constant if the indices are spread among many processors and disks. In addition, the NonStop SQL is fault tolerant and suppons geographically distributed data and execution.

Tags : , , , , , , , , , , , , , , , , , , , , , , ,

Parallel Dataflow Approach to SQL Software

Terabyte online databases, consisting of billions of records, are becoming common as the price of online storage decreases. These databases are often represented and manipulated using the SQL relational model. A relational database consists of relations (files in COBOL terminology) that in turn contain tuples (records in COBOL terminology). All the tuples in a relation have the same set of attributes (fields in COBOL terminology).

Relations are created, updated, and queried by writing SQL statements. These statements are syntactic sugar for a simple set of operators chosen from the relational algebra. Select project, here called scan, is the simplest and most common operator – it produces a row-and column subset of a relational table. A scan of relation R using predicate P and attribute list L produces a relational data stream as output. The scan reads each tuple, t, of R and applies the predicate P to it. If P(t) is true, the scan discards any attributes of t not in L and inserts the resulting tuple in the scan output stream. Expressed in SQL, a scan of a telephone book relation to find the phone numbers of all people named Smith would be written:

SELECT  telephone_number  /* the output attribute(s) */

FROM  telephone_book  /* the input relation */

WHERE  last_name = ‘Smith’;  /* the predicate */

A scan’s output stream can be sent to another relational operator, returned to an application, displayed on a terminal, or printed in a report. Therein lies the beauty and utility of the relational model. The uniformity of the data and operators allow them to be arbitrarily composed into dataflow graphs. The output of a scan may be sent to a sort operator that will reorder the tuples based onan attribute sort criteria, optionally eliminating duplicates. SQL defines several aggregate operators to summarize attributes into a single value, for example, taking the sum, min, or max of an attribute, or counting the number of distinct values of the attribute. The insert operator adds tuples from a stream to an existing relation. The update and delete operators alter and delete tuples in a relation matching a scan stream.

The relational model defines several operators to combine and compare two or more relations. It provides the usual set operators union, intersection, difference, and some more exoticones like join and division. Discussion here will focus on the equi-join operator (here called join). The join operator composes two relations, A and B, on some attribute to produce a third relation. For each tuple, ta, in A, the join finds all tuples, tb, in B with attribute value equal to that of ta. For each matching pair of tuples, the join operator inserts into the output steam a tuple built by concatenating the pair. Codd, in a classic paper, showed that the relational data model can represent any form of data, and that these operators are complete. Today, SQL applications are typically a combination of conventional programs and SQL statements. The programs interact with clients, perform data display, and provide high-level direction of the SQL dataflow. The SQL data model was originally proposed to improve programmer productivity by offering a non-procedural database language. Data independence was and additional benefit; since the programs do not specify how the query is to be executed, SQL programs continue to operate as the logical and physical database schema evolves.

Parallelism is an unanticipated benefit of the relational model. Since relational queries are really just relational operators applied to very large collections of data, they offer many opportunities for parallelism. Since the queries are presented in a non-procedural language, they offer considerable latitude in executing the queries. Relational queries can be executed as a dataflow graph. As mentioned in the introduction, these graphs can use both pipelined parallelism and partitioned parallelism. If one operator sends its output to another, the two operators can execute in parallel giving potential speedup of two.

The benefits of pipeline parallelism are limited because of three factors: (1) Relational pipelines are rarely very long – a chain of length ten is unusual. (2) Some relational operators donot emit their first output until they have consumed all their inputs. Aggregate and sort operators have this property. One cannot pipeline these operators. (3) Often, the execution cost of one operator is much greater than the others (this is an example of skew). In such cases, the speedup obtained by pipelining will be very limited. Partitioned execution offers much better opportunities for speedup and scaleup. By taking the large relational operators and partitioning their inputs and outputs, it is possible to use divide-and-conquer to turn one big  job into many independent little ones. This is an ideal situation for speedup and scaleup. Partitioned data is the key to partitioned execution.

Tags : , , , , , , , , , , , , , ,

Blind XPath Injuction

Blind XPath Injection attack that enables an attacker to extract a complete XML document used for XPath querying, without prior knowledge of the XPath query. The attack is considered “complete” since all possible data is exposed. The attack makes use of two techniques –XPath crawling and Booleanization of XPath queries. Using this attack, it is possible to get hold of theXML “database” used in the Xpath query. This can be most powerful against sites that use XPath queries (and XML “databases”) for authentication, searching and other uses.

Compared to the SQL injection attacks, XPath Injection has the following upsides:

1. Since XPath is a standard (yet rich) language, it is possible to carry the attack ‘as-is’ for any XPath implementation. This is in contrast to SQL injection where different implementations have different SQL dialects (there is a common SQL language, but it is often too weak).

2. The XPath language can reference almost all parts of the XML document without access control restrictions, whereas with SQL, a “user” (which is a term undefined in the XPath/XML context) may be restricted to certain tables, columns or queries. So the outcome of the Blind XPath Injection attack is guaranteed to consist of the complete XML document, i.e. the complete database.

It is possible to take a more systematic approach to the XPath Injection problem. This approach is called “blind injection” (the foundations of which are laid in, in the SQL injection context). It assumes more or less nothing on the structure of the query except that the user data is injected in a Boolean expression context. It enables the attacker to extract a single bit of  information per a single query injection. This bit is realized, for example, as “Login successful” or “Login failed”.

This approach is even more powerful with XPath than it is with SQL, due to the following characteristics of XPath:

The technique we use is as follows:

We first show how to crawl an XPath document, using only scalar queries (that is, queries whose return type is “string”, “numeric” or “Boolean”). The crawling procedure assumes no knowledge of the document structure; yet at its end, the document, in its completeness, is reconstructed.

We then show how a scalar XPath query can be replaced by a series of  Boolean queries. This procedure is called a “Booleanization” of the query. A Boolean query is a query whose result is a Boolean value (true/false). So in a Booleanization process, a query whose result type is string or numeric is replaced with a series of queries whose result type is Boolean, and from which we can reconstruct the result ofthe original string or numeric query.

Finally, each Boolean query can be resolved by a single “blind” injection. That is, we show how it is possible to form an injection string, including the Boolean query, that when injected into an XPath query, causes the application to behave in one way if the Boolean query resolves into “true”, and in another way if the query resolves into “false”. This way, the attacker can determine a single bit – the Boolean query result.

The novelty in this approach towards XPath Injection is that it does not require much prior knowledge of the XPath query format, unlike the “traditional” approach described above. It does not require that data from the XML document be embedded in the response and that the whole XML document is eventually extracted, regardless of the format of the XPath query used by the application. It uses only a difference in the application behavior resulting from a difference in the XPath query return value to extract a single information bit.

 

Tags : , , , , , , , , , , , , , , , , , ,

Parallel Dataflow Approach to SQL Software

Terabyte online databases, consisting of billions of records, are becoming common as the price of online storage decreases. These databases are often represented and manipulated using the SQL relational model. A relational database consists of relations (files in COBOL terminology) that in turn contain tuples (records in COBOL terminology). All the tuples in a relation have the same set of attributes (fields in COBOL terminology).

Relations are created, updated, and queried by writing SQL statements. These statements are syntactic sugar for a simple set of operators chosen from the relational algebra. Select project, here called scan, is the simplest and most common operator – it produces a row-and column subset of a relational table. A scan of relation R using predicate P and attribute list L produces a relational data stream as output. The scan reads each tuple, t, of R and applies thepredicate P to it. If P(t) is true, the scan discards any attributes of t not in L and inserts the resulting tuple in the scan output stream. Expressed in SQL, a scan of a telephone book relation to find the phone numbers of all people named Smith would be written:

SELECT     telephone_number      /* the output attribute(s) */

FROM        telephone_book           /* the input relation */

WHERE     last_name = ‘Smith’;   /* the predicate */

A scan’s output stream can be sent to another relational operator, returned to an application, displayed on a terminal, or printed in a report. There in lies the beauty and utility of the relational model. The uniformity of the data and operators allow them to be arbitrarily composed into data flow graphs. The output of a scan may be sent to a sort operator that will reorder the tuples based onan attribute sort criteria, optionally eliminating duplicates. SQL defines several aggregate operators to summarize attributes into a single value, for example, taking the sum, min, or maxof an attribute, or counting the number of distinct values of the attribute. The insert operator adds tuples from a stream to an existing relation. The update and delete operators alter and delete tuples in a relation matching a scan stream.

The relational model defines several operators to combine and compare two or more relations. It provides the usual set operators union, inter section, difference, and some more exoticones like join and division. Discussion here will focus on the equi-join operator (here called join). The join operator composes two relations, A and B, on some attribute to produce a third relation. For each tuple, ta, in A, the join finds all tuples, tb, in B with attribute value equal to that of ta. For each matching pair of tuples, the join operator inserts into the output steam a tuple built by concatenating the pair. Codd, in a classic paper, showed that the relational data model can represent any form of data, and that these operators are complete [CODD70]. Today, SQL applications are typically a combination of conventional programs and SQL statements. The programs interact with clients, perform data display, and provide high-level direction of the SQL dataflow.

The SQL data model was originally proposed to improve programmer productivity by offering a non-procedural database language. Data independence was and additional benefit; since the programs do not specify how the query is to be executed, SQL programs continue to operate as the logical and physical database schema evolves. Parallelism is an unanticipated benefit of the relational model. Since relational queries are really just relational operators applied to very large collections of data, they offer many opportunities for parallelism. Since the queries are presented in a non-procedural language, they offer considerable latitude in executing the queries. Relational queries can be executed as a dataflow graph. As mentioned in the introduction, these graphs can use both pipelined parallelism and partitioned parallelism. If one operator sends its output to another, the two operators can execute in parallel giving potential speedup of two.

The benefits of pipeline parallelism are limited because of three factors: (1) Relational pipelines are rarely very long – a chain of length ten is unusual. (2) Some relational operators donot emit their first output until they have consumed all their inputs. Aggregate and sort operators have this property. One cannot pipeline these operators. (3) Often, the execution cost of one operator is much greater than the others (this is an example of skew). In such cases, the speedup obtained by pipelining will be very limited. Partitioned execution offers much better opportunities for speedup and scaleup. Bytaking the large relational operators and partitioning their inputs and outputs, it is possible to use divide-and-conquer to turn one big job into many independent little ones. This is an ideal situation for speedup and scaleup. Partitioned data is the key to partitioned execution.

Tags : , , , , , , , , , ,