Posts Tagged ‘XPath

Booleanization of XPath Scalar Queries

We’ll now describe how a query whose return type is string or number can be replaced with a sequence of XPath queries whose return type is Boolean. Let us assume that Q is a numeric XPath query. Let us further assume that we handle 32-bit signed integers (which is sufficient in a 32-bit architecture). We first extract the sign bit of Q:

(Q>=0)

True value indicates that Q is positive (or zero), and false value indicates that Q is negative. We then use –Q (instead of Q) if it is negative, and proceed. Assuming that Q is positive, we extract its 31 bits,also assuming we already know the most significant N bits (of the 31 bits), we can then find the next bit: Let K be the number formed by the known N high bits, then the N+1 bit is set to 1, then the rest, 30-N bits, set to 0.

((Q-K)>=0)

Yields true if the N+1 bit (from the left) is 1, and false if that bit is 0.

Thus, we can reconstruct a positive Q with 31 Boolean queries. We start with N=0 and iteratively extract the next bit until we get to N=30, inclusive.

A string query S is first factored into bytes (or, more accurately, Unicode symbols), as follows:

First, we query the string length using the XPath string-length function, which is a numeric query:

(string-length(S))

Then we can iterate over the symbols, reducing the query into a series of one byte (symbol) queries:

(substring(S,N,1))

Now, a single byte/symbol query B is in turn reduced into Boolean queries as follows: Let us assume that the list of possible symbols (excluding the double quote mark) in the document is known (denote it by C), and that the list’s length is L. L is hopefully small, e.g. if it is known that the XML document is in fact comprised of printable ASCII characters, including CR, LF and HT, excluding double quotes, then L is 97. We index each possible symbol, starting from 0 and going to (L-1). Denote by K=ceiling(log2(L)) – this is the number of bits that are required to determine the symbol. Now, we prepare K strings of length L. The Nth string is a list of bits in position N of the symbols. Let us designate the Nth string as CN.

First, we ensure that the byte is not a double quote:

If the expression returns true, then the byte is simply the double quote mark. If the expression is false,we proceed as follows: The Nth bit is extracted as following:

(number(translate(B,”C”,”CN”))=1)

If this yields true, then the Nth bit is 1, and if it yields false, the Nth bit is 0. Note, we must exclude the double quote mark from C, or else the XPath syntax will be broken. Thus we are able to extract string queries using Boolean queries.

 

Tags : , , , , ,

Blind XPath Injuction

Blind XPath Injection attack that enables an attacker to extract a complete XML document used for XPath querying, without prior knowledge of the XPath query. The attack is considered “complete” since all possible data is exposed. The attack makes use of two techniques –XPath crawling and Booleanization of XPath queries. Using this attack, it is possible to get hold of theXML “database” used in the Xpath query. This can be most powerful against sites that use XPath queries (and XML “databases”) for authentication, searching and other uses.

Compared to the SQL injection attacks, XPath Injection has the following upsides:

1. Since XPath is a standard (yet rich) language, it is possible to carry the attack ‘as-is’ for any XPath implementation. This is in contrast to SQL injection where different implementations have different SQL dialects (there is a common SQL language, but it is often too weak).

2. The XPath language can reference almost all parts of the XML document without access control restrictions, whereas with SQL, a “user” (which is a term undefined in the XPath/XML context) may be restricted to certain tables, columns or queries. So the outcome of the Blind XPath Injection attack is guaranteed to consist of the complete XML document, i.e. the complete database.

It is possible to take a more systematic approach to the XPath Injection problem. This approach is called “blind injection” (the foundations of which are laid in, in the SQL injection context). It assumes more or less nothing on the structure of the query except that the user data is injected in a Boolean expression context. It enables the attacker to extract a single bit of  information per a single query injection. This bit is realized, for example, as “Login successful” or “Login failed”.

This approach is even more powerful with XPath than it is with SQL, due to the following characteristics of XPath:

The technique we use is as follows:

We first show how to crawl an XPath document, using only scalar queries (that is, queries whose return type is “string”, “numeric” or “Boolean”). The crawling procedure assumes no knowledge of the document structure; yet at its end, the document, in its completeness, is reconstructed.

We then show how a scalar XPath query can be replaced by a series of  Boolean queries. This procedure is called a “Booleanization” of the query. A Boolean query is a query whose result is a Boolean value (true/false). So in a Booleanization process, a query whose result type is string or numeric is replaced with a series of queries whose result type is Boolean, and from which we can reconstruct the result ofthe original string or numeric query.

Finally, each Boolean query can be resolved by a single “blind” injection. That is, we show how it is possible to form an injection string, including the Boolean query, that when injected into an XPath query, causes the application to behave in one way if the Boolean query resolves into “true”, and in another way if the query resolves into “false”. This way, the attacker can determine a single bit – the Boolean query result.

The novelty in this approach towards XPath Injection is that it does not require much prior knowledge of the XPath query format, unlike the “traditional” approach described above. It does not require that data from the XML document be embedded in the response and that the whole XML document is eventually extracted, regardless of the format of the XPath query used by the application. It uses only a difference in the application behavior resulting from a difference in the XPath query return value to extract a single information bit.

 

Tags : , , , , , , , , , , , , , , , , , ,