# Posts Tagged ‘XPath query

### Booleanization of XPath Scalar Queries

We’ll now describe how a query whose return type is string or number can be replaced with a sequence of XPath queries whose return type is Boolean. Let us assume that Q is a numeric XPath query. Let us further assume that we handle 32-bit signed integers (which is sufficient in a 32-bit architecture). We first extract the sign bit of Q:

(Q>=0)

True value indicates that Q is positive (or zero), and false value indicates that Q is negative. We then use –Q (instead of Q) if it is negative, and proceed. Assuming that Q is positive, we extract its 31 bits,also assuming we already know the most significant N bits (of the 31 bits), we can then find the next bit: Let K be the number formed by the known N high bits, then the N+1 bit is set to 1, then the rest, 30-N bits, set to 0.

((Q-K)>=0)

Yields true if the N+1 bit (from the left) is 1, and false if that bit is 0.

Thus, we can reconstruct a positive Q with 31 Boolean queries. We start with N=0 and iteratively extract the next bit until we get to N=30, inclusive.

A string query S is first factored into bytes (or, more accurately, Unicode symbols), as follows:

First, we query the string length using the XPath **string-length** function, which is a numeric query:

(string-length(S))

Then we can iterate over the symbols, reducing the query into a series of one byte (symbol) queries:

(substring(S,N,1))

Now, a single byte/symbol query B is in turn reduced into Boolean queries as follows: Let us assume that the list of possible symbols (excluding the double quote mark) in the document is known (denote it by C), and that the list’s length is L. L is hopefully small, e.g. if it is known that the XML document is in fact comprised of printable ASCII characters, including CR, LF and HT, excluding double quotes, then L is 97. We index each possible symbol, starting from 0 and going to (L-1). Denote by K=ceiling(log2(L)) – this is the number of bits that are required to determine the symbol. Now, we prepare K strings of length L. The Nth string is a list of bits in position N of the symbols. Let us designate the Nth string as **C***N*.

First, we ensure that the byte is not a double quote:

If the expression returns true, then the byte is simply the double quote mark. If the expression is false,we proceed as follows: The Nth bit is extracted as following:

(number(translate(B,”C”,”CN”))=1)

If this yields true, then the Nth bit is 1, and if it yields false, the Nth bit is 0. Note, we must exclude the double quote mark from C, or else the XPath syntax will be broken. Thus we are able to extract string queries using Boolean queries.