Posts Tagged ‘XML

Extensions of Relational and Object oriented Database Systems

In this approach a relational or object-oriented database system is extended to support SGML/XML data management. The proposed SGML extensions included, for example, a system where SGML files were mapped to the O2 database management system, and the extension of operators of SQL to accommodate structured text. All current commercial database systems provide some XML support. Examples of commercial systems are Oracle’s XML SQL Utility and IBM’s DB2 XML Extender. For the sake of discussion, we consider IBM’s DB2 XML Extender as representative of the many systems following this approach.

Data model: When conventional database systems are used for XML, data structuring is systematic and explicitly defined by a database schema. The data model of the original system is typically extended to encompass XML data, but the extensions define simplified tree models rather than rich XML documents.The XML extensions are intended primarily to support the management of enterprise data, wrapped as elements and attributes in an XML document. A problem in using the systems is the need for parallel understanding of two different kinds of data models.

Data definition: The extended systems require explicit definition of transformation of a DTD to the internal structures. XML elements are typically mapped to objects in object-oriented systems, but relational systems require more elaborate transformations to represent hierarchic and ordered structures in unordered tables. In the DB2 XML Extender the whole document can be stored either externally as a file or as a whole in a column of a table. Elements and attributes can also be stored separately inside tables, which can be accessed independently or used for selecting whole documents (as if the side tables were indexes). DTDs, which are stored in a special table, can be associated with XML documents and used to validate them.

Data manipulation: In relational extensions, whole documents and DTDs that are stored in tables can be accessed and manipulated through the SQL database language. As explained above, specific elements of XML data can be extracted when documents are loaded, maintained separately, and accessed directly through SQL. Support for accessing elements that have not been extracted as part of document loading is provided through limited XPath queries, and the DB2 XML Extender can be used together with DB2 UDB Text for full-text search. DB2 also provides document assembly via a function call that can be embedded in an SQL query.

Tags : , , , , , , , , , , , ,

Native SGML/XML Systems

Native SGML/XML systems are designed especially for the management of SGML/XML data. The systems should include capabilities to define, create, store, validate, manipulate, publish,and retrieve SGML/XML documents and their parts. Some of the native systems, such as Astoria and Information Manager, are comprehensive document management systems with front-ends for users to work with documents. Some others, such as SIM and Tamino, are software packages intended for building applications for the management of SGML/XML data. A few systems, especially those that support semi-structured data,such as Lore, XYZFind, and dbXML, provide native support for tree-structured data but are limited in their support of rich XML documents because they do not rely extensively on DTDs or other document type definitions.

The data model: There is no single well-defined data model for XML data. The lack of a well-defined universal conceptual model causes problems in the native systems: for example, the underlying model for XML data is not explicitly defined in Astoria or Tamino, and system-specific notions and models have been invented in SIM. Many of the systems consist of packages of tools that do not share a common data model and may be limited in kind of XML documents they are able to store and manipulate. Unfortunately, because the systems do not highlight the details of the data model, such inconsistencies and constraints are often difficult to detect.

Data definition: The capability to define document types is an important characteristic of XML, and we consider the document type definition capability an essential feature in systems of this category. This aspect severely reduces the utility of semi-structured approaches for managing persistent XML resources. The systems originally developed for SGML are able to use DTDs directly as the document type definition with no translation to some other form of schema. Additional definitions may be needed, however,to support flexible manipulation and efficient implementation. In Astoria an important extension is provided by components, which form the data unit for many operations. For example, access rights are granted at the component level, components can have variants and versions, and simultaneous update to a document by several users is controlled at the component level.

Data manipulation: The lack of a standardized XML query language has led to various system-specific query languages. In addition, the simplified data models restrict query capabilities. For example, since Tamino does not store information about attribute types, queries utilizing IDs and IDREFs are impossible. The response to a Tamino query is an XML document containing the query result as tagged text, plus metadata related to the query (e.g.date and time). Thus the query language cannot be applied directly to query results unless a Tamino schema defines them as part of the database. In content management systems such as Astoria and Information Manager, parts of documents can be updated by structure editors integrated with the systems. In both of them style sheets can be associated with documents in their associated editors, and transformations can be defined by means of style sheets. Both of the systems also offer some capabilities for document assembly. In Tamino, database update is applied at the document level. The data storage mechanism for XML data(called X-Machine) has an associated programming language that includes commands for inserting and deleting documents. XSL is used to transform XML documents to HTML for Web publishing,but there is no additional support for defining transformations.

Tags : , , , , , , , ,

Why signing XML documents is different ?

Why relying on XML for solving the “what you see is what you sign” problem? Our ideas can be summarized in two points:

  1. If a document to be signed is either not well-formed in the sense of XML, or not valid in the sense of its accompanying schema, or both, than it must strictly be assumed that the document has been manipulated. In consequence, it has to be dropped, and the user has to notified.
  2. A smart card application can extract certain content items for dis-play on the smart card reader¬from a structured and formally described document. The extraction and display operations are fully controlled by the tamper-proof smart card—which is the same environment that generates the digital signature.

The fundamental property of XML documents is wellformedness. Ac-cording to the XML specification every XML processing entity has to check and assert this property. Regarding digital signing wellformedness is important, since it ensures the uniqueness of the XML documents’ interpretation. Wellformedness also ensures the usage of proper Unicode characters and the specification of their encoding. This is also very important regarding digital signatures, since character set manipulation can be used to perform “what you see is what sign” attacks.

Validity is a much more restrictive property of XML documents com-pared to wellformedness. A smart card which checks validity of XML documents with respect to a given schema before signing ensures due to the tamper resistance of the smart card that only certain types of XML documents are signed. Consider for example a smart card which contains your private key, but only signs XML documents which are valid with respect to a purchase order schema. You could give this card to your secretary being sure, that nothing else than purchase order is signed using your signature. Using additional constrains in the schema, e.g. the restriction of the maxi-mum amount to 100 Euro, eliminates the last chance of misusage.

When operated in a class 3 card reader (i.e. a card reader including a dis-play and a keypad) the card can display selected content and request user confirmation. This finally solves the “what you see is what you sign” problem. Obviously, XML processing is not an easy task to perform on resource-constraint SmartCards. The following table therefore summarizes the challenging XML properties and the resulting opportunities for improving the signing process.

Tags : , , , , , , , ,

Checking and Signing XML Documents on Java Smart Cards

Smart card assistance for generating digital signatures is current state of the art and bestpractice. This is mainly due to the fact that smart cards now a days have enough processingpower to produce digital signatures for documents by on card resources (processor and memory)only. This way the owner’s private signing key never has to leave the smart card: The signingkey is and remains permanently stored in a tamper proof environment. A closer look at thesigning process however reveals a still existing major security problem: the problem known asthe “what you see is what you sign” problem. Before signing a document the signer usuallywants to check the document’s syntactic and semantic correctness.

When compared to the traditional process of signing a paper document with a hand writtensignature, the difference can easily be identified: In the traditional case, it is relativelyeasy for the user to assert the correctness, because syntactic and semantic document checkingand signature generation are in immediate context. Digitally signing an electronic documentis completely different, because checking and signature generation are executed in twodifferent environments, exposing fundamentally different characteristics different withrespect to security on the one hand and processor, memory, and display resources on the other hand.

Traditionally, the signing application computes the document’s digest using a one way hashfunction and sends the result to the smart card. The card encrypts the digest by an asymmetriccipher using the signing key stored on the card. The resulting value is the digital signatureof the document. It is sent back to the signing application. The user can neither check thesyntactic correctness of the document (in case of XML documents: well formedness, validity)nor the semantic correctness of the document. What really is signed is beyond the user’scontrol. It might for instance be the digest for a manipulated document. Even if the smartcard can be regarded as tamper proof, the terminal (e.g. a PC) and the programs running onit are vulnerable to viruses and Trojan horses. Such evildoers might obviously also affectsigning applications and let them produce valid signatures for from the user’s perspectiveinvalid documents. Such incidents invalidate the signing process in total.

We propose an enhanced architecture which performs checking and signing of XML documents onJava smart cards, called JXCS architecture. The basic idea of JXCS is to shift the syntacticvalidation and hash value generation from the vulnerable PC to the trusted smart card.Syntactic validation imposes the following challenges and opportunities: Challenging is theneed of processing XML documents on resource constraint Java smart cards. The opportunity ofthe approach is the possibility to perform syntactic and even semantic checks on the XMLdocument in a tamper proof environment which improves the security of the signing process.We propose the need for three major checks on the XML documents to be signed: Wellformedness, validity and content acknowledgement using a class 3 card reader. Taken togetherall three checks can defeat “what you see is what you sign” attacks.

Enhanced by Zemanta

Tags : , , , , , , , , , ,

XML Encryption and Access Control

Subtree encryption (element wise)

The two published proposals by [Imamura] and [Simon, LaMacchia] have in common that they take a complete sub tree (descendant-or-self(), maybe with of without attributes of self()), serialize this subtree into a text representation, encrypt it using some encryption mechanism like a symmetric cipher and replace the unencrypted part of the document with the resulting cipher text.

The subtree encryption is an end-to-end-security approach, in which the document includes all sensitive information in encrypted (secured) form. It allows to include multiple encrypted subtrees, and depending on the choosen model and granularity, it is possible to select even single attributes for encryption. In the following illustration, the “Public Nodes” do not need to be confidential (encrypted),but the one at the bottom is encrypted in the subtree.

To encrypt a subtree, the nodes that should be secured are selected:

 

Server-side Access Control

The server-side access control scenarios with flexible in their content model:

Server-side AC can completely restructure and rebuild the tree, based on the access control lists. It is not forced to make a complete subtree opaque, but it can let some elements childs visible (unencrypted) to the client without enforcing the root of the subtree (self()) being visible.

 

Tags : , , , , , , , , , , ,

Axis2 Web Services Framework

In recent years many Web Services frameworks emerged. One of the most popular open source Web Services Framework is Apache Axis2. The Rampart module of Axis2 contains an implementation of the WS-Security standard, which allows to apply XML Encryption and XML Signature in SOAP messages.

To use a module in the Axis2 framework, the module must be engaged to the Axis2’s message flow. A flow is a collection of modules, where each module takes the incoming SOAP message context, processes it, and passes it to the next module. When the SOAP message comes to the end of the flow, it is forwarded to a Message Receiver. The Message Receiver invokes the function implemented in the Service class and passes the result to the output flow.

The Axis2 flow consists typically of  three modules, namely Transport, Security, and Dispatch. The Security module processes the security elements. In particular, encrypted elements are first decrypted and then parsed by an XML parser in order to update the SOAP message context. The decrypted and validated content is then passed on to the Dispatch module. Each module in the flow and the Message Receiver can stop the SOAP message processing if an error occurs. In this case the processing is terminated and an appropriate SOAP fault is returned.

We distinguish between two types of server responses. We say that a security fault is returned, if the server replies with a WSDoAllReceiver: security processing failed message. If an application-specific error or no error message is returned, then we say that the server replies with an application response.

Tags : , , , , , , , , , , ,

Parsing and representing XML and HTML documents with SWI-Prolog

The core of the Web is formed by document standards and exchange protocols. Here we describe tree-structured documents transferred as SGML or XML. HTML, an SGML application, is the most commonly used document format on the Web. HTML represents documents as a tree using a fixed set of elements (tags), where the SGML DTD (Document Type Declaration) puts constraints on how elements can be nested. Each node in the hierarchy has a name (the element-name), a set of name-value pairs known as its attributes and content, a sequence of sub-elements and text (data).

XML is a rationalisation of SGML using the same tree-model, but removing many rarely used features as well as abbreviations that were introduced in SGML to make the markup easier to type and read by humans. XML documents are used to represent text using custom application-oriented tags as well as a serialization format for arbitrary data exchange between computers. XHTML is HTML based on XML rather than SGML. A stable Prolog term-representation for SGML/XML trees plays a similar role as the DOM (Document Object Model ) representation in use in the object-oriented world.

Some issues have been subject to debate.

  1. Representation of text by a Prolog atom is biased by the use of SWI-Prolog which has no length-limit on atoms and atoms that can represent Unicode text. At the same time SWI-Prolog stacks are limited to128 MB each. Using atoms only the structure of the tree is represented on the stack, while the bulk of the data is stored on the unlimited heap. Using lists of character codes is another possibility adopted by both PiLLoW and ECLiPSe. Two observations make lists less attractive: lists use two cells per character while practical experience shows text is frequently processed as a unit only. For (HTML) text-documents we profit from the compact representation of atoms. For XML documents representing serialized data-structures we profit from frequent repetition of the same value.
  2. Attribute values of multi-value attributes (e.g. NAMES) are returned as a Prolog list. This implies the DTD must be available to get unambiguous results. With SGML this is always true, but not with XML.
  3. Optionally attribute values of type NUMBER or NUMBERS are mapped to Prolog numbers. In addition to the DTD issues mentioned above, this conversion also suffers from possible loss of information. Leading zeros and different floating point number notations used are lost after conversion. Prolog systems with bounded arithmetic may also not be able to represent all values. Still, automatic conversion is useful in many applications, especially involving serialized data-structures.
  4. Attribute values are represented as Name=Value. Using Name(Value) is an alternative. The Name=Value representation was chosen for its similarity to the SGML notation and because it avoids the need for univ (=..) for processing argument-lists.

Implementation : The SWI-Prolog SGML/XML parser is implemented as a C-library that has been built from scratch to reach at a lightweight parser. Total sourceis 11,835 lines. The parser provides two interfaces. Most natural to Prolog is load structure(+Src, -DOM, +Options) which parses a Prolog stream into a term as described above. Alternatively, sgml_parse/2 provides an event-based parser making call-backs on Prolog for the SGML events. The call-back mode can deal with unbounded documents in streaming mode. It can be mixed with the term-creation mode, where the handler for begin calls the parser to create a term-representation for the content of the element. This feature is used to process long files with a repetitive record structure in limited memory.

Tags : , , , , , , , , , ,

Blind XPath Injuction

Blind XPath Injection attack that enables an attacker to extract a complete XML document used for XPath querying, without prior knowledge of the XPath query. The attack is considered “complete” since all possible data is exposed. The attack makes use of two techniques –XPath crawling and Booleanization of XPath queries. Using this attack, it is possible to get hold of theXML “database” used in the Xpath query. This can be most powerful against sites that use XPath queries (and XML “databases”) for authentication, searching and other uses.

Compared to the SQL injection attacks, XPath Injection has the following upsides:

1. Since XPath is a standard (yet rich) language, it is possible to carry the attack ‘as-is’ for any XPath implementation. This is in contrast to SQL injection where different implementations have different SQL dialects (there is a common SQL language, but it is often too weak).

2. The XPath language can reference almost all parts of the XML document without access control restrictions, whereas with SQL, a “user” (which is a term undefined in the XPath/XML context) may be restricted to certain tables, columns or queries. So the outcome of the Blind XPath Injection attack is guaranteed to consist of the complete XML document, i.e. the complete database.

It is possible to take a more systematic approach to the XPath Injection problem. This approach is called “blind injection” (the foundations of which are laid in, in the SQL injection context). It assumes more or less nothing on the structure of the query except that the user data is injected in a Boolean expression context. It enables the attacker to extract a single bit of  information per a single query injection. This bit is realized, for example, as “Login successful” or “Login failed”.

This approach is even more powerful with XPath than it is with SQL, due to the following characteristics of XPath:

The technique we use is as follows:

We first show how to crawl an XPath document, using only scalar queries (that is, queries whose return type is “string”, “numeric” or “Boolean”). The crawling procedure assumes no knowledge of the document structure; yet at its end, the document, in its completeness, is reconstructed.

We then show how a scalar XPath query can be replaced by a series of  Boolean queries. This procedure is called a “Booleanization” of the query. A Boolean query is a query whose result is a Boolean value (true/false). So in a Booleanization process, a query whose result type is string or numeric is replaced with a series of queries whose result type is Boolean, and from which we can reconstruct the result ofthe original string or numeric query.

Finally, each Boolean query can be resolved by a single “blind” injection. That is, we show how it is possible to form an injection string, including the Boolean query, that when injected into an XPath query, causes the application to behave in one way if the Boolean query resolves into “true”, and in another way if the query resolves into “false”. This way, the attacker can determine a single bit – the Boolean query result.

The novelty in this approach towards XPath Injection is that it does not require much prior knowledge of the XPath query format, unlike the “traditional” approach described above. It does not require that data from the XML document be embedded in the response and that the whole XML document is eventually extracted, regardless of the format of the XPath query used by the application. It uses only a difference in the application behavior resulting from a difference in the XPath query return value to extract a single information bit.

 

Tags : , , , , , , , , , , , , , , , , , ,

XML Database Benchmarks

Semi-structured data models and query languages have been widely studied. Several storage strategies and mapping schemes for XML data using a relational database are explored. Domain-specific database benchmarks for OLTP (TPC-C), decision support (TPC-H, TPC-R, APB-1), information retrieval, spatial data management (Sequoia) etc. are available. XOO7,  XMach-1  and XMark  are the three benchmarks currently available that test XMLMS for their query processing abilities.

Table 1. Comparing Benchmarks over XML system characteristics

XOO7 design attempts to harness the similarities in data models of  XML and object-oriented approaches. Although XML attempts to provide a framework for handling semistructured data, it encompasses most of the modeling features ofcomplex object models. There are straight forward correspondences between the object-oriented schemas and instances and XML DTDs and data. XOO7 is an adaptation of the OO7 Benchmark for object-oriented database systems. XOO7 provides 18 query challenges. The current implementation of XOO7 tests XML management systems which store their data locally.

XMach-1 tests multi-user features provided by the systems. The benchmark is modeled for a web application using XML data. It evaluates standard and non-standard linguistic features such as insertion, deletion, querying URL and aggregate operations. Although the proposed workload and queries are interesting, the benchmark has not been applied and no results have been published yet. XMark developed under the XML benchmark project at CWI, is a benchmark proposed for XML data stores. The benchmark consists of an application scenario which models an Internet auction site and 20 XQuery challenges designed to cover the essentials of XML query processing. These queries have been evaluated on an internal research prototype, Monet XML, to give a first baseline.

Table 1 compares the expressive power of queries from XOO7, XMark and XMach-1. As can be seen XOO7 is the most comprehensive benchmark in terms of XML functionalities covered. Both XMark and XMach-1 focus on a datacentric usage of XML. All three benchmarks provide queries to test relational model characteristics like selection, projection and reduction. Properties like transaction processing, view manipulation, aggregation and update, are not yet tested by any of the benchmarks. XMach-1 covers delete and insert operations, although the semantics of such operations are yet to be clearly defined under XML query model.

XOO7 is a comprehensive benchmark as can be seen from Table 1 and also empirical evaluations show the ability of the XOO7 queries to distinguish all the desired functionalities supported by an XML database. In the absence of queries exploiting the document-centric features, XMark and XMach-1 may not be able to clearly distinguish XML-enabled systems from Native XML management systems.

Tags : , , , , , , , , , , , , , , , ,