Content Analysis Standards development Heterogeneity MEtadata REtrieval Newsletters
   
Latest Newsletter
2005-08-12
XPath & XQuery
XPath & XQuery

Newsletter Discussion


Abstract

XQuery based on XPath expressions is the language for querying XML data. This newsletter deals with the formation of XPath expressions, discusses XQuery syntax and finally depicts the formulation of queries by combining XQuery syntax with XPath expressions.

XPath & XQuery

Introduction

Extendable Markup Language or XML as it is most often known is the markup language for encoding and describing data. XML is capable of labeling diverse data sources such as structured and semi-structured documents, relational databases and object repositories. As a result, XML is fast becoming the method of choice for representing all kind of documents ranging from product catalogs, digital libraries to scientific data repositories, etc. Not only are the documents represented in XML, but XML is also being used as an underlying technology for many applications. For instance SOAP uses XML messages as a method of communication whereas RDF can be written using XML. XML is acting as a catalyst in constructing new forms of web known as Semantic Web. Putting this entire thing together means that there is a huge pool of data available, represented by using XML. So, there is a need of an easy, yet powerful and effective tool to search the XML data. XQuery based on XPath expressions is the language for querying XML data.

XPath

XPath is the language for finding information in an XML document. The primary purpose of XPath is to address elements in XML documents. But, along with this primary purpose, XPath also provides basic facilities for manipulating strings, numbers and Booleans. XPath uses Path notation for navigating through the hierarchical structure of an XML document and hence the language is known as XPath. XPath is the major element of W3C's XSLT standard. XPointer and XQuery are based on XPath expressions.

What is Path Expression?

  • XPath uses path expression to select nodes or node-sets in XML documents. These path expressions have a syntax, which looks similar to the traditional file system notation.
  • E.g. /book/price

XPath Terminology

Nodes

XPath models an XML document like a tree of nodes. In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document nodes. The root of the tree is called document node or root node. To better understand these nodes lets look at the following XML document.

	<?xml version="1.0" encoding="ISO-8859-1"?>
	<bookstore>
		<book>
			<title lang="en">Harry Potter</title>
			<author>J K. Rowling</author>
			<year>2005</year>
			<price>29.99</price>
		</book>
	</bookstore>
    
Example 1.0
  • <bookstore> is the document node or the root node of the document.
  • All the following subsequent nodes are the element nodes. E.g. <book>, <title lang="en">Harry Potter</title>, etc.
  • The information within the tag is qualified by attributes. E.g. lang="en".

Relationship amongst Nodes

The different nodes in the XML document bare certain relationships with each other. The different possible relationships are Parent, Children, Siblings, Ancestor, and Descendant with their obvious semantics as in the English language. Each element has one Parent node. Element nodes may have zero, one or more child nodes. Nodes that have same Parent element are known as Siblings. Ancestor nodes are node's Parent, Parent's Parent and so on. Similarly Descendant nodes are node's children, children's children and so on.

Selecting Nodes

Selecting various nodes within the XML document is the integral part of XPath. XPath uses Path expression to select nodes. A node is selected by following path or steps. The most useful path expressions are listed below.

Expression Description
Nodename Selects all child nodes of the node
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes

Let's formulate some path expression for selecting nodes


	bookstore: 		Selects all the child nodes of the bookstore elements
	/bookstore: 		Selects the root element bookstore.
	/bookstore/book: 	Selects all book elements that are child of bookstore.
	//book: 		Selects all the book elements no matter where they are in the document.
	//@lang: 		Selects all the attributes that are named lang.

     

Predicates

Predicates are used to find specific nodes or a node that contains a specific value. The use of predicate is to restrict the result set of path expression to specific values. Predicates are always embedded in square brackets.

Example of predicates applied to above used path expressions:

	/bookstore/book[1]: 	 Selects the first book element that is the child of the bookstore element.
	/bookstore/book[last()]: Selects the last book element that is the child of the bookstore element.
	//title[@lang]: 	 Selects all the title elements that have attribute lang.

    

XPath also allows the use of wildcards. XPath wildcards can be used to select unknown nodes in XML documents. The allowed wildcards are *, @* and node(). Two path expressions can be combined together using | operator.

Evaluation of XPath Expression

An expression in XPath is evaluated to yield an object, which has one of the following four basic types.

  • node-set (an unordered collection of nodes without duplicates)
  • boolean (true or false)
  • number (a floating-point number)
  • string (a sequence of UCS characters)

Location Path

Location Path is an important kind of expression. A location path selects a set of nodes relative to the context node. The result of evaluating the expression contained by location path is the node-set which contains nodes as defined by the location path. Location paths can recursively contain expression to filter the resultant node-set. Every location path can be expressed in the straightforward but rather verbose syntax.

Example of Location Path
    	Child::text(): 		Selects all text node children of the context node.
	Child::node(): 		Selects all the children of the context node, irrespective of their node types.
	Descendant::para: 	Selects the para element descendants of the context node.
    

Location Path can be relative or absolute. A relative location path consists of a sequence of one or more location path separated by /. The steps in location path are composed from left to right. Each step in turn selects a set of nodes relative to a context node. An absolute location path consists of / optionally followed by a relative location path. A / by itself selects the root node of the document containing the context node.

Operators

XPath supports all the basic operators such as +, -, *, /, <, <=, >, >=, =, !=, and, or, mod.

XQuery

XQuery is the query language that uses the structure of XML intelligently to express queries across the data, either physically stored in XML or viewed as XML via middleware. It is designed to be a language in which queries are concise and easily understood. XQuery is flexible enough to query the XML data occurring in documents or databases. XQuery is derived from the XML query language called Quilt, which in turn borrowed several features from various programming languages such as XPath, XQL, SQL, etc. XQuery is defined by W3C. XQuery is supported by all the major database developers such as IBM, Microsoft, Oracle, Software AG, etc. XQuery can be best defined by saying "XQuery is to XML what SQL is to database tables". Thus, XQuery is the language for finding and extracting elements and attributes from XML documents.

Basics

The basic building block of XQuery is the expression, which is a string of Unicode characters. It provides several kinds of expressions, which can be constructed from keywords, symbols and operands. Like XML, XQuery is a case sensitive language. XQuery 1.0 and XPath 2.0 share the same data model and support the same functions and operators. XQuery is compatible with several W3C standards, such as XML, Namespaces, XSLT, XPath, and XML Schema. However, XQuery 1.0 is a stable W3C Candidate-Recommendation.

Query Formulation

Functions

XQuery uses functions to extract data from XML documents. XQuery provides a wide range of built in functions ranging from string manipulation, data and time conversions, node manipulation, Boolean operators, numerical values, etc.

Path Expressions

XQuery uses path expressions to navigate through the XML document. The path expressions along with functions are collectively applied on the XML document to retrieve the data.

Predicate

XQuery uses predicates to restrict the extracted data from the XML document. Predicates are combined with the path expression.

FLWOR

The elements can be selected and filtered in XQuery either by using path expressions along with predicates or using FLWOR expression. FLWOR is an acronym that stands for:

  • For - binds a variable to each item returned by the [in] expression
  • Let - allows variable assignment (optional)
  • Where - specifies a criteria (optional)
  • Order by - specifies the sort-order of the result (optional)
  • Return - specifies what to return in the result

The advantage of using FLWOR expression is that it has SQL-like syntax and adds more functionality and flexibility.

    	for $x in doc("book.xml")/bookstore/book
	where $x/price>30
	order by $x/title
	return $x/title
    

The translation of the above FLWOR expression in simple English would be for the variable x in document "Book.xml" return all the title elements where the price of the book is greater than 30 and order the result with respect to the title.

Basic Syntax Rules

  • XQuery is case sensitive, which means lower case variables are not the same as upper case variables.
  • XQuery elements, variables must be valid XML names.
  • An XQuery string value can be in single or double quotes.
  • An XQuery variable is defined with a $ followed by a name ex $bookstore.
  • XQuery comments are delimited by (: and :) ex (: XQuery Comment :)

XQuery Conditional Expression

XQuery also allows the use of the conditional expression if-then-else. The if-then-else clause can be added after the return statement of the FLWOR expression.

XQuery Comparison

In XQuery there are two ways of comparing values:

  • General comparison: =, !=, <, <=, >, >=. When using these operators for comparison the condition returns true if any of the attributes satisfies the condition.
  • Value comparisons: eq, ne, lt, le, gt, ge. When using these operators for comparison the condition returns true if and only if one of the attributes satisfies the condition.

Type Specification

  • XQuery is a strongly typed programming language.
  • Like Java, C# and other programming languages it is a mix of static typing and dynamic typing.
  • Types in XQuery are different from classes in OOPS.
  • Types matches to XQuery's data model.

Conclusion

  • XQuery is a powerful and convenient tool for analyzing or generating XML.
  • XQuery is protocol independent, thus can be evaluated on any system with predictable results.
  • There are no standard implementations but the XQuery site lists known implementation.
  • XQuery is compatible with other W3C standards namely, XML 1.1, Namespaces, XML Schema, XPath.

Reference and Further Reading

   
Impressum
Cashmere - int RSS Feed
 
Valid XHTML 1.0!
Newsletters
Webmaster