XPath in Arbortext APP

Summary

This document details the technical details required to use XPath statements in Arbortext APP.

4 Arbortext APP XPath examples

4.4 Add attribute values and show to variable

4.6 Output if "tag" starts with the text "section"

4.7 Combine/display values using concat-nodes

4.8 Count descendants of a stream from a script

4.9 Arbortext APP variables in XPath

4.10 Setting context node at a character position

5 Further Information and References

1 Introduction

This section is designed to introduce XPath and its power when combined with Arbortext APP. It covers the fundamentals of XPath and how it is incorporated with Arbortext APP in order to provide an easy, concise and practical tool for developing projects. This section also supplies the information needed to build customised XPath expressions and demonstrates how XPath can be used to address an XML document as required. However, this section is not a full XPath manual. XPath is a standard and the full specification is widely available in books and online at www.w3.org/TR/xpath. It is assumed that the reader is familiar with Arbortext APP and it is recommended that you read the XML chapter before reading the XPath chapter.

1.1	What is XPath

The XPath language was created in order to fulfil the operational needs of XSLT, the XSL transformation language, and XPointer, a method to address fragments of external XML documents (see <?xpointer>). Both needed a method of addressing parts of an XML document using standardized syntax. However it must be noted that XPath only addresses the core functionality in these standards as they are not identical. Consequently, Arbortext APP uses XPath as a foundation and uses uniquely developed extensions to address XML documents. XPath is a recommendation of the World Wide Web Consortium (W3C).

1.2	Why XPath in Arbortext APP

The addition of the XPath syntax greatly enhances Arbortext APP, allowing template creation and design to be simple and quick. XPath can be used to gain information about context, to test for the occurrence of particular elements, attributes and text and also to locate and address all parts of the XML instance.

XPath usage minimises the need for counters, yanks and other show string expressions as each test or manipulation can be performed when and where it is required.

2 Expressions

XPath treats an XML document as having a tree structure. To illustrate this the image below shows a simple XML instance and its tree structure representation:

XPath uses genealogical terms to describe the levels of the tree relative to either the root of the document or the currently selected point, for example in the above example, <doc> is the 'parent' of <chapter>.

Additionally, in the tree structure there are seven node types for each type of construct that can appear in an XML document:

processing instruction

comment

namespace

document root

Below is an example of a tree structure that demonstrates six node types. The Doc, Chap and Para nodes are elements.

The key to addressing parts of an XML document is the location path. A location path describes how something may be found. By way of analogy, you can think of a location path as describing where a certain paragraph is in a document, for example, "the second paragraph in the third chapter".

The context node is where you are in a document. A location path consists of one or more location steps separated by a slash '/' and can be considered as similar to describing a directory structure. The context node for each step is the result of the previous.

An XPath location step consists of a minimum of two parts; the axis and the node-test. It can also have an optional third part; the predicates, for example:

axis::nodetest[predicate]*

The double colon separates the axis from the node-test and the predicate is encoded in square brackets. The * at the end indicates that there can be zero, one or more than one predicate.

2.1	Expression components

A node-test identifies a type of node in the document. Usually, it is the name of the element but it may also be a function that identifies a node type. The axis distinguishes between these node-tests: child::para finds the para elements that are children of the context node while preceding-sibling::para finds the preceding para elements that are siblings of the context node.

Below is a list of the axes and node-tests available in the XPath language to be used when making XPath expressions.

2.2	Axis values and node-tests

Axis	Semantics
child	A directly nested node.
descendant	Any nested element of a given node.
parent	The direct ancestor of a given node.
ancestor	Any ancestor of a given node (e.g grandparent).
attribute	An attribute of a given element.
following	Any node following a given node in document order.
preceding	Any node preceding a given node in document order.
following-sibling	Any node that follows a given node in document order and is in the same nesting level (or DTD content model if you like).
preceding-sibling	Any node that precedes a given node in document order and is in the same nesting level (or DTD content model if you like).
namespace	Any element that has a given namespace URI.
Self	The current node.
Descendant-or-self	Same as descendant and the current node.
Ancestor-or-self	Same as ancestor and the current node.

A node-test selects a set of nodes using the chosen axis. The principal node type (attribute, namespace or element) of the contaxt node affects the node-test.

node-test	Semantics
*	Any node of the principal node type of the context node.
memo	Elements or attributes (depending on the principal node type of the context node) named memo e.g. <memo id="red">.
text()	Any text node e.g. ….>some text<….
processing-instruction()	Any processing instruction node e.g. <?A3B2 th=12pt?>
node()	Any node.
comment()	Any comment node e.g. <!--comment-->

The axis and the node-test together form the 'basis' of the location step.

2.3	Predicates

A predicate is an expression which filters a node set with respect to an axis to produce a new node set. Here are a few examples:

Function	Semantics
[position()<=5]	True for the first five nodes
[/book/preface]	True for all nodes, if node set/book/preface is not empty.
[position()=last()]	True for the last node.
[substring(string(),2,4)="elec"]	True for each node.

Given the combination of location steps we did in the previous section, we can now add a predicate to one of the steps as follows:

/child::chap[position()<3]/attribute::id

which effectively retrieves the id attribute of the first two <chap> elements.

Predicates can be combined with the result of the previos predicate and passed to the next, for example:

para[3][@ID="something"]

First selects the third para child, then tests if its ID attribute equals "something".

2.4	Examples of axis values

Below is an example of a basic XML document.

The shaded area shows the current context.

This location step, using the child axis:

child::para

would point us to the para element.

The parent axis used like this

parent::folder

would point us to the root element folder of the XML document.

The ancestor axis used like this

ancestor::folder

would point us to the root element folder of the XML document.

The descendant axis used like this

descendant::para

would point us to the para element with element content "Second paragraph".

The attribute axis used like this

attribute::id

would point us to the id attribute.

With this basis, the location steps can now be combined. Using the example XML document above, we will use an XPath expression to point to the chap ids. The first step is to start to create the context node by assigning it to the root. This is accomplished by starting an XPath expression with / (slash) character which is the abbreviated syntax for the node tree root.

Next we then need to step to its children nodes or nested chap elements. This we can do using the child axis as in:

child::chap

Now the final step would be to use the attribute axis to point to the id attributes as in:

attribute::id

These three steps can effectively be combined together by placing a / in between the location steps.

/child::chap/attribute::id

This is similar to the method used to describe directory structures in Unix.

2.5 Functions

XPath functions can be used to manipulate or test the node-set which are the results of the location path.

There are four categories of functions: node-set, string, number and Boolean functions.

2.6	Node set functions

Function	Semantics
last()	Returns a number equal to the number of nodes which match the axis. It can therefore be used to denote the last node in a set, for example position()=last().
position()	Returns a number equal to the position of the selected context. This function can be used in a predicate to test if a node exists at a position. For example the predicate [position()=3] will return a value of true if there is a third node of the type specified by the axis and node test.
count()	Returns the number of nodes specified by the argument within its parentheses. It can also be used within a predicate to test if that number of nodes exist. For example, the predicate [count(para)=2] will select those elements within the set specified by the basis which contains exactly two para children.
name()	Returns a string which is equal to the name of the node selected by the argument specified within the parentheses. As an example, name(child::node()[position()=2]) would return the name of the second child node of the context node.
id()	Selects a node which has an ID attribute which matches the argument specified in the parentheses.

2.7	String functions

String functions operate on nodes as text and allow comparisons, concatenation and manipulation. String functions simplify tasks which would usually involve complicated yanks, show strings and mini-scripts.

Function	Semantics
string()	Converts the value returned by the argument specified in the parentheses to a string. This means that if a number 5 is returned, it is converted to the string "5".
concat()	Uses two or more arguments within the parentheses and simply joins the results together. The arguments can be location paths or literal strings enclosed within quotes. Arguments are separated by commas. Example: concat(child::para[position()=2], ' with ', child::para[position()=3]) joins the text of the second para child with the string ' with ' and the text of the third para child.
starts-with()	Uses two arguments to return true or false if the string from the first argument location starts with the string from the second. For example, starts-with (child::para[position()=3], "X") would return a value of true if the text of the third para child of the context node starts with "X". Both arguments can be location paths.
contains()	Similar to the starts-with() function, contains() returns true if the first argument string contains the second argument string. For example, contains(child::para[position()=3], "hello") would return a value of true if the text of the third para child of the context node contains the string "hello". Again, both arguments can be location paths.
substring()	Uses three arguments to return the substring of the string specified by the first argument, starting at the position declared by the second argument and with a length declared by the third argument. For example: substring("12345", 2, 3) would return "234". It can also be used to return a true or false, for example the predicate [substring(string(), 4,3)="big"] would be true if the location specified by the preceding basis has the three character string "big" starting at the fourth character.
substring-before()	Returns the characters from the string specified by the first argument up to, but not including the string specified by the second argument. For example, substring-before("abcdefghij","f") would return "abcde".
substring-after()	Similar to substring-before(), it returns the characters before the string declared by the second argument. For example, substring-after("abcdefghij", "f") would return "ghij".
string-length()	Simply returns the number of characters in the string declared by the argument contained within the parentheses. For example string-length(child::para[position()=3]) would return 24 if the third para child of the context node contains 24 characters. Note that this would not count the characters within its grandchildren.
normalize-space()	Takes the string declared by the argument in the parentheses and removes any leading or trailing spaces and reduces any multiple to just 1.
translate()	The translate() function returns the string declared by the first argument but with all occurrences of the characters declared by the second argument string replaced by the characters at the corresponding position declared in the third argument string. For example, translate("12345","24","78") would give "17385". That is, the characters "2" and "4" were replaced by "7" and "8".

2.8	Number functions

Number functions deal with numerical values and allow mathematics to be performed within expressions.

Function	Semantics
number()	Converts the value given by the argument in the parentheses to its numerical value. Therefore, if the argument returns the string "7", it would be converted to the number 7. A value of true is converted to a value of 1 and false to 0. If the value is not a number, "NaN" (Not a Number) is returned.
sum()	Returns the sum of the numerical values of the node set declared by the argument within the parentheses. For example, sum(child::para/attribute::number) would return the sum of all the number attributes of the para children of the context node.
floor()	Simply rounds numerical values down to the nearest integer.
ceiling()	Rounds numerical values up to the nearest integer.
round()	Rounds numerical values to the nearest integer, up or down.

2.9	Boolean functions

Boolean values are true and false. We have already seen that these can be converted to their numerical equivalents. It is therefore possible to test if something exists, test its content etc. and get a simple result. There are two functions worth mentioning.

Function	Semantics
boolean()	Will return "true" for a number if the result of the expression within its parentheses is not zero. It will return "true" for string if the length of the string returned by the expression in parentheses is not zero.
not()	Simply changes a Boolean true to false and false to true.

2.10	Abbreviations

XPath allows for some of the more commonly used parts of an expression to be abbreviated to help readability and compactness. Some examples are given in the following list. The important ones are that the axis child:: can be omitted and that attribute can be abbreviated to "@". The parent of the context node can be abbreviated to ".." while "." selects the context node. The last one is that [position()=5] can be abbreviated to [5].

Function	Semantics
para[3]	Selects the third para element child of the context node.
para[3]/@ID	Selects the ID attribute of the third para element of the context node.
para[@ID="3"]	Selects the para child of the context node which has an ID attribute of 3.
//chapter	Selects all chapter elements in the entire document.
.//chapter	Selects all the chapter elements which are descendants of the context node.
..//chapter	Selects all the chapter descendants of the parent of the context node.

2.11 Operators

When performing tests and evaluations, a number of operators are permitted within XPath.

Function	Semantics
=	Equal to.
!=	Not equal to.
<,<=,>,>=	Less than, less than or equal to, greater than, greater than or equal to.
+, -, *	Add, subtract, multiply.
div	Division.
mod	Remainder of division
and, or	Logical and, or.

3 XPath in Arbortext APP

The XPath processor in Libxml is fully compliant with the W3C XPath 1.0 specification. (for more information on parsers see the XML in 3B2). XPath can be used in Arbortext APP to query and return parts of a document as inline commands, in show strings or in scripts.

Used inline, XPath can display the results of the expression and it can also pass information to variables and tags using the normal Arbortext APP syntax. An XPath expression is called using the <?xpath> keyword with the syntax below:

<?xpath expression>

In this example and with all the following examples 'expression' is an XPath expression.

This simply outputs the result where the expression is placed. The result of an expression can be assigned to a variable or string in the same way as it would be with show strings, by placing the target before the expression. For example to pass the result to a variable called "string":

<?xpath ^"string" expression>

XPath expressions can be used in show strings and scripts and, as they access information in memory (the document tree), they can be treated as variables using the syntax:

^!"expression"

Or in a function variable as:

<?show $^[xpath .()]>

Note the exclamation mark and quotes. For example the following syntax tests the result of an expression against the x(1) counter to produce the OUTPUT if they match.

<?show =^!"expression"^?x(1)OUTPUT>

Also, in a script, the following compares a variable value with that returned by the XPath expression and performs a similar action to that above:

?=^var,^!"expression"@output

3.1	Options and preferences

In order to integrate XPath into Arbortext APP more effectively,many enhancements have been made described in the following sections.

Each of these enhancements are controlled by options under the XML Parse ▶ tab in the Document Preferences dialogue box. (See tdpref).

By default all these options are off, ensuring that only valid XPath statements are used. It is possible to return to this default by pressing the Strict button as shown below:

If you want to use the 3B2-XML options, press the 3B2-XML button and the following options are set as shown below:

Add Arbortext APP functions to global Namespace.

If you want to use the libxml options, press the libxml button and the following options are set as shown below:

Add Arbortext APP functions to x3b2 Namespace.

Alternatively, you can tick the options you require individually. For information about the different parsers in Arbortext APP see the XML in 3B2 chapter.

One enhancement that is not controlled by an option is the ability to double up quote marks. In order to allow quotes to be used inside XPath statements, it is possible to put two quote marks together, which will place a single quote mark in the string, for example:

"string('hello ''World''')"

The above syntax will output "hello 'World'".

3.1.1 Allow Arbortext APP variables in XPath statements

It is possible to access the contents of Arbortext APP variables in any place the XPathspecification allows XPath variables. Arbortext APP variables are accessed using the ^character, followed by the variable name. For example:

<?show "^testvar"hello>

will output "hello". It is also possible to extend this to evaluate dynamic XPathstatements, as follows:

<?show $^!"string(^!’tx1#start-pos(//*)’)">

3.1.2 Allow Arbortext APP node tests

Three extra node tests have been added to select nodes containing either text, whitespace, or CDATA sections. These can be used in addition to the existing node test functions. These are:

Node Test	Description
text-black()	Returns true if the node is a text node and contains valid non-whitespace characters.
text-white()	Returns true if the node is a text node and only contains whitespace characters.
text-cdata()	Returns true if the node is a CDATA node.

3.1.3 Add Arbortext APP functions to global Namespace

A set of extension XPath functions have been added that can be used wherever normal XPath functions can be used. This first option places the extension functions in the global namespace, i.e. you can use the functions as normal functions without specifying any namespaces. This option is only present for backward compatibility, and should not be used when creating new XML documents.

3.1.4 Add Arbortext APP functions to x3b2 Namespace

A set of extension XPath functions have been added that can be used wherever normal XPath functions can be used. This second option places the extension functions in the x3b2 namespace. In order to use these functions, you must prefix them with "x3b2:". As this results in the most compliant XPath usage, this is the preferred option. This option is also required to be selected to use the custom extension functions described below, and to enable the EXSLT extension functions listed in the EXSLT section below in their respective default namespace. If both options are turned off, the Arbortext APP extension functions are not available at all. See the XPath custom functions section for more information.

3.1.5 Context text stream or text blocks

Added ability (v8.11o) to change the XPath context stream when formatting textfrom a different stream. Added ability to select blocks of text during formatting based on XPathexpressions, in addition to the existing start/end character positions.

Existing method:
<tx1#34,79>	Format a chunk from tx1 between char pos 34 and 79.

Additional methods:
<tx1#!"//item">	Find all item nodes in tx1 and format them (including the start + end elements for the returned nodeset) One (!) = XPath expression Two (!!) = XPointer expression
<tx1#!-"//item">	Find all item nodes in tx1 and format them (contents of the nodes only)
<tx1#+34,79> OR <tx1#+!"//item">	As above, but make tx1 the context stream for any XPath statements found during formatting.

3.1.6 EXSLT functions in XPath

(v8.11) Ability to use exslt functions in XPath expressions. Need to use the default namespace for the relevant set of EXSLT functions. For example, to use the common functions, prefix the function name with ‘exsl:’, for the date functions use ‘date:’, for dynamic use ‘dyn:’, for math use ‘math:’, for sets use ‘set:’ and for strings use ‘str’.

3.2	Special node tests and functions

Extra functions were added to Arbortext APP in order to allow XPath to execute commands specific to Arbortext APP, such as show string like functions, gathering content and mark-up and finding the start and end positions of nodes. These Arbortext APP specific operators are listed in the following table.

The use of the Arbortext APP functions is controlled by two options in the XML Parse ▶ tab. The first option places the extension functions in the global namespace, i.e. you can use the functions as normal functions without specifying any namespaces. This option is only present for backward compatibility, and should not be used when creating new XML documents.

The second option places the extension functions in the x3b2 namespace. In order to use these functions, you must prefix them with "x3b2:". As this results in the most compliant XPath usage, this is the preferred option. If both options are turned off, the Arbortext APP extension functions are not available at all.

For further information on namespaces in XML, see the see the XML in 3B2 chapter.

Nodetest	Semantics
count(text-black())	Counts how many occurrences of black text like <para>text</para>.
count(text-white())	Counts how many occurrences of white or no text like <para> </para>.
count(text-cdata())	Counts cdata sections like …….

The commands above are classed as node tests because they are not affected by the x3B2 namespace prefix.

Function

Semantics

Additional

if()

If exp is true, do what is in arg.

These functions allow conditions to be tested for. With if() and if-else(), if the first parameter evaluates to a nonzero number, the results of the second parameter are returned. Otherwise, 0 (with if()) or the results of the third parameter (with if-else()) are returned. With if-not, the results of the second parameter are returned if the first parameter evaluates to 0.

if-not(exp, arg)

If exp is false, do what is in arg.

if-else(n,’alternative 1’,’alternative 2’)

If n is true, do alternative 1, otherwise do alternative 2.

node-number(node())

The internal DOM equivalent number of the node.

This function returns the ID number of the first node in the parameter, or the current context node if not specified.

node-from-number(number)

The DOM node corresponding to the number.

This function returns the node with the ID number contained in the parameter, or the current context node if not specified.

node-list(node-set)

Returns a string containing all the node numbers of the node-set.

This function returns the list of ID numbers for all nodes contained in node-set specified by the parameter.

nodes-from-list(‘string’)

Gives the node-set defined by the numbers in the string.

This function returns a node-set containing all nodes specified by the ID numbers listed in the parameters. The input parameters for this function are a set of strings containing comma-separated values. For example, "nodes-from-list(’3,5’,’7’,’8,9’)" will return a node-set containing the nodes with ID numbers of 3,5,7,8 and 9.

concat-nodes(node-set, ’string’, 'function')

Concatenates the content of the nodes of the node set separated by ‘string’. Note that double or single quote marks need to be placed around the function when using libxml.

This function executes a function against all nodes in the node-set and returns the result as a string. This function takes up to three parameters - the node-set to work against, a string containing a separator, and a string containing the function to run against the nodes. For example, "concat-nodes(//@*,",",’name()’)" will output a comma-separated list of the names of all attributes in the document.

IMPORTANT NOTE: Due to the difference in the way the 3B2-XML and Libxml parsers work, the syntax of this function has been changed. In the 3B2-XML parser, the function (the third parameter) was not in quotes, i.e. "concat-nodes(//@*,",",name())". For Libxml, this parameter MUST be contained in quotes as in the example shown above. As a result of this, any quotes contained in the function call must be doubled up as described earlier.

markup(node)

Everything beneath the actual node with markup and tags including the actual tag.

These functions return the text from the Arbortext APP stream located within the start and end positions for the first node specified in the parameter, for markup(), or start and end content positions for content-markup(). If no parameter is specified, the text for the current context node is returned.

content-markup(node)

Everything beneath the actual node with markup and tags.

start-pos(node)

Start position of the node in the string.

These functions return the required character position for the first node contained in the parameter, or the current context node if no parameter is specified.

end-pos(node)

End position of the node in the string.

content-start-pos(node)

Start position of the node content in the string.

content-end-pos(node)

End position of the node content in the string.

show-string(Arbortext APP show string)

Include return values of a Arbortext APP show string into an XPath expression.

This function returns the result of the parameter string after being evaluated as a show string.

unparsed-entity-uri

These functions return the URI and types of the specified NDATA or NOTATION entities from the DTD.

unparsed-entity-type

get-path()

get-path(node-set, mode, depth)

This function returns a string containing the location of the first node in the parameter, allowing you to specify the maximum depth to traverse printing the path, the separator character to use between levels, which character to use if a complete path is output, and an alternate character to use if the maximum depth is reached first. If no parameters are specified, it returns the location of the current node, separated by ’/’ characters. For example, for the following XML:
<?xml version="1.0"?>
<root>
<elem><child/></elem>
</root>
get-context(//child) returns "/root/elem/child"
get-context(//child,9999,",","-","+") returns "-root,elem,child"
get-context(//child,2,",","-","+") returns "+elem,child"
Note that this function does not return an accurateenough path to identify the input node in an XPathlocation set.

get-context()

get-context(node-set, depth, separator, complete_prefix, partial_prefix)

Return: a string containing the position of the first node in the node-setInputs:
node-set - the node to return the context for (default: context)
depth - how far back up the tree to look (default: 9999)
separator - the string to insert between levels (default: "/")
complete_prefix - the string to put at the start if there are no more nodes found higher in the tree (default: "/")
partial_prefix - the string to put at the start if the maximum
depth was reached with parent nodes still left (default: "//")

valid-content()

8.12

Returns the list of valid children/values for the selected element or attribute. This list is presented in the same format as would occur in the DTD definition.
Obviously to make it work, there must be a DTD defined for it, but the document does NOT have to be validated first. If the DTD points to an external file, then the option to load external files in tdpref (or equivalent stream flag) must also be turned on.

match()

match(context node, "path", "path", ...)

8.13p

Works in a similar manner to <xslt:template match=""/>, but tests the paths in order specified in the parameters. If the context node is matched by any of the paths, then the function returns a string containing the path that matched.

3.3	XPath Formatting (V9)

Previously, the primary method to include a fragment of text from a different stream during formatting was by using the following method:

<tx1#30,50>

which would, for example, format the text from stream tx1 between character positions 30 to 50. The problem with this is that if the contents of stream tx1 changed, then the character positions might no longer refer to the correct places in the stream and would need updating.

This method has been extended to allow XPath or XPointer expressions to select the content from the stream to be formatted. To use an XPath expression, an exclamation mark (!) should be placed after the ’#’, followed by a valid expression that returns a node-set. To use XPointer, then two exclamation marks are required.

For example;

<tx1#!"//item"> <!-- XPath -->	1
<tx1#!!"//item"> <!-- XPointer -->	2

This will find all item nodes in tx1 and format them, including the start and end elements, i.e. all text between the start-pos() and end-pos() for each item node returned by the location path.

Placing a dash (-) after the # will only select the contents of the nodes, for example:

<tx1#-!//item> <!-- XPath -->	1
<tx1#-!!//item> <!-- XPointer -->	2

This will find all item nodes in tx1 and only format the contents of the nodes, i.e. between content-start-pos() and content-end-pos().

Placing a plus (+) after the # will switch the XPath context to the node being formatted, so that any XPath expressions occurring in the stream will behave as if they were the main stream being formatted. For example:

<tx1#+30,50>	1
<tx1#+!"//item>	2

3.4	XPath Custom Functions (V9)

If the option to add the extension XPath functions to the x3b2 namespace isenabled, then it is possible to treat any Arbortext APP or Perl mini-script as a special customextension function. This allows XPath to perform tasks not normally possible, ordifficult to achieve, such as string manipulation or complex calculations.In order to enforce compliance with the relevant specifications, the customfunctions are always only available through the x3b2 namespace, even if the"global namespace" option is selected.

Apart from setting the "x3b2 namespace" option and creating the required script,no further initialisation is required. When processing the XPath expression, if afunction is called that doesn’t exist, then libxml will check to see if a script exists in Arbortext APP with the same name (ignoring the x3b2 prefix). If it does, then this script iscalled, otherwise an error occurs as normal. Note that the pre-defined extensionfunctions take priority over any custom functions, so the scripts should be namedto avoid conflict with the existing functions.

In order to pass parameters into the mini-script and receive a return value, a set ofstandard Arbortext APP variables are used. Each parameter is stored in a variable of the form

^funcname_x

where "funcname" is the name of the function, and "x" is the number of theparameter. The total number of available parameters is placed in the variable

^funcname_n

and the result of the function should be placed in the variable of the form

^funcname

again, where "funcname" is the name of the function.

For example, if the following XPath expression was evaluated,

x3b2:property(’line-height’) * 2

then a Arbortext APP miniscript called property will be executed with the parameter "lineheight"placed into a Arbortext APP variable "property_1". As there is only one parameter,the variable "property_n" will contain the value 1.

The result of the script shouldbe placed in a variable called "property". This value will then be used in theremainder of the expression and therefore multiplied by 2.

All input parameters, including node sets, are automatically converted as if theywere passed to the string() function first. Care should be taken to ensure the returnvalues are in the correct format for the rest of the expression. So in the previousexample, if the property script returns a string, then XPath will be unable tomultiply it by two and will not produce a valid result.

3.5	XPath Namespaces (V9)

As a short-cut when using location paths to select nodes in namespaces, it is possible to embed the XPath statements in an XML stream that declares the required namespace. For example,

	1
<?xml version="1.0"?>	2
<root xmlns:p="http://random.com/ns">	3
<?xpath tx1#count(//p:elem)/>	4
</root>	5

will count all “elem” elements in the tx1 stream that are in the namespace “http://random.com/ns”. Note that the XPath stream must be well-formed XML, and the namespace must be declared with a prefix — default namespaces are ignored.

4 Arbortext APP XPath examples

Below are some simple examples of how to perform tasks which would usually involve show strings in Arbortext APP.

4.1	Lists and numbering

<?xpath count (preceding-sibling::list_item)+1>

outputs the number of preceding siblings called <list_item> plus one. This can be used to number lists, chapters, sections – anything where the number depends on the number of elements of the name which come before the current element. It is necessary to add one because the preceding-sibling axis returns the number of <list_item> elements which come before the current one.

4.2	Context Jump

<?}$^!"name(..)">

Jumps to the label whose name is the same as the parent of the current element. This is useful for formatting paragraphs depending on how they are nested.

4.3	Boolean context jump

<?}$^!"boolean(//body[@columns=2])">

Jumps to the label <?:true> if the first <body> element in the document has an attribute columns="2".

4.4	Add attribute values and show to variable

<?xpath "^width"sum(cell/@colwidth)>

Adds up all the colwidth attributes of the cell children of the current element and puts the result into a variable called "width". If placed at the beginning of a table row, it would give you the total width of the row.

4.5	If not first of kind

<?show =^!"count(preceding-sibling::para)"^!^(0)&emspace;>

Outputs an &emspace; if there is more than zero preceding <para> elements (i.e if it is not the first in a section).

4.6	Output if "tag" starts with the text "section"

<?show =^!"number(starts-with(title[1],’Section’))"^?x(1)New>

This tests if the first <title> child of the context node starts with word 'section'. The result is changed to a number which is compared with the value stored in x(1) to determine whether to output 'New' or not.

This allows the value of x(1) to be varied contextually. Note the use of single quotes around "Section". This is to avoid confusion with the use of double quotes around the XPath expression.

4.7	Combine/display values using concat-nodes

<?xpath concat-nodes(child::*/@id, ", ")>

This outputs all the id attribute values and separates them with a comma. This is the same as the showstring plus "destination" command but without the need for a variable. This can be very useful for generating the content of TOC's.

4.8	Count descendants of a stream from a script

^chapter=^!"count(stream#//Chapter)"

This XPath expression can be used in a script. This example counts all the Chapter descendants of the Arbortext APP XML stream called "stream". This is then passed to the variable ^chapter. You can select the stream you want by prefixing the location path with the stream name and a hash symbol.

The ability to use XPath in Arbortext APP scripts adds even more scope and ease in manipulating and creating complex documents in Arbortext APP. The XPath expressions which have been covered in this chapter can be used as normal in scripts with a slight change in syntax. From the examples above, note that the only change to the syntax is the removal of the chevron and question mark similar to normal Arbortext APP commands in scripts.

4.9	Arbortext APP variables in XPath

Variables can be incorporated into XPath in Arbortext APP.

To pass the results of an XPath command to a variable, any of the command structure in the following examples can be used.

This will find the name of the first descendant and place it into the variable:

<?xpath "^var" name(//*[1])>

This command first finds all element descendants of the root. Then it tests if its position is equal to the value stored in ^var. Finally, the name() function returns the name of the element. So in total, it returns the name of the elements at a position equal to the number stored in the variable ^var:

<?xpath name(//*[boolean(position()=^var)])?>

This command does the same as the one above but it is easier:

<^?xpath name((//*)[^var])?>

This joins the contents of ^var onto the end of 'some text'.

<?xpath concat('some text', ^var)?>

This outputs the contents of the variable as a string:

<?xpath string(^var)?>

4.10	Setting context node at a character position

By entering the character position in your XPath command allows Arbortext APP to start working from that position as if it is the context node and hence arrives at the end result much quicker therefore providing more efficiency. In the example below, 179 is the character position where the context node is placed effectively.

<?xpath tx1#179#node-list(*)>

Also if a minus character is entered before the number, Arbortext APP will look for the node position rather than the character position.

<?xpath tx1#-16#node-list(*)>

This feature is only available for use with the libxml parser which is implemented from version 8. For information on the libxml parser, see the XML in 3B2 chapter.

5 Further Information and References

5.1	Associated documentation

If using HTML (online or local) documentation, mouse over the links above for a summary.

5.2	Internet resources

You can find out more on XML and XPath by visiting:

The W3C website www.w3c.org/tr/xpath

The XML website www.xml.com

6 Subject Index

Subject index only applicable to the PDF version.

Document created on 04-Jan-2002, last reviewed on 03-Oct-2005 (revision 4)

Summary

Contents

1

1.1

1.2

2

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

2.10

2.11

3

3.1

3.1.1

3.1.2

3.1.3

3.1.4

3.1.5

3.1.6

3.2

3.3

3.4

3.5

4

4.1

4.2

4.3

4.4

4.5

4.6

4.7

4.8

4.9

4.10

5

5.1

5.2

6