Copyright © 2002-2004 authors and contributors. All rights reserved.
This document is the first release of the CLiX constraint language specification. It is not a generic user manual. The intended audience of this document are developers that wish to implement CLiX and users that wish to look up specific details of the language.
1 Introduction
2 Overview
3 Namespaces
4 Header
4.1 Header Schema
5 Rules
5.1 Rules Schema
5.2 Rule Schema
5.3 Reporting
5.4 Constraint formulas
5.4.1 Namespace prefixes in paths
5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths
5.4.3 Node set conversion
5.4.4 clix:forall
5.4.5 clix:exists
5.4.6 clix:not
5.4.7 clix:and
5.4.8 clix:or
5.4.9 clix:implies
5.4.10 clix:iff
5.4.11 clix:equal
5.4.12 clix:notEqual
5.4.13 clix:same
5.4.14 clix:less
5.4.15 clix:lessOrEqual
5.4.16 clix:greater
5.4.17 clix:greaterOrEqual
5.5 Extension Mechanisms
5.5.1 clix:operator
5.5.2 macro:*
CLiX is a constraint language for XML. It is based on first order logic and [XPath] and enables the expression of constraints that make assertions about the content of documents. Elements in documents that do not meet these assertions violate the constraints.
CLiX is on a different level from traditional schema languages like [XML Schema] or [RELAX NG]. These languages are intended to specify the structure of XML documents. CLiX constraints assume that the basic structure is already in place and add additional restrictions. CLiX is thus complementary to schema validation and may be applied as an additional step in a validation pipeline.
Some similarities exist between CLiX and [Schematron], since both languages are assertion languages, and both are path based. Historically, CLiX had been developed slightly before Schematron, but was the result of research at University College London, and hence had not been as widely distributed as the Schematron community effrot. CLiX differs from Schematron in its degree of expressiveness: first order logic is more expressive than boolean logic, and many constraints where Schematron requires scripting can be handled natively in CLiX.
The remainder of this document specifies the CLiX language. This document is normative: conforming CLiX implementations must implement the language semantics specified in this document.
This specification introduces rule files that contain a list of rules which are fed into a CLiX processor as input.
The result of applying the rules to specified documents is not formally specified at this point and is
left to the implementation, although the clix:report element can be used to associate diagnostic messages with each
constraint.
The specification begins by defining a generic metadata mechanism that is used to attach authorship and other information to CLiX rules.
CLiX language elements are associated with the namespace URI http://www.clixml.org/clix/1.0. In the remainder
of the specification, the namespace prefix clix will be used to refer to this namespace, as in
clix:rules.
Macro invocation is performed using the macro namespace, http://www.clixml.org/clix/1.0/Macro. We will use
the prefix macro to refer to this namespace.
The purpose of the header elements is to enable the specification of documentation and authorship information for structures, such as rules or macros, in the language. They effectively provide a simple way of attaching meta-data to these artefacts. Since they are reusable, they are defined here separately.
clix:header
<clix:header> <!-- Content: (clix:author | clix:comment | clix:description | *)* --> </clix:header>
clix:header is the containg elements for the header information. The
header can contain a number of fixed CLiX elements, plus elements from other
namespaces that can contain arbitrary markup.
clix:author
<!-- Category: header-elements --> <clix:author> <!-- Content: #PCDATA --> </clix:author>
Author information can be attached using the clix:author element. Since the header content
model allows multiple occurences of elements, multiple authors can be defined.
clix:comment
<!-- Category: header-elements --> <clix:comment> <!-- Content: #PCDATA --> </clix:comment>
The clix:comment element is intended to add additional information to CLiX elements, supplementing the
description. While the description provides a functional overview, the comment is intended as a means for providing
background information.
clix:description
<!-- Category: header-elements --> <clix:description> <!-- Content: (xhtml:* | #PCDATA)* --> </clix:description>
The clix:description element provides a functional description of the element that it is attached to. It can
contain text and elements from the XHTML namespace. This facilitates the production of documentation from description
elements using stylesheets.
* Any element from other namespaces may be included in the header. The content of such elements is completely unconstrained and can be used to attach supporting information that can be processed by tools.
The example below shows a simple header, with the three statically defined fields, plus two custom
elements meta:date and meta:version. These elements can be processed as data items
by third party applications.
<clix:header xmlns:meta="http://www.mycompany.com/clix/metadata/1.0"> <clix:author>Christian Nentwich</clix:author> <clix:description>Some sample header info</clix:description> <clix:comment>Just for the spec</clix:comment> <meta:date>23-04-2003</meta:date> <meta:version>2</meta:version> </clix:header>
All CLiX constraints are contained in rules files, which must start with the clix:rules
element. This is a top-level element and it must contain one or more clix:rule elements.

clix:rules
<clix:rules> <!-- Content: (clix:macros*,clix:variable*,clix:key*,clix:rule+) --> </clix:rules>
CLiX constraints are contained in rule files, which must start with the clix:rules
element. Any namespace prefixes bound at the clix:rules elements become available for use in
path expression in the rule file, as explained in 5.4.1 Namespace prefixes in paths.
Optionally, the clix:rules element can contain macros, variable and key elements:
clix:macros
<clix:macros href = URI />
Instructs the CLiX processor to include a macro file for macro processing. This mechanism is left unspecified in this version of the spec and will be updated in a minor release.
clix:variable
<clix:variable id = cdata xpath = absolute path />
clix:variable creates a global variable that all rules in a CLiX file may make use of. The
id attribute becomes the variable name and xpath is evaluated to become the variable
value. Since there is no context defined at this point, xpath must be an
absolute path, as defined in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths
clix:key
<clix:key name = cdata match = absolute path use = relative path />
This creates a key for lookup using the key() function inside a path, in exactly the same
way as the [XSLT]key construct. The
name attribute is a unique identifier for the key. match must be an absolute path
as defined in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths, and use must be a path relative to the nodes retrieved
by evaluating the absolute path.
A key defined in such a way can then be accessed using the XPath function key(name, nodeset),
which evaluates its second path parameter to a node set and gets all elements from the key for which use
evaluates to the same value.
Suppose we have the following document:
<restaurant> <dinner> <dessert>Crepes</dessert> <price>2.50</price> </dinner> <dinner> <dessert>Chocolate gateaux</dessert> <price>3</price> </dinner> <favourite>Crepes</favourite> </restaurant>
Then we could define a key over the dinners by dessert names using <clix:key name="dessertKey" match="//dinner" use="dessert"/>. Assuming that we have a variable fav bound to the favourites element, we could check
the price of our favourite dessert using: <clix:equal op1="key('dessertKey',$fav)/price" op2="2.50"/>
This is a more complete example of the structure of a rule file:
<clix:rules xmlns:clix="http://www.clixml.org/clix/1.0" version="1.0"> <clix:header> <clix:author>Michael Marconi</clix:author> <clix:description>A set of rule examples</clix:description> <clix:comment>This should help to demonstrate the rules element</clix:comment> </clix:header> <clix:rule id="rule1"> ... </clix:rule> <clix:rule id="rule2"> ... </clix:rule> </clix:rules>

clix:rule
<clix:rule id = ID disabled = boolean <!-- Content: (clix:header?,clix:report?,formula) --> </clix:rule>
Every rule has a mandatory id attribute that can be used for referencing. It may
contain a header that provides information about the rule, as specified in
4 Header. This is followed by the reporting mechanism (see
5.3 Reporting)and a constraint formula. The constraint formula is represented
by one of the formula elements defined in 5.4 Constraint formulas.
The rule element may also have an optional disabled attribute whose possible values
include true and false. If the attribute is set to true,
CLiX processors should not execute the rule, though they are still allowed to load it.
When a rule is violated, it is important to provide diagnostic feedback to the user or
a processing application that is evaluating the rule. The
clix:report element is used to hold markup and text that will describe how
the rule has been violated.
clix:report
<clix:report> <!-- Content: any --> </clix:report>
The content model of clix:report is completely unrestricted. This means that report
messages can contain arbitrary XML markup. They may also make use of XHTML elements, which can be
use to generate HTML or XHTML reports depending on the serializer that is attached to the
implementation.
Rules in CLiX are based on a fairly simple language that combines first order
logic with XPath: quantifiers are used to iterate over
sets of nodes, boolean operators like and and or
allow the construction of more complex formulae, and predicates like equals
and notequals compare sets of nodes to get a result.
The purpose of this section is to define an XML syntax for this formula language.
Every construct in the language makes use of XPath to retrieve elements from documents for processing. The XPath expressions that can be used to address elements in documents are restricted depending on the formula type. The next few sections define the permissible types of path expressions.
XPath location steps use prefixes to retrieve elements from certain namespaces. For example, given our
definition of the clix prefix above, /clix:rules/clix:rule would select all
rule elements in the namespace http://www.clixml.org/clix/1.0.
When CLiX formulas make use of namespace prefixes in their paths, those prefixes must be bound. CLiX processors
will accept all prefixes bound at the root of the rule file, clix:rules as valid prefixes for
path expressions. The following example shows how to write a rule over elements in a custom namespace,
http://www.mycompany.com/mylanguage:
The CLiX formula constructs make use of XPath to select elements from documents. Predicate elements
like clix:equal may be relative to variables only, global variables must use
absolute path expressions and quantifiers like clix:forall need path expressions that evaluate
to node sets, since they are iteration constructs. We will define these three path types below.
An absolute path starts either with a / or a //. If the path is
a union, then each path in the union must be absolute itself. For example,
/foo/bar is an absolute path and so is /foo | /bar.
/foo/bar is an absolute path that selects all elements
<bar> contained in <foo>.
/foo/@bar is an absolute path that selects the attribute
bar in all elements <foo>.
/foo/bar/text() is an absolute path that selects all
text nodes in <bar>.
The following examples are all illegal:
$x/foo is illegal because it is relative to a variable (x).
substring(/foo,1,5) is illegal because, even though the contained path
is absolute, the expression as a whole does not begin with an absolute path.
A quantifier path is either an absolute path or a path expression whose locator paths
are relative to a variable. In addition, a quantifier path must evaluate
to a node set. This is because these paths are used by clix:forall and clix:exists, which are iteration
constructs.
/foo/bar is legal because is an absolute path and selects a set of bar nodes.
/foo/@att is legal because is an absolute path and selects a set containing the
att attribute, or an empty set if no such attribute is present.
$x/foo/@bar is legal because it is relative to a variable and selects an attribute node.
The following examples are all illegal:
foo/bar is legal because is neither absolute, nor relative to a variable. It assumes an
implied context node.
substring(/foo/@att,1,5) is illegal because it results in a string.
A predicate path is a path expression that may contain only location paths that are relative to
variables, or expressions that yield strings, booleans or numbers. It is called a predicate path becaues it
is the only permissible type of path in the CLiX "predicates" like clix:equal. As an example, $x/bar
is a predicate path and so is 54.
Predicate paths are always evaluated to yield a value of type string, number or boolean. If a path evaluates to a node set, the node set is converted to a string using the conversion rules set out in 5.4.3 Node set conversion. The following are examples of legal predicate paths:
54 is legal because it selects a number
true() is legal because it selects a boolean.
$x/foo/@bar is legal because it selects a nodeset that contains an attribute node.
substring($x/foo,5) is legal because it selects a string, and because the parameter
path is relative to a variable.
The following examples are all illegal predicate paths:
The result of evaluating a predicate path is always a string, number or boolean. Nevertheless,
paths that evaluate to node sets, for example /foo/bar or $x/@bar are legal predicate
paths. We therefore need to define conversion rules for converting node sets to strings.
A node set is converted to a string by determining the string value of each node in the set, and then concatenating the string values into a single string. The string values for the different types of nodes are defined as follows:
Elements: the string value is the concatenated value of all text node children of the element
Attributes: the string value is the value of the attribute.
Text nodes: the value is the content of the text node.
Comments: the value is the content of the comment, excluding the <!--
and --> escape characters.
Processing instructions: the value is the string content of the processing instruction following
the target, but excluding the terminating ?>
Here is an example document to illustrate these conversion rules:
In this document, assuming that x is bound to the root element:
the value of $x/bar[1]/@id is "1",
the value of $x/bar/@id is "12",
and the value of $x/bar is "ab".
<clix:forall var = cdata in = quantifierpath> <!-- Content: formula --> </clix:forall>
clix:forall is a universal quantifier that iterates over a set of nodes and returns
true if and only if its subformula evaluates to true for all assignments
to the variable.
var is the variable that the quantifier will bind nodes to. It must be a valid
XPath variable identifier. In addition, no variable with the same name must be bound
in any ancestor formula. The following example is illegal:
<clix:forall var="x" in="/spec"> <clix:forall var="x" in="//p"> .. </clix:forall> </clix:forall>
in defines an quantifier path that is evaluated to a set of nodes to iterate
over. The path must follow the constraint on quantifier paths defined in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths.
Here is an example of a constraint that says all elements price must have a
currency attribute set to EUR:
<clix:exists var = cdata in = quantifierpath> <!-- Content: formula? --> </clix:exists>
clix:exists is an existential quantifier that iterates over a node set, binding a variable to each
node in turn, and returns true if and only if its subformula returns true for at least
one assignment to the variable. If clix:exists has no subformula, if returns true if and
only if the set selected using the path expression is non-empty.
var is the variable that the quantifier will bind nodes to. It must be a valid
XPath variable identifier. No duplicate bindings are allowed, as specified in 5.4.4 clix:forall.
in defines an quantifier path that is evaluated to a set of nodes to iterate
over. The path must follow the constraint on quantifier paths defined in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths.
The following constraint expresses that every dinner element must include a
dessert element:
<clix:not> <!-- Content: formula --> </clix:not>
clix:not implements logical negation not. It returns true if its subformula
returns false, otherwise it returns false.
<clix:and> <!-- Content: (formula,formula) --> </clix:and>
clix:and implements logical conjunction: it returns true if and only if both of its
subformulae are true. This behaviour is captured in the following truth-table, where f1 refers
to the first subformula and f2 to the second:
| f1 | f2 | f1 and f2 |
|---|---|---|
| true | true | true |
| true | false | false |
| false | true | false |
| false | false | false |
The following constraint expresses that every dinner element must include a
dessert and a price:
<clix:forall var="dinner" in="/restaurant/dinner"> <clix:and> <clix:exists var="d" in="$dinner/dessert"/> <clix:exists var="d" in="$dinner/price"/> </clix:and> </clix:forall>
<clix:or> <!-- Content: (formula,formula) --> </clix:or>
clix:or implements logical disjunction: it returns true if either of its
subformulae are true. Another way of looking at this is that it returns false only if both
subformulae are false. This behaviour is captured in the following truth-table, where f1 refers
to the first subformula and f2 to the second:
| f1 | f2 | f1 or f2 |
|---|---|---|
| true | true | true |
| true | false | true |
| false | true | true |
| false | false | false |
<clix:implies> <!-- Content: (formula,formula) --> </clix:implies>
clix:implies provides logical implication. "a implies b", where a and b are
the two subformulae, expresses the constraint that if a is true then b must also be
true. It does not express any notion about what happens when a is false - in this case the outcome
of the implication is always true. This is captured more precisely in the following truth table:
| f1 | f2 | f1 implies f2 |
|---|---|---|
| true | true | true |
| true | false | false |
| false | true | true |
| false | false | true |
<clix:iff> <!-- Content: (formula,formula) --> </clix:iff>
clix:iff is a shorthand for "if and only if". It is true if both subformulae evaluate to
the same result, that is if they are either both true or both false. "a iff b" is
a common way of abbreviating the equivalent "(a implies b) and (b implies a)":
| f1 | f2 | f1 iff f2 |
|---|---|---|
| true | true | true |
| true | false | false |
| false | true | false |
| false | false | true |
<clix:equal op1 = predicatepath op2 = predicatepath/>
clix:equal compares two values for equality and returns true if and only if the values
are equal. The two parameters are predicate paths and have to meet the constraints set out in
5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. If both parameters evaluate to the same types, they are compared directly, that is
strings are compared to strings using standard string comparison, numbers are compared to numbers using numeric
comparison (which disregard lexical artefacts so that for example 5.0 is equal to 5)
and booleans are similarly compared using a boolean comparison.
If either parameter evaluates to a node set, it is converted using the conversion rules specified in 5.4.3 Node set conversion and treated like a string. The following casting rules are then applied as normal.
If the two parameters do not evaluate to the same type, casting rules are applied to convert the
type of lower priority into the type of the other parameter. The order of the parameters is insignificant,
type priority is the overriding concern. Thus, Type 1 and Type 2 in the following table
do not refer to the types of the first and second parameter, but are simply a way of referring to a pair of
types. The
| Type 1 | Type 2 | Base Type |
| Boolean | Number | Boolean |
| Boolean | String | String |
| String | Number | String |
There are thus three type conversions that may take place and their implementation proceeds as follows:
Number to Boolean: A non-zero number except NaN is converted to
true. 0 and NaN are converted to false.
Boolean to String:true is converted to "true" and
false is converted to "false".
Number to String: Numbers with zero remainders are converted by taking the string
representation of the integer part of the number on a decimal base, preceded by a minus sign if the
number is negative. Thus 5 is converted to "5". Numbers with non-zero remainders
are represented as decimal numbers with a period representing the decimal point, with at least one digit
in the integer part, and as many digits in the fractional part as are required to distinguish the number from
all other IEEE754 values. Thus 5.23400 is converted to "5.234". Finally,
NaN is converted to "NaN".
As a result of these conversions, care must be taken when comparing numbers to strings. For example,
<clix:equal op1="5" op2="'5.0'"/> returns false because the number
5 would be converted into the string "5", which is not equal to
"5.0". In general, it would be best to compare numbers to numbers directly, or to make use
of the XPath number function where this is not possible.
Assume the following example document, and that the variable x has been bound to
bar1 and y has been bound to bar2:
<foo> <bar1>A value</bar1> <bar2>5.0</bar2> </foo>
Then equal would behave as follows:
<clix:equal op1="$x" op2="'A value'"/> is true
<clix:equal op1="$y" op2="5"/> is false
<clix:equal op1="number($y)" op2="5"/> is true
<clix:notequal op1 = predicatepath op2 = predicatepath/>
clix:notequal compares two values for equality and returns true if and only if
the values are not equal. The two parameters are predicate paths and have to meet the
constraints set out in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. Differing parameter types are cast to a base using the
rules specified for 5.4.11 clix:equal.
The behaviour of clix:notequal is such that the following
two formulae are equivalent:
<clix:notequal op1="$x/bar" op2="$y/bar"/> <clix:not> <clix:equal op1="$x/bar" op2="$y/bar"/> </clix:not>
<clix:same op1 = varref op2 = varref />
clix:same is the only predicate in CLiX that does not take two XPaths as a parameter.
Instead, it takes two variable references of the form $varname and checks if
exactly the same nodes are bound to the two variables.
This is different from clix:equal, because the two parameters are not compared by
value. Take the following document as an example:
<foo> <bar1>A value</bar1> <bar2>A value</bar2> </foo>
If you assume that x is bound to bar1 and y is bound
to bar2, then:
<clix:same op1="$x" op2="$y"/> is false because x
does not point to the same node as y.
<clix:equal op1="$x" op2="$y"/> is true, because the two nodes
contain the same value
<clix:same op1="$x" op2="$x"/> is true
Same is useful for certain types of constraints like complex uniqueness constraints. We can
express that all elements foo are unique if compared by @id as follows:
<clix:less op1 = predicatepath op2 = predicatepath/>
clix:less compares its two parameters and returns true if the first parameter is either
lexicographically or numerically strictly less than the second, in the case of strings and numbers, or if it is
not equal to the second, in the case of booleans. The two parameters are predicate paths and
have to meet the constraints set out in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. Before comparison, different types
of parameters are cast to a common base as specified for 5.4.11 clix:equal
Assume the following example document, and that the variable x has been bound to
bar1 and y has been bound to bar2:
<foo> <bar1>A value</bar1> <bar2>5.0</bar2> </foo>
Then clix:less would be evaluated as follows:
<clix:less op1="$x" op2="'No value'"/> is true
<clix:less op1="number($y)" op2="6"/> is true
<clix:less op1="number($y)" op2="5"/> is false - the value
needs to be strictly less.
<clix:less op1="true()" op2="false()"/> is true
<clix:lessOrEqual op1 = predicatepath op2 = predicatepath/>
clix:lessOrEqual is a combination of clix:less and clix:equal.
The two parameters are predicate paths and have to meet the constraints set out
in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. The following two formulae are equivalent:
<clix:lessOrEqual op1="$x" op="$y"/> <clix:or> <clix:less op1="$x" op="$y"/> <clix:equal op1="$x" op="$y"/> </clix:or>
<clix:greater op1 = predicatepath op2 = predicatepath/>
clix:greater compares its two parameters and returns true if the first parameter is either
lexicographically or numerically strictly greater than the second, in the case of strings and numbers, or if it is
not equal to the second, in the case of booleans. The two parameters are predicate paths and
have to meet the constraints set out in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. Before comparison, different types
of parameters are cast to a common base as specified for 5.4.11 clix:equal
With the exception of booleans, for which both clix:greater and clix:less return
true if and only if the two parameters evaluate to the same boolean value, the following two formulae
are equivalent:
<clix:greater op1="$x" op="$y"/> <clix:less op1="$y" op="$x"/>
Assume the following example document, and that the variable x has been bound to
bar1 and y has been bound to bar2:
<foo> <bar1>A value</bar1> <bar2>5.0</bar2> </foo>
Then clix:greater would be evaluated as follows:
<clix:greater op1="$x" op2="'No value'"/> is false
<clix:greater op1="number($y)" op2="5"/> is false - the value
needs to be strictly greater.
<clix:greater op1="true()" op2="false()"/> is true
<clix:greaterOrEqual op1 = predicatepath op2 = predicatepath/>
clix:greaterOrEqual is a combination of clix:greater and clix:equal.
The two parameters are predicate paths and have to meet the constraints set out
in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. The following two formulae are equivalent:
<clix:greaterOrEqual op1="$x" op="$y"/> <clix:or> <clix:greater op1="$x" op="$y"/> <clix:equal op1="$x" op="$y"/> </clix:or>
CLiX provides two extension mechanisms for the constraint language, plugin operators and macros. Plugin operators are scripted predicates that can take a number of parameters and return true or false. They can be used to perform a variety of tasks like database lookups, communication with legacy systems or extensive computation.
Macros are parameterised CLiX formulas that are kept in a macro definition file. They can be invoked in a rule to automatically expand the rule. This makes it possible to reuse frequently used constraint types. Macros are not further defined in this version of the spec but will be added in a minor release.
<clix:operator name = cdata> <!-- Content: clix:param* --> </clix:operator>
clix:operator can be used wherever any other predicate like clix:equal or
clix:notEqual can be used. Like these predicates, it takes a number of parameters and
returns true of false. The difference is that the operator can take any number of parameters, and its
computation is outside the scope of this specification.
It is assume that a CLiX execution engine will make operators available for use in a formula. The way
this is done is language specific and thus this specification does not constrain it further. The only
requirement imposed here is that the name attribute of the operator must point to some
implementation construct with the same name, and implementors should clearly specify how this relationship
is established.
An operator can take a number of clix:param elements, which are defined as follows:
<clix:param name = cdata value = predicatepath/>
Every parameter has a name and a value. The name must bind to a parameter that the
externally specified implementation expects. For example, in an ECMAScript implementation, this will reference
a parameter passed to the function implementing the operator. The value must contain a
predicate path, as specified in 5.4.2 Absolute Paths, Quantifier Paths and Predicate Paths. Implementing CLiX processors will evaluate this
predicate path and pass its value on as a string, number or boolean to the operator implementation.
Suppose we want to check that some element in a document is a prime number, and we provide an operator implementation that can execute this check. We would make this operator available in some programming language and then invoke it as follows:
<clix:forall var="n" in="/mylist/number"> <clix:operator name="isPrime"> <clix:param name="num" value="number($n)"/> </clix:operator> </clix:forall>