RustyXML takes correctness seriously. With 1296+ tests across multiple test suites, including the complete W3C/OASIS XML Conformance Test Suite, RustyXML achieves 100% compliance with the industry-standard XML validation tests.
This document describes W3C XML 1.0 compliance, XPath 1.0 support, and the validation methodology.
RustyXML is designed to comply with W3C XML 1.0 (Fifth Edition).
| Section | Requirement | Status |
|---|---|---|
| 2.1 | Well-formed documents | ✅ |
| 2.2 | Characters (Unicode) | ✅ |
| 2.3 | Common syntactic constructs | ✅ |
| 2.4 | Character data | ✅ |
| 2.5 | Comments | ✅ |
| 2.6 | Processing instructions | ✅ |
| 2.7 | CDATA sections | ✅ |
| 2.8 | Prolog and document type declaration | ✅ Parsed |
| 2.9 | Standalone document declaration | ✅ |
| 2.10 | White space handling | ✅ |
| 2.11 | End-of-line handling | ✅ |
| 2.12 | Language identification | ✅ |
| Feature | Status | Notes |
|---|---|---|
| Start tags | ✅ | Full attribute support |
| End tags | ✅ | Name matching validation |
| Empty element tags | ✅ | <element/> syntax |
| Attributes | ✅ | Single and double quotes |
| Namespaces | ✅ | Prefix resolution |
| Default namespaces | ✅ | xmlns="..." |
| Feature | Status | Notes |
|---|---|---|
| Character references | ✅ | &#N; and &#xN; |
| Predefined entities | ✅ | <, >, &, ', " |
| Character encoding | ✅ | UTF-8 (primary), UTF-16 detection |
| Unicode characters | ✅ | Full Unicode support |
| BOM handling | ✅ | UTF-8/UTF-16 BOM detection |
| Feature | Status | Notes |
|---|---|---|
| XML declaration | ✅ | Version, encoding, standalone |
| DOCTYPE declaration | ✅ | Parsed but not validated |
| Single root element | ✅ | Enforced |
| Comments | ✅ | <!-- ... --> |
| Processing instructions | ✅ | <?target data?> |
| CDATA sections | ✅ | <![CDATA[...]]> |
RustyXML makes practical concessions shared by most XML implementations:
- Non-validating parser - DTD declarations are parsed but entity definitions are not expanded (except predefined entities)
- Lenient character handling - Control characters in content generate warnings rather than errors
- Flexible encoding - Automatically detects UTF-8, UTF-16 LE/BE
RustyXML implements the complete XPath 1.0 specification.
All XPath axes are fully supported:
| Axis | Status | Description |
|---|---|---|
child |
✅ | Direct children |
parent |
✅ | Parent node |
self |
✅ | Context node |
attribute |
✅ | Attributes of context node |
descendant |
✅ | All descendants |
descendant-or-self |
✅ | Context node and descendants |
ancestor |
✅ | All ancestors |
ancestor-or-self |
✅ | Context node and ancestors |
following |
✅ | All following nodes |
following-sibling |
✅ | Following siblings |
preceding |
✅ | All preceding nodes |
preceding-sibling |
✅ | Preceding siblings |
namespace |
✅ | Namespace nodes |
| Test | Status | Example |
|---|---|---|
| Node name | ✅ | child::book |
| Wildcard | ✅ | child::* |
| Node type | ✅ | text(), comment(), processing-instruction(), node() |
| Processing instruction target | ✅ | processing-instruction('xml-stylesheet') |
| Feature | Status | Example |
|---|---|---|
| Position predicates | ✅ | [1], [last()] |
| Attribute predicates | ✅ | [@id='1'] |
| Element predicates | ✅ | [title] |
| Comparison operators | ✅ | =, !=, <, >, <=, >= |
| Boolean operators | ✅ | and, or |
| Arithmetic operators | ✅ | +, -, *, div, mod |
| Nested predicates | ✅ | [item[@type='book']] |
| Function | Status | Description |
|---|---|---|
position() |
✅ | Current position in node set |
last() |
✅ | Size of node set |
count(node-set) |
✅ | Number of nodes |
local-name() |
✅ | Local part of name |
namespace-uri() |
✅ | Namespace URI |
name() |
✅ | Qualified name |
id(string) |
✅ | Select by ID |
| Function | Status | Description |
|---|---|---|
string() |
✅ | Convert to string |
concat(str, str, ...) |
✅ | Concatenate strings |
starts-with(str, prefix) |
✅ | Test string prefix |
contains(str, substr) |
✅ | Test substring presence |
substring(str, start, len?) |
✅ | Extract substring |
substring-before(str, delim) |
✅ | String before delimiter |
substring-after(str, delim) |
✅ | String after delimiter |
string-length(str?) |
✅ | String length |
normalize-space(str?) |
✅ | Normalize whitespace |
translate(str, from, to) |
✅ | Character translation |
| Function | Status | Description |
|---|---|---|
boolean() |
✅ | Convert to boolean |
not(bool) |
✅ | Logical negation |
true() |
✅ | Boolean true |
false() |
✅ | Boolean false |
lang(lang) |
✅ | Test language |
| Function | Status | Description |
|---|---|---|
number() |
✅ | Convert to number |
sum(node-set) |
✅ | Sum of node values |
floor(num) |
✅ | Floor function |
ceiling(num) |
✅ | Ceiling function |
round(num) |
✅ | Round to nearest |
| Syntax | Expansion | Status |
|---|---|---|
// |
/descendant-or-self::node()/ |
✅ |
. |
self::node() |
✅ |
.. |
parent::node() |
✅ |
@attr |
attribute::attr |
✅ |
[n] |
[position() = n] |
✅ |
RustyXML includes a comprehensive conformance test suite based on W3C and OASIS standards.
| Category | Tests | Description |
|---|---|---|
| Well-Formedness | 18 | Basic XML structure |
| Characters | 12 | Unicode and special characters |
| Whitespace | 8 | Whitespace preservation and normalization |
| Entities | 10 | Entity references and escaping |
| CDATA | 8 | CDATA section handling |
| Comments | 6 | Comment parsing |
| Processing Instructions | 6 | PI parsing and data extraction |
| Namespaces | 12 | Namespace declaration and resolution |
| Attributes | 10 | Attribute parsing and quoting |
| Elements | 8 | Element naming and nesting |
| XML Declaration | 6 | Version, encoding, standalone |
| DOCTYPE | 4 | DOCTYPE declaration parsing |
| Edge Cases | 8 | Complex real-world scenarios |
| XPath Axes | 15 | All 13 axes plus edge cases |
| Total | 121 | Conformance tests |
test/xml_conformance_test.exs
# Run all conformance tests
mix test test/xml_conformance_test.exs
# Run specific category
mix test test/xml_conformance_test.exs --only wellformedness
mix test test/xml_conformance_test.exs --only xpathRustyXML is tested against the official W3C XML Conformance Test Suite (xmlconf), the industry standard with 2000+ test cases from Sun, IBM, OASIS/NIST, and others.
| Category | Tests | Passed | Status |
|---|---|---|---|
| Valid documents (must accept) | 218 | 218 | ✅ 100% |
| Not-well-formed (must reject) | 871 | 871 | ✅ 100% |
| Invalid (DTD validation) | - | - | N/A (non-validating) |
RustyXML achieves 100% compliance with all 1089 applicable OASIS/W3C XML Conformance tests.
| Category | Tests | Passed | Status |
|---|---|---|---|
| Valid documents (must accept) | 218 | 218 | ✅ 100% |
| Not-well-formed (must reject) | 871 | 0 | |
| Invalid (DTD validation) | - | - | N/A (non-validating) |
Lenient mode accepts malformed XML for processing third-party or legacy documents.
RustyXML supports two modes:
Strict Mode (Default) - Matches SweetXml/xmerl behavior:
- Validates element and attribute names
- Checks comment content (no
--sequences) - Validates text content (no unescaped
]]>) - Raises
ParseErrorfor malformed documents
Lenient Mode (lenient: true) - Accepts malformed XML:
- Best for processing real-world documents that may have minor issues
- 100% acceptance of valid documents
- Does not reject malformed documents
# Strict mode (default) - matches SweetXml
doc = RustyXML.parse("<root/>")
RustyXML.parse("<1invalid/>") # Raises ParseError
# Lenient mode - accepts malformed XML
doc = RustyXML.parse("<1invalid/>", lenient: true)
# Tuple-based error handling (no exceptions)
{:ok, doc} = RustyXML.parse_document("<root/>")
{:error, reason} = RustyXML.parse_document("<1invalid/>")| Malformed Input | Strict Mode (Default) | Lenient Mode |
|---|---|---|
<!-- comment -- inside --> |
❌ Error | ✅ Accepts |
<1invalid-name> |
❌ Error | ✅ Accepts |
<valid>text ]]> more</valid> |
❌ Error | ✅ Accepts |
<?XML version="1.0"?> (wrong case) |
❌ Error | ✅ Accepts |
standalone="YES" (wrong case) |
❌ Error | ✅ Accepts |
&undefined; in attributes |
❌ Error | ✅ Accepts |
| External entity in attribute | ❌ Error | ✅ Accepts |
Rationale: Strict mode by default ensures SweetXml compatibility and full XML 1.0 compliance. Lenient mode is available for processing third-party or legacy XML that may have minor issues.
The W3C/OASIS XML Conformance Test Suite is not included in the RustyXML package to keep the download size small (~50MB of test data). To run the conformance tests locally:
Option 1: Download directly from W3C
mkdir -p test/xmlconf && cd test/xmlconf
curl -LO https://www.w3.org/XML/Test/xmlts20130923.tar.gz
tar -xzf xmlts20130923.tar.gz && rm xmlts20130923.tar.gzOption 2: Use the convenience script
./scripts/download-xmlconf.shThe test suite version xmlts20130923 (September 2013) is the latest official release from the W3C. Since XML 1.0 Fifth Edition (2008) has been stable for over 15 years, no updates to the conformance tests have been necessary.
# Run all conformance tests (requires test suite download)
mix test test/oasis_conformance_test.exs
# Run only valid document tests
mix test test/oasis_conformance_test.exs --only valid
# Run only not-well-formed tests
mix test test/oasis_conformance_test.exs --only not_wf
# Include skipped tests (shows full results)
mix test test/oasis_conformance_test.exs --include skip- W3C Test Suite: https://www.w3.org/XML/Test/
- OASIS Committee: https://www.oasis-open.org/committees/xml-conformance/
- Test Suite Archive: https://www.w3.org/XML/Test/xmlts20130923.tar.gz
XPath compliance is tested against:
- W3C XPath 1.0 specification examples
- XSLT/XPath conformance test suite
- Real-world query patterns from SweetXml users
RustyXML is designed as a drop-in replacement for SweetXml.
| Function | SweetXml | RustyXML | Status |
|---|---|---|---|
xpath/2 |
✅ | ✅ | Compatible |
xpath/3 |
✅ | ✅ | Compatible |
xmap/2 |
✅ | ✅ | Compatible |
xmap/3 |
✅ | ✅ | Compatible |
~x sigil |
✅ | ✅ | Compatible |
stream_tags/2 |
✅ | ✅ | Compatible |
stream_tags/3 |
✅ | ✅ | Compatible |
| Modifier | SweetXml | RustyXML | Status |
|---|---|---|---|
s (string) |
✅ | ✅ | Compatible |
l (list) |
✅ | ✅ | Compatible |
e (entities) |
✅ | ✅ | Compatible |
o (optional) |
✅ | ✅ | Compatible |
i (integer) |
✅ | ✅ | Compatible |
f (float) |
✅ | ✅ | Compatible |
k (keyword) |
✅ | ✅ | Compatible |
# Before
import SweetXml
doc |> xpath(~x"//item"l)
# After
import RustyXML
doc |> xpath(~x"//item"l)All parsing paths produce consistent output for the same input.
| Path | Description | Validates Against |
|---|---|---|
Structural Index (parse/1) |
Main parse path (~4x input memory) | All test suites |
Streaming (stream_tags/3) |
Bounded-memory chunks | All test suites |
SAX (sax_parse/1) |
Event-based processing | All test suites |
# Paths are validated in test/rusty_xml_test.exs
test "parse produces consistent output" do
xml = "<root><item>test</item></root>"
doc = RustyXML.parse(xml)
assert is_reference(doc)
endThe streaming parser (stream_tags/3) is validated for:
| Feature | Status | Notes |
|---|---|---|
| Complete element reconstruction | ✅ | Builds valid XML strings |
| Nested element handling | ✅ | Captures full subtrees |
| Whitespace preservation | ✅ | All whitespace preserved |
| Attribute handling | ✅ | All attributes captured |
| CDATA sections | ✅ | Preserved in output |
| Entity preservation | ✅ | Entities maintained |
| Chunk boundary handling | ✅ | Elements spanning chunks work correctly |
| Early termination | ✅ | Stream.take works without hanging |
RustyXML's streaming implementation addresses known SweetXml issues:
| Issue | SweetXml | RustyXML | Status |
|---|---|---|---|
| #97 - Stream.take hangs | ❌ Hangs | ✅ Works | Fixed |
| #50 - Nested text order | ❌ Wrong order | ✅ Correct | Fixed |
| Element boundary chunks | ✅ Handles correctly | Fixed |
- Synthetic tests - Generated XML covering edge cases
- Real-world XML - RSS feeds, configuration files, SOAP messages
- Conformance suites - W3C and OASIS standard tests
- Fuzz testing - Random input to find parsing errors
- All tests run on every CI build
- Cross-platform testing (Linux, macOS, Windows)
- Multiple Elixir/OTP version matrix
- Memory leak detection with Valgrind (Rust side)
If you find XML that RustyXML doesn't handle correctly:
- Create a minimal reproduction case
- Open an issue with:
- Input XML (or link to conformance test)
- Expected output
- Actual output
- RustyXML version
| Suite | Tests | Purpose |
|---|---|---|
| OASIS/W3C Conformance | 1089 | Industry-standard XML validation |
| RustyXML Unit Tests | 207+ | API, XPath, streaming, SAX, sigils |
| Total | 1296+ |
RustyXML's strict mode (default) implements comprehensive XML 1.0 validation:
- ✅ Element and attribute names (XML 1.0 Edition 4 NameStartChar/NameChar)
- ✅ Comment content (no
--sequences) - ✅ Text content (no unescaped
]]>) - ✅ Standalone declaration values (
yesornoonly) - ✅ Document structure ordering (XMLDecl → DOCTYPE → root)
- ✅ Processing instruction target validation (
xmlreserved)
- ✅ Entity registry tracking (declared entities, types, values)
- ✅ Undefined entity detection in attribute values
- ✅ Case-sensitive entity matching
- ✅ External entity detection (SYSTEM/PUBLIC)
- ✅ WFC: No External Entity References in attributes
- ✅ Unparsed entity (NDATA) restrictions
- ✅ Entity replacement text validation:
- Split character reference detection (
&+#) - Balanced markup validation
- Invalid name character detection (CombiningChar as first char)
- XML declaration in entity prohibition
- Split character reference detection (
- XML 1.1 support - Minimal adoption, incompatible changes
- External entity resolution - Security concerns (XXE attacks)
- Full DTD processing - Complexity vs. benefit
- XPath 2.0 - Different specification, significant effort
- XSD validation - Out of scope for a parsing library
- W3C XML 1.0 (Fifth Edition) - XML specification
- W3C Namespaces in XML 1.0 - Namespace specification
- W3C XPath 1.0 - XPath specification
- OASIS XML Conformance - Test suite
- W3C XML Test Suite - Additional tests
- SweetXml - Elixir XML library (compatibility target)