Description
If I reuse namespace prefixes for different nodes (such as when I use a blank XML namespace, i.e. xmlns=), I can get the wrong namespace when the node is reparented.
Code that Causes the Error
#! /usr/bin/env ruby
require 'nokogiri'
doc = Nokogiri::XML("<root xmlns=\"urn:something\"><child1 xmlns=\"urn:something_else\"/><child2/></root>")
node = doc.at_xpath("//ns:child2",{"ns" => "urn:something"})
new_parent = doc.at_xpath("//ns:child1",{"ns" => "urn:something_else"})
new_parent.add_child(node)
puts doc.to_xml
Note that this will now output incorrect namespaces on the "child2" node, like the following:
<?xml version="1.0"?>
<root xmlns="urn:something">
<child1 xmlns="urn:something_else">
<child2/>
</child1>
</root>
Note that the "child2" node has now been incorrectly 'moved' into the "urn:something_else" namespace.
A more correct output would be something similar to:
<?xml version="1.0"?>
<root xmlns="urn:something">
<child1 xmlns="urn:something_else">
<child2 xmlns="urn:something"/>
</child1>
</root>
Source of the Behaviour
I've tracked this down to static void relink_namespace(xmlNodePtr reparented)
in ext/nokogiri/xml_node.c
.
It seems this code doesn't check for nodes above the current node possibly 'squatting' on a namespace, for example:
- The node being relinked has an ancestor node that has a matching, 'correct', namespace definition.
- The node being relinked also has an ancestor node that has a namespace definition that matches by prefix, but NOT by href.
- The node with the with the incorrect definition is not only an ancestor of the node being relinked, it is also a descendent of the node with the 'correct' definition - meaning that traversing up the tree you will hit the node with the incorrect definition first.
Tests and PRs
A test which reproduces this issue an an associated correction are provided in PR #2495.