6 KiB
xmldiff — Implementation Plan
Goal
A Java library that takes two XML strings (left = expected, right = actual) and produces two HTML strings suitable for rendering a side-by-side diff. Each output is a <span> tree with inner spans annotated with CSS classes.
CSS Classes
| Class | Meaning |
|---|---|
neutral |
This token is identical in both sides |
correct |
This token is on the left side and differs from right |
wrong |
This token is on the right side and differs from left |
skipped |
Child content of an element whose tag name differs |
Diff Granularity Rules
| Token | If equal | If different |
|---|---|---|
| Element name | neutral |
Left → correct, right → wrong; all content (attrs, children, text) → skipped |
| Attribute name | neutral |
Left attr name → correct, right attr name → wrong |
| Attribute value | neutral |
Left attr name neutral, left value → correct; same on right → wrong |
| Text content | neutral |
Left text → correct, right text → wrong |
| Element present only on left | — | Left subtree → correct, right → empty <span></span> |
| Element present only on right | — | Right subtree → wrong, left → empty <span></span> |
Attribute order is not significant:
Output Format
Each output string is pretty-printed HTML. XML special characters (<, >, &, ") inside span text are HTML-escaped. Indentation uses 2 spaces per level. Output does not include an XML declaration.
Example shape:
<span class="neutral"><root>
<child </span><span class="correct">attr</span><span class="neutral">="</span><span class="correct">value</span><span class="neutral">">
</span><span class="correct">text here</span><span class="neutral">
</child>
</root></span>
Dependencies
<!-- XML diffing -->
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-core</artifactId>
<version>2.10.0</version>
</dependency>
<!-- Testing -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.11.0</version>
<scope>test</scope>
</dependency>
XMLUnit 2.x is the diffing engine. It produces a list of Comparison objects, each with:
getType()—ComparisonTypeenum:ELEMENT_TAG_NAME,ATTR_VALUE,ATTR_NAME_LOOKUP,TEXT_VALUE,CHILD_NODELIST_LENGTH,HAS_CHILD_NODES, etc.getControlDetails().getXPath()— XPath of the affected node on the left sidegetTestDetails().getXPath()— XPath of the affected node on the right side
Algorithm
Step 1 — Diff (DiffEngine)
Diff diff = DiffBuilder
.compare(leftXml)
.withTest(rightXml)
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName))
.ignoreWhitespace()
.build();
For each Comparison c in diff.getDifferences():
record (c.getControlDetails().getXPath(), c.getTestDetails().getXPath(), c.getType())
into two maps: leftDiffs: XPath → ComparisonType
rightDiffs: XPath → ComparisonType
Step 2 — Render (HtmlRenderer)
Walk each DOM tree independently, pretty-printing to HTML. At each node, look up its XPath in the relevant diff map to determine its CSS class.
Element node:
xp = xpathOf(node)
if leftDiffs contains xp with type ELEMENT_TAG_NAME:
emit tag name as correct/wrong
emit all attributes + children recursively as skipped
else:
emit tag name as neutral
for each attribute (in document order):
emit based on attr-level diff lookup
recurse into children
Text node:
xp = xpathOf(node)
if leftDiffs/rightDiffs contains xp with type TEXT_VALUE:
emit as correct / wrong
else:
emit as neutral
Missing child (CHILD_NODELIST_LENGTH or similar):
emit present side as correct/wrong
emit absent side as empty <span></span>
XPaths are computed from the DOM tree as each node is visited, matching the XPaths that XMLUnit generates (e.g. /root[1]/child[1]).
Step 3 — Output
XmlDiff.compare() calls DiffEngine, then calls HtmlRenderer once for the left tree and once for the right tree, returning a DiffResult.
Test Cases
| # | Scenario | Left class | Right class |
|---|---|---|---|
| 1 | Identical simple elements | all neutral |
all neutral |
| 2 | Differing text content | text correct |
text wrong |
| 3 | Differing attribute value | value correct |
value wrong (name neutral) |
| 4 | Differing attribute name | name correct |
name wrong |
| 5 | Differing element name | name correct, children skipped |
name wrong, children skipped |
| 6 | Extra child on left only | child correct |
empty span |
| 7 | Extra child on right only | empty span | child wrong |
| 8 | Attribute order differs | first mismatch correct |
first mismatch wrong |
| 9 | Nested elements, partial diff | only differing subtree marked | same |
| 10 | Self-closing element, no diff | all neutral |
all neutral |
Assumptions
- Comments, processing instructions, and CDATA sections are ignored.
- Whitespace-only text nodes between elements are ignored (XMLUnit
ignoreWhitespace()). - Namespace prefixes are treated as plain text; no namespace-aware comparison.
- The library is stateless;
XmlDiff.compare()is safe to call concurrently.