xmldiff — Implementation Plan

Goal

A Java library that takes two XML strings (left = expected, right = actual) and produces two HTML strings suitable for rendering a side-by-side diff. Each output is a <span> tree with inner spans annotated with CSS classes.

CSS Classes

Class	Meaning
`neutral`	This token is identical in both sides
`correct`	This token is on the left side and differs from right
`wrong`	This token is on the right side and differs from left
`skipped`	Child content of an element whose tag name differs

Diff Granularity Rules

Token	If equal	If different
Element name	`neutral`	Left → `correct`, right → `wrong`; all content (attrs, children, text) → `skipped`
Attribute name	`neutral`	Left attr name → `correct`, right attr name → `wrong`
Attribute value	`neutral`	Left attr name neutral, left value → `correct`; same on right → `wrong`
Text content	`neutral`	Left text → `correct`, right text → `wrong`
Element present only on left	—	Left subtree → `correct`, right → empty `<span></span>`
Element present only on right	—	Right subtree → `wrong`, left → empty `<span></span>`

Attribute order is not significant:

Output Format

Each output string is pretty-printed HTML. XML special characters (<, >, &, ") inside span text are HTML-escaped. Indentation uses 2 spaces per level. Output does not include an XML declaration.

Example shape:

<span class="neutral">&lt;root&gt;
  &lt;child </span><span class="correct">attr</span><span class="neutral">="</span><span class="correct">value</span><span class="neutral">"&gt;
    </span><span class="correct">text here</span><span class="neutral">
  &lt;/child&gt;
&lt;/root&gt;</span>

Dependencies

<!-- XML diffing -->
<dependency>
    <groupId>org.xmlunit</groupId>
    <artifactId>xmlunit-core</artifactId>
    <version>2.10.0</version>
</dependency>

<!-- Testing -->
<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter</artifactId>
    <version>5.11.0</version>
    <scope>test</scope>
</dependency>

XMLUnit 2.x is the diffing engine. It produces a list of Comparison objects, each with:

getType() — ComparisonType enum: ELEMENT_TAG_NAME, ATTR_VALUE, ATTR_NAME_LOOKUP, TEXT_VALUE, CHILD_NODELIST_LENGTH, HAS_CHILD_NODES, etc.
getControlDetails().getXPath() — XPath of the affected node on the left side
getTestDetails().getXPath() — XPath of the affected node on the right side

Algorithm

Step 1 — Diff (DiffEngine)

Diff diff = DiffBuilder
    .compare(leftXml)
    .withTest(rightXml)
    .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName))
    .ignoreWhitespace()
    .build();

For each Comparison c in diff.getDifferences():
    record (c.getControlDetails().getXPath(), c.getTestDetails().getXPath(), c.getType())
    into two maps:  leftDiffs: XPath → ComparisonType
                    rightDiffs: XPath → ComparisonType

Step 2 — Render (HtmlRenderer)

Walk each DOM tree independently, pretty-printing to HTML. At each node, look up its XPath in the relevant diff map to determine its CSS class.

Element node:

xp = xpathOf(node)
if leftDiffs contains xp with type ELEMENT_TAG_NAME:
    emit tag name as correct/wrong
    emit all attributes + children recursively as skipped
else:
    emit tag name as neutral
    for each attribute (in document order):
        emit based on attr-level diff lookup
    recurse into children

Text node:

xp = xpathOf(node)
if leftDiffs/rightDiffs contains xp with type TEXT_VALUE:
    emit as correct / wrong
else:
    emit as neutral

Missing child (CHILD_NODELIST_LENGTH or similar):

emit present side as correct/wrong
emit absent side as empty <span></span>

XPaths are computed from the DOM tree as each node is visited, matching the XPaths that XMLUnit generates (e.g. /root[1]/child[1]).

Step 3 — Output

XmlDiff.compare() calls DiffEngine, then calls HtmlRenderer once for the left tree and once for the right tree, returning a DiffResult.

Test Cases

#	Scenario	Left class	Right class
1	Identical simple elements	all `neutral`	all `neutral`
2	Differing text content	text `correct`	text `wrong`
3	Differing attribute value	value `correct`	value `wrong` (name neutral)
4	Differing attribute name	name `correct`	name `wrong`
5	Differing element name	name `correct`, children `skipped`	name `wrong`, children `skipped`
6	Extra child on left only	child `correct`	empty span
7	Extra child on right only	empty span	child `wrong`
8	Attribute order differs	first mismatch `correct`	first mismatch `wrong`
9	Nested elements, partial diff	only differing subtree marked	same
10	Self-closing element, no diff	all `neutral`	all `neutral`

Assumptions

Comments, processing instructions, and CDATA sections are ignored.
Whitespace-only text nodes between elements are ignored (XMLUnit ignoreWhitespace()).
Namespace prefixes are treated as plain text; no namespace-aware comparison.
The library is stateless; XmlDiff.compare() is safe to call concurrently.

6 KiB Raw Permalink Blame History