xmldiff/xmldiff.md
2026-03-13 10:52:43 +01:00

143 lines
No EOL
6 KiB
Markdown

# xmldiff — Implementation Plan
## Goal
A Java library that takes two XML strings (left = expected, right = actual) and produces two HTML strings suitable for rendering a side-by-side diff. Each output is a `<span>` tree with inner spans annotated with CSS classes.
## CSS Classes
| Class | Meaning |
|------------|-----------------------------------------------------------|
| `neutral` | This token is identical in both sides |
| `correct` | This token is on the **left** side and differs from right |
| `wrong` | This token is on the **right** side and differs from left |
| `skipped` | Child content of an element whose **tag name** differs |
## Diff Granularity Rules
| Token | If equal | If different |
|-------------------------------|------------|--------------------------------------------------------------------------------------|
| Element name | `neutral` | Left → `correct`, right → `wrong`; all content (attrs, children, text) → `skipped` |
| Attribute name | `neutral` | Left attr name → `correct`, right attr name → `wrong` |
| Attribute value | `neutral` | Left attr name neutral, left value → `correct`; same on right → `wrong` |
| Text content | `neutral` | Left text → `correct`, right text → `wrong` |
| Element present only on left | — | Left subtree → `correct`, right → empty `<span></span>` |
| Element present only on right | — | Right subtree → `wrong`, left → empty `<span></span>` |
Attribute **order is not significant**:
## Output Format
Each output string is pretty-printed HTML. XML special characters (`<`, `>`, `&`, `"`) inside span text are HTML-escaped. Indentation uses 2 spaces per level. Output does **not** include an XML declaration.
Example shape:
```html
<span class="neutral">&lt;root&gt;
&lt;child </span><span class="correct">attr</span><span class="neutral">="</span><span class="correct">value</span><span class="neutral">"&gt;
</span><span class="correct">text here</span><span class="neutral">
&lt;/child&gt;
&lt;/root&gt;</span>
```
## Dependencies
```xml
<!-- XML diffing -->
<dependency>
<groupId>org.xmlunit</groupId>
<artifactId>xmlunit-core</artifactId>
<version>2.10.0</version>
</dependency>
<!-- Testing -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.11.0</version>
<scope>test</scope>
</dependency>
```
**XMLUnit 2.x** is the diffing engine. It produces a list of `Comparison` objects, each with:
- `getType()``ComparisonType` enum: `ELEMENT_TAG_NAME`, `ATTR_VALUE`, `ATTR_NAME_LOOKUP`, `TEXT_VALUE`, `CHILD_NODELIST_LENGTH`, `HAS_CHILD_NODES`, etc.
- `getControlDetails().getXPath()` — XPath of the affected node on the left side
- `getTestDetails().getXPath()` — XPath of the affected node on the right side
## Algorithm
### Step 1 — Diff (DiffEngine)
```
Diff diff = DiffBuilder
.compare(leftXml)
.withTest(rightXml)
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName))
.ignoreWhitespace()
.build();
For each Comparison c in diff.getDifferences():
record (c.getControlDetails().getXPath(), c.getTestDetails().getXPath(), c.getType())
into two maps: leftDiffs: XPath → ComparisonType
rightDiffs: XPath → ComparisonType
```
### Step 2 — Render (HtmlRenderer)
Walk each DOM tree independently, pretty-printing to HTML. At each node, look up its XPath in the relevant diff map to determine its CSS class.
**Element node:**
```
xp = xpathOf(node)
if leftDiffs contains xp with type ELEMENT_TAG_NAME:
emit tag name as correct/wrong
emit all attributes + children recursively as skipped
else:
emit tag name as neutral
for each attribute (in document order):
emit based on attr-level diff lookup
recurse into children
```
**Text node:**
```
xp = xpathOf(node)
if leftDiffs/rightDiffs contains xp with type TEXT_VALUE:
emit as correct / wrong
else:
emit as neutral
```
**Missing child (CHILD_NODELIST_LENGTH or similar):**
```
emit present side as correct/wrong
emit absent side as empty <span></span>
```
XPaths are computed from the DOM tree as each node is visited, matching the XPaths that XMLUnit generates (e.g. `/root[1]/child[1]`).
### Step 3 — Output
`XmlDiff.compare()` calls `DiffEngine`, then calls `HtmlRenderer` once for the left tree and once for the right tree, returning a `DiffResult`.
## Test Cases
| # | Scenario | Left class | Right class |
|---|---------------------------------------|------------|-------------|
| 1 | Identical simple elements | all `neutral` | all `neutral` |
| 2 | Differing text content | text `correct` | text `wrong` |
| 3 | Differing attribute value | value `correct` | value `wrong` (name neutral) |
| 4 | Differing attribute name | name `correct` | name `wrong` |
| 5 | Differing element name | name `correct`, children `skipped` | name `wrong`, children `skipped` |
| 6 | Extra child on left only | child `correct` | empty span |
| 7 | Extra child on right only | empty span | child `wrong` |
| 8 | Attribute order differs | first mismatch `correct` | first mismatch `wrong` |
| 9 | Nested elements, partial diff | only differing subtree marked | same |
| 10| Self-closing element, no diff | all `neutral` | all `neutral` |
## Assumptions
- Comments, processing instructions, and CDATA sections are ignored.
- Whitespace-only text nodes between elements are ignored (XMLUnit `ignoreWhitespace()`).
- Namespace prefixes are treated as plain text; no namespace-aware comparison.
- The library is stateless; `XmlDiff.compare()` is safe to call concurrently.