143 lines
No EOL
6 KiB
Markdown
143 lines
No EOL
6 KiB
Markdown
# xmldiff — Implementation Plan
|
|
|
|
## Goal
|
|
|
|
A Java library that takes two XML strings (left = expected, right = actual) and produces two HTML strings suitable for rendering a side-by-side diff. Each output is a `<span>` tree with inner spans annotated with CSS classes.
|
|
|
|
## CSS Classes
|
|
|
|
| Class | Meaning |
|
|
|------------|-----------------------------------------------------------|
|
|
| `neutral` | This token is identical in both sides |
|
|
| `correct` | This token is on the **left** side and differs from right |
|
|
| `wrong` | This token is on the **right** side and differs from left |
|
|
| `skipped` | Child content of an element whose **tag name** differs |
|
|
|
|
## Diff Granularity Rules
|
|
|
|
| Token | If equal | If different |
|
|
|-------------------------------|------------|--------------------------------------------------------------------------------------|
|
|
| Element name | `neutral` | Left → `correct`, right → `wrong`; all content (attrs, children, text) → `skipped` |
|
|
| Attribute name | `neutral` | Left attr name → `correct`, right attr name → `wrong` |
|
|
| Attribute value | `neutral` | Left attr name neutral, left value → `correct`; same on right → `wrong` |
|
|
| Text content | `neutral` | Left text → `correct`, right text → `wrong` |
|
|
| Element present only on left | — | Left subtree → `correct`, right → empty `<span></span>` |
|
|
| Element present only on right | — | Right subtree → `wrong`, left → empty `<span></span>` |
|
|
|
|
Attribute **order is not significant**:
|
|
|
|
## Output Format
|
|
|
|
Each output string is pretty-printed HTML. XML special characters (`<`, `>`, `&`, `"`) inside span text are HTML-escaped. Indentation uses 2 spaces per level. Output does **not** include an XML declaration.
|
|
|
|
Example shape:
|
|
|
|
```html
|
|
<span class="neutral"><root>
|
|
<child </span><span class="correct">attr</span><span class="neutral">="</span><span class="correct">value</span><span class="neutral">">
|
|
</span><span class="correct">text here</span><span class="neutral">
|
|
</child>
|
|
</root></span>
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
```xml
|
|
<!-- XML diffing -->
|
|
<dependency>
|
|
<groupId>org.xmlunit</groupId>
|
|
<artifactId>xmlunit-core</artifactId>
|
|
<version>2.10.0</version>
|
|
</dependency>
|
|
|
|
<!-- Testing -->
|
|
<dependency>
|
|
<groupId>org.junit.jupiter</groupId>
|
|
<artifactId>junit-jupiter</artifactId>
|
|
<version>5.11.0</version>
|
|
<scope>test</scope>
|
|
</dependency>
|
|
```
|
|
|
|
**XMLUnit 2.x** is the diffing engine. It produces a list of `Comparison` objects, each with:
|
|
- `getType()` — `ComparisonType` enum: `ELEMENT_TAG_NAME`, `ATTR_VALUE`, `ATTR_NAME_LOOKUP`, `TEXT_VALUE`, `CHILD_NODELIST_LENGTH`, `HAS_CHILD_NODES`, etc.
|
|
- `getControlDetails().getXPath()` — XPath of the affected node on the left side
|
|
- `getTestDetails().getXPath()` — XPath of the affected node on the right side
|
|
|
|
## Algorithm
|
|
|
|
### Step 1 — Diff (DiffEngine)
|
|
|
|
```
|
|
Diff diff = DiffBuilder
|
|
.compare(leftXml)
|
|
.withTest(rightXml)
|
|
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName))
|
|
.ignoreWhitespace()
|
|
.build();
|
|
|
|
For each Comparison c in diff.getDifferences():
|
|
record (c.getControlDetails().getXPath(), c.getTestDetails().getXPath(), c.getType())
|
|
into two maps: leftDiffs: XPath → ComparisonType
|
|
rightDiffs: XPath → ComparisonType
|
|
```
|
|
|
|
### Step 2 — Render (HtmlRenderer)
|
|
|
|
Walk each DOM tree independently, pretty-printing to HTML. At each node, look up its XPath in the relevant diff map to determine its CSS class.
|
|
|
|
**Element node:**
|
|
```
|
|
xp = xpathOf(node)
|
|
if leftDiffs contains xp with type ELEMENT_TAG_NAME:
|
|
emit tag name as correct/wrong
|
|
emit all attributes + children recursively as skipped
|
|
else:
|
|
emit tag name as neutral
|
|
for each attribute (in document order):
|
|
emit based on attr-level diff lookup
|
|
recurse into children
|
|
```
|
|
|
|
**Text node:**
|
|
```
|
|
xp = xpathOf(node)
|
|
if leftDiffs/rightDiffs contains xp with type TEXT_VALUE:
|
|
emit as correct / wrong
|
|
else:
|
|
emit as neutral
|
|
```
|
|
|
|
**Missing child (CHILD_NODELIST_LENGTH or similar):**
|
|
```
|
|
emit present side as correct/wrong
|
|
emit absent side as empty <span></span>
|
|
```
|
|
|
|
XPaths are computed from the DOM tree as each node is visited, matching the XPaths that XMLUnit generates (e.g. `/root[1]/child[1]`).
|
|
|
|
### Step 3 — Output
|
|
|
|
`XmlDiff.compare()` calls `DiffEngine`, then calls `HtmlRenderer` once for the left tree and once for the right tree, returning a `DiffResult`.
|
|
|
|
## Test Cases
|
|
|
|
| # | Scenario | Left class | Right class |
|
|
|---|---------------------------------------|------------|-------------|
|
|
| 1 | Identical simple elements | all `neutral` | all `neutral` |
|
|
| 2 | Differing text content | text `correct` | text `wrong` |
|
|
| 3 | Differing attribute value | value `correct` | value `wrong` (name neutral) |
|
|
| 4 | Differing attribute name | name `correct` | name `wrong` |
|
|
| 5 | Differing element name | name `correct`, children `skipped` | name `wrong`, children `skipped` |
|
|
| 6 | Extra child on left only | child `correct` | empty span |
|
|
| 7 | Extra child on right only | empty span | child `wrong` |
|
|
| 8 | Attribute order differs | first mismatch `correct` | first mismatch `wrong` |
|
|
| 9 | Nested elements, partial diff | only differing subtree marked | same |
|
|
| 10| Self-closing element, no diff | all `neutral` | all `neutral` |
|
|
|
|
## Assumptions
|
|
|
|
- Comments, processing instructions, and CDATA sections are ignored.
|
|
- Whitespace-only text nodes between elements are ignored (XMLUnit `ignoreWhitespace()`).
|
|
- Namespace prefixes are treated as plain text; no namespace-aware comparison.
|
|
- The library is stateless; `XmlDiff.compare()` is safe to call concurrently. |