# xmldiff — Implementation Plan ## Goal A Java library that takes two XML strings (left = expected, right = actual) and produces two HTML strings suitable for rendering a side-by-side diff. Each output is a `` tree with inner spans annotated with CSS classes. ## CSS Classes | Class | Meaning | |------------|-----------------------------------------------------------| | `neutral` | This token is identical in both sides | | `correct` | This token is on the **left** side and differs from right | | `wrong` | This token is on the **right** side and differs from left | | `skipped` | Child content of an element whose **tag name** differs | ## Diff Granularity Rules | Token | If equal | If different | |-------------------------------|------------|--------------------------------------------------------------------------------------| | Element name | `neutral` | Left → `correct`, right → `wrong`; all content (attrs, children, text) → `skipped` | | Attribute name | `neutral` | Left attr name → `correct`, right attr name → `wrong` | | Attribute value | `neutral` | Left attr name neutral, left value → `correct`; same on right → `wrong` | | Text content | `neutral` | Left text → `correct`, right text → `wrong` | | Element present only on left | — | Left subtree → `correct`, right → empty `` | | Element present only on right | — | Right subtree → `wrong`, left → empty `` | Attribute **order is not significant**: ## Output Format Each output string is pretty-printed HTML. XML special characters (`<`, `>`, `&`, `"`) inside span text are HTML-escaped. Indentation uses 2 spaces per level. Output does **not** include an XML declaration. Example shape: ```html <root> <child attr="value"> text here </child> </root> ``` ## Dependencies ```xml org.xmlunit xmlunit-core 2.10.0 org.junit.jupiter junit-jupiter 5.11.0 test ``` **XMLUnit 2.x** is the diffing engine. It produces a list of `Comparison` objects, each with: - `getType()` — `ComparisonType` enum: `ELEMENT_TAG_NAME`, `ATTR_VALUE`, `ATTR_NAME_LOOKUP`, `TEXT_VALUE`, `CHILD_NODELIST_LENGTH`, `HAS_CHILD_NODES`, etc. - `getControlDetails().getXPath()` — XPath of the affected node on the left side - `getTestDetails().getXPath()` — XPath of the affected node on the right side ## Algorithm ### Step 1 — Diff (DiffEngine) ``` Diff diff = DiffBuilder .compare(leftXml) .withTest(rightXml) .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName)) .ignoreWhitespace() .build(); For each Comparison c in diff.getDifferences(): record (c.getControlDetails().getXPath(), c.getTestDetails().getXPath(), c.getType()) into two maps: leftDiffs: XPath → ComparisonType rightDiffs: XPath → ComparisonType ``` ### Step 2 — Render (HtmlRenderer) Walk each DOM tree independently, pretty-printing to HTML. At each node, look up its XPath in the relevant diff map to determine its CSS class. **Element node:** ``` xp = xpathOf(node) if leftDiffs contains xp with type ELEMENT_TAG_NAME: emit tag name as correct/wrong emit all attributes + children recursively as skipped else: emit tag name as neutral for each attribute (in document order): emit based on attr-level diff lookup recurse into children ``` **Text node:** ``` xp = xpathOf(node) if leftDiffs/rightDiffs contains xp with type TEXT_VALUE: emit as correct / wrong else: emit as neutral ``` **Missing child (CHILD_NODELIST_LENGTH or similar):** ``` emit present side as correct/wrong emit absent side as empty ``` XPaths are computed from the DOM tree as each node is visited, matching the XPaths that XMLUnit generates (e.g. `/root[1]/child[1]`). ### Step 3 — Output `XmlDiff.compare()` calls `DiffEngine`, then calls `HtmlRenderer` once for the left tree and once for the right tree, returning a `DiffResult`. ## Test Cases | # | Scenario | Left class | Right class | |---|---------------------------------------|------------|-------------| | 1 | Identical simple elements | all `neutral` | all `neutral` | | 2 | Differing text content | text `correct` | text `wrong` | | 3 | Differing attribute value | value `correct` | value `wrong` (name neutral) | | 4 | Differing attribute name | name `correct` | name `wrong` | | 5 | Differing element name | name `correct`, children `skipped` | name `wrong`, children `skipped` | | 6 | Extra child on left only | child `correct` | empty span | | 7 | Extra child on right only | empty span | child `wrong` | | 8 | Attribute order differs | first mismatch `correct` | first mismatch `wrong` | | 9 | Nested elements, partial diff | only differing subtree marked | same | | 10| Self-closing element, no diff | all `neutral` | all `neutral` | ## Assumptions - Comments, processing instructions, and CDATA sections are ignored. - Whitespace-only text nodes between elements are ignored (XMLUnit `ignoreWhitespace()`). - Namespace prefixes are treated as plain text; no namespace-aware comparison. - The library is stateless; `XmlDiff.compare()` is safe to call concurrently.