The "3DM" XML 3-way Merging and Differencing Tool 

Keywords: tree, diff, differencing, merge, merging, 3-way, XML, treediff, xmldiff

0. Recent Developemnt

Please go to the 3dm home page for a more up-to-date version of the home page and 3dm itself. The information on this page is becoming increasingly obsolete.

1. What is 3DM?

The 3DM tool is a tool for performing 3-way merging and differencing of XML files. Unlike line-based tools, such as diff and diff3, 3DM is aware of the structure of the processed XML documents. 3DM is not limited to update/insert/delete operations, it also handles moves and copies of entire subtrees. 3DM is not reliant on edit histories; the only input needed are the XML files. The tool was implemented as a part of my master's thesis (2.0M pdf, updated 26/12/2001, 3.9M .ps.gz, updated 26/12/20011) , which is the currently recommended reading for those interested in detailed examples (chapter 2 and 10) and the algorithms (chapters 5-9).
These web pages will probably be updated to be more extensive if enough interest for the tool arises :)

2. An XML differencing example

Consider the two XML files (merge case A16 in the thesis):

b.xml
2.xml
<R>
    <a>
        <b/>
        <m/>
5
    </a>
    <c>
3
        <cc/>
    </c>
</R>
<R>
    <c>
        <cc/>
    </c>
    <c>
        <cc/>
    </c>
    <m/>
</R>

The numbers are node identifiers, assigned according to the BFS traversal of the XML  parse tree. They are not part of the XML file. The difference is obtained by running:

3dm --diff b.xml 1.xml

And looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<diff>
 <diff:copy src="3" dst="1" />
 <diff:copy src="3" dst="1" />
 <diff:copy src="5" dst="1" />
</diff>

The src attribute refers to nodes in b.xml by their node identifier.  The diff says that 2.xml is formed by setting the children of <R> to the subtrees rooted at node 3 (twice) and node 5 from b.xml. The differencing algorithm and the format of the diff is described in chapter 8 of the thesis.

3. A 3-way merging example

Consider the three XML files shown below: a base file (b.xml ) and two parallell versions (branches), 1.xml and 2.xml , obtained by editing b.xml in different ways. When performing a 3-way merge of these files, we get a file that contains both the changes between b.xml and 1.xml as well as the changes between b.xml and 2.xml. The changes between b.xml and 1.xml are:
  1. <R> is updated to <Rp>
  2. <i> has been inserted below <m>
The changes between b.xml and 2.xml are:
  1. The subtree rooted at <a> has been deleted, except for <m> which is moved as the last child of <R>
  2. The subtree rooted at <c> has been copied

1.xml
b.xml
2.xml
<Rp>
    <a>
        <b/>
        <m>
            <i/>
        </m>
    </a>
    <c>
        <cc/>
    </c>
</Rp>

<R>
    <a>
        <b/>
        <m/>

    </a>
    <c>

        <cc/>
    </c>
</R>
<R>
    <c>
        <cc/>
    </c>
    <c>
        <cc/>
    </c>
    <m/>
</R>


The 3-way merge is performed by running the command shown below. You can ignore the meaning of -c 0 for now.

3dm -c 0 --merge b.xml 1.xml 2.xml

The result is shown below. Note how the changes from both branches have been integrated.

<?xml version="1.0" encoding="UTF-8"?>
<Rp>
 <c>
  <cc />
 </c>
 <c>
  <cc />
 </c>
 <m>
  <i />
 </m>
</Rp>


The merging algorithm is described in chapers 6-7 of the thesis.

4. Obtaining 3DM

Please go to the 3dm home page. The Java source of  3DM is released under the Lesser Gnu Public License (LGPL). To run 3dm, you will also need a SAX2 compilant XML parser (the default parser is the Xerces parser from the Apache project) and the GNU java-getopt package ( home page , local copy of java-getopt-1.0.8.jar).

Old (pre 0.1.0) versions are available here (3dm_src.tar.gz) and here (3dm_src.zip). A pre-compiled pre-0.1.0 JAR file is also available (3dm.jar)

5. Running 3DM

Running 3DM is easy, just start java with Xerces, java-getopt and 3dm.jar (or the directory of compiled classes) in the classpath and TreeDiffMerge as main class. Below are shell scripts for Linux and Windows.

Linux 3dm

#!/bin/sh
# 3DM startup script
JAVA=/usr/local/jbuilder4/jdk1.3/bin/java
JARPATH=/home/ctl/ubidoc-support/lib
COMPILEPATH=3dm.jar
$JAVA -cp $JARPATH/java-getopt-1.0.8.jar:$JARPATH/xerces.jar:$COMPILEPATH TreeDiffMerge $*

Windows 3dm.bat

@echo off
rem 3DM startup script
set JAVA=c:\bin\jbuilder5\jdk1.3\bin\java
set JARPATH=c:\data\ubidoc-support\lib
set COMPILEPATH=3dm.jar
%JAVA% -cp %JARPATH%\java-getopt-1.0.8.jar;%JARPATH%/xerces.jar;%COMPILEPATH% TreeDiffMerge %1 %2 %3 %4 %5 %6 %7

6. Visualizing the merge

3DM can produce a "merge log" that lists the changes in the merged file compared to the base version. I've written a quick hack to visualize the merge log (sources .tar.gz , sources .zip and .jar file). The main class is TreeDiff, and it requires a SAX2 parser in the classpath; the .jar file is compiled for Xerces.  The figure below visualizes the merge log obtained by running the previously mentioned merge example.

TreeDiff screenshot

7. Current Limitations

Being prototype code, some less crucial things are currently missing from 3DM:
  1. DTD processing. 3DM simply assumes the DTD of the input files are identical
  2. More intelligent processing of entities. Currently 3DM (or actually the XML parser) expands all entities. A better alternative would be to merge entity declarations.
See chapter 10 of the thesis for a more in-depth description of the limitations.

8. Further development

There is a development website for 3DM at http://developer.berlios.de/project/?group_id=400. This page is also available at http://tdm.berlios.de/.

9. Contact

The author can be emailed at ctl theatsign cs dot hut dot fi.


1Postscript and PDF regenerated for better layout, no change to contents since rev 1.108
* $Id: index.html,v 1.8 2006/08/18 08:18:27 ctl Exp $