Tailor is a tool to migrate changesets between ArX, Bazaar, Bazaar-NG, CVS, Codeville, Darcs, Git, Mercurial, Monotone, Subversion and Tla [1] repositories.
This script makes it easier to keep the upstream changes merged in a branch of a product, storing needed information such as the upstream URI and revision in special properties on the branched directory.
The following ascii-art illustrates the usual scenario:
+------------+ +------------+ +--------------+ | Immutable | | Working | | Upstream CVS |-------->| darcs |----------->| darcs | | repository | tailor | repository | darcs pull | repository | +--------------+ +------------+ +------------+ |^ || || v| User
Ideally you should be able to swap and replace "CVS server" and "darcs repository" with any combination of the supported systems.
A more convoluted setup shows how brave people are using it to get a two way sync:
+----------+ +--------+ +--------+ +---------+ | | -----> | hybrid | darcs | | ----> | my | | upstream | tailor | CVS | -----> | master | darcs | working | | CVS | <----- | darcs | <----- | darcs | <---- | darcs | | | | sync | tailor | | | | +----------+ +--------+ +--------+ +---------+ (cron) (cron)
[1] | ArX and Codeville systems may be used only as the target backend, since the source support isn't coded yet. Contributions on these backends will be very appreciated, since I do not use them enough to figure out the best way to get pending changes and build tailor ChangeSets out of them. To the opposite, Bazaar (1.0, not Bazaar-NG) and Tla are supported only as source systems. |
tailor is written in Python, and thus Python must be installed on your system to use it. It has been successfully used with Python 2.3 and 2.4.
Since it relies on external tools to do the real work such as cvs, darcs [2] and svn, they need to be installed as well, although only those you will actually use.
Make tailor executable:
$ chmod +x tailor
You can either run tailor where it is currently located, or move it along with the vcpx directory to a location in your PATH.
There's even a standard setup.py that you may use to install the script using Python's conventional distutils.
[2] | Darcs 1.0.2 is too old, 1.0.3 is good, 1.0.4 (the fourth release candidate is under final testing) is recommended since it's faster in most operations! |
Tailor has more than 50 unit and operational tests, that you can run with the following command line:
$ tailor test -v
Since some tests take very long to complete, in particular the operational tests, you may prefer the execution of a single suite:
$ tailor test -v Darcs
or even a single test within a suite:
$ tailor test StateFile.testJournal
To obtain a list of the test, use --list option. As usual with:
$ tailor test --help
you will get some more details.
tailor needs now a configuration file that collects the various bits of information it needs to do it's job.
The simplest way of starting out a new configuration is by omitting the --configfile command line option, and specifying the other as needed plus --verbose: in this situation, tailor will print out an equivalent configuration that you can redirect to a file, that you later will pass as --configfile.
Bootstrap a new tailored project, starting at upstream revision 10
First create a config file:
$ tailor --verbose -s svn -R http://svn.server/path/to/svnrepo \ --module /Product/trunk -r 10 --subdir Product \ ~/darcs/MyProduct > myproject.tailor
Modify it as you like (mostly adjusting root-directories and the like):
$ emacs myproject.tailor
Run tailor on it:
$ tailor --configfile myproject.tailor
Bootstrap a new product, fetching its whole CVS repository and storing under SVN
First create a config file:
$ tailor --verbose --source-kind cvs --target-kind svn \ --repository :pserver:cvs.zope.org:/cvs-repository \ --module CMF/CMFCore --revision INITIAL \ --target-repository file:///some/where/svnrepo \ --target-module / cmfcore > cmfcore.tailor
Modify it as you like (mostly adjusting root-directories and the like):
$ emacs cmfcore.tailor
Note
By default, tailor uses "." as subdir, to mean that it will extract upstream source directly inside the root-directory.
This is known to cause problems with CVS as source, with which you could see some wierd error like
$ cvs -q -d ...:/cvsroot/mymodule checkout -d . ... mymodule cvs checkout: existing repository /cvsroot/mymodule does not match /cvsroot/mymodule/mymodule cvs checkout: ignoring module mymodule
When this is the case, the culprit may be a CVS shortcoming not being able to handle -d . in the right way. Specify a different subdir option to avoid the problem.
Run tailor on it once, to bootstrap the project:
$ tailor -D -v --configfile cmfcore.tailor
If the target repository is on the local filesystem (ie, it starts with file:///) and it does not exist, tailor creates a new empty Subversion repository at the specified location.
Note
Before step d) below, you may want to install an appropriate hook in the repository to enable the propset command to operate on unversioned properties, as described in the svn manual. Then you can specify '--use-svn-propset' option, and tailor will put the original author and timestamp in the proper svn metadata instead of appending them to the changelog.
Other than the annoying repository manual intervention, this thread and this other explain why using -r{DATE} may produce strange results with this setup.
Run tailor again and again, to sync up with latest changes:
$ tailor -D -v --configfile myproject.tailor
Given the configuration file shown below in Config file format, the following command:
$ tailor --configfile example.tailor
is equivalent to this one:
$ tailor --configfile example.tailor tailor
in that they operate respectively on the default project(s) or the ones specified on the command line (and in this case there is just a single default project, tailor).
This one instead:
$ tailor --configfile example.tailor tailor tailor-reverse
operates on both projects.
With CVS, you can specify a particular point in time specifying a start-revision with a timestamp like 2001-12-25 23:26:48 UTC.
To specify also a particular branch, prepend it before the timestamp, as in unstable-branch 2001-12-25 23:26:48 UTC.
To migrate the whole history of a specific branch, use something like somebranch INITIAL.
Should one of the replayed changes generate any conflict, tailor will prompt the user to correct them. This is done after the upstream patch has been applied and before the final commit on the target system, so that manually tweaking the conflict can produce a clean patch.
Tailor currently suffers of the following reported problems:
This list will always be incomplete, but I'll do my best to keep it short :-)
When your project is composed by multiple upstream modules, it is easier to collect such information in a single file. This is done by specifying the --configfile option with a file name as argument. In this case, tailor will read the above information from a standard Python ConfigParser file.
For example:
[DEFAULT] verbose = True projects = tailor [tailor] root-directory = /tmp/n9 source = darcs:tailor target = svn:tailor state-file = tailor.state [tailor-reverse] root-directory = /tmp/n9 source = svn:tailor target = darcs:tailor state-file = reverse.state [svn:tailor] repository = file:///tmp/testtai module = /project1 subdir = svnside [darcs:tailor] repository = ~/WiP/cvsync subdir = darcside
The configuration may hold one or more projects and two or more repositories: project names do not contains colons ":", repository names must and the first part of the name before the colon specify the kind of the repository. So, the above example contains two projects, one that goes from darcs to subversion, the other in the opposite direction.
The [DEFAULT] section contains the default values, that will be used when a specific setting is missing from the particular section.
You can specify on which project tailor should operate by giving its name on the command line, even more than one. When not explicitly given, tailor will look at projects in the [DEFAULT] section, and if its missing it will loop over all projects in the configuration.
The following simpler config just go in one direction, for a single project, so no need neither for [DEFAULT].projects nor command line arguments:
[pxlib] source = cvs:pxlib target = hg:pxlib root-directory = ~/mypxlib start-revision = INITIAL subdir = pxlib [cvs:pxlib] repository = :pserver:anonymous@cvs.sf.net:/cvsroot/pxlib module = pxlib [hg:pxlib] hg-command = /usr/local/bin/hg
This will use a single directory, pxlib to contain both the source and the target system. If you prefer keeping the separated, you just need to specify a different directory for each repository [3], as in:
[pxlib] source = cvs:pxlib target = hg:pxlib root-directory = ~/mypxlib start-revision = INITIAL [cvs:pxlib] repository = :pserver:anonymous@cvs.sf.net:/cvsroot/pxlib module = pxlib subdir = original delay-before-apply = 10 [hg:pxlib] hg-command = /usr/local/bin/hg subdir = migrated
This will extract upstream CVS sources into ~/mypxlib/original, and create a new Mercurial repository in ~/mypxlib/migrated.
On final example to show the syntax of Bazaar sources:
[project] target = hg:target start-revision = base-0 root-directory = /tmp/calife state-file = tailor.state source = baz:source [baz:source] module = calife--pam--3.0 repository = roberto@keltia.net--2003-depot subdir = tla [hg:target] repository = /tmp/HG/calife-pam subdir = hg
[3] | NB: when the source and the target repositories specify different directories with the subdir option, tailor uses rsync to keep them in sync, so that tool needs to be installed. |
The [DEFAULT] section in the configuration file may set the default value for any of the recognized options: when a value is missing from a specific section it is looked up in this section.
One particular option, projects, is meaningful only in the [DEFAULT] section: it's a comma separated list of project names, the one that will be operated on by tailor when no project is specified on the command line. When the there are no projects setting nor any on the command line, tailor activates all configured projects, in order of appearance in the config file.
A project is identified by a section whose name does not contain any colon (":") character, and configured with the following values:
Note
If a particular option is missing from the project section, its value is obtained looking up the same option in the [DEFAULT] section.
Non mandatory options:
Some backends have a distinct notion of patch name and change log, others just suggest a policy that the first line of the message is a summary, the rest if present is a more detailed description of the change. With this option you can control the format of the name, or of the first line of the changelog.
The prototype may contain %(keyword)s such as 'author', 'date', 'revision', 'firstlogline', 'remaininglog' or 'project'. It defaults to [%(project)s @ %(revision)s]; setting it to the empty string means that tailor will simply use the original changelog.
When you set it empty, as in
[project] patch-name-format =
tailor will keep the original changelog as is.
Remove the first line of the upstream changelog. This is intended to go in pair with patch-name-format, when using it's 'firstlogline' variable to build the name of the patch. By default is False.
A reasonable usage is:
[DEFAULT] patch-name-format=[%(project)s @ %(revision)s]: %(firstlogline)s remove-first-log-line=True
All the section whose name contains at least one colon character denote a repository. A single repository may be shared by zero, one or more projects. The first part of the name up to the first colon indicates the kind of the repository, one of arx, baz, bzr, cdv, cvs, darcs, git, hg, monotone, svn and tla.
Note
If a particular option is missing from the repository section, its value is obtained looking up the same option in the section of the project currently using the repository, falling back to the [DEFAULT] section.
Some options may be shared with others repositories, like in the following example, where the common settings for the target monotone repository are set just once:
[DEFAULT] target-repository = /bigdisk/my-huge-repository.mtn target-keyid = test@example.com target-passphrase = lala source-repository = http://svn.someserver.com [productA] target = monotone:productA source = svn:sourceA [productB] target = monotone:productB source = darcs:sourceB [productC] target = monotone:productC source = svn:sourceC ... [monotone:productA] module = every.thing.productA [monotone:productB] module = every.thing.productB [monotone:productC] module = every.thing.productC [svn:sourceA] module = /productA [darcs:sourceB] repository = http://some.server.com/darcs/productB [svn:sourceC] module = /productC
When the source and target repositories use different subdirectories, tailor uses rsync to copy the changes between the two after each applied changeset. When the source repository basedir is a subdirectory of target basedir tailor prefixes all paths coming from upstream to match the relative position.
This defaults to the project's setting.
States the charset encoding the particular repository uses, and it's particularly important when it differs from local system setup, that you may inspect executing:
python -m locale
Sometime the migration is fast enough to put the upstream server under an excessive load. When this is the case, you may specify delay-before-apply = 5, that is the number of seconds tailor will wait before applying each changeset.
It defaults to None, ie no delay at all.
Maximum number of seconds allowed to separated commits to different files for them to be considered part of the same changeset.
180 by default.
With this enabled (it is off by default) tailor will use -kk flag on checkouts and updates to turn off the keyword expansion. This may help minimizing the chance of spurious conflicts with later merges between different branches.
False by default.
CVS and CVSPS repositories may turn off automatic tagging of entries, that tailor does by default to prevent manual interventions in the CVS working copy, using tag_entries = False.
True by default.
With this enabled (it is off by default) tailor will use -kk flag on checkouts and updates to turn off the keyword expansion. This may help minimizing the chance of spurious conflicts with later merges between different branches.
False by default.
CVS and CVSPS repositories may turn off automatic tagging of entries, that tailor does by default to prevent manual interventions in the CVS working copy, using tag_entries = False.
True by default.
Under darcs this may be either the name of a tag or the hash of an arbitrary patch in the repository, plus the ordinary INITIAL or HEAD symbols.
Note
If you want to start from a particular patch, giving it's hash value as start-revision, you must use a subdir different from .. [4]
Relative path to a git directory to use as a parent. This is one way to import branches into a git repository, which creates a new git repository borrowing ancestry from the parent-repo. It is quite a simple way, and thus believed to be quite robust, but spreads branches across several git repositories. If this parameter is not set, and repository is not set either, the branch has no parent.
The alternative is to specify a repository parameter, to contain all git branches. The .git directory in the working copy for each branch will then only contain the .git/index file.
A reference to the git commit which is the parent for the first revision on the branch to be imported. It can be a tag name or any syntax acceptable by git (eg. something like "tag~2", if you want to correct the idea of where the branchpoint is).
Since tailor generates mostly-stable SHA-1 revisions, you can usually also use a SHA-1 as branchpoint. Just import your trunk first, find the correct SHA-1, and setup and import your branch. This is especially useful since the current cvs source implementation misses many tags.
Activate (with True) or activate and specify (with a string) the filter on the svn log to eliminate illegal XML characters.
False by default, when set to True the following characters are washed out from the upstream changes:
allbadchars = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09" \ "\x0B\x0C\x0E\x0F\x10\x11\x12\x13\x14\x15" \ "\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F\x7f"
If this is not right or enough, you can specify a string value instead of the boolean flag, containing the characters to omit, as in:
filter-badchars=\x00\x01
Indicate that tailor is allowed to properly inject the upstream changeset's author and timestamp into the target repository. As stated above, this requires a manual intervention on the repository itself and thus is off by default, and tailor simply appends those values to the changelog. When active at bootstrap time and the repository is local, tailor creates automatically a minimal hooks/pre-revprop-change script inside the repository, so no other intervention is needed.
False by default.
Instead of executing tailor --configfile project.tailor.conf you can prepend the following signature to the config itself:
#!/usr/bin/env /path/to/tailor
Giving execute mode to it will permit the launch of the tailor process by running the config script directly:
$ ./project.tailor.conf
When a config file is signed in this way [5], either you pass it as argument to --configfile or executed as above, tailor will actually execute it as a full fledged Python script, that may define functions that alter the behaviour of tailor itself.
A common usage of this functionality is to define so called hooks, sequences of functions that are executed at particular points in the tailorization process.
Just to illustrate the functionality, consider the following example:
#!/usr/bin/env tailor """ [DEFAULT] debug = False verbose = True [project] target = bzr:target root-directory = /tmp/prova state-file = tailor.state source = darcs:source before-commit = before after-commit = after start-revision = Almost arbitrarily tagging this as version 0.8 [bzr:target] python-path = /opt/src/bzr.dev subdir = bzrside [darcs:source] repository = /home/lele/WiP/cvsync subdir = darcside """ def before(wd, changeset): print "BEFORE", changeset changeset.author = "LELE" return changeset def after(wd, changeset): print "AFTER", changeset
With the above in a script called say tester, just doing:
$ chmod 755 tester $ ./tester
will migrate the history from a darcs repository to a bazaar-ng one, forcing the author to a well-known name :-)
A pre commit hook may even alter the content of the files. The following function replaces the DOS end-of-line convention with the UNIX one:
def newlinefix(wd, changeset): from pyutil import lineutil lineutil.lineify_all_files(wd.basedir, strip=True, dirpruner=lineutil.darcs_metadir_dirpruner, filepruner=lineutil.source_code_filepruner) return True
It uses zooko's pyutil[#]_ toolset. Another approach would be looping over changeset.entries and operating only on added or changed entries.
This loops over the file touched by a particular changeset and tries to reindent it if it's a Python file:
def reindent_em(wd, changeset): import reindent import os for entry in changeset.entries: fname = os.path.join(wd.basedir, entry.name) try: if fname[-3:] == '.py': reindent.check(fname) except Exception, le: print "got an exception from attempt to reindent" \ " (maybe that file wasn't Python code?):" \ " changeset entry: %s, exception:" \ " %s %s %s" % (entry, type(le), repr(le), hasattr(le, 'args') and le.args,) raise le return True
You have to find reindent.py in your Python distribution and put it in your python path. Beware that this has some drawbacks: be sure to read ticket 8 annotations if you use it.
[5] | Tailor does actually read just the first two bytes from the file, and compare them with "#!", so you are free to choose whatever syntax works in your environment. |
[6] | Available either at https://yumyum.zooko.com:19144/pub/repos/pyutil or http://zooko.com/repos/pyutil. |
The state file stores two things: the last upstream revision that has been applied to the tree, and a sequence of pending (not yet applied) changesets, that may be empty. In the latter case, tailor will fetch latest changes from the upstream repository.
Tailor uses the Python's logging module to emit noise. It's basic configuration is hardwired and correspond to the following:
[formatters] keys = console [formatter_console] format = %(asctime)s [%(levelname).1s] %(message)s datefmt = %H:%M:%S [loggers] keys = root [logger_root] level = INFO handlers = console [handlers] keys = console [handler_console] class = StreamHandler formatter = console args = (sys.stdout,) level = INFO
However, you can completely override the default adding a supersection [[logging]] to the configuration file, something like:
# ... usual tailor config ... [project] source = bzr:source target = hg:target # Here ends tailor config, and start the one for the logging # module [[logging]] [logger_tailor.BzrRepository] level = DEBUG handlers = tailor.source [handler_tailor.source] class = SMTPHandler args = ('localhost', 'from@abc', ['tailor@abc'], 'Tailor log')
See the output of tailor -h for some further tips. There's also a wiki page that may give you some other hints. The development of Tailor is mainly driven by user requests at this point, and the preferred comunication medium is the dedicated mailing list [6].
I will be more than happy to answer any doubt, question or suggestion you may have on it. I'm usually hanging out as "lelit" on the #tailor IRC channel on the freenode.net network. Do not hesitate to contact me either by email or chatting there.
[7] | I wish to say a big Thank you to Zooko, for hosting the ML and for supporting Tailor in several ways, from suggestions to bug reporting and fixing. |
Lele Gaifax <lele@nautilus.homeip.net>
Since I'm not currently using all the supported systems (so little time, so many VCSs...) I'm not in position to test them out properly, but I'll do my best to keep them in sync, maybe with your support :-)
ArX support was contributed by Walter Landry.
Bazaar-NG support was contributed by Johan Rydberg. Nowadays it's being maintained by Lalo Martins.
Git support was contributed by Todd Mokros.
Monotone support was kindly contributed by Markus Schiltknecht and further developed by rghetta, that was able to linearize the multi-headed monotone history into something tailor groks. Kudos!
Tla support was contributed by Robin Farine.
Tailor is distribuited under the GNU General Public License.
This document and most of the internal documentation use the reStructuredText format so that it can be easily converted into other formats, such as HTML. For more information about this, please see:
http://docutils.sourceforge.net/rst.html