regexps.com
In a project tree, some of the files and directories are "part of
the source" -- they are of interest to arch
. Other files and
directories may be scratch files, editor back-up files, and temporary
or intermediate files generated by programs. Those other files should
be ignored by most arch
commands.
This chapter discusses how arch
recognizes which files to pay
attention to, and which to ignore.
arch
has flexible facilities for keeping track of all of the files
and directories in your project: for taking "inventories" of your
project tree. It has these facilities for three reasons:
Distinguishing Source arch
uses a project inventory to
distingiuish files and directories which are part of your project from
other files and directories which are temporary files, scratch files,
editor backup-files, and so forth.
Additionally, arch
permits you to overlay projects: store more than
one project at a single root. When you do that, arch
uses
inventories to sort out which files and directories belong to each
project. (The topic of overlays, however, is deferred until a later
chapter.)
Recognizing Renames Every file or directory in an arch
inventory
has two names. One name is simply the location (path) of the file
relative to the root of the project tree. The other name is a
"logical name" for the file: a name that remains the same regardless
of where in the project tree the file is located. When arch
compares two versions of a project tree, it uses logical names to
discover when files or directories have been moved, renamed, deleted,
or added.
Canonical Inventories Finally, arch
permits you to make a record
of the "canonical inventory" of your project -- all of the files
that you believe are supposed to be there. arch
can then tell you
whether any files are missing or have been added compared to the
canonical inventory.
For each project tree, you have a choice to make regarding how project inventories work. The options are described briefly here, then in more detail in the sections that follow.
Naming Conventions The simplest (and default) option is to simply
use
naming conventions
. arch
will search your tree for files
matching certain naming patterns, and consider all of those files to
be source files.
When you use only naming conventions to take an inventory, the logical
name of a file and its location name are exactly the same. For that
reason, if you rename a file, arch
will think you deleted a file
with the old name, and added a file with the new name. If you delete
a file, then add a file with the same name, arch
will think that the
new file is a modified form of the old file. None of those
limitations are fatal, arch
will still work, but they do limit the
effectiveness of arch
at branching and merging. ("Branching" and
"merging" are topics of a later chapter.)
Explicit Inventories Another option is to use an
explicit inventory
.
Once again, arch
will search for files that satisfy certain naming
conventions -- but not every such file or directory is automatically
source. Instead, whenever you add, delete or renamed a file, you must
inform arch
of that fact explicitly. For example, after adding the
file foo.c
, you have to tell arch
:
% larch add foo.c
and if you rename foo.c
to bar.c
, then you must also tell arch
:
% larch move foo.c bar.c
Implicit Inventories A third option combines some of the advantages
of using naming conventions with some of the advantages of explicit
inventories:
implicit inventories
. When you use an implicit
inventory, every file that passes the naming conventions is considered
source. You may explicitly add, delete, and rename files --
allowing arch
to precisely track renames for those files and
directories. You also may store a
file tag
(the "logical name"
of a file) in any file. If you don't explicitly tag a file, and use
an implicit inventory, arch
will search for those embedded tags and
use them to precisely detect new files, deleted files, and renamed
files.
Each of the three options is called a tagging method .
There is some advice at the end of this chapter about how to choose among the three tagging methods.
If you never explicitly specify a tagging method, arch
will use
simple naming conventions, by default. You can also make explicit
your choice to use only naming conventions with this command issued in
a project tree:
% larch tagging-method names
Similarly, to use either an implicit or explicit inventory, use one of the commands:
% larch tagging-method explicit % larch tagging-method implicit
To find out what method a given project tree uses, use the same command with no argument:
% larch tagging-method names
The command larch inventory
is used to print a list of source files.
It has many options, including options to print other kinds of file
lists (such as a list of all editor backup files, or a list of all
files which are not source):
% cd source-tree
% larch inventory --source hello.c hello.h library library/buffer.c library/buffer.h ...
contrasted with:
% cd source-tree
% ls hello.c hello.c.~1~ hello.h library
(Notice that hello.c.~1~
is not included in the inventory of
source files.)
The naming conventions used by arch
are as follows:
Control Files A
control file
is part of the source, but control
files are not included in the output of larch inventory
unless the
--all
flag is used. Control file and directory names match any of
these patterns:
.arch-project-tree .arch-ids .owned.* .common {arch}
Junk Files A junk file is not part of the source. A junk file or directory name matches the pattern:
,*
or if it contains any of the characters:
<space> <tab> <newline> [ ]
? \
Note that if a directory name matches that pattern, then none of the contents of the directory are part of the source, regardless of their names.
Junk files are listed by the command:
% larch inventory --junk
Arch sometimes creates junk files and directories of its own. When it does, those files and directories have names that match the pattern:
,,*
You should avoid creating files and directories with names that match
that pattern. arch
will freely delete files and directories with
names that match ,,*
whenever it needs to re-use such a name.
Usually, arch
will delete any junk file it creates before the
command that created the junk file terminates. Sometimes, though,
when a command fails, arch
will leave behind junk files or
directories matching ,,*
. This is a debugging feature, likely to be
removed in a future release. For now, whenever you find such a file
(and are confident it isn't being used by a currently running
command), you are free to delete it.
Backup Files If a file is not a junk file, it may be a backup file . Backup files are not part of the source. They match any of the patterns:
*~ *.bak *.modified *.orig *.original *.rej *.rejects
Backup files are listed by the command:
% larch inventory --backups
Precious Files If a file is not a control file, junk file, or backup
file, it might be a
precious file
. Precious files are not part of
the source, but arch
does sometimes treat them specially. For
example, when arch
copies a directory of source for you, it copies
not only the source files, but the precious files as well.
Precious files and directories match one of these patterns:
+* .gdbinit =build* =install* CVS RCS TAGS
Of course, precious files can be listed by the command:
% larch inventory --precious
Sometimes arch
will create its own precious files -- usually to save
some information that you might not want to lose. When it does, it
creates a file or directory matching the pattern:
++*
You should avoid creating such filenames yourself. arch
won't every
delete such a file -- but if one happens to get in the way of an
arch
command, that command will fail with an error.
Source Files If a file is not a control, junk, backup, or precious
file, it might be an ordinary
source file
. Source files are, of
course, the files that arch
stores in an archive (along with control
files).
Source files must match the pattern:
[=a-zA-Z0-9]*
but must not match any of the patterns:
*.o *.core core
Ordinary source files are listed by:
% larch inventory --source
Some files which are arch
control files are counted as source even
though they don't match the patterns above. However, these files are
not listed by default. All source files (ordinary source plus control
files) are listed by:
% larch inventory --source --all
Unrecognized Files Any file that doesn't fall into the above categories is an unrecognized file . Unrecognized files can be listed by the command:
% larch inventory --unrecognized
WARNING The basic pattern for source files is:
[=a-zA-Z]*
however, you should restrict yourself to file names that do not
contain spaces. Filenames containing spaces are likely to trigger
bugs in the current release of arch
.
If you want to use explicit designation of source filess, rather than naming conventions alone, then use this command:
% larch tagging-method explicit
Note that you must use that command from within a working directory
tree that has already been initialized using init
.
When using explicit designation, it is (ordinarilly) necessary to add every file and directory in the source to the explicit list using the command:
% larch add FILE
If FILE
is a directory, that will create FILE/.arch_ids/=id
. If
it is a regular file or symbolic link, it will create (in the same
directory) .arch_ids/FILE.id
. In either case, the file created will
contain an obscure string known as an "inventory tag" (inventory
tags are explained in more detail below).
If you remove a regular file or symbolic link, you must use the command:
% larch delete FILE
That won't remove FILE
itself, but it will remove the inventory tag
for FILE
.
In order to remove a directory, you must yourself remove the
.arch_ids
subdirectory. That will also implicitly remove the
inventory tags of any files that arch
thinks are stored in that
directory.
If you rename a regular file or symbolic link, you can use the command:
% larch move OLD-NAME NEW-NAME
to move the inventory tag for that file.
If you rename a directory, it's inventory tag (and the tags for all
files and subdirectories it contains) move with it automatically
(because the .arch_ids
subdirectory has moved).
When you run larch inventory
in a working directory using explicit
designation, only explicitly designated source files are listed.
If you would rather see a list of all files passing the naming
conventions for source files, use:
% larch inventory --source --names
You should also read about tree-lint
later in this chapter.
To use implicit tagging, use the following command in your working directory:
% larch tagging-method implicit
When implicit tagging is used, every file that passes the naming
conventions is treated as source. If a file or directory has an
explicit tag (created with add
), arch
will use that explicit
tag to recognize when a file has moved. If a file (but not a
directory or symbolic link) lacks an explicit tag, arch
will look
for a tag in the file itself.
A tag within a file has one of two forms. It may be either:
<punct><basename><spaces>-<spaces><tag>
where <punct>
is an arbitrary string of punctuation and spaces,
<basename>
is the basename of the file, and <tag>
an inventory tag
for the file. Or:
<punct>tag:<spaces><tag>
In either case, <tag>
should be unique among the files within a
directory. A tag within a file must occur within the first 1024
bytes
of the file.
A handy convention for source files is to add a comment to the top of every file, briefly stating the purpose of the file:
/* hello.c - `main' for the hello world program ...
or:
/* tag: `main' for the hello world program ...
Another possible convention is to use a string identifying the author and the time the file was first created (or first tagged):
/* tag: joe.hacker@gnu.org Thu Nov 29 17:25:15 PST 2001 ...
If you use the basename
form of an implicit tag, and actually rename
a file (rather than simply move it between directories), you do need
to remember to update the tag line to reflect the new basename.
When you use implicit tagging, it is ok if a file lacks any tag at
all, either explicit or implicit. In that case, if you rename the
file, arch
will think you've deleted the old file and added a new
one -- but aside from that, everything will work normally.
CAUTION: Leading and trailing spaces around an inventory tag are
not considered part of the tag. Within a tag, every non-graphical
character is replaced by _
. For example, you write the that tag:
`main' for the hello world program
the actual inventory tag is:
`main'_for_the_hello____world_program
It is possible that a future release of arch
will slightly change
the rule -- so that multiple spaces and tabs are replaced by a single
_
.
If you are using naming conventions only to recognize source files,
then if you rename a directory or file, arch
will conclude that you
have deleted the old file, and created a new file.
If you are using an explicit source inventory, arch
will always
recognize when a directory is renamed (presuming that the .arch_ids
subdirectory is preserved), and it will recognize when a file is
renamed if you use move
(rather than delete
and add
).
Of course, arch
can be fooled if you swap two files without swapping
their inventory tags.
If you are using an implicit inventory, arch
will never recognize
when an untagged file is renamed (it will think "delete" and
"add"). If a file is tagged explicitly, arch
will recognize when
the file is added, deleted, or renamed -- just as when using an
explicit inventory. If a file is not tagged explicitly, but has an
embedded tag, arch
will recognize when the file is added, deleted or
moved.
The command:
% larch tree-lint
is useful for keeping things neat and tidy.
If you use explicit tagging, it will tell you of any tags for which the corresponding file does not exist. It will tell you of any files that pass the naming conventions, but for which no explicit tag exists.
If you use implicit tagging, it will tell you of any files for which no tag can be found -- either explicit or implicit. It will tell you of any explicit tags for which the corresponding file does not exist.
In either case, or if you are using naming conventions only,
tree-lint
will tell you of any files that don't fit the naming
conventions at all.
Finally, if you use explicit or implicit tagging, tree-lint
will
check for cases where multiple files use the same tag. If any two
files do have the same tag, you must correct that, either by
editting the tag (if it is in the file itself) or by using delete
and add
to replace a duplicated explicit tag.
A
manifest
is an explicit list of the files you believe are
supposed to be in a project tree. arch
allows you to maintain a
manifest, and to compare it to the actual contents of a tree.
The command set-manifest
sets the manifest to the current contents
of the project tree:
% larch set-manifest
Note that only regular source files, not arch
control files, are
included in the manifest. To replace an existing manifest, you must
provide the -f
flag (or --force
) to set-manifest
.
You can retrieve the manifest with:
% larch manifest
Each line of the manifest is of the form:
<path>\t<tag>
and the list is sorted by the <tag>
field.
You can look for missing, added, or renamed files with:
% larch check-manifest
which will compare the project tree inventory to the manifest and print a report describing divergences (if there are any).
When arch
considers the files and directories in a working directory
it builds a one-to-one index mapping path names (relative to the root
of the working directory tree) to
inventory tags
.
The inventory tag of a file is its "logical identity". The path is the position of that identity within the particular working dir.
You can see the inventory tag for each source file with the command:
% larch inventory --source --tags
When arch
compares two project trees, it bases the comparison on
logical identities. If both trees have a file with a particular
inventory tag, but the files are in different positions, then arch
considers the file to have been moved or renamed. Similarly, if an
inventory tag is present in one tree, but missing in the other, then
arch
considers the file to have been added or deleted.
If you use naming conventions only, the inventory tag of each file is
the same as its path. Thus, when using the names
tagging method,
arch
never recognizes that a file has been moved or renamed.
When you use the explicit
tagging method, inventory tags are stored
in the .arch-ids
directories. There is a file in .arch-ids
for
each tagged file (and one file for the directory containing
.arch-ids
), and those files contain the tags.
When you use the implicit
tagging method, tags in .arch-ids
directories take precedence (if they exist). If a file is not
explicitly tagged, arch
searches for the inventory tag in the file
itself (as described earlier in the chapter). Finally, if a file is
not tagged at all, then its path is used as the inventory tag.
Be cautious when changing tagging methods for directories already
checked-in to an arch
revision control archive.
For example, if you change from the tagging method names
to
explicit
, then the inventory tag for every file will change. arch
will think that you've deleted all of the files in the old tree, and
added all of the files in the new tree.
However, there is a work-around for this problem, described in a later chapter.
In some situations, it isn't convenient to explicitly tag every file or to add an implicit tag to every file.
You can supply a default tag for every file that doesn't have an explicit tag with the command:
% larch explicit-default TAG-PREFIX
After that, every file in that directory which lacks an explicit tag will have the tag:
TAG-PREFIX__BASENAME
where BASENAME
is the basename of the file. Default tags created in
this way take precedence over implicit tags embedded in files. You
can find out the default tag for a directory with:
% larch explicit-default TAG-PREFIX
and remove the default with:
% larch explicit-default --delete
You can also specify a default tag which has lower precedence than implicit tags:
% larch explicit-default --weak TAG-PREFIX
and view that default:
% larch explicit-default --weak
or delete it:
% larch explicit-default --weak --delete
When using implicit tags, you may sometimes have a directory with many
files that have no tag (either explicit or implicit), but not want
those files to appear in a report of untagged files generated by
tree-lint
. There are two ways to tell tree-lint
to shut-up
about such files:
One is to provide a default explicit tag or weak default explicit tag
using larch explicit-default
, as described above.
The second method is to label the directory as "don't care"
directory -- which means that tree-lint
shouldn't complain about
untagged files. You can do that with:
% larch explicit-default --dont-care set
or remove the "don't care" flag with:
% larch explicit-default --delete --dont-care
You can find out whether the "don't care" flag is set in a given directory with:
% larch explicit-default --dont-care
Given the choice of the names
, explicit
, and implicit
tagging
conventions, which one should you choose?
The names
method is best for project trees that you don't control,
and for which the maintainer does not include file tags (either
explicit or implicit). For such trees, the names
method will always
work, but if you want to use the explicit
or implicit
method,
you'll have to add file tags yourself.
The implicit
method is, in my opinion, by far the most convenient.
It is easy to get in the habit of adding a tag:
line to the bottom
of each new file and doing a single larch add
for each directory.
After those steps, you can rename files and directories freely --
without having to remember to tell arch
in a separate command.
On the other hand, the implicit
method has two limitations. One
limitation is that you must accept the possibility of accidently
adding new files to the inventory. Any file you create that passes
the naming conventions counts as source. The other, closely related,
limitation is that if you use implicit
inventories, you will
never want to compile a program in its own source directory. When
you compile a program, that creates intermediate files and
executables. Many of those files will almost certainly pass the
naming conventions for source -- so arch
will wrongly include them
in a source inventory. I use the implicit
method, but my
configure
scripts have a safeguard that causes them to refuse to
compile my programs in the source tree.
Finally, the explicit
method is the only choice left if you want the
benefits of real file tags (therefore you can't use the names
method) but either insist on compiling in the source tree or can't
risk accidently adding the occasional unintended file (so you
shouldn't use implicit
).
Note: this is a relatively new feature, so the documentation is not yet well integrated with the rest of the manual.
The file {arch}/=tagging-method
defines the naming conventiosn used
for a particular project tree. By editting that file, you can
estalish naming conventions that are different from the defaults,
which are described above.
That file can contain blank lines and comments (lines beginning with # ) and directives, one per line. The permissable directives are:
implicit explicit names specify the tagging method to use for this tree
exclude RE junk RE backup RE precious RE unrecognized RE source RE specify a regular expression to use for the indicated category of files.
Regular expressions are specified in Posix ERE syntax (the same syntax used by egrep , grep -E , and awk ) and have default values which implement the naming conventions described above.
The exclude
pattern should match a subset of files matched by the
source
pattern. Files which match exclude
are printed by:
% larch inventory --source --all
but not printed by:
% larch inventory --source
Although you can define your own naming conventions, there are some minor limitations:
The file names .
and ..
are always ignored by inventory
.
File names which contain non-printing characters, spaces, or any of
the globbing characters (*
, [
, ]
, \
, ?
) are always placed
in the category unrecognized
. This is so that tools which operate
on project trees can safely presume that no source file has a name
that includes these characters.
File names which begin with
,,
are always placed in the category
junk
. This is so that tools which operate on a project tree can
safely destroy or create files beginning with
,,
.
The default naming conventions are given by:
exclude ^(.arch-ids|\{arch\})$ junk ^(,.*)$ backup ^.*(~|\.~[0-9]+~|\.bak|\.orig|\.rej|\.original|\.modified|\.reject)$ precious ^(\+.*|\.gdbinit|=build\.*|=install\.*|CVS|CVS\.adm|RCS|RCSLOG|SCCS|TAGS)$ unrecognized ^(.*\.(o|a|so|core)|core)$ source ^([_=a-zA-Z0-9].*|\.arch-ids|\{arch\}|\.arch-project-tree)$arch: The arch Revision Control System
regexps.com