Directory and File Structure of Avida

This document contains a guide to all of the files present in avida, and where they are located.

Filenames

Source code files in avida follow a standard naming convention. A file that ended in ".hh" is a called a "header" file, while one that ends in ".cc" is a "code" file. Headers tend to contain class descriptions with only the declaration of their methods, while their corresponding code files contain the method definitions. When one file needs to use a class defined in another, all that it needs to do is reference the header file. Since the bulk of the source code will be in the code file, this minimized cross-compilation.

When you compile a program in C++, it goes through a "compilation" phase and then a "link" phase. The compilation phase takes each code (.cc) file and compiles it independently into an object (.o) file. In the link phase, all of these compiled object files are linked together into a single executable (such as "primitive").

Since the bodies of the methods are only in the code files, they only need to be compiled once into a single object file. If you place a function body in the header file it will get compiled again each time another class includes that header. Since a header will often be included because only one or two methods from a class are required, this can create real bloat -- a function will be compiled as long as its body is included, even if the method is never directly called within the object file being created.

For example: The cOrganism object is declared in the file "organism.hh"" and the fully defined in "organism.cc"". When this file is compiled, it creates the object file "organism.o". Both the cPopulation class ("population.cc") and the cTestCPU class ("test_cpu.cc") use the cOrganism object, but its methods are in "organism.cc"", so they only need to be compiled once, and are then linked in at the link stage.

Occasionally you want short functions to have their bodies directly in the header file. When a function compiled in one object file is run from another, the linker basically points the caller to the location of that function. There are an extra few CPU cycles expended while it goes to find that function and starts it up. A small function can be made "inline", which means it will be placed, as a whole, right inside of the function that calls it. If the function is short enough, it only takes up as much space as the call to it would have taken anyway, and hence does not increase the size of the executable.

This will probably start to make more sense when you are directly modifying the code.

Directory: current/

All of the files for the current version of avida reside in the directory labeled "current/", which is automatically created when you check avida out of the CVS. In addition to the subdirectories "doc/", "source/" and "work/" (all described below), this directory contains several key information and automatic compilation files. The most important of these are described here.

configure This is a script used to choose which portions of avida should be compiled and how. In particular, it allows the user to choose "--enable-debug" for debugging options (which slow avida down substantially, but turn on checks to make sure nothing is going wrong) or "--enable-viewer" for the ncurses viewer and "--enable-qt-viewer" for the graphical viewer. Several other options can be listed by typing "./configure --help" in this directory.
Makefile This file performs the compilation after the configure script is run. When you make changes to the source code, you can type "make" to process the Makefile without needing to re-configure.
AUTHORS This file contains information about the authorship of avida.
COPYING
COPYING.gpl
These files contain copyright information.
README A general guide on how to get started once you put the avida files on your machine.
README.cvs How-to guide on setting up avida from the CVS
INSTALL How to install avida on your machine.
NEWS Ideally a list of all recent changes to avida. In practice, its not being kept as up-to-date as it should be.

Directory: current/work/

After you type "make install", this directory will contain all of the configuration files for avida (explained in more detail under in their own documentation files). The key files and directories here are:

analyze.cfg The default file used to write analysis scripts.
environment.cfg This file contains the default environment information.
events.cfg This file contains the default event list.
genesis This is the main configuration file that is used by default.
inst_set.default This is the main, heads-based instruction set that is used by default.
organism.default This file contains a short, default starting ancestor. Watch out because sometimes it can get stuck at a small length and not acquire any tasks.
organism.heads.100 This is a longer (length 100), hand written ancestor. It is nearly identical to organism.default, but contains a long sequence of the "nop-C" instruction to pad its length.
genebank/ This directory is created when you run avida. If you save any individual genotypes during the course of a run, their files will be placed here by default.

Directory: current/source/

This is a large sub-directory structure that contains all of the source code that makes up avida. Each sub-directory here will include its own Makefile information. The only two important files directly in this directory are:

LEVELS This is a "levelization" map of most of the avida files. It is good programming technique to keep track of which files depend on each other, and to make sure there are no "circular dependencies". This file keeps track of all of the dependencies, placing each component on a level, where they only depend on files in lower levels. See the levelization map for more information on this.
defs.hh This is a header file that contains all of the definitions required globally throughout the avida source.

Directory: current/source/main/

This sub-directory contains all of the core source code files for the software. For ease, I'm listing them in three groups of "more important components", "less important components", and "utility components", each in alphabetical order. The syntax "name.??" indicates that I am referring to both of the files, "name.hh" and "name.cc". The more important files are:

analyze.?? These files define the class cAnalyze and associated helper classes, which work together to run the analyze mode in avida. All of the code to process the analyze.cfg file is located here.
avida.?? These are the main files that manage an avida run. They process command line arguments, setup global classes, and hand control over to the driver class that will actually run the avida experiment.
config.?? These files define the cConfig object that maintains the current configuration state of avida. This class is initialized by the genesis file, but can later be modified by the event list.
environment.?? This file defines the cEnvironment object, which controls all of the environmental interactions in an avida run. It makes use of reactions, resources, and tasks.
genebank.?? The cGenebank object, defined here, keeps track of genotype (and species) formation in the population. Every time a new organism is born, the genebank assigns it a genotype.
genome.?? The cGenome object maintains of a sequence of objects of class cInstruction.
genotype.?? The cGenotype object maintains statistics about those organisms that have the associated identical genome.
inst.?? The cInstruction class is very simple, maintaining a single instruction in avida.
inst_lib.?? The cInstLib class associates instructions with their corresponding functionality in the hardware, and keeps track of the full set of possible instructions available in the current run.
mutations.?? These files contain the cMutationRates class which maintain the probability of occurrence for each type of mutation.
organism.?? The cOrganism class represents a single organism, and contains the initial genome of that organism, its phenotypic information, its virtual hardware, etc.
phenotype.?? The cPhenotype class maintains information about what a single organism has done over the course of its life.
population.?? The cPopulation class manages the organisms that exist in an avida population. It maintains a collection of cPopulationCell objects (either as A grid, or independent cells for mass action) and contains the scheduler, genebank, event manager, etc.
population_cell.?? A cPopulationCell is a single location in an avida population. It can contain an organism, and has its own mutation rates (but not yet its own environment.)
stats.?? A cStatistics object maintains track of many different population-wide statistics.

Next, we have the less important files, which are basically various forms of helper classes:

fitness_matrix.?? The cFitnessMatrix class is used to calculate the quasi-species around a genotype.
landscape.?? The cLandscape class is used to study the local region of genotype space around a genotype. All possible single (or more, if desired) mutations are performed, and statistics are kept about the results.
pop_interface.?? The cPopulationInterface class is used by organisms to interact back with the population (or test CPU).
primitive.cc This file defines the primitive avida driver, when no interactive user interface is being used.
reaction.?? The cReaction class contains all of the information for what triggers a reaction, its restrictions, and the process that occurs.
reaction_result.?? The cReactionResult class contains all of the information about the results of a reaction after one occurs, such as the amount of resources consumed, what the merit change is, what tasks triggered it, etc.
resource.?? The cResource class contains information about a single resource, such as its inflow rate, outflow, name, etc.
resource_count.?? The resource count keeps track of how much of each resource is present in the region being tracked.
species.?? The cSpecies class represents a group of genotypes that can be crossed over with a high probability of producing a viable result.
tasks.?? These files contain all of the information associated with tasks, including the cTaskLibrary, and the cTaskEntry classes. A task entry contains the task name and a pointer to the function that will check its completion.

Finally, we have the utility classes. A utility class is different from normal classes in that they include only methods and no data. This is basically for collecting functions that don't need to be part of a class, but I want to keep them nice, neat and organized.

analyze_util.?? The cAnalyzeUtil class contains functions to analyze the state of avida populations.
callback_util.?? The cCallbackUtil class contains functions to enable the cPopulationInterface to interact smoothly with the population.
genome_util.?? The cGenomeUtil class contains functions to analyze individual genomes, such as scanning if they contain a specific instruction or to find the edit distance between two genomes.
inst_util.?? The cInstUtil class is for functions that used both a genome and the instruction library, such as loading a new genome or randomizing a genome.

Directory: current/source/cpu/

This sub-directory contains the files used to define the virtual CPUs in Avida.

cpu_defs.hh This header file only contains definitions relating to virtual CPUs.
cpu_memory.?? The cCPUMemory class inherits from the cGenome class, extending its functionality to facilitate insertions and deletions. It also associates flags with each instruction in the genome to mark if they have been executed, copied, mutated, etc.
cpu_stack.?? The cCPUStack class is an integer-stack component in the virtual CPUs.
cpu_stats.hh The cCPUStats class tracks mutations that have occurred during an organism's life, and the number of times each instruction has been executed.
hardware_base.?? The cHardwareBase class is an abstract base class that all other hardware types must be overloaded from. It has minimal built in functionality.
hardware_cpu.?? The cHardwareCPU class extends cHardwareBase into a proper virtual CPU, with registers, stacks, memory, IO Buffers, etc.
hardware_factory.?? The cHardwareFactory manages the building of new hardware as well as the recycling of old hardware after an organism dies. Recycling hardware rather than making a new one each time can provide a performance increase.
hardware_util.?? The cHardwareUtil class is a utility class that currently only manages the loading of instruction sets into a virtual CPU.
head.hh The cCPUHead class implements a head pointing to a position in the memory of a virtual CPU.
label.hh The cLabel class marks labels (series of no-operation instruction) in a genome. These are used when a label needs to be used as an instruction argument.
test_cpu.hh The cTestCPU class maintains a test environment to run organisms in that we don't want to be able to directly affect the real population.
test_util.hh The cTestUtil utility class is for test-related functions that require a test CPU, such as printing out a genome to a file with collected information.

Directory: current/source/events/

The events sub-directory contains files that manage the scheduling and processing of user-defined events. Many of the files here are actually automatically generated by the make_events.pl script, which is written in the language Perl, and automatically run by the Makefile. This script will convert cPopulation.events into the proper C++ files.

Directory: current/source/tools/

The tools sub-directory contains C++ source code that is used throughout avida, but is not specific to the project. I will list the proper classes first, and then the template class.

assert.?? A collection of tests that start up the debugger if any of the tests fail. When running the code in an optimized mode, these tests are not performed for speedup.
data_entry.?? Associates data names with functions for printing out data file with a user specified format.
data_manager.?? This class manages a collection of data entries and handles the creation and output of user-designed data files at runtime.
datafile.?? A set of classes useful for handling output files, and a manager to track output files by name.
debug.?? This is used to track of problems with the code. You can call methods in a debug object to set errors, warnings or leave comments. In practice, assert is cleaner.
file.?? A set of classes useful for loading input files and removing comments.
functions.hh Some useful math functions such as Min, Max, and Log.
help.?? This still needs to be integrated into avida proper, (it is being used some in analyze mode) but can output useful HTML help files.
merit.?? Provides a very large integer number, dissectable in useful ways.
random.?? A powerful random number generator, that can output numbers in a variety of formats.
slice.?? A scheduling class that can operate at high precision.
stat.?? A set of classes used to collect statistics on data.
string.?? A standard string object, but with lots of functionality.
string_list.?? A specialized class for collections of strings, with added functionality over a normal list.
string_util.?? Contains a bunch of static methods to manipulate and compare strings.
tools.hh A collection of global objects (for debug, random, and default directory) and a flags class.

Templates are special classes that interact with another data-type that doesn't need to be specified until the programmer instantiates an object in the class. Its a hard concepts to get used, but allows for remarkably flexible programming, and makes very reusable code. The main drawback (other than brain-strain) is that templates must be entirely defined in header files since separate code is generated for each class the template interacts with.

tArray.hh A fixed-length array template; array sizes may be adjusted manually when needed.
tBuffer.hh A container that keeps only the last N entries, indexed with the most recent first.
tDictionary.hh A container template that allows the user to search for a target object based on a keyword.
tList.hh A reasonably powerful linked list and iterators. The list will keep track of the iterators and never allow them to have an illegal value.
tMatrix.hh A fixed size matrix template with arbitrary indexing.
tMemTrack.hh This is a template that can be put over any class or data type to keep track of it. If all creations of objects in the class are done through this template rather than (or in conjunction with) "new", memory leaks should be detectable. This is new, and not yet used in avida.
tVector.hh A variable-length array object; array sizes will be automatically adjusted to accommodate any positions accessed in it.

Directories: current/source/viewers/ and current/source/qt-viewer/

These directories contain the source code for the avida viewers, which I'm not planning on going into at this point. The first directory contains the files for the text viewers (both the ncurses viewer and the Windows ASCII viewer), while the second one contains the files for the Qt graphical viewer thats still under development. If people are interested in this later in the course, I will come back to it in more detail.

Directory: current/source/support/

This directory contains all of the originals of the files that are copied into the current/work/ directory on the installation process for the user to modify. There is also a config/ sub-directory under here with additional, optional configuration files that you may want to look at to see other possible pre-configured settings.


Project hosted by:
SourceForge.net