Table of Contents
dbdbd is a tool for reading and writing simple flat-text data files. A dbdbd data file has record per line, plus optional comments, and can be edited by hand as well as manipulated with dbdbd. In fact, the main goal of dbdbd is to provide a semi-automated alternative to ad hoc text-parsing scripts, in a way that still allows for the option of editing data files by hand when desired.
dbdbd is called a "database definer" because it lets you create arbitrarily many database interfaces. The way it works is that for every file or group of files whose lines conform to a given pattern, you can create a database object based on an appropriate scanf format string. That format string is then used to read in your data and (in slightly tweaked form) to write it out again.
The scanf format string thus determines the types of your data fields. For instance, if you've got a file where each line has a number followed by two strings, like this:
1 banana yellow 2 orange orange 3 apple red
you could use the format string "%d%s%s" to tell your database object that your records consist of one numeric and two string fields. All manipulation of files and records through that particular database object will be based on that definition of the fields.
dbdbd is designed to streamline what might otherwise be a lot of ad hoc script-writing and file-parsing. In other words, if this looks all too familiar to you:
while (line = fh.gets) next if /^\s*#/.match(line) if /([\d.]+)\s+(\w+)\s+(\w+)\s+/.match(line) data["number"] = $1.to_i data["first_name"] = $2 data["last_name"] = $3 else puts "Malformed line: #{line}" end end
then dbdbd may be just what you need :-)
dbdbd requires Ruby (it's being developed with version 1.6.7) and scanf for Ruby, version 1.1 or higher. Ruby is available at www.ruby-lang.org, and scanf for Ruby is available at www.rubyhacker.com/code/scanf.
For system-wide installation to your site_dir, you can just run:
ruby install.rb
(as superuser). Otherwise, put the file dbdbd.rb somewhere where Ruby can find it at run-time with a simple require dbdbd, (i.e., somewhere in the path in the $: variable), or else use the full path to load it (e.g., require "/path/to/dbdbd.rb").
Note: This section contains material that's covered in other sections. For a quicker start, you can skip straight to the "Annotated sample program" (below).
To use dbdbd, you first create a DBDB database object. In doing this, you provide a scanf format string. This format string governs the behavior of the database, in that it defines the number and type of the data fields contained in each record.
The database object has numerous methods which allow you to add, change, and remove records, as well as to read records from files and write the database to a file.
To save the database, you "tie" the object to a file, and call the sync method. The file created (or updated) will be formatted in conformance with the dbdbd file formatting rules (see below).
To read in the records from a pre-existing file, you "tie" the object to the file and call the read method. A pre-existing file can be one that you've saved previously, or one that you've created or edited by hand, or one that's been handled both ways. dbdbd only cares that the file conform to the dbdbd formatting rules at the time that the file is read in.
Each record in the database object consists of one or more fields. (Record/field is essentially equivalent to row/column.) Each field corresponds to one specifier in the scanf string; that specifier (%s, %d, %50c, etc.) determines what type of object (string, fixed-width string, integer (decimal, hex, or octal), float) the field will be.
A record can have an associated comment, which can be more than one line long. If there's a comment just before a record in a file, dbdbd will associate that comment with that record, so that they are written out together when the file is updated. Your program can also add a comment to a record on the fly; that comment will then appear in the file when you save the database.
Your fields can have names, though they don't have to. If there's a field-name line in the file (see below), it will be parsed and the fields will be given those names. You can also name fields from your program, and the field names will be saved to file (when you call sync) in the form of a field-name line.
There must be a unique field in your database -- that is, a column which has a different value for each row (such as a Social Security number). By default, this is the first field, but you can assign this role to a different field.
On output, your records get sorted. By default, they are sorted by the first field, but you can specify a different sort field.
A dbdbd-conformant file can also have a comment block at the beginning (the header), as well as arbitrary "endmatter" following a line consisting of "END".
You can create and/or modify dbdbd data files by hand, as long as you leave them in state that makes sense to dbdbd when they're read back in again. What follows is a description of what dbdbd expects to find in a file you ask it to read.
(In what follows, "comment" means lines starting with # or whitespace and #.)
Every dbdbd data file consists of the following components:
# Phone number file # September, 2002
You can set the header with DBDB.header=(str).
# Last_name First_name Area_code Number
(See below for more information on field names.)
Black David 123 456-7890
You can type your data in unevenly:
Black David 123 456-7890 Peel Emma 456 456-7890
and dbdbd will neaten it up on subsequent output. The exception to this is fixed-width fields, which you have to make sure are lined up correctly when editing by hand.
Comments may be interspersed with the data.
# I owe this guy a phone call. Doe John 999 343-4343
Note that comments travel with the line that follows them. So if the file gets resorted, the comment line (or lines) directly above John Doe's data will still show up in that position. If a comment appears after all the data lines (i.e., right before end-of-file or END), that comment will be kept in that place.
Blank lines are skipped, and are not preserved when the database is next written out.
Your file may have a header, which is a comment block at the top of the file. The header may also contain blank lines.
The field-name line is also a commented-out line, and is also optional.
The header and field-name line can interact in undesireable ways unless you're careful. The field-name line is defined as a comment line exactly one line above the first data line (the first data line being the first non-commented, non-blank line in the file). This means that if you do this:
# This is a header, consisting of # two lines 123 David Black
the "two lines" line will be parsed as a field-name line, producing field names you don't want.
Therefore, if you have a header but don't want a field line, put a blank line after the header:
# This is a header, consisting of # two lines 123 David Black
dbdbd will now be able to tell that you did not intend there to be a field-name line.
Whitespace at the beginning of a data line is ignored. Whitespace between fields serves as a separator -- except in the case of fixed-width fields, where the whitespace counts toward the width count.
This section starts with a sample program, and then goes on to more detailed discussion of how to program with dbdbd. You can also see examples in the test and sample subdirectories of the distribution.
Here's a sample dbdbd "session" which will illustrate many of the things you're likely to need to do. Following this program is more detailed information on the process of programming with dbdbd.
require '<tt>dbdbd</tt>' # Create a DBDB object, with appropriate format string: db = DBDB.new("%d%s%s")
# Give names to your fields (optional) db.fields = %w{number first_name last_name} # Associate the database with a file. (No reading or writing yet.) db.tie("somefile")
# Create some records: db.insert(100, "John", "Doe") db.insert(200, "Jane", "Doe") db.insert(300, "Joan", "Doe") # Change the first_name field of the record at 100. db[100]["first_name"] = "Jack"
# Save database to disk (must be done manually): db.sync # Now test by reading it in: db2 = DBDB.new("%d%s%s") db2.tie("somefile") db2.read
# Access the "first_name" field of a record: puts db2[100]["first_name"] # => Jack # Alternative way to access, by sequential number of the field: puts db2[100][2] # => Jack
# Add a header (opening comment block) to the database: db2.header = "Sample data file" # Save the database back to disk: db2.sync
The basic operation for creating a dbdbd database object is to call DBDB::new, passing in a scanf format string and (optionally) an array of field names. (You can also set the field names later, or not at all.)
The format string you pass in on creation of a database object is the key to the whole database. To design it, you need to determine what the fields in your dataset actually are, and then concatenate the appropriate scanf specifiers.
In many cases, you will use one of three specifiers: %s, %d, and %c.
There's also %f for float, %o for octal integer, and %x for hex integer.
(Note to scanf connoisseurs: you can probably do a lot more than is described here, since dbdbd uses a straightforward scanf operation on each line. I haven't yet pushed the envelope much on this in my own tests.)
If you want your fields to be of fixed width, you can use the %c scanf specifier with an integer width modifer. This sponge up as many characters as you ask it to, saving them as a string. For example:
db = DBDB.new("%12c%25c%d")
will break each line into a 12-char string, a 25-char string, and a decimal integer. As always, space at the beginning of a line does not count. (Also, in this particular example, space between the second string and the integer will not matter; dbdbd will scan forward and find the integer.)
There are a few things you might want to do right after creating the database object. Technically you can do them at any point, but since they affect how data are read, stored, and written, it's probably a good idea to change them -- if you need to -- early in the program run.
One of the fields must have unique values across all records. This field is the "key". You can specify which field is the key by setting the attribute uniq_key:
db = DBDB.new("%s%s%s") db.uniq_key = 3 # Now you can repeat values in fields 1 and 2, but not field 3: db.insert(%w{ John T. Doe }) db.insert(%w{ John T. Smith }) db.insert(%w{ John T. Jones })
By default, the first field is the key.
(Be sure you change the key before you insert any records with duplicate values in the current key field. Otherwise, you'll overwrite data.)
By default, dbdbd sorts your data on the uniq_key field (see above) on output. You can change this, by setting the attribute sort_key. You can use either a field name or a field position:
db.sort_key = "last_name" # or db.sort_key = 3
Your fields or "columns" can be assigned names. You can do this either at database creation:
db = DBDB.new("%d%s%s", %w{ number first_name last_name })
or later:
db = DBDB.new("%d%s%s") # ... later ... db.fields = %w{ number first_name last_name }
(Field names may not contain whitespace.)
The field names in force at the time of a "sync" operation will be saved to file in the form of a field-name line (described in "Editing dbdbd files by hand). And if there's a field-name line in a file at the time of a "read" operation, the fields will be given the name from that line.
You can read records from a dbdbd-conformant file with the read method. A single database object can be tied sequentially to different files and read all of them in.
To create a new record, or replace an old one, use the method insert, passing in an array of field values:
db.insert(123, "David", "Black")
A record is retrieved by its key (unique field). By default this is the first field:
rec = db[123]
For data retrieval, a record can be indexed either by field name, or by sequential field number:
rec = db[100] puts rec["first_name"] # => John puts rec[2] # => John
The sync method writes the whole database to the file to which it is "tied".
I'd love to hear from anyone using dbdbd. I don't have plans to make it much more elaborate than it is, but I wouldn't mind fixing bugs :-) Also if you do have any ideas about making it better, please let me know.
dbdbd is by David Alan Black (dblack@candle.superlink.net).
Copyright (c) 2002, David Alan Black.
You may distribute this distribution unchanged. You may make changes to this distribution, as long as you: (a) label every change clearly as a change; (b) change the version number non-trivially so that it is clear that this is a forked dbdbd and not part of the present or future main development branch (e.g., 0.2.0 becomes JoeG0.0.3); (c) retain this paragraph, and the above copyright notice, both without alteration, in your distribution; (d) include a URL for the dbdbd home page (knossos.shu.edu/dblack/dbdbd) in your documentation (which will happen if you follow (c)).
You use dbdbd entirely at your own risk. The author takes absolutely no responsibility for anything that might happen to you, your data, or anyone or anything else as a result of your using this software, or any derivative of it.