dbdbd

David Black's DataBase Definer

David Alan Black

Version 0.2.2

December 26, 2003


Table of Contents

Description
Is dbdbd for you?
Requirements
Installation
Descriptive overview of dbdbd usage
Editing dbdbd files by hand
The formatting rules for dbdbd files
Further file format considerations
Programming with dbdbd
Annotated sample program
Creating a database
Tuning database attributes
Reading records from a pre-existing file
Creating, modifying, and accessing records
Writing the database to a file
Writing the database to STDOUT
Feedback and bug reports
Author
Copyright and License
Disclaimer
Version

Description

dbdbd is a tool for reading and writing simple flat-text data files. A dbdbd data file has record per line, plus optional comments, and can be edited by hand as well as manipulated with dbdbd. In fact, the main goal of dbdbd is to provide a semi-automated alternative to ad hoc text-parsing scripts, in a way that still allows for the option of editing data files by hand when desired.

dbdbd is called a "database definer" because it lets you create arbitrarily many database interfaces. The way it works is that for every file or group of files whose lines conform to a given pattern, you can create a database object based on an appropriate scanf format string. That format string is then used to read in your data and (in slightly tweaked form) to write it out again.

The scanf format string thus determines the types of your data fields. For instance, if you've got a file where each line has a number followed by two strings, like this:

     1    banana   yellow
     2    orange   orange
     3    apple    red

you could use the format string "%d%s%s" to tell your database object that your records consist of one numeric and two string fields. All manipulation of files and records through that particular database object will be based on that definition of the fields.

Is dbdbd for you?

dbdbd is designed to streamline what might otherwise be a lot of ad hoc script-writing and file-parsing. In other words, if this looks all too familiar to you:

     while (line = fh.gets)
       next if /^\s*#/.match(line)
       if /([\d.]+)\s+(\w+)\s+(\w+)\s+/.match(line)
         data["number"] = $1.to_i
         data["first_name"] = $2
         data["last_name"] = $3
       else
         puts "Malformed line: #{line}"
       end
     end

then dbdbd may be just what you need :-)

Requirements

dbdbd requires Ruby (it's being developed with version 1.6.7) and scanf for Ruby, version 1.1 or higher. Ruby is available at www.ruby-lang.org, and scanf for Ruby is available at www.rubyhacker.com/code/scanf.

Installation

For system-wide installation to your site_dir, you can just run:

    ruby install.rb

(as superuser). Otherwise, put the file dbdbd.rb somewhere where Ruby can find it at run-time with a simple require dbdbd, (i.e., somewhere in the path in the $: variable), or else use the full path to load it (e.g., require "/path/to/dbdbd.rb").

Descriptive overview of dbdbd usage

Note: This section contains material that's covered in other sections. For a quicker start, you can skip straight to the "Annotated sample program" (below).

To use dbdbd, you first create a DBDB database object. In doing this, you provide a scanf format string. This format string governs the behavior of the database, in that it defines the number and type of the data fields contained in each record.

The database object has numerous methods which allow you to add, change, and remove records, as well as to read records from files and write the database to a file.

To save the database, you "tie" the object to a file, and call the sync method. The file created (or updated) will be formatted in conformance with the dbdbd file formatting rules (see below).

To read in the records from a pre-existing file, you "tie" the object to the file and call the read method. A pre-existing file can be one that you've saved previously, or one that you've created or edited by hand, or one that's been handled both ways. dbdbd only cares that the file conform to the dbdbd formatting rules at the time that the file is read in.

Each record in the database object consists of one or more fields. (Record/field is essentially equivalent to row/column.) Each field corresponds to one specifier in the scanf string; that specifier (%s, %d, %50c, etc.) determines what type of object (string, fixed-width string, integer (decimal, hex, or octal), float) the field will be.

A record can have an associated comment, which can be more than one line long. If there's a comment just before a record in a file, dbdbd will associate that comment with that record, so that they are written out together when the file is updated. Your program can also add a comment to a record on the fly; that comment will then appear in the file when you save the database.

Your fields can have names, though they don't have to. If there's a field-name line in the file (see below), it will be parsed and the fields will be given those names. You can also name fields from your program, and the field names will be saved to file (when you call sync) in the form of a field-name line.

There must be a unique field in your database -- that is, a column which has a different value for each row (such as a Social Security number). By default, this is the first field, but you can assign this role to a different field.

On output, your records get sorted. By default, they are sorted by the first field, but you can specify a different sort field.

A dbdbd-conformant file can also have a comment block at the beginning (the header), as well as arbitrary "endmatter" following a line consisting of "END".

Editing dbdbd files by hand

You can create and/or modify dbdbd data files by hand, as long as you leave them in state that makes sense to dbdbd when they're read back in again. What follows is a description of what dbdbd expects to find in a file you ask it to read.

The formatting rules for dbdbd files

(In what follows, "comment" means lines starting with # or whitespace and #.)

Every dbdbd data file consists of the following components:

Header
an optional comment section at the beginning:
     # Phone number file
     # September, 2002

You can set the header with DBDB.header=(str).

Field-name line
an optional commented-out line with the field names:
     # Last_name      First_name     Area_code     Number

(See below for more information on field names.)

Data and comments
Data are the lines of actual, ummm, data. Whitespace at the beginning of the line is ignored; in fact, all whitespace is stripped off during parsing (except for fixed-width fields; see below).
       Black          David          123           456-7890

You can type your data in unevenly:

         Black          David          123           456-7890
       Peel      Emma    456            456-7890

and dbdbd will neaten it up on subsequent output. The exception to this is fixed-width fields, which you have to make sure are lined up correctly when editing by hand.

Comments may be interspersed with the data.

     # I owe this guy a phone call.
       Doe          John            999          343-4343

Note that comments travel with the line that follows them. So if the file gets resorted, the comment line (or lines) directly above John Doe's data will still show up in that position. If a comment appears after all the data lines (i.e., right before end-of-file or END), that comment will be kept in that place.

Blank lines are skipped, and are not preserved when the database is next written out.

Endmatter
Optional final section, signalled by a line containing nothing but the string "END". /^END$/. You can put whatever you want after END, and dbdbd will ignore it (but will keep it intact when rewriting the file).
Malformed lines
The only thing dbdbd itself uses post-END space for is saving malformed lines (defined as those which have the wrong number of fields). They're saved here so that you can examine and fix them.

Further file format considerations

The header and field-name line

Your file may have a header, which is a comment block at the top of the file. The header may also contain blank lines.

The field-name line is also a commented-out line, and is also optional.

The header and field-name line can interact in undesireable ways unless you're careful. The field-name line is defined as a comment line exactly one line above the first data line (the first data line being the first non-commented, non-blank line in the file). This means that if you do this:

    # This is a header, consisting of
    # two lines
      123       David     Black

the "two lines" line will be parsed as a field-name line, producing field names you don't want.

Therefore, if you have a header but don't want a field line, put a blank line after the header:

    # This is a header, consisting of
    # two lines

      123       David     Black

dbdbd will now be able to tell that you did not intend there to be a field-name line.

Whitespace inside data lines

Whitespace at the beginning of a data line is ignored. Whitespace between fields serves as a separator -- except in the case of fixed-width fields, where the whitespace counts toward the width count.

Blank lines

Blank lines among the lines of data (and their comments) are ignored, and are not preserved on output to file. They should therefore not be relied on for aesthetic or other purposes.

Blank lines in the header and the endmatter will be preserved.

Programming with dbdbd

This section starts with a sample program, and then goes on to more detailed discussion of how to program with dbdbd. You can also see examples in the test and sample subdirectories of the distribution.

Annotated sample program

Here's a sample dbdbd "session" which will illustrate many of the things you're likely to need to do. Following this program is more detailed information on the process of programming with dbdbd.

    require '<tt>dbdbd</tt>'

    # Create a DBDB object, with appropriate format string:
      db = DBDB.new("%d%s%s")
    # Give names to your fields (optional)
      db.fields = %w{number first_name last_name}

    # Associate the database with a file.  (No reading or writing yet.)
      db.tie("somefile")
    # Create some records:
      db.insert(100, "John", "Doe")
      db.insert(200, "Jane", "Doe")
      db.insert(300, "Joan", "Doe")

    # Change the first_name field of the record at 100.
      db[100]["first_name"] = "Jack"
    # Save database to disk (must be done manually):
      db.sync

    # Now test by reading it in:
      db2 = DBDB.new("%d%s%s")
      db2.tie("somefile")
      db2.read
    # Access the "first_name" field of a record:
      puts db2[100]["first_name"]  # => Jack

    # Alternative way to access, by sequential number of the field:
      puts db2[100][2]  # => Jack
    # Add a header (opening comment block) to the database:
      db2.header = "Sample data file"

    # Save the database back to disk:
      db2.sync

Creating a database

The basic operation for creating a dbdbd database object is to call DBDB::new, passing in a scanf format string and (optionally) an array of field names. (You can also set the field names later, or not at all.)

Designing the scanf format string

The format string you pass in on creation of a database object is the key to the whole database. To design it, you need to determine what the fields in your dataset actually are, and then concatenate the appropriate scanf specifiers.

In many cases, you will use one of three specifiers: %s, %d, and %c.

%s
matches a string of non-whitespace characters.
%d
specifier matches a decimal integer.
%c
specifier matches a single character -- but it can also be modified with a number, in which case it serves as an general-purpose specifier for fixed-width fields. These merit a separate discussion; see below.

There's also %f for float, %o for octal integer, and %x for hex integer.

(Note to scanf connoisseurs: you can probably do a lot more than is described here, since dbdbd uses a straightforward scanf operation on each line. I haven't yet pushed the envelope much on this in my own tests.)

Fixed-width fields

If you want your fields to be of fixed width, you can use the %c scanf specifier with an integer width modifer. This sponge up as many characters as you ask it to, saving them as a string. For example:

      db = DBDB.new("%12c%25c%d")

will break each line into a 12-char string, a 25-char string, and a decimal integer. As always, space at the beginning of a line does not count. (Also, in this particular example, space between the second string and the integer will not matter; dbdbd will scan forward and find the integer.)

Tuning database attributes

There are a few things you might want to do right after creating the database object. Technically you can do them at any point, but since they affect how data are read, stored, and written, it's probably a good idea to change them -- if you need to -- early in the program run.

Setting the unique field (record key)

One of the fields must have unique values across all records. This field is the "key". You can specify which field is the key by setting the attribute uniq_key:

      db = DBDB.new("%s%s%s")
      db.uniq_key = 3

    # Now you can repeat values in fields 1 and 2, but not field 3:
      db.insert(%w{ John T. Doe })
      db.insert(%w{ John T. Smith })
      db.insert(%w{ John T. Jones })

By default, the first field is the key.

(Be sure you change the key before you insert any records with duplicate values in the current key field. Otherwise, you'll overwrite data.)

Output record sort order

By default, dbdbd sorts your data on the uniq_key field (see above) on output. You can change this, by setting the attribute sort_key. You can use either a field name or a field position:

      db.sort_key = "last_name"  # or db.sort_key = 3

Assigning field names

Your fields or "columns" can be assigned names. You can do this either at database creation:

      db = DBDB.new("%d%s%s", %w{ number first_name last_name })

or later:

      db = DBDB.new("%d%s%s")
    # ... later ...
      db.fields = %w{ number first_name last_name }

(Field names may not contain whitespace.)

The field names in force at the time of a "sync" operation will be saved to file in the form of a field-name line (described in "Editing dbdbd files by hand). And if there's a field-name line in a file at the time of a "read" operation, the fields will be given the name from that line.

Reading records from a pre-existing file

You can read records from a dbdbd-conformant file with the read method. A single database object can be tied sequentially to different files and read all of them in.

Creating, modifying, and accessing records

To create a new record, or replace an old one, use the method insert, passing in an array of field values:

      db.insert(123, "David", "Black")

A record is retrieved by its key (unique field). By default this is the first field:

      rec = db[123]

For data retrieval, a record can be indexed either by field name, or by sequential field number:

      rec = db[100]
      puts rec["first_name"]  # => John
      puts rec[2]             # => John

Iterating through the records

dbdbd objects respond to each. On each iteration, two things are yielded: the key (i.e., first item in the record), and the whole record.

Writing the database to a file

The sync method writes the whole database to the file to which it is "tied".

Writing the database to STDOUT

The method dump will write the whole database to STDOUT.

Feedback and bug reports

I'd love to hear from anyone using dbdbd. I don't have plans to make it much more elaborate than it is, but I wouldn't mind fixing bugs :-) Also if you do have any ideas about making it better, please let me know.

Author

dbdbd is by David Alan Black (dblack@candle.superlink.net).

Copyright and License

Copyright (c) 2002, David Alan Black.

You may distribute this distribution unchanged. You may make changes to this distribution, as long as you: (a) label every change clearly as a change; (b) change the version number non-trivially so that it is clear that this is a forked dbdbd and not part of the present or future main development branch (e.g., 0.2.0 becomes JoeG0.0.3); (c) retain this paragraph, and the above copyright notice, both without alteration, in your distribution; (d) include a URL for the dbdbd home page (knossos.shu.edu/dblack/dbdbd) in your documentation (which will happen if you follow (c)).

Disclaimer

You use dbdbd entirely at your own risk. The author takes absolutely no responsibility for anything that might happen to you, your data, or anyone or anything else as a result of your using this software, or any derivative of it.

Version

This is dbdbd, David Black's DataBase Definer, version 0.2.2.