wfb2sql - Converts the CIA World Factbook into SQL statements
wfb2sql [options] --database=[db2|postgresql|mysql]
wfb2sql is a Perl script that converts the CIA World Factbook HTML pages into SQL statements.
It was tested with the HTML version of the World Factbook 2003 and the World Factbook 2002 and probably has to be modified to extract data from further versions.
By default, all data that can be interpreted, is included in the SQL statements. If you want to restrict the data to only some information, use the --no... command line options as described below. The data is written to STDOUT, so you can directly pass it to a database client.
In the current version, three database management systems are supported: IBM DB/2, PostgreSQL and mySQL. The mysql schema data produced by wfb2sql uses a very basic SQL language, which, for example, doesn't include foreign keys. At least this schema should work with other DBMSs as well. If you plan to adapt wfb2sql to other SQL dialects, focus on BLOB handling first. Not all databases support import of files in INSERT statements.
The db2 data had to be split into several files, since DB/2 doesn't support file inclusion in INSERT statements without a multimedia extender. Because of that, wfb2sql outputs the schema data to STDOUT and creates extra files with spool data. By default, it tries to create the extra files in the current directory, but you may specify an alternative directory using the --tmp-dir option. Please note, that you have to delete the extra files manually after script execution. To insert the world factbook data into a database, create a new database and run
wfb2sql [your options] | db2 -t
Use the --database-name parameter to tell the script the name of your database. It will automatically add a CONNECT TO <database> statement to the produced sql code.
The flag and map images are loaded into the database via the LOAD_FILE()
function. The use of this function requires the FILE privilege. So, please make sure, that the user has sufficient rights to execute this function. Otherwise, MySQL will not report an error and all the blob fields will just contain NULL. To insert the world factbook data into a database, create a database (e.g. factbook) and run
wfb2sql [your options] | mysql <databasename>
The script needs some time, so you might want to turn on the --verbose option.
Please note that you must have Postgres superuser privilege to use server-side lo_import(), which is needed for maps and flags. To insert the world factbook data into a database, create a new database (with createdb) and run
wfb2sql [your options] | psql <databasename>
wfb2sql supports the use of a configuration file called wfb2sqlrc. The script first tries to find the file wfb2sqlrc in the current directory. If it is not found, it tries to read the file .wfb2sqlrc in the users home directory. If both files exist, the latter will be ignored. The file format is similar to standard UN*X configuration files. Comments begin with a # and all behind the # will be ignored. Options are specified by
optionname = value
and whitespaces are silently ignored. If a configuration file is load, these settings will override the defaults. Additional command line arguments will override the configuration file settings. Note that the options in the configuration file don't have leading --. A --no... option might look like this in the configuration file (using the option --noboundaries in this example):
boundaries = no
You can as an alternative use boundaries = yes to explicitly turn on the option boundaries.
Perl 5.004, Getopt::Long, Pod::Usage, Options::General
Getopt::Long is available at http://search.cpan.org/author/JV/Getopt-Long-2.33/
Pod::Usage is available at http://search.cpan.org/author/MAREKR/PodParser-1.23/
Options::General is available at http://search.cpan.org/author/TLINDEN/Config-General-2.21/
Jan Schreiber <mail@jschreiber.com>
COPYRIGHT (C) 2003 Arbeitsbereich Datenbanken und Informationssysteme, Eberhard-Karls-Universitaet Tuebingen, Germany.
Various. Please send bug reports to the author.
CIA World Factbook http://www.cia.gov/cia/publications/factbook/
CIA World Factbook 2002 http://www.cia.gov/cia/download2002.htm