NAME

wfb2sql - Converts the CIA World Factbook into SQL statements


SYNOPSIS

wfb2sql [options] --database=[db2|postgresql|mysql]


DESCRIPTION

wfb2sql is a Perl script that converts the CIA World Factbook HTML pages into SQL statements.

It was tested with the HTML version of the World Factbook 2003 and the World Factbook 2002 and probably has to be modified to extract data from further versions.

By default, all data that can be interpreted, is included in the SQL statements. If you want to restrict the data to only some information, use the --no... command line options as described below. The data is written to STDOUT, so you can directly pass it to a database client.

In the current version, three database management systems are supported: IBM DB/2, PostgreSQL and mySQL. The mysql schema data produced by wfb2sql uses a very basic SQL language, which, for example, doesn't include foreign keys. At least this schema should work with other DBMSs as well. If you plan to adapt wfb2sql to other SQL dialects, focus on BLOB handling first. Not all databases support import of files in INSERT statements.

Notes for IBM DB/2

The db2 data had to be split into several files, since DB/2 doesn't support file inclusion in INSERT statements without a multimedia extender. Because of that, wfb2sql outputs the schema data to STDOUT and creates extra files with spool data. By default, it tries to create the extra files in the current directory, but you may specify an alternative directory using the --tmp-dir option. Please note, that you have to delete the extra files manually after script execution. To insert the world factbook data into a database, create a new database and run

wfb2sql [your options] | db2 -t

Use the --database-name parameter to tell the script the name of your database. It will automatically add a CONNECT TO <database> statement to the produced sql code.

Notes for MySQL

The flag and map images are loaded into the database via the LOAD_FILE() function. The use of this function requires the FILE privilege. So, please make sure, that the user has sufficient rights to execute this function. Otherwise, MySQL will not report an error and all the blob fields will just contain NULL. To insert the world factbook data into a database, create a database (e.g. factbook) and run

wfb2sql [your options] | mysql <databasename>

The script needs some time, so you might want to turn on the --verbose option.

Notes for PostgreSQL

Please note that you must have Postgres superuser privilege to use server-side lo_import(), which is needed for maps and flags. To insert the world factbook data into a database, create a new database (with createdb) and run

wfb2sql [your options] | psql <databasename>

Configuration file

wfb2sql supports the use of a configuration file called wfb2sqlrc. The script first tries to find the file wfb2sqlrc in the current directory. If it is not found, it tries to read the file .wfb2sqlrc in the users home directory. If both files exist, the latter will be ignored. The file format is similar to standard UN*X configuration files. Comments begin with a # and all behind the # will be ignored. Options are specified by

optionname = value

and whitespaces are silently ignored. If a configuration file is load, these settings will override the defaults. Additional command line arguments will override the configuration file settings. Note that the options in the configuration file don't have leading --. A --no... option might look like this in the configuration file (using the option --noboundaries in this example):

boundaries = no

You can as an alternative use boundaries = yes to explicitly turn on the option boundaries.


OPTIONS

--database=dialect
Selects an SQL dialect for the SQL output. Currently supported values are db2, mysql and postgresql.

--database-name=name
Adds a 'CONNECT TO name' to the schema definition. This is needed if you want to pipe the output directly into db2. For databases other than db2 this parameter is ignored.

--data-only
Output only SQL data and emit schema information. Use this function if you already have created the tables.

--directory=dir
Specifies the root directory of the factbook installation. The root directory containts the file index.html and the print/-directory.

--noboundaries
Include no boundary information. Otherwise, the borders to surrounding countries are stored in an extra table called boundaries, the total boundaries are stored in the country entity.

--nocommunications
Emit data on communication. Otherwise the following fields will be added:
phone_mainlines
Number of main lines in use.

phone_mobile
Number of cellphones.

internet_users
Number of internet users.

isps
Number of Internet service providers.

--nodependent-countries
Emit dependent country information.

--noflags
Emit flag data and schema information.

--nomaps
Emit map data and schema information.

--noorganizations
Emit data about organizations memberships.

--noschema
Do not add a schema name (default: wfb) when printing the SQL CREATE TABLE commands.

--notext-fields
Do not include textual fields like background information.

--schema-name=name
Specifies an alternate schema name. This applies only to databases which support schemata (DB/2 and PostgreSQL Version 7.3 or higher). If you want to disable the schema names, use the --noschema option. The default schema name is wfb.

--schema-only
Prints only SQL schema data. Use this function if you just want to create the SQL tables first.

--small-flags
Include only small flags. By default, the large flags will be included in the database. Use the --noflags option to emit all kind of flags.

--temp-dir=dir
The db2 creation method requires the use of temporary files. By default they will be placed in the current directory, but you may specify a different directory using this option.

-v, --verbose
Prints some more or less useful information to STDERR.


REQUIRES

Perl 5.004, Getopt::Long, Pod::Usage, Options::General

Getopt::Long is available at http://search.cpan.org/author/JV/Getopt-Long-2.33/

Pod::Usage is available at http://search.cpan.org/author/MAREKR/PodParser-1.23/

Options::General is available at http://search.cpan.org/author/TLINDEN/Config-General-2.21/


AUTHOR

Jan Schreiber <mail@jschreiber.com>


COPYRIGHT

COPYRIGHT (C) 2003 Arbeitsbereich Datenbanken und Informationssysteme, Eberhard-Karls-Universitaet Tuebingen, Germany.


BUGS

Various. Please send bug reports to the author.


SEE ALSO

CIA World Factbook http://www.cia.gov/cia/publications/factbook/

CIA World Factbook 2002 http://www.cia.gov/cia/download2002.htm