Module | ReferrerCop |
In: |
referrercop
|
Parses an Apache log file or AWStats data file and filters out entries for referrers that are known spammers.
Visit wonko.com/software/referrercop for news, usage examples, and updates (including updated blacklists).
Version: | 1.0.4 (10/17/2005) |
Author: | Ryan Grove (ryan@wonko.com) |
Copyright: | Copyright © 2005 Ryan Grove |
License: | ReferrerCop is open source software distributed under the terms of the GNU General Public License. |
referrercop [-f | -i | -n | -s] [options] [<file> ...] referrercop -u <url> [options] referrercop -U [options] referrercop {-h | -V}
Modes:
-f, --filter Filter the specified files (or standard input if no files are specified), sending the results to standard output. This is the default mode. -i, --in-place Filter the specified files in place, replacing each file with the filtered version. A backup of the original file will be created with a .bak extension. -n, --extract-ham Extract ham (nonspam) URLs from the input data and send them to standard output. Duplicates will be suppressed. -s, --extract-spam Extract spam URLs from the input data and send them to standard output. Duplicates will be suppressed. -u, --url <url> Test the specified URL. -U, --update Check for an updated version of the default blacklist and download it if available.
Options:
-b, --blacklist <file> Blacklist to use instead of the default list. -v, --verbose Print verbose status and statistical info to stderr. -w, --whitelist <file> Whitelist to use instead of the default list.
Information:
-h, --help Display usage information (this message). -V, --version Display version information.
APP_NAME | = | 'ReferrerCop' | ||
APP_VERSION | = | '1.0.4' | ||
UPDATE_SERVER | = | 'wonko.com' | ||
UPDATE_PORT | = | 80 | ||
UPDATE_PATH | = | '/files/referrercop/blacklist.refcop.gz' | ||
CONFIG_PATHS | = | [ '.', File::SEPARATOR + File.join('usr', 'local', 'share', 'referrercop'), File::SEPARATOR + File.join('usr', 'share', 'referrercop') | List of paths that will be searched for blacklist/whitelist files if they aren’t specified on the command line. | |
REGEXPS | = | { :apache_combined => /^[^\s]+ - [^\s]+ \[.+\] "[A-Z]+ [^\s]+(?: [^\s]+")? [0-9]+ [0-9-]+ "(.*)" ".*"$/i, :awstats_header => /^AWSTATS DATA FILE /, :awstats_map => /^BEGIN_MAP.*^END_MAP$/m, :awstats_pagerefs_extract => /^BEGIN_PAGEREFS.*?$.*?^(.*?)^END_PAGEREFS$/m, :awstats_pagerefs_replace => /^BEGIN_PAGEREFS.*?^END_PAGEREFS$/m, :awstats_url => /^(https?:\/\/[^\s]+)/i, :text_url => /^(https?:\/\/[^\s]+)/i | Common regular expressions used throughout the application. |
Determines the format of input and extracts URLs of the specified type.
type should be either :ham or :spam.
Determines the format of input and filters it for referrer spam. The filtered data will be sent to output.
Parses and filters Apache combined log entries from input. The filtered log entries will be sent to output.
Parses and filters input as a list of URLs (one per line). The filtered URLs will be sent to output.
Examines input and returns its type. The following input types are supported:
Loads filename as a blacklist, compiling all regular expressions for speed. If filename is nil and a blacklist exists at one of the paths specified in CONFIG_PATHS, that blacklist will be loaded.