adzapper Home | About Adzapper | Installing | Zaplets | Zaplet File Format

Zaplets

Zaplets are small rule files that describe what to block and what not to block, on a per-website basis. They are simple, human-readable text files.

adzapper examines each URL before downloading the requested file. If the URL's site matches the site described in the zaplet's 'host' statement, adzapper will apply the blocking rules contained in the zaplet.

If the URL matches an 'allow' statement in the zaplet, the URL is downloaded. If the URL matches a 'block' statement in the zaplet, adzapper sends a single-pixel transparent GIF instead of whatever file the URL really refers to.

If there is no 'block' statement, all URLs that are not specifically allowed are blocked.

This behavior currently means that all blocked URLs are interpreted as pictures, but since most blocked URLs are ads, which are GIF or JPEG files, this works pretty well.

Actual content filtering for HTML (a la Muffin, XSL, and W4F), and specific filters for different MIME types (a la Muffin) are coming soon.

Zaplets usually live in the zaplets/ directory in the directory where you installed adzapper, unless you change this directory using the command-line options.

How to write zaplets

First you have to know the format of an ad URL. this will probably require you to look at the HTML for the web page you are looking at (View Source) or else look at the URL for the ad by viewing the image. (Under Netscape/Linux, right click the image and select View Image. Then look at the URL in the Location: box.)

Zaplets describe how to block URLs. The thing to know about ads is that their URLs usually change, but according to a pattern-- since ad banners are mostly GIF files placed in a particular directory on the ad company's web server.

The goal is to find the most general expression that blocks the ad by matching its URL, but doesn't block (match) anything that isn't an ad. It may sound complicated, but once you've looked at a few ad URLs, it's pretty straightforward.

for instance, an ad URL might look like this:

http://adforce.imgis.com/?adserv|135|52407|1|1|MISC=276177797;
Most zaplets have a format like the following:
<zaplet>
version 0.4
host adforce.imgis.com
</zaplet>
This zaplet blocks everything from the adforce.imgis.com server-- everything from here is assumed to be an ad!

Zaplet file format

To see the Zaplet file format specification in Extended BNF format, go here.

Zaplets have to begin and end with XML-style tags: <zaplet> for the start of a zaplet, and </zaplet> for the end of a zaplet. this is to aid in extracting zaplets from email messages and text files, which makes them easier to share. any comments should be put between the open and closing tags, to guarantee that the comments will stay with that particular zaplet.

The 'version' statement sets the version of the zaplet file format that this zaplet uses. This must be a single-place decimal number, like '0.4'. the version statement will help adzapper remain compatible with past versions of the zaplet file format.

The 'host' statement sets the host that this zaplet is for. this can be a host in conventional Internet notation, like 'foo.bar.com', or an IP number in dotted notation, like '234.56.78.9'. If there is more than one zaplet that has the same host specified, the last one that is read in wins. host matches go from most specific ('www.foo.bar.com' or '234.56.78.9') to least specific ('bar.com' also matches 'www.foo.bar.com', and '234' also matches '234.56.78.9'). The zaplet with the most specific host match wins.

There can only be one host per 'host' statement.

If there is no 'host' statement, the zaplet is not valid: the 'host' statement is required. (Note that this is a change from previous versions of the zaplet file format!)

If there is no 'allow' statement or if it is empty, no URLs are specifically allowed.

If there is no 'block' statement or if it is empty, all URLs from hosts matching the 'host' statement are blocked.

If it exists, the zaplet 'default-numeric' is checked if the host is numeric and there is no numeric host matched.

If it exists, the zaplet 'default' is checked if there is no host matched.

A more complicated zaplet might be:

<zaplet>
# this is the zaplet for yimg.com
version 0.4
host yimg.com
allow main33.gif
allow yahoo.gif
</zaplet>
any line that starts with the character '#' is a comment.

The 'allow' statement says that certain URLs are to be allowed through, despite whatever block statements follow. This allows for certain navigational images to be displayed even when a site puts all its images on a single server or in a single directory, for example.

Both literal strings and perl-style regular expressions can be used in block or allow statements:

<zaplet>
host cnet.com
version 0.4
allow '0-1-title.gif'
block advertising
block "ads"
block 'Ads'
block /[Bb]anners/
</zaplet>
Literals don't need to be quoted, but can be quoted using single or double quotes.

Regular expressions must be enclosed by '/' characters. see the python documentation page for the re module for more information:

http://www.python.org/doc/current/lib/module-re.html

Compatibility

I don't guarantee that the zaplet file format will remain stable over the first few releases-- this is alpha software right now! :-)

However, if you contribute zaplets, I will make sure to convert them to whatever new format I use in the future, if there is a file format change.

The idea, however, is for the file format to evolve, but still be backwards compatible with older zaplets.

Adding a zaplet

After you've written a zaplet and placed it in the <adapper installation location>/zaplet/ directory, you need to stop adzapper and then restart it. (On Unix you can send adzapper a SIGHUP, which accomplishes the same thing.)

Coming soon: web-based configuration, automatic checking of remote zaplet repositories, and content filtering.

Philosophy

The idea behind zaplets is to allow easy configurability without changing the adzapper program; and to allow people to share zaplets easily.

When you find yourself visiting a web site a lot that has ads, you can quickly make up a zaplet, then restart adzapper. voila! no more ads.

When you're happy with the results you are getting from a zaplet, if you send it to me, I will add it to the repository that is posted to the web and that gets distributed with adzapper.

Eventually I hope to have an automated or semi-automated way of submitting or updating zaplets, and an easy way to search an archive of zaplets. for now I'll keep an updated copy of the zaplets directory and a tar.gz archive of the directory available at:

http://www.pobox.com/~adamf/adzapper/repository/

Many people have provided zaplets, check out the comment fields in the zaplets themselves for credits!


adzapper Home | About Adzapper | Installing | Zaplets | Zaplet File Format

Adam Feuer
adamf at pobox.com (replace the 'at' with '@' to contact me via email)
http://www.pobox.com/~adamf/

Last Modified: Sat Oct 09 21:21:48 1999