Cocoon Caching

A Brief Guide to the Cocoon Cache System

The Cocoon cache system started off clean, simple, and easy to understand. However, unfortunately, it has outstripped its original design, and there are now various confusing aspects to it. This document does not attempt to be an exhaustive guide to Cocoon caching, but attempts to briefly document the most important aspects. A more comprehensive version may be written a later time - and remember, we're always looking for volunteers to write documentation!

Please read the comments in cocoon.properties for an explanation of the global caching parameters that Cocoon accepts. In particular, disabling Last-Modified support appears to disable all external caching (i.e. caching in web browsers, proxies, etc.)

What is cached and what is not?

Currently, in Cocoon 1.x, the main result that is cached is the final output (usually HTML) of the request. Although a few other things are cached (such as in-memory representations of stylesheets and logicsheets) these are largely transparent to the Cocoon developer. In other words, the results of each stage in the pipeline are not cached.

Various Cocoon components have different behaviours with regard to caching. Some components disable caching for all requests that use them; others flush a page from the cache and regenerate it under certain conditions. As per the HTTP 1.1 spec, certain request types such as PUT are never cached under any circumstances (however, POST is not one of these types - although 99% of the time it is not advisable to cache a POST request, it is possible to do so.)

The easiest way to disable caching for a particular page is to add the <?cocoon-disable-caching?> processing instruction. (But this won't work if inserted dynamically, except if it is inserted by a Producer.)

This document is accurate with respect to Cocoon 1.8.1. Older and newer versions may behave differently.

In the table below, if any component being used by a request disables caching, caching is disabled for that request. If any component being used by a request invalidates the cache, the cached copy of that request is invalidated and removed.

Component Component Type Disables Caching For Affected Requests? Invalidates Cache?
ProducerFromFile Producer (default) Never When file changes
DCP Processor Always n/a
LDAP Processor Always n/a
SQL Processor Always n/a
XInclude Processor Never When included file changes(?)
XSLT Processor Never When stylesheet changes. NOTE: The caching behaviour of the XSLT document() function is not specified in the XSLT 1.0 specification, so we recommend an alternative (e.g. the more powerful XInclude) unless you don't care about caching.
XSP "Processor"/"Producer" By default, yes; however, XSP pages can override this by putting <util:cacheable/> within <xsp:structure> (see XSP documentation.) Not applicable if util:cacheable tag is missing, since page will never be cached. Otherwise, define the public boolean hasChanged (Object context) method to define cache invalidation behaviour. Always returning false will never invalidate, for example. The context object is currently a HttpServletRequest. Default behaviour is always invalidate. (see XSP documentation.)
All Cocoon-supplied Formatters Formatters Never Never

The XSP Repository

The XSP repository where compiled XSP pages are stored (see cocoon.properties) is often thought of as a cache, but strictly speaking it is not. At the moment, all commonly available Java compilers such as javac and jikes will only read and write files, not in-memory objects. Thus, writing .java and .class files to disk is essential.

It does not really make sense to say that XSP pages are compiled once, and then "cached" thereafter so that they do not have to be used again. While this is approximately true, XSP generated classes are just normal classes - they remain in memory after use. No caching mechanism is needed.

Although XSP is implemented in Cocoon 1 as a Processor, it is really more like a Producer, because it ignores whether its input has changed. It was written based on the assumption that it would be fed directly from a ProducerFromFile, and nothing else. Thus XSPProcessor will not correctly handle dynamic input. This is also why you cannot use XSP twice for one request (unless you use something like XInclude).

This anomaly has been corrected in Cocoon 2, where XSP is a Generator.

What determines request equivalence?

Cocoon uses the following factors to decide if a request is equivalent to a previous one. Only if all factors are identical will the cached copy (if any) be used.

  • HTTP User Agent string (identifies the browser)
  • Scheme (HTTP, HTTPS, etc.)
  • Server name
  • Port
  • Rest of URI, including query string
  • All request headers

It is on the todo list to make this more configurable.

Example of caching dynamic content - Why Bother?

Sometimes there is no point, and it can even cause bugs. Hence, all dynamic content, by default, is not cached. However, because Cocoon caches based on the query string, as mentioned above, it can be useful to override this.

Suppose you are presenting a large web directory with Cocoon. All of the directory content is served from a robust back-end database. Since the root of the directory (?node=1) is requested so frequently, it makes sense to cache the transformed representation (HTML) of this and other popular nodes. Assuming you are using XSP, you can simply declare <util:cacheable/> and the following method:

     public boolean hasChanged (Object context) {
       return false; // to cache everything and leave
       // memory management up to Cocoon

       // or
       try {
         return Integer.parseInt 
           (((HttpServletRequest) context).getParameter ("node")) 
           < 50; // only cache the most important 50 nodes.
       }
       catch (NumberFormatException ex) {
         // error pages don't change
         return false;
       } 
     }
   

Advanced Cache Control

If writing your own Producer or Processor, implement the org.apache.cocoon.framework.Cacheable and org.apache.cocoon.framework.Changeable interfaces to define custom caching behaviour. The same degree of control is also available to XSP pages - see table above.