Class Scrubyt::PostProcessor
In: lib/scrubyt/output/post_processor.rb
Parent: Object

Post processing results after the extraction

Some things can not be carried out during evaluation - for example the ensure_presence_of_pattern constraint (since the evaluation is top to bottom, at a given point we don‘t know yet whether the currently evaluated pattern will have a child pattern or not) or removing unneeded results caused by evaluating multiple filters.

The sole purpose of this class is to execute these post-processing tasks.

Methods

Public Class methods

This is just a convenience method do call all the postprocessing functionality and checks

Apply the ensure_presence_of_pattern constraint on the full extractor

Remove unneeded results of a pattern (caused by evaluating multiple filters) See for example the B&N scenario - the book titles are extracted two times for every pattern (since both examples generate the same XPath for them) but since always only one of the results has a price, the other is discarded

Issue an error report if the document did not extract anything. Probably this is because the structure of the page changed or because of some rather nasty bug - in any case, something wrong is going on, and we need to inform the user about this!

[Validate]