tagsoupSource codeContentsIndex
Text.HTML.TagSoup
Contents
Data structures and parsing
Tag identification
Extraction
Utility
Combinators
Description

This module is for working with HTML/XML. It deals with both well-formed XML and malformed HTML from the web. It features:

  • A lazy parser, based on the HTML 5 specification - see parseTags.
  • A renderer that can write out HTML/XML - see renderTags.
  • Utilities for extracting information from a document - see ~==, sections and partitions.

The standard practice is to parse a String to [Tag String] using parseTags, then operate upon it to extract the necessary information.

Synopsis
data Tag str
= TagOpen str [Attribute str]
| TagClose str
| TagText str
| TagComment str
| TagWarning str
| TagPosition !Row !Column
type Row = Int
type Column = Int
type Attribute str = (str, str)
module Text.HTML.TagSoup.Parser
module Text.HTML.TagSoup.Render
canonicalizeTags :: StringLike str => [Tag str] -> [Tag str]
isTagOpen :: Tag str -> Bool
isTagClose :: Tag str -> Bool
isTagText :: Tag str -> Bool
isTagWarning :: Tag str -> Bool
isTagPosition :: Tag str -> Bool
isTagOpenName :: Eq str => str -> Tag str -> Bool
isTagCloseName :: Eq str => str -> Tag str -> Bool
fromTagText :: Show str => Tag str -> str
fromAttrib :: (Show str, Eq str, StringLike str) => str -> Tag str -> str
maybeTagText :: Tag str -> Maybe str
maybeTagWarning :: Tag str -> Maybe str
innerText :: StringLike str => [Tag str] -> str
sections :: (a -> Bool) -> [a] -> [[a]]
partitions :: (a -> Bool) -> [a] -> [[a]]
class TagRep a
(~==) :: (StringLike str, TagRep t) => Tag str -> t -> Bool
(~/=) :: (StringLike str, TagRep t) => Tag str -> t -> Bool
Data structures and parsing
data Tag str Source
A single HTML element. A whole document is represented by a list of Tag. There is no requirement for TagOpen and TagClose to match.
Constructors
TagOpen str [Attribute str]An open tag with Attributes in their original order
TagClose strA closing tag
TagText strA text node, guaranteed not to be the empty string
TagComment strA comment
TagWarning strMeta: A syntax error in the input file
TagPosition !Row !ColumnMeta: The position of a parsed element
show/hide Instances
Functor Tag
Typeable1 Tag
Eq str => Eq (Tag str)
Data str => Data (Tag str)
Ord str => Ord (Tag str)
Show str => Show (Tag str)
NFData a => NFData (Tag a)
StringLike str => TagRep (Tag str)
type Row = IntSource
The row/line of a position, starting at 1
type Column = IntSource
The column of a position, starting at 1
type Attribute str = (str, str)Source
An HTML attribute id="name" generates ("id","name")
module Text.HTML.TagSoup.Parser
module Text.HTML.TagSoup.Render
canonicalizeTags :: StringLike str => [Tag str] -> [Tag str]Source
Turns all tag names and attributes to lower case and converts DOCTYPE to upper case.
Tag identification
isTagOpen :: Tag str -> BoolSource
Test if a Tag is a TagOpen
isTagClose :: Tag str -> BoolSource
Test if a Tag is a TagClose
isTagText :: Tag str -> BoolSource
Test if a Tag is a TagText
isTagWarning :: Tag str -> BoolSource
Test if a Tag is a TagWarning
isTagPosition :: Tag str -> BoolSource
Test if a Tag is a TagPosition
isTagOpenName :: Eq str => str -> Tag str -> BoolSource
Returns True if the Tag is TagOpen and matches the given name
isTagCloseName :: Eq str => str -> Tag str -> BoolSource
Returns True if the Tag is TagClose and matches the given name
Extraction
fromTagText :: Show str => Tag str -> strSource
Extract the string from within TagText, crashes if not a TagText
fromAttrib :: (Show str, Eq str, StringLike str) => str -> Tag str -> strSource
Extract an attribute, crashes if not a TagOpen. Returns "" if no attribute present.
maybeTagText :: Tag str -> Maybe strSource
Extract the string from within TagText, otherwise Nothing
maybeTagWarning :: Tag str -> Maybe strSource
Extract the string from within TagWarning, otherwise Nothing
innerText :: StringLike str => [Tag str] -> strSource
Extract all text content from tags (similar to Verbatim found in HaXml)
Utility
sections :: (a -> Bool) -> [a] -> [[a]]Source
This function takes a list, and returns all suffixes whose first item matches the predicate.
partitions :: (a -> Bool) -> [a] -> [[a]]Source
This function is similar to sections, but splits the list so no element appears in any two partitions.
Combinators
class TagRep a Source
Define a class to allow String's or Tag str's to be used as matches
show/hide Instances
(~==) :: (StringLike str, TagRep t) => Tag str -> t -> BoolSource

Performs an inexact match, the first item should be the thing to match. If the second item is a blank string, that is considered to match anything. For example:

 (TagText "test" ~== TagText ""    ) == True
 (TagText "test" ~== TagText "test") == True
 (TagText "test" ~== TagText "soup") == False

For TagOpen missing attributes on the right are allowed.

(~/=) :: (StringLike str, TagRep t) => Tag str -> t -> BoolSource
Negation of ~==
Produced by Haddock version 2.4.2