Module urllib
[hide private]
[frames] | no frames]

Module urllib

source code

Open an arbitrary URL.

See the following document for more info on URLs:
"Names and Addresses, URIs, URLs, URNs, URCs", at
http://www.w3.org/pub/WWW/Addressing/Overview.html

See also the HTTP spec (from which the error codes are derived):
"HTTP - Hypertext Transfer Protocol", at
http://www.w3.org/pub/WWW/Protocols/

Related standards and specs:
- RFC1808: the "relative URL" spec. (authoritative status)
- RFC1738 - the "URL standard". (authoritative status)
- RFC1630 - the "URI spec". (informational status)

The object returned by URLopener().open(file) will differ per
protocol.  All you know is that is has methods read(), readline(),
readlines(), fileno(), close() and info().  The read*(), fileno()
and close() methods work like those of open files.
The info() method returns a mimetools.Message object which can be
used to query various info about the object, if available.
(mimetools.Message objects are queried with the getheader() method.)




Version: 1.17

Classes [hide private]
  ContentTooShortError
  FancyURLopener
Derived class with handlers for errors we can handle (perhaps).
  URLopener
Class to open URLs.
  addbase
Base class for addinfo and addclosehook.
  addclosehook
Class to add a close hook to an open file.
  addinfo
class to add an info() method to an open file.
  addinfourl
class to add info() and geturl() methods to an open file.
  ftpwrapper
Class used by open_ftp() for cache of open FTP connections.
Functions [hide private]
 
_is_unicode(x) source code
 
basejoin(base, url, allow_fragments=True)
Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.
 
ftperrors()
Return the set of errors raised by the FTP class.
source code
 
getproxies()
Return a dictionary of scheme -> proxy server URL mappings.
source code
 
getproxies_environment()
Return a dictionary of scheme -> proxy server URL mappings.
source code
 
localhost()
Return the IP address of the magic hostname 'localhost'.
source code
 
main() source code
 
noheaders()
Return an empty mimetools.Message object.
source code
 
pathname2url(pathname)
OS-specific conversion from a file system path to a relative URL of the 'file' scheme; not recommended for general use.
source code
 
proxy_bypass(host) source code
 
quote(s, safe='/')
quote('abc def') -> 'abc%20def' Each part of a URL, e.g.
source code
 
quote_plus(s, safe='')
Quote the query fragment of a URL; replacing ' ' with '+'
source code
 
reporthook(blocknum, blocksize, totalsize) source code
 
splitattr(url)
splitattr('/path;attr1=value1;attr2=value2;...') -> '/path', ['attr1=value1', 'attr2=value2', ...].
source code
 
splitgophertype(selector)
splitgophertype('/Xselector') --> 'X', 'selector'.
source code
 
splithost(url)
splithost('//host[:port]/path') --> 'host[:port]', '/path'.
source code
 
splitnport(host, defport=-1)
Split host and port, returning numeric port.
source code
 
splitpasswd(user)
splitpasswd('user:passwd') -> 'user', 'passwd'.
source code
 
splitport(host)
splitport('host:port') --> 'host', 'port'.
source code
 
splitquery(url)
splitquery('/path?query') --> '/path', 'query'.
source code
 
splittag(url)
splittag('/path#tag') --> '/path', 'tag'.
source code
 
splittype(url)
splittype('type:opaquestring') --> 'type', 'opaquestring'.
source code
 
splituser(host)
splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'.
source code
 
splitvalue(attr)
splitvalue('attr=value') --> 'attr', 'value'.
source code
 
test(args=[]) source code
 
test1() source code
 
thishost()
Return the IP address of the current host.
source code
 
toBytes(url)
toBytes(u"URL") --> 'URL'.
source code
 
unquote(s)
unquote('abc%20def') -> 'abc def'.
source code
 
unquote_plus(s)
unquote('%7e/abc+def') -> '~/abc def'
source code
 
unwrap(url)
unwrap('<URL:type://host/path>') --> 'type://host/path'.
source code
 
url2pathname(pathname)
OS-specific conversion from a relative URL of the 'file' scheme to a file system path; not recommended for general use.
source code
 
urlcleanup() source code
 
urlencode(query, doseq=0)
Encode a sequence of two-element tuples or dictionary into a URL query string.
source code
 
urlopen(url, data=...)
Returns: open file-like object
source code
 
urlretrieve(url, filename=None, reporthook=None, data=None) source code
Variables [hide private]
  MAXFTPCACHE = 10
  _ftperrors = None
  _hextochr = {'00': '\x00', '01': '\x01', '02': '\x02', '03': '...
  _hostprog = None
  _localhost = None
  _noheaders = None
  _nportprog = None
  _passwdprog = None
  _portprog = None
  _queryprog = None
  _safemaps = {('/', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop...
  _tagprog = None
  _thishost = None
  _typeprog = None
  _urlopener = None
  _userprog = None
  _valueprog = None
  always_safe = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstu...
  ftpcache = {}
Function Details [hide private]

getproxies()

source code 

Return a dictionary of scheme -> proxy server URL mappings.

Scan the environment for variables named <scheme>_proxy; this seems to be the standard convention. If you need a different way, you can pass a proxies dictionary to the [Fancy]URLopener constructor.

getproxies_environment()

source code 

Return a dictionary of scheme -> proxy server URL mappings.

Scan the environment for variables named <scheme>_proxy; this seems to be the standard convention. If you need a different way, you can pass a proxies dictionary to the [Fancy]URLopener constructor.

quote(s, safe='/')

source code 
quote('abc def') -> 'abc%20def'

Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.

RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | ","

Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.

By default, the quote function is intended for quoting the path
section of a URL.  Thus, it will not encode '/'.  This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.

splitnport(host, defport=-1)

source code 
Split host and port, returning numeric port. Return given default port if no ':' found; defaults to -1. Return numerical port if a valid number are found after ':'. Return None if ':' but not a valid number.

urlencode(query, doseq=0)

source code 

Encode a sequence of two-element tuples or dictionary into a URL query string.

If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.

If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.

urlopen(url, data=...)

source code 
Returns:
open file-like object


Variables Details [hide private]

_hextochr

Value:
{'00': '\x00',
 '01': '\x01',
 '02': '\x02',
 '03': '\x03',
 '04': '\x04',
 '05': '\x05',
 '06': '\x06',
 '07': '\x07',
...

_safemaps

Value:
{('/',
  'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-')\
: {'\x00': '%00',
   '\x01': '%01',
   '\x02': '%02',
   '\x03': '%03',
   '\x04': '%04',
   '\x05': '%05',
...

always_safe

Value:
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-'