debian/0000755000000000000000000000000012302376637007177 5ustar debian/postinst0000644000000000000000000000047311331173066011001 0ustar #!/bin/sh set -e if [ -d /usr/share/doc/urlgrabber-2.9.9 ]; then rm -rf /usr/share/doc/urlgrabber-2.9.9 fi if [ -d /usr/share/doc/urlgrabber-2.9.8 ]; then rm -rf /usr/share/doc/urlgrabber-2.9.8 fi if [ -d /usr/share/doc/urlgrabber-2.9.7 ]; then rm -rf /usr/share/doc/urlgrabber-2.9.7 fi #DEBHELPER# debian/docs0000644000000000000000000000000511331173066010035 0ustar TODO debian/copyright0000644000000000000000000000307411410165451011123 0ustar This package was debianized by Anand Kumria on Sun, 9 Oct 2005 13:06:55 +1000. It was originally downloaded from It can now be downloaded from Upstream Authors: Michael D. Stenner Ryan Tomayko Seth Vidal Copyright: © 2002-2006 Michael D. Stenner Ryan Tomayko Copyright: © 2009 Red Hat Inc, pycurl code written by Seth Vidal License: This package is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this package; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA On Debian systems, the complete text of the GNU Lesser General Public License can be found in `/usr/share/common-licenses/LGPL-2.1'. The Debian packaging is © 2007, Kevin Coyner and is licensed under the GPL, see `/usr/share/common-licenses/GPL'. debian/urlgrabber.xml0000644000000000000000000001452011331173066012042 0ustar urlgrabber 1 urlgrabber a high-level cross-protocol url-grabber. urlgrabber [OPTIONS] URL [FILE] DESCRIPTION urlgrabber is a binary program and python module for fetching files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features. OPTIONS --help, -h help page specifying available options to the binary program. --copy-local ignored except for file:// urls, in which case it specifies whether urlgrab should still make a copy of the file, or simply point to the existing copy. --throttle=NUMBER if it's an int, it's the bytes/second throttle limit. If it's a float, it is first multiplied by bandwidth. If throttle == 0, throttling is disabled. If None, the module-level default (which can be set with set_throttle) is used. --bandwidth=NUMBER the nominal max bandwidth in bytes/second. If throttle is a float and bandwidth == 0, throttling is disabled. If None, the module-level default (which can be set with set_bandwidth) is used. --range=RANGE a tuple of the form first_byte,last_byte describing a byte range to retrieve. Either or both of the values may be specified. If first_byte is None, byte offset 0 is assumed. If last_byte is None, the last byte available is assumed. Note that both first and last_byte values are inclusive so a range of (10,11) would return the 10th and 11th bytes of the resource. --user-agent=STR the user-agent string provide if the url is HTTP. --retry=NUMBER the number of times to retry the grab before bailing. If this is zero, it will retry forever. This was intentional… really, it was :). If this value is not supplied or is supplied but is None retrying does not occur. --retrycodes a sequence of errorcodes (values of e.errno) for which it should retry. See the doc on URLGrabError for more details on this. retrycodes defaults to -1,2,4,5,6,7 if not specified explicitly. MODULE USE EXAMPLES In its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's file if you're just reading: from urlgrabber import urlopen fo = urlopen(url) data = fo.read() fo.close() Here, the url can be http, https, ftp, or file. It's also pretty smart so if you just give it something like /tmp/foo, it will figure it out. For even more fun, you can also do: from urlgrabber import urlopen local_filename = urlgrab(url) # grab a local copy of the file data = urlread(url) # just read the data into a string Now, like urllib2, what's really happening here is that you're using a module-level object (called a grabber) that kind of serves as a default. That's just fine, but you might want to get your own private version for a couple of reasons: * it's a little ugly to modify the default grabber because you have to reach into the module to do it * you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior Therefore, you're probably better off making your own. This also gives you lots of flexibility for later, as you'll see: from urlgrabber.grabber import URLGrabber g = URLGrabber() data = g.urlread(url) This is nice because you can specify options when you create the grabber. For example, let's turn on simple reget mode so that if we have part of a file, we only need to fetch the rest: from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url) The available options are listed in the module documentation, and can usually be specified as a default at the grabber-level or as options to the method: from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url, filename=None, reget=None) AUTHORS Written by: Michael D. Stenner <mstenner@linux.duke.edu> Ryan Tomayko <rtomayko@naeblis.cx> This manual page was written by Kevin Coyner <kevin@rustybear.com> for the Debian system (but may be used by others). It borrows heavily on the documentation included in the urlgrabber module. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. RESOURCES Main web site: http://linux.duke.edu/projects/urlgrabber/ debian/changelog0000644000000000000000000001271312302376637011055 0ustar urlgrabber (3.9.1-4ubuntu3) trusty; urgency=medium * Rebuild to drop files installed into /usr/share/pyshared. -- Matthias Klose Sun, 23 Feb 2014 13:54:39 +0000 urlgrabber (3.9.1-4ubuntu2) precise; urgency=low * Rebuild to drop python2.6 dependencies. -- Matthias Klose Sat, 31 Dec 2011 02:15:09 +0000 urlgrabber (3.9.1-4ubuntu1) oneiric; urgency=low * Apply 674d545e from upstream development version to fix incorrect documentation for progress_object callback. (LP #776555) * Convert to dh_python2. -- Daniel T Chen Tue, 09 Aug 2011 17:45:08 -0400 urlgrabber (3.9.1-4) unstable; urgency=low * Add two patches created from upstream development version. Closes: #587575. * Changed to 3.0 quilt format: + Add quilt to build-depends. + Add quilt command to debian/rules. -- Kevin Coyner Thu, 08 Jul 2010 17:40:08 +0000 urlgrabber (3.9.1-3) unstable; urgency=low * Add Depends on python-pycurl. Closes: #587000. -- Kevin Coyner Fri, 25 Jun 2010 02:04:19 +0000 urlgrabber (3.9.1-2) unstable; urgency=low * Install with only the default python version to avoid unnecessary depends on python2.5. Changes made to debian/rules. Closes: #587006. Thanks to Stefano Rivera. * Debian files uploaded to svn repository on alioth for python modules. Closes: #587004. -- Kevin Coyner Fri, 25 Jun 2010 01:25:25 +0000 urlgrabber (3.9.1-1) unstable; urgency=low * New upstream release. The main backend was changed from from urllib2 to pycurl. The API is identical. Callers do not need to change anything. Closes: #518436, #517993, #493251, #586400, #529752. * debian/control: + Bumped standards version to 3.8.4. No changes needed. + Bumped debhelper version to 7.4~. + Removed build dependency on dpatch. + Added build dependency on python-pycurl. + Updated homepage. + Set XS-Python-Version: >= 2.5 * Removed keepalive.py patch. * Updated debian/watch for new homepage. * Update homepage reference in debian/copyright. Closes: #586399. Updated copyright information for additional new author Seth Vidal. * Add debian/source/format file set to 3.0 (quilt). * debian/rules: + Run setup.py with current python version only. + Respect nocheck in DEB_BULID_OPTIONS (although failed tests don't abort + Changed dh_clean -k to dh_prep to conform with debhelper version 7. build, as several tests fail) * Bumped debian/compat from 5 to 7. -- Kevin Coyner Mon, 21 Jun 2010 20:36:19 +0000 urlgrabber (3.1.0-5) unstable; urgency=low [ Piotr Ożarowski ] * Homepage field added * Rename XS-Vcs-* fields to Vcs-* (dpkg supports them now) [ Sandro Tosi ] * debian/control - switch Vcs-Browser field to viewsvn [ Jakub Wilk ] * Build-depend on python-all rather than python-all-dev. * Remove superfluous references to CFLAGS from debian/rules. * Prepare for Python 2.6 transition (closes: #556161). * Typographical fixes in debian/copyright. * Bump standards version to 3.8.3 (no additional changes needed). * Point to the versioned LGPL-2.1 in debian/copyright. * Add README.source. -- Debian Python Modules Team Sat, 14 Nov 2009 09:37:03 +0100 urlgrabber (3.1.0-4) unstable; urgency=low * Patch to have urlgrabber.keepalive.HTTPHandler use Request.get_method() to determine the appropriate HTTP method. Thanks to Jakub Wilk. Closes: #433724 * Changed maintainer e-mail to reflect new Debian account. * Added dpatch as Build-Depends to debian/control. -- Kevin Coyner Sat, 04 Aug 2007 21:52:14 -0400 urlgrabber (3.1.0-3) unstable; urgency=low * debian/control: Added python modules packaging team to uploaders and added VCS fields. -- Kevin Coyner Mon, 09 Apr 2007 19:27:36 -0600 urlgrabber (3.1.0-2) unstable; urgency=low * debian/control: Changed "Architecture: any" to all. -- Kevin Coyner Mon, 09 Apr 2007 15:20:02 -0600 urlgrabber (3.1.0-1) unstable; urgency=low * New upstream release. * New maintainer. (Closes: #418095) * Added man page. * Cleaned up cruft in debian/rules. * Rewrote debian/copyright. * Cleaned up debian/control and added homepage. * Added debian/README.Debian. * Added debian/postinst to clean up unneeded docs that were inappropriately added in previous versions. * Removed unneeded debian/pycompat file. -- Kevin Coyner Fri, 06 Apr 2007 22:27:03 -0400 urlgrabber (2.9.9-1) unstable; urgency=low * New upstream release * Apply Ana Beatriz Guerrero Lopez's patch to * Update to new Python policy (Closes: #373402) * Switch to standards version 3.7.2 * Update to debhelper compat level 5 * Thanks Ana! -- Anand Kumria Thu, 6 Jul 2006 09:16:37 +1000 urlgrabber (2.9.7-2) unstable; urgency=low * When I imported urlgrabber into bzr, I somehow lost a Build-Dep: on python. Re-adding it so I can (Closes: #335340) -- Anand Kumria Sat, 31 Dec 2005 15:34:22 +1100 urlgrabber (2.9.7-1) unstable; urgency=low * New upstream release (Closes: #344934) -- Anand Kumria Sat, 31 Dec 2005 15:34:22 +1100 urlgrabber (2.9.6-1) unstable; urgency=low * Initial release (Closes: #312698) -- Anand Kumria Sun, 9 Oct 2005 13:06:55 +1000 debian/compat0000644000000000000000000000000211410233004010350 0ustar 7 debian/patches/0000755000000000000000000000000011620324527010617 5ustar debian/patches/progress_fix.diff0000644000000000000000000000067111415412171014162 0ustar --- urlgrabber-3.9.1/urlgrabber/progress.py.orig 2010-07-02 21:25:51.000000000 -0400 +++ urlgrabber-3.9.1/urlgrabber/progress.py 2010-07-02 20:30:25.000000000 -0400 @@ -658,6 +658,8 @@ if seconds is None or seconds < 0: if use_hours: return '--:--:--' else: return '--:--' + elif seconds == float('inf'): + return 'Infinite' else: seconds = int(seconds) minutes = seconds / 60 debian/patches/progress_object_callback_fix.diff0000644000000000000000000000144611620324530017324 0ustar From: James Antill Date: Thu, 19 May 2011 20:17:14 +0000 (-0400) Subject: Fix documentation for progress_object callback. X-Git-Url: http://yum.baseurl.org/gitweb?p=urlgrabber.git;a=commitdiff_plain;h=674d545ee303aa99701ffb982536851572d8db77 Fix documentation for progress_object callback. --- diff --git a/urlgrabber/grabber.py b/urlgrabber/grabber.py index 36212cf..f6f57bd 100644 --- a/urlgrabber/grabber.py +++ b/urlgrabber/grabber.py @@ -49,7 +49,7 @@ GENERAL ARGUMENTS (kwargs) progress_obj = None a class instance that supports the following methods: - po.start(filename, url, basename, length, text) + po.start(filename, url, basename, size, now, text) # length will be None if unknown po.update(read) # read == bytes read so far po.end() debian/patches/grabber_fix.diff0000644000000000000000000002116611415412207013724 0ustar --- urlgrabber-3.9.1/urlgrabber/grabber.py.orig 2010-07-02 21:24:12.000000000 -0400 +++ urlgrabber-3.9.1/urlgrabber/grabber.py 2010-07-02 20:30:25.000000000 -0400 @@ -68,14 +68,14 @@ (which can be set on default_grabber.throttle) is used. See BANDWIDTH THROTTLING for more information. - timeout = None + timeout = 300 - a positive float expressing the number of seconds to wait for socket - operations. If the value is None or 0.0, socket operations will block - forever. Setting this option causes urlgrabber to call the settimeout - method on the Socket object used for the request. See the Python - documentation on settimeout for more information. - http://www.python.org/doc/current/lib/socket-objects.html + a positive integer expressing the number of seconds to wait before + timing out attempts to connect to a server. If the value is None + or 0, connection attempts will not time out. The timeout is passed + to the underlying pycurl object as its CONNECTTIMEOUT option, see + the curl documentation on CURLOPT_CONNECTTIMEOUT for more information. + http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTCONNECTTIMEOUT bandwidth = 0 @@ -439,6 +439,12 @@ except: __version__ = '???' +try: + # this part isn't going to do much - need to talk to gettext + from i18n import _ +except ImportError, msg: + def _(st): return st + ######################################################################## # functions for debugging output. These functions are here because they # are also part of the module initialization. @@ -808,7 +814,7 @@ self.prefix = None self.opener = None self.cache_openers = True - self.timeout = None + self.timeout = 300 self.text = None self.http_headers = None self.ftp_headers = None @@ -1052,9 +1058,15 @@ self._reget_length = 0 self._prog_running = False self._error = (None, None) - self.size = None + self.size = 0 + self._hdr_ended = False self._do_open() + + def geturl(self): + """ Provide the geturl() method, used to be got from + urllib.addinfourl, via. urllib.URLopener.* """ + return self.url def __getattr__(self, name): """This effectively allows us to wrap at the instance level. @@ -1085,9 +1097,14 @@ return -1 def _hdr_retrieve(self, buf): + if self._hdr_ended: + self._hdr_dump = '' + self.size = 0 + self._hdr_ended = False + if self._over_max_size(cur=len(self._hdr_dump), max_size=self.opts.max_header_size): - return -1 + return -1 try: self._hdr_dump += buf # we have to get the size before we do the progress obj start @@ -1104,7 +1121,17 @@ s = parse150(buf) if s: self.size = int(s) - + + if buf.lower().find('location') != -1: + location = ':'.join(buf.split(':')[1:]) + location = location.strip() + self.scheme = urlparse.urlsplit(location)[0] + self.url = location + + if len(self._hdr_dump) != 0 and buf == '\r\n': + self._hdr_ended = True + if DEBUG: DEBUG.info('header ended:') + return len(buf) except KeyboardInterrupt: return pycurl.READFUNC_ABORT @@ -1113,8 +1140,10 @@ if self._parsed_hdr: return self._parsed_hdr statusend = self._hdr_dump.find('\n') + statusend += 1 # ridiculous as it may seem. hdrfp = StringIO() hdrfp.write(self._hdr_dump[statusend:]) + hdrfp.seek(0) self._parsed_hdr = mimetools.Message(hdrfp) return self._parsed_hdr @@ -1136,6 +1165,7 @@ self.curl_obj.setopt(pycurl.PROGRESSFUNCTION, self._progress_update) self.curl_obj.setopt(pycurl.FAILONERROR, True) self.curl_obj.setopt(pycurl.OPT_FILETIME, True) + self.curl_obj.setopt(pycurl.FOLLOWLOCATION, True) if DEBUG: self.curl_obj.setopt(pycurl.VERBOSE, True) @@ -1148,9 +1178,11 @@ # timeouts timeout = 300 - if opts.timeout: - timeout = int(opts.timeout) - self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) + if hasattr(opts, 'timeout'): + timeout = int(opts.timeout or 0) + self.curl_obj.setopt(pycurl.CONNECTTIMEOUT, timeout) + self.curl_obj.setopt(pycurl.LOW_SPEED_LIMIT, 1) + self.curl_obj.setopt(pycurl.LOW_SPEED_TIME, timeout) # ssl options if self.scheme == 'https': @@ -1276,7 +1308,7 @@ raise err elif errcode == 60: - msg = _("client cert cannot be verified or client cert incorrect") + msg = _("Peer cert cannot be verified or peer cert invalid") err = URLGrabError(14, msg) err.url = self.url raise err @@ -1291,7 +1323,12 @@ raise err elif str(e.args[1]) == '' and self.http_code != 0: # fake it until you make it - msg = 'HTTP Error %s : %s ' % (self.http_code, self.url) + if self.scheme in ['http', 'https']: + msg = 'HTTP Error %s : %s ' % (self.http_code, self.url) + elif self.scheme in ['ftp']: + msg = 'FTP Error %s : %s ' % (self.http_code, self.url) + else: + msg = "Unknown Error: URL=%s , scheme=%s" % (self.url, self.scheme) else: msg = 'PYCURL ERROR %s - "%s"' % (errcode, str(e.args[1])) code = errcode @@ -1299,6 +1336,12 @@ err.code = code err.exception = e raise err + else: + if self._error[1]: + msg = self._error[1] + err = URLGRabError(14, msg) + err.url = self.url + raise err def _do_open(self): self.curl_obj = _curl_cache @@ -1446,9 +1489,23 @@ # set the time mod_time = self.curl_obj.getinfo(pycurl.INFO_FILETIME) if mod_time != -1: - os.utime(self.filename, (mod_time, mod_time)) + try: + os.utime(self.filename, (mod_time, mod_time)) + except OSError, e: + err = URLGrabError(16, _(\ + 'error setting timestamp on file %s from %s, OSError: %s') + % (self.filenameself.url, e)) + err.url = self.url + raise err # re open it - self.fo = open(self.filename, 'r') + try: + self.fo = open(self.filename, 'r') + except IOError, e: + err = URLGrabError(16, _(\ + 'error opening file from %s, IOError: %s') % (self.url, e)) + err.url = self.url + raise err + else: #self.fo = open(self._temp_name, 'r') self.fo.seek(0) @@ -1532,11 +1589,14 @@ def _over_max_size(self, cur, max_size=None): if not max_size: - max_size = self.size - if self.opts.size: # if we set an opts size use that, no matter what - max_size = self.opts.size + if not self.opts.size: + max_size = self.size + else: + max_size = self.opts.size + if not max_size: return False # if we have None for all of the Max then this is dumb - if cur > max_size + max_size*.10: + + if cur > int(float(max_size) * 1.10): msg = _("Downloaded more than max size for %s: %s > %s") \ % (self.url, cur, max_size) @@ -1582,9 +1642,21 @@ self.opts.progress_obj.end(self._amount_read) self.fo.close() - + def geturl(self): + """ Provide the geturl() method, used to be got from + urllib.addinfourl, via. urllib.URLopener.* """ + return self.url + _curl_cache = pycurl.Curl() # make one and reuse it over and over and over +def reset_curl_obj(): + """To make sure curl has reread the network/dns info we force a reload""" + global _curl_cache + _curl_cache.close() + _curl_cache = pycurl.Curl() + + + ##################################################################### # DEPRECATED FUNCTIONS debian/patches/series0000644000000000000000000000010511620324411012020 0ustar grabber_fix.diff progress_fix.diff progress_object_callback_fix.diff debian/README.Debian0000644000000000000000000000042211331173066011226 0ustar urlgrabber for Debian --------------------- The files keepalive.py and byterange.py are generic urllib2 extension modules and can be used to add keepalive and range support to any urllib2 application. -- Kevin Coyner Fri, 6 Apr 2007 22:01:01 -0400 debian/control0000644000000000000000000000262011620326276010576 0ustar Source: urlgrabber Section: python Priority: optional Maintainer: Ubuntu Developers XSBC-Original-Maintainer: Kevin Coyner Uploaders: Debian Python Modules Team Build-Depends: debhelper (>= 7.4~), python-all (>= 2.6.6-3~), python-pycurl, quilt (>= 0.46-7~) Standards-Version: 3.8.4 Homepage: http://urlgrabber.baseurl.org/ Vcs-Svn: svn://svn.debian.org/python-modules/packages/urlgrabber/trunk/ Vcs-Browser: http://svn.debian.org/viewsvn/python-modules/packages/urlgrabber/trunk/ XS-Python-Version: >= 2.5 Package: python-urlgrabber Architecture: all Depends: ${shlibs:Depends}, ${misc:Depends}, ${python:Depends}, python-pycurl Provides: ${python:Provides} Description: A high-level cross-protocol url-grabber urlgrabber dramatically simplifies the fetching of files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features. This package provides both a binary and a module, both of the name urlgrabber. . It supports identical behavior for http://, ftp:// and file:/// URIs. It provides HTTP keepalive, byte ranges, regets, progress meters, throttling, retries, access to authenticated http/ftp servers, and proxies. Additionally it has the ability to treat a list of mirrors as a single source and to automatically switch mirrors if there is a failure. debian/dirs0000644000000000000000000000001011331173066010042 0ustar usr/bin debian/urlgrabber.txt0000644000000000000000000001160411331173066012061 0ustar URLGRABBER(1) ============= NAME ---- urlgrabber - a high-level cross-protocol url-grabber. SYNOPSIS -------- 'urlgrabber' [OPTIONS] URL [FILE] DESCRIPTION ----------- urlgrabber is a binary program and python module for fetching files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features. OPTIONS ------- --help, -h:: help page specifying available options to the binary program. --copy-local:: ignored except for file:// urls, in which case it specifies whether urlgrab should still make a copy of the file, or simply point to the existing copy. --throttle=NUMBER:: if it's an int, it's the bytes/second throttle limit. If it's a float, it is first multiplied by bandwidth. If throttle == 0, throttling is disabled. If None, the module-level default (which can be set with set_throttle) is used. --bandwidth=NUMBER:: the nominal max bandwidth in bytes/second. If throttle is a float and bandwidth == 0, throttling is disabled. If None, the module-level default (which can be set with set_bandwidth) is used. --range=RANGE:: a tuple of the form first_byte,last_byte describing a byte range to retrieve. Either or both of the values may be specified. If first_byte is None, byte offset 0 is assumed. If last_byte is None, the last byte available is assumed. Note that both first and last_byte values are inclusive so a range of (10,11) would return the 10th and 11th bytes of the resource. --user-agent=STR:: the user-agent string provide if the url is HTTP. --retry=NUMBER:: the number of times to retry the grab before bailing. If this is zero, it will retry forever. This was intentional... really, it was :). If this value is not supplied or is supplied but is None retrying does not occur. --retrycodes:: a sequence of errorcodes (values of e.errno) for which it should retry. See the doc on URLGrabError for more details on this. retrycodes defaults to -1,2,4,5,6,7 if not specified explicitly. MODULE USE EXAMPLES ------------------- In its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's file if you're just reading: .................................. from urlgrabber import urlopen fo = urlopen(url) data = fo.read() fo.close() .................................. Here, the url can be http, https, ftp, or file. It's also pretty smart so if you just give it something like /tmp/foo, it will figure it out. For even more fun, you can also do: .................................. from urlgrabber import urlopen local_filename = urlgrab(url) # grab a local copy of the file data = urlread(url) # just read the data into a string .................................. Now, like urllib2, what's really happening here is that you're using a module-level object (called a grabber) that kind of serves as a default. That's just fine, but you might want to get your own private version for a couple of reasons: .................................. * it's a little ugly to modify the default grabber because you have to reach into the module to do it * you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior .................................. Therefore, you're probably better off making your own. This also gives you lots of flexibility for later, as you'll see: .................................. from urlgrabber.grabber import URLGrabber g = URLGrabber() data = g.urlread(url) .................................. This is nice because you can specify options when you create the grabber. For example, let's turn on simple reget mode so that if we have part of a file, we only need to fetch the rest: .................................. from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url) .................................. The available options are listed in the module documentation, and can usually be specified as a default at the grabber-level or as options to the method: from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url, filename=None, reget=None) AUTHORS ------- Written by: Michael D. Stenner Ryan Tomayko This manual page was written by Kevin Coyner for the Debian system (but may be used by others). It borrows heavily on the documentation included in the urlgrabber module. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. RESOURCES --------- Main web site: http://linux.duke.edu/projects/urlgrabber/[] debian/rules0000755000000000000000000000354611620335655010264 0ustar #!/usr/bin/make -f # -*- makefile -*- # Sample debian/rules that uses debhelper. # This file was originally written by Joey Hess and Craig Small. # As a special exception, when this file is copied by dh-make into a # dh-make output file, you may use that output file without restriction. # This special exception was added by Craig Small in version 0.37 of dh-make. # Uncomment this to turn on verbose mode. #export DH_VERBOSE=1 %: dh $@ --with quilt,python2 PYVERS = $(shell pyversions -r) configure: configure-stamp configure-stamp: dh_testdir touch configure-stamp build: build-stamp build-stamp: configure-stamp dh_testdir python setup.py build ifeq (,$(findstring nocheck,$(DEB_BUILD_OPTIONS))) # we should set -e, but all the tests fail. for py in $(PYVERS); do \ $$py test/runtests.py; \ done endif touch build-stamp clean: dh_testdir dh_testroot -rm -f build-stamp configure-stamp for py in $(PYVERS); do \ $$py setup.py clean; \ done python setup.py clean find $(CURDIR) -name "*.pyc" -exec rm -f '{}' \; dh_clean install: build dh_testdir dh_testroot dh_prep dh_installdirs # Add here commands to install the package into debian/urlgrabber. python setup.py install --prefix=/usr --root=$(CURDIR)/debian/python-urlgrabber --install-layout=deb # remove unneeded documents installed by setup.py -rm -rf $(CURDIR)/debian/python-urlgrabber/usr/share/doc/urlgrabber-3* # Build architecture-independent files here. binary-indep: build install dh_testdir dh_testroot dh_installchangelogs ChangeLog dh_installdocs dh_installman debian/urlgrabber.1 dh_link dh_strip dh_compress dh_fixperms dh_python2 dh_installdeb dh_shlibdeps dh_gencontrol dh_md5sums dh_builddeb # Build architecture-dependent files here. binary-arch: build install binary: binary-indep binary-arch .PHONY: build clean binary-indep binary-arch binary install configure debian/source/0000755000000000000000000000000011410233457010466 5ustar debian/source/format0000644000000000000000000000001411410233457011674 0ustar 3.0 (quilt) debian/README.source0000644000000000000000000000047711331173066011356 0ustar This package uses dpatch to manage all modifications to the upstream source. Changes are stored in the source package as diffs in debian/patches and applied during the build. Please see: /usr/share/doc/dpatch/README.source.gz for more information on how to apply the patches, modify patches, or remove a patch. debian/watch0000644000000000000000000000011211410111575010205 0ustar version=3 http://urlgrabber.baseurl.org/download/urlgrabber-(.*)\.tar\.gz debian/urlgrabber.10000644000000000000000000001152311331173066011402 0ustar .\" Title: urlgrabber .\" Author: .\" Generator: DocBook XSL Stylesheets v1.72.0 .\" Date: 04/09/2007 .\" Manual: .\" Source: .\" .TH "URLGRABBER" "1" "04/09/2007" "" "" .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .SH "NAME" urlgrabber \- a high\-level cross\-protocol url\-grabber. .SH "SYNOPSIS" \fIurlgrabber\fR [OPTIONS] URL [FILE] .sp .SH "DESCRIPTION" urlgrabber is a binary program and python module for fetching files. It is designed to be used in programs that need common (but not necessarily simple) url\-fetching features. .sp .SH "OPTIONS" .PP \-\-help, \-h .RS 4 help page specifying available options to the binary program. .RE .PP \-\-copy\-local .RS 4 ignored except for file:// urls, in which case it specifies whether urlgrab should still make a copy of the file, or simply point to the existing copy. .RE .PP \-\-throttle=NUMBER .RS 4 if it's an int, it's the bytes/second throttle limit. If it's a float, it is first multiplied by bandwidth. If throttle == 0, throttling is disabled. If None, the module\-level default (which can be set with set_throttle) is used. .RE .PP \-\-bandwidth=NUMBER .RS 4 the nominal max bandwidth in bytes/second. If throttle is a float and bandwidth == 0, throttling is disabled. If None, the module\-level default (which can be set with set_bandwidth) is used. .RE .PP \-\-range=RANGE .RS 4 a tuple of the form first_byte,last_byte describing a byte range to retrieve. Either or both of the values may be specified. If first_byte is None, byte offset 0 is assumed. If last_byte is None, the last byte available is assumed. Note that both first and last_byte values are inclusive so a range of (10,11) would return the 10th and 11th bytes of the resource. .RE .PP \-\-user\-agent=STR .RS 4 the user\-agent string provide if the url is HTTP. .RE .PP \-\-retry=NUMBER .RS 4 the number of times to retry the grab before bailing. If this is zero, it will retry forever. This was intentional\&... really, it was :). If this value is not supplied or is supplied but is None retrying does not occur. .RE .PP \-\-retrycodes .RS 4 a sequence of errorcodes (values of e.errno) for which it should retry. See the doc on URLGrabError for more details on this. retrycodes defaults to \-1,2,4,5,6,7 if not specified explicitly. .RE .SH "MODULE USE EXAMPLES" In its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's file if you're just reading: .sp .RS 4 .nf from urlgrabber import urlopen fo = urlopen(url) data = fo.read() fo.close() .fi .sp .RE Here, the url can be http, https, ftp, or file. It's also pretty smart so if you just give it something like /tmp/foo, it will figure it out. For even more fun, you can also do: .sp .RS 4 .nf from urlgrabber import urlopen local_filename = urlgrab(url) # grab a local copy of the file data = urlread(url) # just read the data into a string .fi .sp .RE Now, like urllib2, what's really happening here is that you're using a module\-level object (called a grabber) that kind of serves as a default. That's just fine, but you might want to get your own private version for a couple of reasons: .sp .RS 4 .nf * it's a little ugly to modify the default grabber because you have to reach into the module to do it * you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior .fi .sp .RE Therefore, you're probably better off making your own. This also gives you lots of flexibility for later, as you'll see: .sp .RS 4 .nf from urlgrabber.grabber import URLGrabber g = URLGrabber() data = g.urlread(url) .fi .sp .RE This is nice because you can specify options when you create the grabber. For example, let's turn on simple reget mode so that if we have part of a file, we only need to fetch the rest: .sp .RS 4 .nf from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url) .fi .sp .RE The available options are listed in the module documentation, and can usually be specified as a default at the grabber\-level or as options to the method: .sp .RS 4 .nf from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url, filename=None, reget=None) .fi .sp .RE .SH "AUTHORS" Written by: Michael D. Stenner Ryan Tomayko .sp This manual page was written by Kevin Coyner for the Debian system (but may be used by others). It borrows heavily on the documentation included in the urlgrabber module. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. .sp .SH "RESOURCES" Main web site: \fIhttp://linux.duke.edu/projects/urlgrabber/\fR .sp