squid-prefetch-1.1/0000755000000000000000000000000012065721047011170 5ustar squid-prefetch-1.1/debian/0000755000000000000000000000000012065722013012404 5ustar squid-prefetch-1.1/debian/copyright0000644000000000000000000000102312065716613014344 0ustar This is the Debian GNU/Linux prepackaged version of Squid-Prefetch. This package was put together by Brian White . This code is released to the public domain (the only true "free"). Squid-Prefetch was written by Brian White and has been released into the public domain (the only true "free"). It would be apprecated if copies of any improvements of bug-fixes would be sent to the author in order to be incorporated into the main upstream code and propogated to all users of this software. squid-prefetch-1.1/debian/install0000644000000000000000000000006412065717007014004 0ustar squid-prefetch /usr/sbin/ squid-prefetch.conf /etc/ squid-prefetch-1.1/debian/dirs0000644000000000000000000000001712065717321013274 0ustar /usr/sbin /etc squid-prefetch-1.1/debian/changelog0000644000000000000000000000437412065722013014266 0ustar squid-prefetch (1.1-3) unstable; urgency=low * QA upload. * Change maintainer to QA Group * debian/ cleanup: - use debhelper/dh - introduce at least a basic syntax check * Apply patch from Daniel Dumitrache to fix error reporting broken by 1.0-1.1. (closes: #370168) -- Frank Lichtenheld Mon, 24 Dec 2012 01:34:09 +0100 squid-prefetch (1.1-2.3) unstable; urgency=low * Non-maintainer upload. * Fix "maintainer-script-calls-init-script-directly prerm:3 than using invoke-rc.d." (closes: #553129). -- gregor herrmann Sat, 28 Nov 2009 15:02:17 +0100 squid-prefetch (1.1-2.2) unstable; urgency=low * Non-maintainer upload to solve release goal. * Add LSB dependency header to init.d scripts (Closes: #467407). -- Petter Reinholdtsen Sun, 30 Mar 2008 18:16:27 +0200 squid-prefetch (1.1-2.1) unstable; urgency=low * Non-maintainer upload. * debian/rules: fixed bashisms. (Closes: #459131) -- Miguel Angel Ruiz Manzano Mon, 21 Jan 2008 12:15:15 -0300 squid-prefetch (1.1-2) unstable; urgency=medium * added postinst warning if /etc/squid.conf still exists (closes: #308010) * make use of invoke-rc.d (closes: #367767) * allow running an unprivileged user (closes: 308011) * Fix alarm timer handling in case of exceptions; patch from Matej Vela. (closes: #349745) -- Brian White Mon, 4 Dec 2006 14:47:44 -0500 squid-prefetch (1.0-1) unstable; urgency=medium * fixed problem locating squid config file (closes: #267737) * fixed (I think) URL looping problem (closes: #234441) * fixed problem with invalid-url error on prefetch (closes: #234693) -- Brian White Tue, 4 Jan 2005 09:39:52 -0500 squid-prefetch (0.7-1) unstable; urgency=medium * fixed problem reading config file (closes: #237993) -- Brian White Mon, 5 Apr 2004 07:56:32 -0400 squid-prefetch (0.6-1) unstable; urgency=medium * minor bug fixes -- Brian White Mon, 11 Aug 2003 09:45:02 -0400 squid-prefetch (0.5-1) experimental; urgency=low * first Debianized version -- Brian White Mon, 11 Aug 2003 09:45:02 -0400 Local variables: mode: debian-changelog End: squid-prefetch-1.1/debian/init.d0000644000000000000000000000211210773736326013530 0ustar #! /bin/sh ### BEGIN INIT INFO # Provides: squid-prefetch # Required-Start: $remote_fs $syslog squid # Required-Stop: $remote_fs $syslog squid # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 ### END INIT INFO DAEMON=/usr/sbin/squid-prefetch NAME=squid-prefetch DESC="prefetch daemon for squid" test -f $DAEMON || exit 0 set -e case "$1" in start) echo -n "Starting $DESC: " start-stop-daemon --start --background --quiet --make-pidfile --pidfile /var/run/$NAME.pid \ --exec $DAEMON >/dev/null 2>&1 echo "$NAME." ;; stop) echo -n "Stopping $DESC: " start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid >/dev/null 2>&1 rm -f /var/run/$NAME.pid echo "$NAME." ;; restart|force-reload) # # If the "reload" option is implemented, move the "force-reload" # option to the "reload" entry above. If not, "force-reload" is # just the same as "restart". # $0 stop $0 start ;; *) N=/etc/init.d/$NAME # echo "Usage: $N {start|stop|restart|reload|force-reload}" >&2 echo "Usage: $N {start|stop|restart|force-reload}" >&2 exit 1 ;; esac exit 0 squid-prefetch-1.1/debian/control0000644000000000000000000000135412065721350014015 0ustar Source: squid-prefetch Section: web Priority: optional Maintainer: Debian QA Group Standards-Version: 3.7.2 Build-Depends: debhelper (>= 9~), libwww-perl, liburi-perl Package: squid-prefetch Depends: squid, libwww-perl, liburi-perl, ${perl:Depends}, ${misc:Depends} Architecture: all Description: Simple page-prefetch for Squid web proxy Squid-Prefetch will perform early fetches of pages linked to by pages already read. This means that a user that clicks on a link will have that new page appear instantly instead of having to wait for it to be fetched from the Internet. Only text pages are prefetched on the assumption that the images can be loaded later so long as the text of a page is available for display. squid-prefetch-1.1/debian/compat0000644000000000000000000000000212065717164013615 0ustar 9 squid-prefetch-1.1/debian/rules0000755000000000000000000000021112065721117013462 0ustar #! /usr/bin/make -f # %: dh $@ override_dh_installinit: dh_installinit -- defaults 31 override_dh_auto_test: perl -c squid-prefetch squid-prefetch-1.1/squid-prefetch.conf0000644000000000000000000000772510535074706015000 0ustar ############################################################################### # # Squid-Prefetch Configuration File # # All options appear as # # option_name value # # Blank lines and anything after a "#" is ignored as a comment. This is the # same format as the master squid configuration file. Options are taken first # from this file, then from the master squid configuration, and then built-in # defaults. # ############################################################################### # # prefetch_user, prefetch_group: Userid and group to run as. If set # to non-root, the prefetch program will discard any superuser # privileges it has and run instead as this user. Common userids for # this are "proxy" or "nobody". Make sure that user has access to the # access-log file or squid-prefetch will not run. Numeric ids are # acceptable. # #prefetch_user root #prefetch_group root # # squid_config_file: Pathname of the master squid configuration file for # reading options. Currently, the only option that would be read from that # location are the "cache_access_log" and "http_port" options. # #squid_config_file /etc/squid/squid.conf # # http_proxy: IP address to use for accessing the squid cache. Since the # prefetch program must have access to the squid log files, it's reasonable to # assume that squid is running on the local host. # #http_host 127.0.0.1 # # http_port: What port number to use for accessing the squid cache. Squid # normally runs on port 3128 but can be changed. This value will be read from # the master squid configuration file if not specified here. # #http_port 3128 # # max_history_size: How many fetched URLs to remember. This allows the # prefetch program to avoid doing prefetch on pages it has already done it # for, thus reducing network bandwidth and system load. This is really the # only part of the program that consumes memory, so a balance must be struck # between those two things. # #max_history_size 5000 # # max_history_age: How old a prefetch must be before it will be done again (in # seconds). For those items that have not been expired from the history log # due to size limitations are checked against their age. If the last prefetch # operation was done longer ago than this amount of time, then another is # done. # #max_history_age 86400 # # prefetch_regex: This (perl) pattern is matched against all URLs to limit # what pages get acted upon. This provides a first pruning of what links need # to be acted upon. It's worth using this pattern to indicate only those # pages which can contain text (HTML or otherwise) since other types will be # ignored as soon as the header comes back from the server. Extensions that # are used for dynamically generated pages are best avoided since they are # not usually cacheable. # #prefetch_regex http://.*(\.(html?|te?xt)|/[^\.]*) # # prefetch_options: Allow fetching of URLs with options (anything after "?"). # For the most part, pages that take options are dynamic content and # prefetching them accomplishes little. Use "1" for "yes", "0" for "no". # #prefetch_options 0 # # prefetch_fragments: Allow fetching of URLs with fragment specifications # (anything after "#"). Links to pages with fragments are often word # definitions and the like. Fetching a single page may cache many different # links because it will fetch all the fragments on that page. Use "1" for # "yes", "0" for "no". # #prefetch_fragments 1 # # prefetch_maxsize: This is the maximum size of page that will be prefetched. # It should be large enough to contain most text pages but not so large as to # waste bandwidth fetching huge pages on the off chance a user will go there. # The size is measured in bytes. # #prefetch_maxsize 65536 # # prefetch_cross: Allow prefetching of pages on a different host than the one # doing the linking. If this option is set ("1", then links to other hosts # will also be prefetched. Otherwise ("0"), only pages on the same host as # the page currently being viewed will be fetched. # #prefetch_cross 0 squid-prefetch-1.1/squid-prefetch0000755000000000000000000002310512065721047014042 0ustar #! /usr/bin/perl ############################################################################### # # Squid-Prefetch (v1.1) # # Written by Brian White . # This program has been placed in the public domain (the only true "free"). # ############################################################################### use URI; use Net::HTTP; @ConfFiles = qw(./squid-prefetch.conf $ENV{HOME}/.squid-prefetch /etc/squid-prefetch.conf); $FetchPattern = 'http://.*(\.(html?|te?xt)|/[^\.]*)'; $AccessFile = ""; $LastAccess = 0; %Config; %Squid; %DoneTime; %DoneCount; %DoneFetch; %DonePrefetch; @DoneList; @LinkList; ############################################################################### sub uniq { return () unless @_; my($last) = ""; my(@new) = (); foreach (@_) { next unless $_; if ($_ ne $last) { $last = $_; push(@new,$_); } } return @new; } sub RandomizeArray { my($array,$count) = @_; my $i; $count = @$array unless ($count > 0 && $count <= @$array); for ($i=$[; $i < $count; $i++) { my $random = int(rand($count)); my $temp; $temp = $$array[$random]; $$array[$random]= $$array[$i]; $$array[$i] = $temp; } return scalar @$array; } ############################################################################### sub ReadConfig { my($file,$conf) = @_; my($parm,$valu); open(CONF,"<$file") || die "Error: could not read config file '$file' ($!)\n"; while () { chomp; s/\#.*$//; next if m/^\s*$/; if (($parm,$valu) = m/^(\w+)\s+(.*?)\s*$/) { $conf->{$parm} = $valu; } } close(CONF); } sub ConfigValue { my($parm,$default) = @_; return $Config{$parm} if (exists $Config{$parm}); return $Squid{$parm} if (exists $Squid{$parm}); return $default; } ############################################################################### sub ReadAccessLog { my(@pages); if (time() - $LastAccess > 900) { open(ACCESS,"<$AccessFile") || die "Error: could not read access log $AccessFile ($!)\n"; seek(ACCESS,0,2); # go to end of file $LastAccess = time(); } while () { $LastAccess = time(); @_ = split; next unless ($_[3] =~ m!/2! && $_[5] eq "GET" && $_[6] =~ m!http://! && $_[9] =~ m!^text/!); push @pages,$_[6]; } return @pages; } sub FetchUrl { my($url) = @_; my($data,@links); my($host,$path) = ($url =~ m!http://(?:[^/]+@)?([^/]+)(/.*?)(\#.*)?$!); unless ($host && $path) { print STDERR "Warning: could not parse URL $url\n"; return; } # limit how long we will spend doing the fetch local $SIG{ALRM} = sub { die "Error: Timeout fetching $url\n" }; alarm(3); my ($code,$mesg,%hdrs, $http); eval { $http = Net::HTTP->new(PeerHost => $ProxyHost, PeerPort => $ProxyPort, Host => $host, SendTE => 0, KeepAlive => 0); $http->write_request("GET" => "http://$host$path", "Accept" => "text/html", "Cache-Control" => "only-if-cached", "User-Agent" => "Squid-Prefetch"); ($code,$mesg,%hdrs) = $http->read_response_headers(); print "\nfetch: $url: $code ($mesg)\n"; print " $k -> $v\n" while (($k,$v) = each %hdrs); }; alarm(0); die if $@; if ($code != 200) { print STDERR "Warning: fetch returned code $code ($mesg) for $url\n"; return; } if ($hdrs{"Content-Type"} !~ m!^text/html($|;)!) { print STDERR "Warning: fetch returned non-html content-type \"",$hdrs{"Content-Type"},"\" for $url\n"; return; } if ($hdrs{"Cache-Control"} =~ m/\bno-cache\b/) { print STDERR "Warning: no-cache directive for $url\n"; return; } if ($hdrs{"X-Cache"} =~ m/^MISS\b/ || $hdrs{"X-Cache-Lookup"} =~ m/^MISS\b/) { print STDERR "Warning: squid didn't cache $url\n"; return; } alarm(5); eval { while (1) { my $bufr; my $size = $http->read_entity_body($bufr,4096); # print "\n$size : $bufr"; last unless ($size > 0); $data .= $bufr; while ($data =~ m!]+href\s*=\s*(\"|\'|)([^\"\'>\s]+)\1[^>]*>!gis) { my $uri = URI->new($2); my $lnk = $uri->abs($url); my $frg = ($lnk =~ s/(\#.*)$//); next if ($frg && !$FetchFragments); my $opt = ($lnk =~ s/(\?.*)$//); next if ($opt && !$FetchOptions); my($lnkh,$lnkp) = ($lnk =~ m!http://(?:[^/]+@)?([^/]+)(/.*)$!); next if (exists $DoneTime{$lnk}); next if ($lnk !~ m!^$FetchPattern$!oi); next if ($lnkh ne $host && !$FetchCrossSite); print "found $lnk\n"; unshift @links,$lnk; } $data =~ s!^.*($|<)!!; } }; alarm(0); die if $@; @links = uniq(sort(@links)); RandomizeArray(\@links); push @LinkList,@links; } sub PrefetchUrl { my($url) = @_; my($total); my($host,$path) = ($url =~ m!http://(?:[^/]+@)?([^/]+)(/.*)$!); unless ($host && $path) { print STDERR "Warning: could not parse URL $url\n"; return; } # limit how long we will spend doing the fetch local $SIG{ALRM} = sub { die "Error: Timeout fetching $url\n" }; alarm(3); my ($code,$mesg,%hdrs, $http); eval { $http = Net::HTTP->new(PeerHost => $ProxyHost, PeerPort => $ProxyPort, Host => $host, SendTE => 1, KeepAlive => 0); $http->write_request("GET" => "http://$host$path", "Accept" => "text/*", "User-Agent" => "Squid-Prefetch"); ($code,$mesg,%hdrs) = $http->read_response_headers(); print "\nprefetch: $url: $code ($mesg)\n"; # print " $k -> $v\n" while (($k,$v) = each %hdrs); }; alarm(0); die if $@; if ($code != 200) { print STDERR "Warning: fetch returned code $code ($mesg) for $url\n"; return; } if ($hdrs{"Content-Type"} !~ m!^text/!) { print STDERR "Warning: fetch returned non-text content-type \"$hdrs{Content-Type}\" for $url\n"; return; } if (exists $hdrs{"Content-Length"} && $hdrs{"Content-Length"} > $FetchMaxSize) { print STDERR "Warning: fetch returned oversize content-length ",$hdrs{"Content-Length"}," for $url\n"; return; } alarm(5); eval { while (1) { my $bufr; my $size = $http->read_entity_body($bufr,4096); last unless ($size > 0); $total += $size; last if ($total > $FetchMaxSize); } }; alarm(0); die if $@; } ############################################################################### # read our config file foreach $file (@ConfFiles) { if (-r $file) { ReadConfig($file,\%Config); } } # read Squid config file ReadConfig(ConfigValue("squid_config_file","/etc/squid/squid.conf"),\%Squid); # determine config information $PrefetchUser = ConfigValue("prefetch_user","root"); $PrefetchGroup = ConfigValue("prefetch_group","root"); $AccessFile = ConfigValue("cache_access_log","/var/log/squid/access.log"); $ProxyHost = ConfigValue("http_proxy","127.0.0.1"); $ProxyPort = ConfigValue("http_port",3128); $HistorySize = ConfigValue("max_history_size",5000); $HistoryAge = ConfigValue("max_history_age",24*60*60); $FetchPattern = ConfigValue("prefetch_regex",$FetchPattern); $FetchOptions = ConfigValue("prefetch_options",0); $FetchFragments = ConfigValue("prefetch_fragments",1); $FetchMaxSize = ConfigValue("prefetch_maxsize",65536); $FetchCrossSite = ConfigValue("prefetch_cross",0); if ($PrefetchGroup !~ m/^(\d+)$/) { my $name = $PrefetchGroup; $PrefetchGroup = (getgrnam($name))[2]; die "Error: unknown group '$name'\n" unless (defined $PrefetchGroup); # print STDERR "- switching to gid $PrefetchGroup ($name)...\n"; $) = $PrefetchGroup if ($PrefetchGroup != $)); } if ($PrefetchUser !~ m/^(\d+)$/) { my $name = $PrefetchUser; $PrefetchUser = (getpwnam($name))[2]; die "Error: unknown group '$name'\n" unless (defined $PrefetchUser); # print STDERR "- switching to uid $PrefetchUser...\n"; $> = $PrefetchUser if ($PrefetchUser != $>); } # prefetch pages while (1) { # read access log my @urls = ReadAccessLog(); my $time = time(); my @todo = (); # determine candidate pages that have recently been fetched while (@urls) { my $url = shift @urls; my $frg = ($url =~ s/(\#.*)$//); next if ($frg); my $opt = ($url =~ s/(\?.*)$//); next if ($opt); next if ($url !~ m!^$FetchPattern$!oi); # print STDERR "Note: user fetch of seen page $url\n" if ($DoneTime{$url} && !$DonePrefetch{$url}); # remember this URL $DoneTime{$url} = $time; $DoneCount{$url}++; push @DoneList,$url; # ignore those pages that appear because we prefetched them if (exists $DonePrefetch{$url}) { delete $DonePrefetch{$url}; next; } # determine if it's age makes it a candidate my $age = $DoneFetch{$url}; next if ($time - $age < $HistoryAge); # add it to the todo list delete $DoneFetch{$url}; push @todo,$url.$opt; } # remember any prefetched pages not found in log (because fetch failed) foreach (keys %DonePrefetch) { $DoneTime{$_} = $time; $DoneCount{$_}++; push @DoneList,$_; delete $DonePrefetch{$_}; print STDERR "Warning: no log info for prefetch of $_ (donetime=$DoneTime{$_})\n"; } # keep the todo list down to a reasonable size shift @todo while (scalar @todo > 1000); # fetch and analyze page from todo list while (@todo) { my $url = pop @todo; # ignore those pages we've already done prefetch for next if (exists $DoneFetch{$url}); # fetch one page and analyze for links (saved to @LinkList) $DoneFetch{$url} = $time; eval { FetchUrl($url); }; last; } # Keep list of links to a reasonable size shift @LinkList while (scalar @LinkList > 100); # prefetch one link from list while (@LinkList) { my $url = pop @LinkList; next if (exists $DoneTime{$url}); $DonePrefetch{$url} = $time; eval { PrefetchUrl($url); }; last; } # reduce the history size to be within limits while (scalar @DoneList > $HistorySize) { my $url = shift @DoneList; if (--$DoneCount{$url} <= 0) { print STDERR "Note: removing $url from history...\n"; delete $DoneCount{$url}; delete $DoneTime{$url}; delete $DoneFetch{$url}; delete $DonePrefetch{$url}; } } # wait a moment before starting all over again sleep(1); } squid-prefetch-1.1/COPYING0000644000000000000000000000051007715463726012234 0ustar Squid-Prefetch was written by Brian White and has been released into the public domain (the only true "free"). It would be apprecated if copies of any improvements of bug-fixes would be sent to the author in order to be incorporated into the main upstream code and propogated to all users of this software.