squidGuard-1.5/ 0000750 0001750 0001750 00000000000 11641753464 012447 5 ustar adjoo adjoo squidGuard-1.5/ANNOUNCE 0000640 0001750 0001750 00000020731 10717346070 013576 0 ustar adjoo adjoo
squidGuard 1.3 has been released
=======================================
Shalla Secure Services is proud to announce the release of squidGuard
version 1.3. The new version contains bugfixes and some new features.
See CHANGELOG for details. Please report problems and bugs to
sq-bugs@squidguard.org.
squidGuard 1.2.1 has been released
=======================================
Shalla Secure Services is proud to announce the release of squidGuard
version 1.2.1. See CHANGELOG for details. Please report problems and
bugs to sq-bugs@squidguard.org.
squidGuard 1.2.0 has been released
==================================
Tele Danmark Internordia is proud to announce the release of
squidGuard vesion 1.2.0
Introduction
============
squidGuard is a combined filter, redirector and
access controller plugin for Squid. It is
* free
* very flexible
* extremely fast *)
* easily installed
* portable
squidGuard can be used to
* limit the web access for some users to a list of accepted/well
known web servers and/or URLs only.
* block access to some listed or blacklisted web servers and/or URLs
for some users. **)
* block access to URLs matching a list of regular expressions or
words for some users. **)
* enforce the use of domainnames/prohibit the use of IP address in
URLs. **)
* redirect blocked URLs to an "intelligent" CGI based info page. **)
* redirect unregistered user to a registration form.
* redirect popular downloads like Netscape, MSIE etc. to local
copies.
* redirect banners to an empty GIF. **)
* have different access rules based on time of day, day of the week,
date etc.
* have different rules for different user groups.
* and much more..
Neither squidGuard nor Squid can be used to
* filter/censor/edit text inside documents
* filter/censor/edit embeded scripting languages like JavaScript or
VBscript inside HTML
*) 100,000 requests in 10seconds on a 500MHz Pentium with lists of
5900 domains
7880 urls
13780 total
100,000 requests in 12seconds on a 500MHz Pentium with lists of
5900 domains
200000 urls
205900 total
I.e. domain and URL listsizes have neglectable performance effect
**) squidGuard is not a porn or banner filter/blocker, but it is very
well suited for these purposes too.
Capabilities
============
squidGuard has many powerful configuration options that lets you:
1. define different time spaces based on any reasonable
combination of
+ time of day (00:00-08:00 17:00-24:00)
+ day of the week (sa)
+ date (1999-05-13)
+ date range (1999-04-01-1999-04-05)
+ date wildcards (*-01-01 *-05-17 *-12-25)
2. group sources (users/clients) into distinct categories like
"managers", "employees", "teachers", "students", "customers",
"guests" etc. based on any reasonable combination of
+ IP address ranges with
+ prefix notation (172.16.0.0/12)
+ netmask notation (172.16.0.0/255.240.0.0)
+ first-last notation (172.16.0.11-172.16.0.35)
3. address lists (172.16.134.54 172.16.156.23 ...)
4. domain lists (foo.bar.com ...) *)
5. user id lists (weho sdgh dfhj asef ...) **) and optionally
link the group to a given time space
+ positively (within business-hours)
+ negatively (outside leisure-time)
6. group destinations (URLs/servers) into distinct categories
like "local", "customers", "vendors", "banners", "banned" etc.
based on an unlimited number of unlimited lists of
+ domains, including subdomains (foo.bar.com)
+ hosts (host.foo.bar.com)
+ directory URLs, including subdirectories
(foo.bar.com/some/dir)
+ file URLs (foo.bar.com/somewhere/file.html)
+ regular expressions ((expr1|expr2|...))
and optionally link the group to a given time space:
+ positively (within business-hours)
+ negatively (outside leisure-time)
7. rewrite/redirect URLs based on any reasonable combination of
+ string/regular expression editing à la sed with
+ silent squid redirecting rewrite (s@from@to@)
+ visible client redirecting rewrite (s@from@to@r) ***)
+ URL replacement with
+ silent squid redirect to a common URL (redirect "new_url")
+ visible client redirect to a common URL
(redirect "302:new_url") ***)
activated by
+ 1-1 URL redirection
+ destination group match
+ a fallback/default for blocked URLs
+ a fallback/default for blocked/unknown clients
and optionally with
+ runtime string substitution à la strftime or printf
8. define access control lists (acl) based on any reasonable
combination of the definitions above by
+ giving each source (user/client) group
+ a pass list with any reasonable combination of
+ acceptable destination groups (good-dests ...)
+ unacceptable destination groups (!bad-dests ...)
+ block IP address URLs (enforce the use of domain names)
(!in-addr)
+ wildcards/nothing (any|all|none)
9. optionally a common rewrite rule set for the source group
10. optionally a default replacement URL for blocked destinations
for the source group
and optionally:
11. link the acl to a given time space
+ positively (within business-hours)
+ negatively (outside leisure-time)
12. defining a fallback/default ruleset
13. have selective logging by optional log statements in the: ****)
+ source/client group declarations to log all translations
for the group (log "file")
+ destination group declarations. Typically used to log
blacklist matches. (log "file")
+ rewrite rule group declarations to log all translations
for the rule set (log "file")
and optionally anonymized to protect the individuals
(log anonymous "file")
*) Client access control based on domain name requires enabling
reverse lookups (log_fqdn on) in squid.conf.
**) Client access control based on user id requires enabling
RFC931/ident in squid.conf. Note: The RFC931/ident configuration is
changed in squid-2.2 and the RFC931/ident support is broken in
squid-2.2 at least up to STABLE2. We currently recommend using
squid-2.1.PATCH2 in production if RFC931 is used.
***) Note: Visible redirects (302:new-url) are not supported by some
interim versions of Squid (presumably 1.2-2.0).
****) Note: squidGuard is smart enough to open only one filedescriptor
per logfile (i.e. not necessarily one per log statement); per spawned
process of course. Though logging to too many different files may
exeed your system's concurrent filedescriptor limit.
Portability
===========
squidGuard should compile right out of the box on any modern brand of
UNIX with a development environment and a recent version (2.7.X or
3.2.X) of the Berkeley DB library. squidGuard is developed on Sun
Solaris-2.8 with gcc-2.95.3, bison-1.28, flex-2.5.4.
We also test regularly on Linux/RedHat with gcc and
our most recent copy of the Berkeley DB.
Users have reported success on at least, but not limited to:
* AIX: 4.1.3, 4.3.2.0/egcs-2.91.66
* Dec-Unix: OSF1-4.0/gcc-2.7.2.3, 3.2C/gcc-2.7.2.3
* Linux: RedHat-5.2/gcc-2.8.1 and later
* Solaris: 2.6/gcc-2.7.2.3
* Solaris: 2.8/gcc-2.95.3
Nota Bene!
==========
.db files created with Berkeley DB version 2.7.X are NOT
compatible with Berkeley DB version 3.2.X! If you created files
with "squidGuard-1.1.X -C" you must export them to a plain text
file and remove all .db files and run "squidGuard-1.2.0 -C"
News in squidGuard-1.2.0
========================
o Support for Berkeley DB version 3.2.X.
o Support for userquotas.
o All known bugs are fixed.
See the CHANGELOG for details.
You can download squidGuard from its homepage:
http://www.squidguard.org/
Kind regards
Pål Baltzersen Lars Erik Håland
squidGuard-1.5/CONFIGURATION 0000640 0001750 0001750 00000011657 11124252542 014377 0 ustar adjoo adjoo Another squidguard website
[1]Home [2]Documentation [3]Download [4]Blacklists [5]Useful stuff
[6]Installation [7]Basic Configuration [8]Extended Configuration
[9]Known Issues
Basic Configuration of SquidGuard
Once SquidGuard is successfully installed, you want to configure the
software according to your needs. A sample configuration has been
installed in the default directory /usr/local/squidGuard (or whatever
directory you pointed you intallation to).
Below you find three examples for the basic configuration of
SquidGuard.
1. Most simple configuration
Most simple configuration: one category, one rule for all
#
# CONFIG FILE FOR SQUIDGUARD
#
dbhome /usr/local/squidGuard/db
logdir /usr/local/squidGuard/logs
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
default {
pass !porn all
redirect http://localhost/block.html
}
}
Make always sure that the very first line of your squidGuard.conf
is not empty!
The entries have the following meaning:
dbhome Location of the blacklists
logdir Location of the logfiles
dest Definition of a category to block. You can enter the domain and
url file along with a regular expression list (talk about regular
expressions later on).
acl The actual blocking defintion. In our example only the default is
displayed. You can have more than one acl in place. The category porn
you defined in dest is blocked by the expression !porn. You have to add
the identifier all after the blocklist or your users will not be able
to surf anyway.
The redirect directive is madatory! You must tell SquidGuard which page
to display instead of the blocked one.
2. Choosing more than one category to block
First you define your categories. Just like you did above for porn.
For example:
Defining three categories for blocking
dest adv {
domainlist adv/domains
urllist adv/urls
}
dest porn {
domainlist porn/domains
urllist porn/urls
}
dest warez {
domainlist warez/domains
urllist warez/urls
}
Now your acl looks like that:
acl {
default {
pass !adv !porn !warez all
redirect http://localhost/block.html
}
}
3. Whitelisting
Sometimes there is a demand to allow specific URLs and domains
although they are part of the blocklists for a good reason. In this
case you want to whitelist these domains and URLs.
Defining a whitelist
dest white {
domainlist white/domains
urllist white/urls
}
acl {
default {
pass white !adv !porn !warez all
redirect http://localhost/block.html
}
}
In this example we assumed that your whitelists are located in a
directory called white whithin the blacklist directory you
specified with dbhome.
Make sure that your white identifier is the first in the row of the
pass directive. It must not have an exclamation mark in front
(otherwise all entries belonging to white will be blocked, too).
4. Initializing the blacklists
Before you start up your squidGuard you should initialize the
blacklists i.e. convert them from the textfiles to db files. Using
the db format will speed up the checking and blocking.
The initialization is performed by the following command:
Initializing the blacklists
squidGuard -C all
Depending on the size of your blacklists and the power of your
computer this may take a while. If anything is running fine you
should see something like the following output:
2006-01-29 12:16:14 [31977] squidGuard 1.2.0p2 started (1138533256.959)
2006-01-29 12:16:14 [31977] db update done
2006-01-29 12:16:14 [31977] squidGuard stopped (1138533374.571)
If you look into the directories holding the files domains and urls
you see that additional files have been created: domains.db and
urls.db. These new files must not be empty!
Only those files are converted you specified to block or whitelist
in your squidGuard.conf file.
Proceed with: [10]Extended Configuration of SquidGuard
______________________________________________________________
Mirko Lorenz - mirko at shalla.de
29.01.2006
References
1. http://www.squidguard.org/index.html
2. http://www.squidguard.org/Doc/index.html
3. http://www.squidguard.org/download.html
4. http://www.squidguard.org/blacklists.html
5. http://www.squidguard.org/addsoft.html
6. http://www.squidguard.org/Doc/install.html
7. http://www.squidguard.org/Doc/configure.html
8. http://www.squidguard.org/Doc/extended.html
9. http://www.squidguard.org/Doc/known_issues.html
10. http://www.squidguard.org/Doc/extended.html
squidGuard-1.5/COPYING 0000640 0001750 0001750 00000001316 10717346070 013476 0 ustar adjoo adjoo By accepting this notice, you agree to be bound by the following
agreements:
This software product, squidGuard, is copyrighted (C) 2006 by
Shalla Secure Services, Gauting, Germany, with all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License (version 2) as
published by the Free Software Foundation. It is distributed in the
hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the GNU General Public License (GPL) for more details.
You should have received a copy of the GNU General Public License
(GPL) along with this program.
squidGuard-1.5/FAQ 0000640 0001750 0001750 00000021326 10717346070 013000 0 ustar adjoo adjoo The squidGuard FAQ
squidGuard is an ultrafast and free filter, redirector and access
controller for Squid
Originally created by Pål Baltzersen and Lars Erik Håland
Maintained by Christine Kronberg.
Copyright © 2006-2007, Shalla Secure Services
FAQ - Frequently Asked/Answered Questions
This is out of date. Have a look at http://www.maynidea.com/squidguard/faq-plus.html
Currently in semirandom order:
1.
Is there a mailing list for squidGuard?
Yes! See www.shalla.de/mailman/squidguard/.
2.
squidGuard does not block?
There my be at least 2 reasons for this:
1. You didn't end your pass rules with "none". Pass rules
ends with an implicit "all". It is good practice to
allways en the pass rules with either "all" or "none" to
make them clear. Ie. use:
pass good none
or
pass good !bad all
2. squidGuard goes into emergency mode. Reasons may be
syntax errors in the config file, reference to non
existing database files, filprotection problems or
missing directories. Check the squidGuard log.
Note:When run under Squid, squidGuard is run with the
same user and group ID as Squid (cache_effective_user
and cache_effective_group in squid.conf). The squidGuard
configuration and database files must be readable for
this user and/or group and the squidGuard log directory
must be writable for this user and/or group. If not
squidGuard will go into the "pass all for all" emergency
mode.
3.
How do I debug squidGuard?
Do something like this:
echo "http://foo/bar 10.0.0.1/- - GET" | /usr/local/bin/s
quidGuard -c /tmp/test.cfg -d
This redirects the log to stderr. The response is either
a blank line (pass on) or a the input with the URL part
rewritten (redirect).
4.
How can I block audio and video?
Use an [11]expressionlist with something like this:
\.(ra?m|mpe?g?|mov|movie|qt|avi|dif|dvd?|mpv2|mp3)($|\?)
5.
How can I test timeconstraints
You can set a simulated start time with the
-t yyyy-mm-ddTHH:MM:SS option:
squidGuard -c test.conf -t 1999-12-31T23:59:30 -d <
test.in>test.out 2>test.log
With the -t option squidGuard parses the given date&time
and calculates an offset from the current time at startup
and then adds this offset to all timevalues during
runtime.
6.
squidGuard compiles fine and the tests succeed, but it seems to
pass all when run under Squid
There may be at leaste two reasons for this:
o Some versions of Squid (supposedly 2.2.*) silently
ignores argumets to the right of
redirect_program prefix/bin/squidGuard. Solutions are
one of:
# Set the actual config file location at
[13]compiletime with --with-sg-config
# Use a shell wraper with
redirect_program prefix/bin/squidGuard.sh and make
prefix/bin/squidGuard.sh an executable shell like:
#! /bin/sh -
exec prefix/bin/squidGuard -c whatever/
squidGuard.conf
o When run under Squid, squidGuard is run with the same
user and group ID as Squid (cache_effective_user and
cache_effective_group in squid.conf). The squidGuard
configuration and database files must be readable for
this user and/or group and the squidGuard log directory
must be writable for this user and/or group. If not
squidGuard will go into the "pass all for all" emergency
mode.
7.
compilation of sg.l on fails with "sg.l:line ...: Error: Too many
positions" with native lex
Some native versions of lex have problems with sg.l. The
solution is to use [14]GNU flex wich is better anyway. Do
"setenv LEX flex" if configure selects the native lex
before flex. Flex should compile right out of the box
similar to other GNU programs. (Thanks to
laurent.foulonneau@mail.loyalty.nc).
8.
Can I use proxy authenticated user the same way as RFC931/Ident
user?
Yes.
9.
Can I manipulate domains.db and urls.db from Perl?
Yes, but you must bind custom comparefunctions. Also note
the domains are stored with a leading ".":
use DB_File;
sub mirror($) {
scalar(reverse(shift));
}
sub domainmatch($$) {
my $search = mirror(lc(shift));
my $found = mirror(lc(shift));
if ("$search." eq $found) {
return(0);
} else {
return(substr($search,0,length($found)) cmp $found);
}
}
sub urlmatch($$) {
my $search = lc(shift) . "/";
my $found = lc(shift) . "/";
if ($search eq $found) {
return(0);
} else {
return(substr($search,0,length($found)) cmp $found);
}
}
my (%url,%domain);
$DB_BTREE->{compare} = \&urlmatch;
my $url_db = tie(%url, "DB_File", "urls.db", O_CREAT|O_RDWR, 0664, $DB_
BTREE)
|| die("urls.db: $!\n");
$DB_BTREE->{compare} = \&domainmatch;
my $domain_db = tie(%domain, "DB_File", "domains.db", O_CREAT|O_RDWR, 0
664, $DB_BTREE)
|| die("domains.db: $!\n");
# Now you can operate on %url and %domain just as normal perl hashes:)
# Add "playboy.com" to the domainlist unless it's already there:
$domain{".playboy.com"} = "" unless(exists($domain{"playboy.com"}));
# or use the DB_File functions put, get, del and seq:
# Add "sex.com" and "dir.yahoo.com/business_and_economy/companies/sex"
# and delete "cnn.com":
$domain_db->put(".sex.com","") unless(exists($domain{"sex.com"}));
$domain_db->sync; # Seems to only sync the last change.
$domain_db->del("cnn.com") if(exists($domain{"cnn.com"}));
$domain_db->sync; # Seems to only sync the last change.
$url_db->put("xyz.com/~sex","") unless(exists($url{"xyz.com/~sex"}));
$url_db->sync; # Seems to only sync the last change.
$url_db->sync; # Seems to only sync the last change.
$domain_db->sync; # Seems to only sync the last change.
undef($url_db); # Destroy the object
undef($domain_db); # Destroy the object
untie(%url); # Sync and close the file and undef the hash
untie(%domain); # Sync and close the file and undef the hash
See the perltie(1) and DB_File(3) man pages that comes
with Perl for more info.
10.
How can I list domains.db or urls.db from Perl?
Use a script like this:
#!/local/bin/perl -w
use strict;
use DB_File;
foreach (@ARGV) {
my (%db, $key, $val);
die("$_: $!\n") unless(-f);
tie(%db, "DB_File", $_, O_RDONLY, 0664, $DB_BTREE) || die("$_: $!\n")
;
foreach $key (keys(%db)) {
if($val = $db{$key}) {
$val = "\"$val\"";
} else {
$val = "undef";
}
print "$key -> $val\n";
}
untie(%db);
}
See the perltie(1) and DB_File(3) man pages that comes
with Perl for more info.
11.
How can I get around "make: don't know how to make /bin/false.
Stop"?
Your system does not have lynx and not /bin/false either:
If it has /usr/bin/false do:
# ln -s ../usr/bin/false /bin/.
Alternatively:
# echo exit 255 >/bin/false
# chmod a+rx /bin/false
If you have questions and/or answers that should be on the FAQ list
please send them to sg-bugs (at) squidguard.org
____________________________
References
1. http://www.squidguard.org/
2. http://www.squid-cache.org/
4. http://www.squidguard.org/Doc/
5. http://www.gnu.org/
6. http://www.perl.com/
7. http://www.squid-cache.org/
squidGuard-1.5/GPL 0000640 0001750 0001750 00000043127 10717346070 013016 0 ustar adjoo adjoo GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
Copyright (C) 19yy
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) 19yy name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Library General
Public License instead of this License.
squidGuard-1.5/INSTALL 0000640 0001750 0001750 00000005122 11124252430 013460 0 ustar adjoo adjoo SquidGuard
[1]Home [2]Documentation [3]Download [4]Blacklists [5]Useful stuff
[6]Installation [7]Basic Configuration [8]Extended Configuration
[9]Known Issues
Installing SquidGuard
1. Unpack the source
tar xvzf squidGuard-1.2.0.tar.gz
2. Compiling
Let's assume it is squidGuard-1.2.0 we are trying to install:
cd squidGuard-1.2.0
./configure
make
If no errors occurred squidGuard is now installed in /usr/local/.
There are a couple of option you can use when running .configure.
For example:
Installing in a different location
./configure --prefix=/some/other/directory
BerkeleyDB not in /usr/local/BerkeleyDB installed
./configure --with-db=/directory/of/BerkeleyDB/installation
Annotation: Make sure that the shared library of your BerkeleyDB
installation is known by your system (check /etc/ld.so.conf).
See all .configure options
./configure --help
3. Installing
su -
make install
4. Installing the blacklists
Copy your blacklists into the desired blacklist directory (default:
/usr/local/squidGuard/db) and unpack them. In the table below we
assume that the default location is used. Make sure that you have
the proper permissions to write to that directory.
cp /path/to/your/blacklist.tar.gz /usr/local/squidGuard/db
cd /usr/local/squidGuard/db
gzip -d blacklist.tar.gz
tar xfv blacklist.tar
Now the blacklists should be ready to use.
Congratulation. You have just completed the installation of squidGuard.
The next step is to configure the software according to your needs.
First start configuring SquidGuard. After you verified that SquidGuard
is working fine, make the required modification to your Squid by adding
the following line:
redirect_program /usr/local/bin/squidGuard -c /usr/local/squidGuard/squidGuard.
conf
The other way round will make you unhappy.
Proceed with: [10]Basic Configuration of SquidGuard
__________________________________________________________________
Mirko Lorenz - mirko at shalla.de
30.11.2006
References
1. http://www.squidguard.org/index.html
2. http://www.squidguard.org/Doc/index.html
3. http://www.squidguard.org/download.html
4. http://www.squidguard.org/blacklists.html
5. http://www.squidguard.org/addsoft.html
6. http://www.squidguard.org/Doc/install.html
7. http://www.squidguard.org/Doc/configure.html
8. http://www.squidguard.org/Doc/extended.html
9. http://www.squidguard.org/Doc/known_issues.html
10. http://www.squidguard.org/Doc/configure.html
squidGuard-1.5/README 0000640 0001750 0001750 00000001540 10717346070 013322 0 ustar adjoo adjoo The official squidGuard homepage is:
http://www.squidguard.org/
What it is
~~~~~~~~~~
squidGuard is a free (GPL), flexible and ultra fast filter, redirector
and access controller plugin for squid. It lets you define multiple
access rules with different restrictions for different user groups on
a squid cache. squidGuard uses squid's standard redirector interface.
Authors
~~~~~~~
The initial squidGuard concept was designed by Pål Baltzersen and was
implemented and was maintained and extended by Lars Erik Håland at
ElTele Øst AS.
Since December 2006 squidGuard is maintained by Shalla Secure Services.
Distribution
~~~~~~~~~~~~
squidGuard is distributed by Shalla Secure Services under GPLv2 and may
therefore be freely used and distributed according to the conditions of
the licence.
squidGuard-1.5/contrib/ 0000750 0001750 0001750 00000000000 11124252062 014067 5 ustar adjoo adjoo squidGuard-1.5/contrib/Makefile.in 0000640 0001750 0001750 00000002372 10717346070 016153 0 ustar adjoo adjoo SHELL=/bin/sh
.SUFFIXES:
.SUFFIXES: .c .o .pl .pm .pod .html .man
PERL = @PERL@
CC = @CC@
CFLAGS = @CFLAGS@
INSTALL = @INSTALL@
INSTALL_DATA = @INSTALL_DATA@
INSTALL_PROGRAM = @INSTALL_PROGRAM@
LDFLAGS = @LDFLAGS@
LIBS = @LIBS@
MKDIR = @top_srcdir@/mkinstalldirs
RM = rm -f
prefix = @prefix@
exec_prefix = @exec_prefix@
bindir = $(exec_prefix)/bin
infodir = $(prefix)/info
all::
@echo making $@ in `basename \`pwd\``
update::
@echo making $@ in `basename \`pwd\``
update:: squidGuardRobot
squidGuardRobot:: squidGuardRobot/squidGuardRobot.in squidGuardRobot/RobotUserAgent.pm
squidGuardRobot/squidGuardRobot.in: @SQUIDGUARDROBOT@
@echo making $@ in `basename \`pwd\``
@$(MKDIR) squidGuardRobot
cp -p $? $@
chmod 660 $@
$(PERL) -0777 -pi -e 's;^#!\s?/\S*perl;#! \100PERL\100;' $@
squidGuardRobot/RobotUserAgent.pm: @SQUIDGUARDROBOTUA@
@echo making $@ in `basename \`pwd\``
@$(MKDIR) squidGuardRobot
cp -p $? $@
chmod 660 $@
clean::
@echo making $@ in `basename \`pwd\``
$(RM) *~ *.bak core *.log *.error
realclean:: clean
@echo making $@ in `basename \`pwd\``
$(RM) TAGS *.orig
distclean:: realclean
@echo making $@ in `basename \`pwd\``
$(RM) Makefile
$(RM) squidGuardRobot/squidGuardRobot
$(RM) sgclean/sgclean
$(RM) hostbyname/hostbyname
squidGuard-1.5/contrib/hostbyname/ 0000750 0001750 0001750 00000000000 11124252062 016240 5 ustar adjoo adjoo squidGuard-1.5/contrib/hostbyname/hostbyname.in 0000640 0001750 0001750 00000002467 10717346070 020765 0 ustar adjoo adjoo #!@PERL@ -w
# By Pål Baltzersen Feb, 2000
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License (version 2) as
# published by the Free Software Foundation. It is distributed in the
# hope that it will be useful, but WITHOUT ANY WARRANTY; without even
# the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
# PURPOSE. See the GNU General Public License (GPL) for more details.
# Hostbyname takes a squidGuard domain or url list and does some half
# harted effort to expand it with the corresponding IP-addresses.
# Usage:
# hostbyname < urls > urls.new
# hostbyname < domains > domains.new
my $version = "0.0.1";
use strict;
use Socket;
my ($hostname, $url, $h, $i, $a);
my ($name,$aliases,$addrtype,$length,@addrs);
while(<>) {
my %seen;
chomp;
$url = $_;
$hostname = $_;
$hostname =~ s/\057.*//;
$url =~ s/^$hostname//;
print "$hostname$url\n";
next if($hostname =~ /^\d.\d.\d.\d$/);
foreach $h ($hostname, "www.$hostname", "ftp.$hostname") {
#print "-> $h\n";
($name,$aliases,$addrtype,$length,@addrs) = gethostbyname($h);
#print "-> $name\n";
foreach $i (@addrs) {
$a = inet_ntoa($i);
#print "-> $a\n";
next if($seen{$a});
$seen{$a} = 1;
print "$a$url\n";
}
}
}
squidGuard-1.5/contrib/sgclean/ 0000750 0001750 0001750 00000000000 11124252062 015503 5 ustar adjoo adjoo squidGuard-1.5/contrib/sgclean/sgclean 0000640 0001750 0001750 00000011106 10717346072 017056 0 ustar adjoo adjoo #!/usr/bin/perl -w
#
#
# usage: sgclean.pl squidGuard.conf
#
# sgclean.pl removes redundant entries in domain files and url files
#
# although sgclean.pl makes a backup of the old files, it's always a
# good idea to make your own backup before running the program
#
# By Lars Erik Håland 1999 (leh@nimrod.no)
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License (version 2) as
# published by the Free Software Foundation. It is distributed in the
# hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
# PURPOSE. See the GNU General Public License (GPL) for more details.
#
# You should have received a copy of the GNU General Public License
# (GPL) along with this program.
#
use strict;
use DB_File;
use Fcntl;
my $VERSION = "1.0.0";
my $tmpfile = "/var/tmp/squidGuard.db";
my $tmpfile_delete = "/var/tmp/squidGuard.delete.db";
my $config = shift;
usage() if(!defined $config);
my $files = sg_config($config);
$| = 1;
for(keys %$files){
sg_clean($_,$files->{$_});
}
sub sg_clean {
my $file = shift;
my $type = shift;
print STDERR "cleaning $type $file\n";
open(F,$file) || die "can't open $type file $file: $!";
open(W,">$file.$$") || die "can't write to $file.$$: $!";
sg_clean_dbfiles();
my(%SG,%SGD);
tie(%SG, 'DB_File',$tmpfile,O_RDWR|O_CREAT,0640,$DB_BTREE);
tie(%SGD, 'DB_File',$tmpfile_delete,O_RDWR|O_CREAT,0640,$DB_BTREE);
my $count = 1;
my $i = 0;
print STDERR "loading... ";
while(){
chomp;
my($url,$redirect) = split;
$redirect = "" if(!defined $redirect);
$SG{$url} = $redirect;
$count++;
}
close(F);
print STDERR "complete loading\n";
print STDERR "cleaning";
my($url,$redirect);
while (($url,$redirect) = each %SG) {
my $keep = undef;
if($type eq "domainlist"){
$keep = sg_clean_domain($url,\%SG,1);
} elsif($type eq "urllist"){
$keep = sg_clean_url($url,\%SG,1);
}
if(!defined $keep){
$SGD{$url}++;
}
if($i % 100 == 0){
my $p = ($i * 100)/$count;
print STDERR "." if(int($p) % 10 == 0);
}
$i++;
}
print STDERR "complete cleaning\n";
print STDERR "updating file";
$i = 0;
while (($url,$redirect) = each %SG) {
next if(defined $SGD{$url});
my $line = "$url" . ($redirect ? " $redirect\n" : "\n");
print W "$line";
if($i % 100 == 0){
my $p = ($i * 100)/$count;
print STDERR "." if(int($p) % 10 == 0);
}
$i++;
}
print "complete updating\n";
close(W);
sg_update_files($file);
untie(%SG);
untie(%SGD);
sg_clean_dbfiles();
}
sub sg_clean_domain {
my $domain = shift;
my $tie = shift;
my $exists_ok = shift;
my $parts = [split(/[.]/,$domain)];
my $d = "";
for(reverse @$parts){
$d = "$_$d";
if(defined $tie->{$d}){
if($domain eq $d){
#print "$domain exists, skipping\n";
return 1 if($exists_ok);
} else {
#print "$domain is subdomain of $d, skipping\n";
}
return undef;
}
$d = ".$d";
}
return 1;
}
sub sg_clean_url {
my $url = shift;
my $tie = shift;
my $exists_ok = shift;
my $parts = [split(/[\/]/,$url)];
my $d = "";
for(@$parts){
$d = "$d$_";
if(defined $tie->{$d}){
if($url eq $d){
#print "$url exists, skipping\n";
return 1 if($exists_ok);
} else {
#print "$url is part of $d, skipping\n";
}
return undef;
}
$d = "$d/";
}
return 1;
}
sub sg_config {
my $file = shift;
open(F,$file) || die "can't open sgconfigfile $file: $!";
my $dbhome = undef;
my $dest = undef;
my $files = {};
while(){
chomp;
if(/^\s*dbhome\s+(\S+)/){
$dbhome = $1;
}
if(/^\s*(dest|destination)\s+(\S+)/){
$dest = $2;
}
if(/^\s*(urllist|domainlist)\s+(\S+)/){
my $type = $1;
my $file = $2;
if(!defined $dest){
printf("Error in configfile line $.\n");
next;
}
$file = "$dbhome/$file" if(defined $dbhome and $file !~ /^\//);
$files->{$file}=$type;
}
}
close(F);
return $files;
}
sub sg_clean_dbfiles {
if(-e "$tmpfile"){
unlink("$tmpfile") || warn "can't remove $tmpfile: $!";
}
if(-e "$tmpfile_delete"){
unlink("$tmpfile_delete")|| warn "can't remove $tmpfile_delete: $!";
}
}
sub sg_update_files {
my $file = shift;
if(-e "$file"){
system("cp $file $file.old");
}
if(-e "$file.$$"){
rename("$file.$$",$file) || warn "can't rename $file.$$ to $file: $!";
}
}
sub usage {
print "Usage: $0 configfile\n";
exit;
}
squidGuard-1.5/contrib/sgclean/sgclean.in 0000640 0001750 0001750 00000011077 10717346070 017470 0 ustar adjoo adjoo #!@PERL@ -w
#
#
# usage: sgclean.pl squidGuard.conf
#
# sgclean.pl removes redundant entries in domain files and url files
#
# although sgclean.pl makes a backup of the old files, it's always a
# good idea to make your own backup before running the program
#
# By Lars Erik Håland 1999 (leh@nimrod.no)
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License (version 2) as
# published by the Free Software Foundation. It is distributed in the
# hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
# PURPOSE. See the GNU General Public License (GPL) for more details.
#
# You should have received a copy of the GNU General Public License
# (GPL) along with this program.
#
use strict;
use DB_File;
use Fcntl;
my $VERSION = "1.0.0";
my $tmpfile = "/var/tmp/squidGuard.db";
my $tmpfile_delete = "/var/tmp/squidGuard.delete.db";
my $config = shift;
usage() if(!defined $config);
my $files = sg_config($config);
$| = 1;
for(keys %$files){
sg_clean($_,$files->{$_});
}
sub sg_clean {
my $file = shift;
my $type = shift;
print STDERR "cleaning $type $file\n";
open(F,$file) || die "can't open $type file $file: $!";
open(W,">$file.$$") || die "can't write to $file.$$: $!";
sg_clean_dbfiles();
my(%SG,%SGD);
tie(%SG, 'DB_File',$tmpfile,O_RDWR|O_CREAT,0640,$DB_BTREE);
tie(%SGD, 'DB_File',$tmpfile_delete,O_RDWR|O_CREAT,0640,$DB_BTREE);
my $count = 1;
my $i = 0;
print STDERR "loading... ";
while(){
chomp;
my($url,$redirect) = split;
$redirect = "" if(!defined $redirect);
$SG{$url} = $redirect;
$count++;
}
close(F);
print STDERR "complete loading\n";
print STDERR "cleaning";
my($url,$redirect);
while (($url,$redirect) = each %SG) {
my $keep = undef;
if($type eq "domainlist"){
$keep = sg_clean_domain($url,\%SG,1);
} elsif($type eq "urllist"){
$keep = sg_clean_url($url,\%SG,1);
}
if(!defined $keep){
$SGD{$url}++;
}
if($i % 100 == 0){
my $p = ($i * 100)/$count;
print STDERR "." if(int($p) % 10 == 0);
}
$i++;
}
print STDERR "complete cleaning\n";
print STDERR "updating file";
$i = 0;
while (($url,$redirect) = each %SG) {
next if(defined $SGD{$url});
my $line = "$url" . ($redirect ? " $redirect\n" : "\n");
print W "$line";
if($i % 100 == 0){
my $p = ($i * 100)/$count;
print STDERR "." if(int($p) % 10 == 0);
}
$i++;
}
print "complete updating\n";
close(W);
sg_update_files($file);
untie(%SG);
untie(%SGD);
sg_clean_dbfiles();
}
sub sg_clean_domain {
my $domain = shift;
my $tie = shift;
my $exists_ok = shift;
my $parts = [split(/[.]/,$domain)];
my $d = "";
for(reverse @$parts){
$d = "$_$d";
if(defined $tie->{$d}){
if($domain eq $d){
#print "$domain exists, skipping\n";
return 1 if($exists_ok);
} else {
#print "$domain is subdomain of $d, skipping\n";
}
return undef;
}
$d = ".$d";
}
return 1;
}
sub sg_clean_url {
my $url = shift;
my $tie = shift;
my $exists_ok = shift;
my $parts = [split(/[\/]/,$url)];
my $d = "";
for(@$parts){
$d = "$d$_";
if(defined $tie->{$d}){
if($url eq $d){
#print "$url exists, skipping\n";
return 1 if($exists_ok);
} else {
#print "$url is part of $d, skipping\n";
}
return undef;
}
$d = "$d/";
}
return 1;
}
sub sg_config {
my $file = shift;
open(F,$file) || die "can't open sgconfigfile $file: $!";
my $dbhome = undef;
my $dest = undef;
my $files = {};
while(){
chomp;
if(/^\s*dbhome\s+(\S+)/){
$dbhome = $1;
}
if(/^\s*(dest|destination)\s+(\S+)/){
$dest = $2;
}
if(/^\s*(urllist|domainlist)\s+(\S+)/){
my $type = $1;
my $file = $2;
if(!defined $dest){
printf("Error in configfile line $.\n");
next;
}
$file = "$dbhome/$file" if(defined $dbhome and $file !~ /^\//);
$files->{$file}=$type;
}
}
close(F);
return $files;
}
sub sg_clean_dbfiles {
if(-e "$tmpfile"){
unlink("$tmpfile") || warn "can't remove $tmpfile: $!";
}
if(-e "$tmpfile_delete"){
unlink("$tmpfile_delete")|| warn "can't remove $tmpfile_delete: $!";
}
}
sub sg_update_files {
my $file = shift;
if(-e "$file"){
system("cp $file $file.old");
}
if(-e "$file.$$"){
rename("$file.$$",$file) || warn "can't rename $file.$$ to $file: $!";
}
}
sub usage {
print "Usage: $0 configfile\n";
exit;
}
squidGuard-1.5/contrib/squidGuardRobot/ 0000750 0001750 0001750 00000000000 11124252062 017205 5 ustar adjoo adjoo squidGuard-1.5/contrib/squidGuardRobot/RobotUserAgent.pm 0000640 0001750 0001750 00000000536 10717346070 022465 0 ustar adjoo adjoo package RobotUserAgent;
use Exporter;
use LWP::Parallel::UserAgent qw(:CALLBACK);
use POSIX qw(strftime);
@ISA = qw(LWP::Parallel::UserAgent Exporter);
@EXPORT = @LWP::Parallel::UserAgent::EXPORT_OK;
sub on_connect {
my ($self, $request, $response, $entry) = @_;
my ($key,$val);
::info("%s: %s", $request->{_method}, $request->{_uri});
}
1;
squidGuard-1.5/contrib/squidGuardRobot/squidGuardRobot.in 0000640 0001750 0001750 00000167231 10717346070 022700 0 ustar adjoo adjoo #! @PERL@ -w
#
# A domain and url collector robot for squidGuard
#
# By Pål Baltzersen 1999-2000 (pal.baltzersen@ost.eltele.no)
# Based on earlier work by Lars Erik Håland (leh@nimrod.no)
#
# By accepting this notice, you agree to be bound by the following
# agreements:
#
# This software product, squidGuard, is copyrighted (C) 1998-2007
# by Christine Kronberg, Shalla Secure Services. All rights reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License (version 2) as
# published by the Free Software Foundation. It is distributed in the
# hope that it will be useful, but WITHOUT ANY WARRANTY; without even
# the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
# PURPOSE. See the GNU General Public License (GPL) for more details.
#
# You should have received a copy of the GNU General Public License
# (GPL) along with this program.
my $VERSION = "2.3.5";
my ($debug,$verbose,$quiet,$washonly,$umask,$home,$lib,$etc,$source_proxy,$link_proxy);
my ($simultaneous_sources,$simultaneous_links,$bulk,$fake_user_agent);
my ($sources,$source_timeout,$source_retries,$candidates);
my ($source_bouncing_ttl,$source_remember,$source_min_ttl,$source_max_ttl);
my ($links,$link_timeout,$link_retries,);
my ($link_bouncing_ttl,$link_remember,$link_min_ttl,$link_max_ttl);
my ($domainexceptions,$urlexceptions,$exceptions,$includes,$redirectors,$patterns);
my ($domainlist,$domains,$domain_ttl,$urllist,$urls,$url_ttl);
my ($doinaddr,$dns_timeout,@nameservers);
my ($domaindiff,@domaindiff,$urldiff,@urldiff);
#
# USER CONFIGURABLE DEFAULTS:
#
$quiet = 0; # 0 == false; 1 == true;
$debug = 0 - $quiet; # 0 == false; 1 == true;
$verbose = $debug || 0 - $quiet; # 0 == false; 1 == true;
$doinaddr = 0; # 0 == false; 1 == true;
$dns_timeout = 10; # SECONDS BEFORE TIMEOUT DURING RESOLVING
@nameservers = undef; # undef||("ns1", "ns2", ...);
$umask = 0027; # usually 0002 || 0007
$home = "/var/spool/www/hosts/proxy.teledanmark.no/filter/robot";
$lib = "$home/lib";
$etc = "$home/etc";
$source_proxy = "http://proxy:80/"; # undef||"http://proxy:1234/"
$link_proxy = undef; # undef||$source_proxy
$simultaneous_sources = 32; # NUMBER OF SIMULTANEOUS SOURCE REQUESTS
$simultaneous_links = 64; # NUMBER OF SIMULTANEOUS LINK REQUESTS
$bulk = 512; # MAX NUMBER OF REQUESTS IN A BULK
$fake_user_agent = "Mozilla/4.72 [en] (WinNT; U)";# undef||"Mozilla/4.72 [en] (WinNT; U)"
$sources = "$etc/source"; # ADD NEW SOURCES HERE; SLURPED AT STARTUP
$source_timeout = 60; # SECONDS BEFORE TIMEOUT DURING "GET"
$source_retries = 3; # NO OF FAILURES BEFORE MARKED AS BOUNCING
$source_bouncing_ttl = 30; # DAYS BEFORE RETESTING A SOURCE MARKED AS BOUNCING
$source_remember = 365; # DAYS BEFORE REMOVING A SOURCE MARKED AS BOUNCING
$source_min_ttl = 2; # MIN DAYS A SOURCE SHOULD BE LISTED AS SUCCEEDING
$source_max_ttl = 10; # MAX DAYS A SOURCE SHOULD BE LISTED AS SUCCEEDING
$candidates = "$etc/candidate"; # SOURCE REDIRECTS ARE LOGGED HERE
$links = "$etc/link"; # ADD NEW URLS HERE; SLURPED AT STARTUP
$link_timeout = 15; # SECONDS BEFORE TIMEOUT DURING HEAD
$link_retries = 2; # FAILURES BEFORE MOVED TO THE BOUNCING LIST
$link_bouncing_ttl = 30; # DAYS BEFORE RETESTING A LINK MARKED AS BOUNCING
$link_remember = 180; # DAYS BEFORE REMOVING A LINK MARKED AS BOUNCING
$link_min_ttl = 30; # MIN DAYS A LINK SHOULD BE LISTED AS SUCCEEDING
$link_max_ttl = 60; # MAX DAYS A LINK SHOULD BE LISTED AS SUCCEEDING
$domainexceptions = "$etc/domainexception"; # ADD NEW DOMAIN EXCEPTIONS HERE; SLURPED AT STARTUP
$urlexceptions = "$etc/urlexception"; # ADD NEW URL EXCEPTIONS HERE; SLURPED AT STARTUP
$exceptions = "$etc/exception"; # ADD NEW EXCEPTIONS HERE; SLURPED AT STARTUP
$includes = "$etc/include"; # ADD NEW INCLUDES HERE; SLURPED AT STARTUP
$redirectors = "$etc/redirector"; # ADD NEW REDIRECTORS HERE; SLURPED AT STARTUP
$patterns = "$etc/patterns"; # file || undef # LIST OF BAD STRINGS AND PERLRE(3)
# DOMAIN MATCH FORCES A DOMAIN LIST ENTRY
$domains = "$etc/domain"; # CREATED AND MAINTAINED BY THIS PROGRAM
$domain_ttl = $link_max_ttl+7; # DAYS TO KEEP AN UNTOUCHED DOMAIN
$urls = "$etc/url"; # CREATED AND MAINTAINED BY THIS PROGRAM
$url_ttl = $link_max_ttl+7; # DAYS TO KEEP AN UNTOUCHED URL
$domainlist = "$etc/domains"; # THE DOMAIN LIST CREATED BY THIS PROGRAM
$urllist = "$etc/urls"; # THE URL LIST CREATED BY THIS PROGRAM
$domaindiff = "$etc/domains.%Y%m%d.diff"; # THE DOMAIN CHANGES THIS TIME; CREATED BY THIS PROGRAM
$urldiff = "$etc/urls.%Y%m%d.diff"; # THE URL CHANGES THIS TIME; CREATED BY THIS PROGRAM
#
# END USER CONFIGURABLE DEFAULTS
#
unshift(@INC, "$lib");
use strict;
use Config;
use Getopt::Std;
use POSIX qw(strftime);
use DB_File;
use Fcntl qw(:flock);
use Net::DNS;
use IO::Select;
use HTTP::Request;
use HTTP::Response;
use HTTP::Status;
use HTML::LinkExtor;
require RobotUserAgent;
my $progname = $0; $progname =~ s/.*\057//;
my (%source,$sourcedb);
my (%candidate, $candidatedb);
my (%link,$linkdb);
my (%domain,$domaindb);
my (%url,$urldb);
my (%domainexception,$domainexceptiondb);
my (%urlexception,$urlexceptiondb);
my (%exception,$exceptiondb);
my (%include,$includedb);
my (%redirector,$redirectordb);
my (@patterns);
my ($start, $now, $checkpoint, $delta, %signal) = time;
my %keys = (
found => 1,
id => 1,
last => 1,
referer => 1,
remote => 1,
retries => 1,
status => 1,
ttl => 1,
used => 1,
);
sub init();
sub load();
sub usage($);
sub date($);
sub strtime($);
sub msg($@);
sub status($@);
sub info($@);
sub debug($@);
sub warning($@);
sub error($@);
sub mirror($);
sub domaincmp($$);
sub linkmatch($$);
sub domainmatch($$);
sub urlmatch($$);
sub exceptionmatch($$);
sub trunc($);
sub addnew($$);
sub patterns($);
sub expire();
sub expiredomains();
sub expireurls();
sub washlinks();
sub release($);
sub min($$);
sub addlink($$);
sub addcandidate($$);
sub dumpcandidates();
sub extract();
sub success($);
sub redirect($);
sub spliturl($);
sub domain($);
sub check();
sub adddomain($$);
sub addurl($$$);
sub addresses($$@);
sub washdomains();
sub washurls();
sub wash();
sub optimum($);
sub compile();
sub today();
sub export();
sub valid($);
sub total($);
sub dumpkeys($$);
sub end($);
sub disconnect();
sub reconnect();
sub fixconnect();
sub init() {
my (%opts, $i, $fd);
getopts("hc:dqQvVw", \%opts) || usage(1);
if (defined($opts{h})) {
usage(0);
}
if (defined($opts{c})) {
open(CONFIG, $opts{c}) || error("$opts{c}: $!");
while() {
eval;
}
close(CONFIG);
}
if (defined($opts{"d"})) {
$debug = 1;
$verbose = 1;
$quiet = 0;
}
if (defined($opts{"q"})) {
$debug = 0;
$verbose = 0;
}
if (defined($opts{"Q"})) {
$debug = 0;
$verbose = 0;
$quiet = 1;
}
if (defined($opts{"v"})) {
$verbose = 1;
$quiet = 0;
}
if (defined($opts{"V"})) {
print "$VERSION\n";
exit(0);
}
if (defined($opts{"w"})) {
$washonly = 1;
}
status("Started");
umask $umask;
select(STDERR);$|=1;
select(STDOUT);$|=1;
$i = 0;
foreach(split(' ', $Config{sig_name})) {
$signal{$_} = $i++;
}
$DB_BTREE->{compare} = \&linkmatch;
$sourcedb = tie(%source,"DB_File","$sources.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$sources.db: $!");
$fd = $sourcedb->fd;
open(SOURCES, "+<&=$fd") || die("dup: $!");
flock(SOURCES, LOCK_EX|LOCK_NB) || die("$sources.db: $!");
$DB_BTREE->{compare} = \&linkmatch;
$candidatedb = tie(%candidate,"DB_File","$candidates.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$candidates.db: $!");
$fd = $candidatedb->fd;
open(CANDIDATES, "+<&=$fd") || die("dup: $!");
flock(CANDIDATES, LOCK_EX|LOCK_NB) || die("$candidates.db: $!");
$DB_BTREE->{compare} = \&linkmatch;
$linkdb = tie(%link,"DB_File","$links.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$links.db: $!");
$fd = $linkdb->fd;
open(LINKS, "+<&=$fd") || die("dup: $!");
flock(LINKS, LOCK_EX|LOCK_NB) || die("$links.db: $!");
$DB_BTREE->{compare} = \&domainmatch;
$domaindb = tie(%domain,"DB_File","$domains.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$domains.db: $!");
$fd = $domaindb->fd;
open(DOMAINS, "+<&=$fd") || die("dup: $!");
flock(DOMAINS, LOCK_EX|LOCK_NB) || die("$domains.db: $!");
$DB_BTREE->{compare} = \&urlmatch;
$urldb = tie(%url,"DB_File","$urls.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$urls.db: $!");
$fd = $urldb->fd;
open(URLS, "+<&=$fd") || die("dup: $!");
flock(URLS, LOCK_EX|LOCK_NB) || die("$urls.db: $!");
$DB_BTREE->{compare} = \&domainmatch;
$domainexceptiondb = tie(%domainexception,"DB_File","$domainexceptions.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$domainexceptions.db: $!");
$fd = $domainexceptiondb->fd;
open(DOMAINEXCEPTIONS, "+<&=$fd") || die("dup: $!");
flock(DOMAINEXCEPTIONS, LOCK_EX|LOCK_NB) || die("$domainexceptions.db: $!");
$DB_BTREE->{compare} = \&urlmatch;
$urlexceptiondb = tie(%urlexception,"DB_File","$urlexceptions.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$urlexceptions.db: $!");
$fd = $urlexceptiondb->fd;
open(URLEXCEPTIONS, "+<&=$fd") || die("dup: $!");
flock(URLEXCEPTIONS, LOCK_EX|LOCK_NB) || die("$urlexceptions.db: $!");
$DB_BTREE->{compare} = \&exceptionmatch;
$exceptiondb = tie(%exception,"DB_File","$exceptions.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$exceptions.db: $!");
$fd = $exceptiondb->fd;
open(EXCEPTIONS, "+<&=$fd") || die("dup: $!");
flock(EXCEPTIONS, LOCK_EX|LOCK_NB) || die("$exceptions.db: $!");
$DB_BTREE->{compare} = \&urlmatch;
$includedb = tie(%include,"DB_File","$includes.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$includes.db: $!");
$fd = $includedb->fd;
open(INCLUDES, "+<&=$fd") || die("dup: $!");
flock(INCLUDES, LOCK_EX|LOCK_NB) || die("$includes.db: $!");
$DB_BTREE->{compare} = \&domainmatch;
$redirectordb = tie(%redirector,"DB_File","$redirectors.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$redirectors.db: $!");
$fd = $redirectordb->fd;
open(REDIRECTORS, "+<&=$fd") || die("dup: $!");
flock(REDIRECTORS, LOCK_EX|LOCK_NB) || die("$redirectors.db: $!");
$SIG{INT} = \&end;
$SIG{TERM} = \&end;
$now = $checkpoint = time;
status("Initialized in %s", strtime($now-$start));
}
sub trunc($) {
my $file = shift || return;
if (-s $file) {
my ($added, $ignored, $key, $val) = (0, 0);
status("Truncating $file..");
open(FILE, ">$file") || error("$file: $!");
close(FILE);
}
}
sub load() {
my (@new,$wash,$key,$k,$v,$n);
my $new = 0;
$new += scalar(addnew(\%source, $sources));
$sourcedb->sync();
trunc($sources);
$new += scalar(addnew(\%link, $links));
$linkdb->sync();
trunc($links);
@new = addnew(\%domain, $domains);
$new += scalar(@new);
$domaindb->sync();
trunc($domains);
foreach $key (@new) {
push(@domaindiff,"+$key");
}
@new = addnew(\%url, $urls);
$new += scalar(@new);
$urldb->sync();
trunc($urls);
foreach $key (@new) {
push(@urldiff,"+$key");
}
$new += scalar(addnew(\%include, $includes));
$includedb->sync();
trunc($includes);
if (@new) {
$new += scalar(@new);
$wash++;
foreach $key (@new) {
$k = $key;
$v = 0;
for ($n = $includedb->seq($k,$v,R_CURSOR);
$n == 0 && urlmatch($k,$key) == 0;
$k = $key, $n = $includedb->seq($k,$v,R_CURSOR)) {
info("Removing obsoleted include: %s", $k);
$includedb->del_dup($k,$v);
$includedb->sync();
}
}
}
@new = addnew(\%domainexception, $domainexceptions);
$domainexceptiondb->sync();
trunc($domainexceptions);
if (@new) {
$new += scalar(@new);
$wash++;
foreach $key (@new) {
$k = $key;
$v = 0;
for ($n = $urldb->seq($k,$v,R_CURSOR);
$n == 0 && urlmatch($k,$key) == 0;
$k = $key, $n = $urldb->seq($k,$v,R_CURSOR)) {
unless(exists($include{$k})) {
my %data = split(/[=;]/, $v);
info("Removing obsoleted url (domainexception=%s): %s",$key,$k);
push(@urldiff,"-$k");
release($data{referer});
$urldb->del_dup($k,$v);
$urldb->sync();
}
}
$k = $key;
$v = 0;
for ($n = $domaindb->seq($k,$v,R_CURSOR);
$n == 0 && domainmatch($k,$key) == 0;
$k = $key, $n = $domaindb->seq($k,$v,R_CURSOR)) {
my %data = split(/[=;]/, $v);
info("Removing obsoleted domain (domainexception=%s): %s",$key,$k);
push(@domaindiff,"-$k");
release($data{referer});
$domaindb->del_dup($k,$v);
$domaindb->sync();
}
}
}
@new = addnew(\%urlexception, $urlexceptions);
$urlexceptiondb->sync();
trunc($urlexceptions);
if (@new) {
$new += scalar(@new);
$wash++;
foreach $key (@new) {
$k = $key;
$v = 0;
for ($n = $urldb->seq($k,$v,R_CURSOR);
$n == 0 && urlmatch($k,$key) == 0;
$k = $key, $n = $urldb->seq($k,$v,R_CURSOR)) {
my %data = split(/[=;]/, $v);
info("Removing obsoleted url (urlexception=%s): %s",$key,$k);
push(@urldiff,"-$k");
release($data{referer});
$urldb->del_dup($k,$v);
$urldb->sync();
}
}
}
@new = addnew(\%exception, $exceptions);
$exceptiondb->sync();
trunc($exceptions);
if (@new) {
$new += scalar(@new);
$wash++;
foreach $key (@new) {
while(exists($url{$key})) {
my %data = split(/[=;]/, $url{$key});
info("Removing obsoleted url (exception): %s", $key);
release($data{referer});
$k = $key;
$v = $url{$key};
$urldb->del_dup($k,$v);
push(@urldiff,"-$k");
$urldb->sync();
}
while(exists($domain{$key})) {
my %data = split(/[=;]/, $domain{$key});
info("Removing obsoleted domain (exception): %s", $key);
release($data{referer});
$k = $key;
$v = $domain{$key};
$domaindb->del_dup($k,$v);
push(@domaindiff,"-$k");
$domaindb->sync();
}
}
}
$new += scalar(addnew(\%redirector, $redirectors));
$redirectordb->sync();
trunc($redirectors);
$now = time;
status("Loaded %d new entries in %s",$new,strtime($now-$checkpoint));
$checkpoint = $now;
@patterns = patterns($patterns);
washlinks() if($wash);
}
sub usage($) {
my $exit = shift;
print STDERR "\n$progname $VERSION\n\n";
print STDERR "Usage: $progname \133options\135\n";
print STDERR "Where the options are:\n";
print STDERR "\t-h\t\t\t\043 help\n";
print STDERR "\t-q|-v\t\t\t\043 quiet or verbose\n";
print STDERR "\t-V\t\t\t\043 print version number and exit\n";
print STDERR "\t-w\t\t\t\043 cleanup inconsistencies only\n";
print STDERR "\t-c \t\043 File with Perl code to override the defaults\n";
print STDERR "\n";
exit($exit);
}
sub date($) {
my $time = shift;
strftime("%Y.%m.%d %T", localtime($time));
}
sub strtime($) {
my $time = shift;
sprintf("%d:%02d:%02d", $time/3600, $time/60%60, $time%60);
}
sub msg($@) {
my $format = shift;
printf STDOUT "%s $progname: $format\n", date(time), @_;
}
sub status($@) {
return if($quiet);
my $format = shift;
$format = "STATUS: $format" unless($format =~ /^status:/i);
msg($format, @_);
}
sub info($@) {
return unless($verbose);
my $format = shift;
$format = "INFO: $format" unless($format =~ /^info:/i);
msg($format, @_);
}
sub debug($@) {
return unless($debug);
my $format = shift;
$format = "DEBUG: $format" unless($format =~ /^debug:/i);
msg($format, @_);
}
sub warning($@) {
my $format = shift;
$format = "WARNING: $format" unless($format =~ /^(error|warning):/i);
printf STDERR "%s $progname: $format\n", date(time), @_;
}
sub error($@) {
my $format = shift;
$format = "ERROR: $format" unless($format =~ /^(error|warning):/i);
warning($format, @_);
end(-1);
}
sub mirror($) {
scalar(reverse(shift));
}
sub domaincmp($$) {
my $search = join("\0",reverse(split(/\./,lc(shift))));
my $found = join("\0",reverse(split(/\./,lc(shift))));
#debug("domaincmp(%s,%s)", $search, $found);
return($search . "\0" cmp $found . "\0");
}
sub linkmatch($$) {
my $search = shift;
my $found = shift;
$search = lc($search);
$found = lc($found);
#debug("linkmatch(%s,%s)", $search, $found);
$search =~ s@/(index\.s?html?|default\.(s?html?|asp))$@@;
$found =~ s@/(index\.s?html?|default\.(s?html?|asp))$@@;
if ($search eq $found
|| $search . "/" eq $found
|| $search eq $found . "/") {
return(0);
} else {
return($search cmp $found);
}
}
sub domainmatch($$) {
my $search = lc(shift);
my $found = lc(shift);
#debug("domainmatch(%s,%s)", $search, $found);
if ($search eq $found) {
return(0);
} else {
$found = join("\0",reverse(split(/\./,$found)));
$search = substr(join("\0",reverse(split(/\./,$search))),0,length($found));
#debug("domainmatch(%s,%s)", $search, $found);
return($search cmp $found)
}
}
sub urlmatch($$) {
my $search = shift;
my $found = shift;
#debug("urlmatch(%s,%s)", $search, $found);
$search = lc($search) . "/";
$found = lc($found) . "/";
if ($search eq $found) {
return(0);
} else {
$search = substr($search,0,length($found));
return($search cmp $found);
}
}
sub exceptionmatch($$) {
my $search = shift;
my $found = shift;
#debug("exceptionmatch(%s,%s)", $search, $found);
$search = lc($search);
$found = lc($found);
if ($search eq $found) {
return(0);
} else {
$search =~ s@/([^/]+\.(s?html?|cgi|php\d?|asp|jpe?g|gif|ra?m|mpe?g?|mov|movie|qt|avi|dif|dvd?|mpv2|mp3))?$@@;
return($search cmp $found);
}
}
sub addnew($$) {
my ($db, $file) = @_;
my @new;
if (-f $file) {
my ($added, $ignored, $key, $val) = (0, 0);
status("Adding new entries from $file..");
open(FILE, $file) || error("$file: $!");
while() {
chomp;
s/\043.*//;
next unless($_);
($key, $val) = split(/\s+/, $_);
if (exists($db->{$key})) {
$ignored++;
debug("Ignored (seen before): %s", $key);
} else {
$added++;
push(@new,$key);
$val = "" unless($val);
$db->{$key} = $val;
info("Added: %s", $key);
}
}
close(FILE);
status("Added $added and ignored $ignored entries from $file.");
} else {
warning("$file: $!");
}
return(@new)
}
sub patterns($) {
my $file = shift;
my @patterns;
if (-f $file) {
my $added = 0;
status("Loading patterns from $file..");
open(FILE, $file) || error("$file: $!");
while() {
chomp;
s/\043.*//;
next unless($_);
push(@patterns, $_);
}
close(FILE);
status("Loaded %d patterns from $file.", scalar(@patterns));
} else {
warning("$file: $!");
}
return(@patterns)
}
sub release($) {
my $key = shift || return;
my %data = split(/[=;]/, $link{$key} || "");
my ($val,$k,$v);
$val="";
$data{used} = 0;
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$linkdb->put($key, $val);
$linkdb->sync();
}
sub expiredomains() {
my ($had,$key,$val,$status,%expired,%obsolete,%redundant,%bad);
status("Checking the domain list for expired and redundant entries..");
undef($domaindb);
untie(%domain);
$DB_BTREE->{compare} = \&domaincmp;
$domaindb = tie(%domain,"DB_File","$domains.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$domains.db: $!");
$had = total($domaindb);
$key = $val = 0;
for ($status = $domaindb->seq($key, $val, R_FIRST);
$status == 0;
$status = $domaindb->seq($key, $val, R_NEXT)) {
my %data = split(/[=;]/, $val || "");
#debug("Checking: %s",$key);
if (!(exists($data{last}) && exists($data{ttl}) && exists($data{referer}))) {
info("Removing expired domain: %s: %s", $key, $val || "undef");
push(@domaindiff,"-$key");
$expired{$key} = $val;
} elsif ($data{last} + ($domain_ttl*86400) < $now) {
info("Removing expired domain: %s: %s", $key, $val);
push(@domaindiff,"-$key");
release($data{referer});
$expired{$key} = $val;
} elsif (exists($domainexception{$key}) && !exists($include{$key})) {
info("Removing obsolete domain (domainexception): %s", $key);
push(@domaindiff,"-$key");
release($data{referer});
$obsolete{$key} = $val;
} elsif (exists($exception{$key}) && !exists($include{$key})) {
info("Removing obsolete domain (exception): %s", $key);
push(@domaindiff,"-$key");
release($data{referer});
$obsolete{$key} = $val;
} elsif ($key =~ /^[\d.]+$/ && $key !~ /^\d+\.\d+\.\d+\.\d+$/) {
info("Removing bad domain (subnet): %s", $key);
push(@domaindiff,"-$key");
release($data{referer});
$bad{$key} = $val;
} else {
my ($host,$domain) = split(/\./, $key, 2);
if($domain && exists($domain{$domain})) {
info("Removing redundant domain: %s", $key);
push(@domaindiff,"-$key");
release($data{referer});
$redundant{$key} = $val;
}
}
}
while(($key,$val) = each(%expired)) {
$domaindb->del_dup($key,$val);
$domaindb->sync();
}
while(($key,$val) = each(%obsolete)) {
$domaindb->del_dup($key,$val);
$domaindb->sync();
}
while(($key,$val) = each(%redundant)) {
$domaindb->del_dup($key,$val);
$domaindb->sync();
}
while(($key,$val) = each(%bad)) {
$domaindb->del_dup($key,$val);
$domaindb->sync();
}
undef($domaindb);
untie(%domain);
$DB_BTREE->{compare} = \&domainmatch;
$domaindb = tie(%domain,"DB_File","$domains.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$domains.db: $!");
$now = time;
status("Removed %d expired, %d obsolete %d redundant and %d bad of %d domains in %s",
scalar(keys(%expired)), scalar(keys(%obsolete)), scalar(keys(%redundant)),
scalar(keys(%bad)), $had, strtime($now-$checkpoint));
$checkpoint = $now;
}
sub expireurls() {
my ($had,$key,$val,$status,%expired,%obsolete,%redundant);
status("Checking the url list for expired and redundant entries..");
$had = total($urldb);
$key = $val = 0;
for ($status = $urldb->seq($key, $val, R_FIRST);
$status == 0;
$status = $urldb->seq($key, $val, R_NEXT)) {
my %data = split(/[=;]/, $val || "");
#debug("Checking: %s",$key);
if ($data{last} + ($url_ttl*86400) < $now) {
info("Removing expired url: %s: %s", $key, $val);
push(@urldiff,"-$key");
release($data{referer});
$expired{$key} = $val;
} elsif (exists($domainexception{$key}) && !exists($include{$key})) {
info("Removing obsolete url (domainexception): %s", $key);
push(@urldiff,"-$key");
release($data{referer});
$obsolete{$key} = $val;
} elsif (exists($urlexception{$key}) && !exists($include{$key})) {
info("Removing obsolete url (urlexception): %s", $key);
push(@urldiff,"-$key");
release($data{referer});
$obsolete{$key} = $val;
} elsif (exists($exception{$key}) && !exists($include{$key})) {
info("Removing obsolete url (exception): %s", $key);
push(@urldiff,"-$key");
release($data{referer});
$obsolete{$key} = $val;
} else {
my $domain = $key;
$domain =~ s@/.*@@;
if(exists($domain{$domain})) {
my %d = split(/[=;]/, $domain{$domain});
info("Removing redundant url: %s", $key);
push(@urldiff,"-$key");
release($data{referer}) unless(lc($data{referer}) eq lc($d{referer}));
$redundant{$key} = $val;
} else {
my $k = $key;
$k =~ s@/[^/]+/?$@@;
if(exists($url{$k})) {
my %u = split(/[=;]/, $url{$k});
info("Removing redundant url: %s", $key);
push(@urldiff,"-$key");
release($data{referer}) unless(lc($data{referer}) eq lc($u{referer}));
$redundant{$key} = $val;
}
}
}
}
while(($key,$val) = each(%expired)) {
$urldb->del_dup($key,$val);
$urldb->sync();
}
while(($key,$val) = each(%obsolete)) {
$urldb->del_dup($key,$val);
$urldb->sync();
}
while(($key,$val) = each(%redundant)) {
$urldb->del_dup($key,$val);
$urldb->sync();
}
$now = time;
status("Removed %d expired, %d obsolete and %d redundant of %d urls in %s",
scalar(keys(%expired)), scalar(keys(%obsolete)), scalar(keys(%redundant)),
$had, strtime($now-$checkpoint));
$checkpoint = $now;
}
sub expire() {
expiredomains();
expireurls();
}
sub washlinks() {
my ($key,$val,$status,$removed);
status("Washing the link list..");
$removed = 0;
$key = $val = 0;
for ($status = $linkdb->seq($key, $val, R_FIRST);
$status == 0;
$status = $linkdb->seq($key, $val, R_NEXT)) {
my ($domain,$path,$url) = spliturl($key);
#debug("Checking: %s",$key);
unless($domain) {
info("Removing (bad format): %s", $key);
$linkdb->del($key);
$linkdb->sync();
$removed++;
next;
}
#next if(exists($source{$key}));
unless(exists($include{$url})) {
if(exists($domainexception{$domain})) {
info("Removing (domainexception): %s", $key);
$linkdb->del($key);
$linkdb->sync();
$removed++;
next;
}
if(exists($urlexception{$url})) {
info("Removing (urlexception): %s", $key);
$linkdb->del($key);
$linkdb->sync();
$removed++;
next;
}
if(exists($exception{$url})) {
info("Removing (exception): %s", $key);
$linkdb->del($key);
$linkdb->sync();
$removed++;
next;
}
}
}
$now = time;
status("Removed %d of %d links in %s",$removed, total($linkdb), strtime($now-$checkpoint));
$checkpoint = $now;
}
sub addlink($$) {
my ($link, $referer) = @_;
my ($domain,$path,$url) = spliturl($link);
my $found = $checkpoint;
return(0) unless($domain);
$link =~ s/\043.*//;
$link =~ s/(\s)/sprintf("%%%02x",ord($1))/eg;
$link =~ s/^(https?|ftp):\057\057([^\100\057]*\100)/$1:\057\057/;
if(exists($link{$link}) && $link{$link}) {
debug("Ignored (seen before): %s", $link);
return(0);
}
unless(exists($include{$url})) {
if(exists($domainexception{$domain})) {
debug("Ignored (domainexception): %s", $link);
return(0);
}
if(exists($urlexception{$url})) {
debug("Ignored (urlexception): %s", $link);
return(0);
}
if(exists($exception{$url})) {
debug("Ignored (exception): %s", $link);
return(0);
}
}
if (exists($source{$link})) {
my %data = split(/[=;]/, $source{$link});
$found = $data{found} || $checkpoint;
}
info("Adding new link: %s (referer=%s)", $link, $referer);
$referer =~ s/\075/%3D/g;
$referer =~ s/\073/%3B/g;
$link =~ s@>.*@@;
$link =~ s@/(index|welcome|default).(s?html?|cgi)@@i;
$linkdb->put($link, "last=0;ttl=0;status=0;found=$found;used=0;referer=$referer;");
$linkdb->sync();
return(1);
}
sub min($$) {
my ($a,$b) = @_;
if($a < $b) {
return($a);
} else {
return($b);
}
}
sub addcandidate($$) {
my ($candidate, $referer) = @_;
my ($domain,$path,$url) = spliturl($candidate);
return(0) unless($domain);
if(exists($candidate{$candidate}) && $candidate{$candidate}) {
debug("Ignored (seen before): %s", $candidate);
return(0);
}
unless(exists($include{$url})) {
if(exists($domainexception{$domain})) {
debug("Ignored (domainexception): %s", $candidate);
return(0);
}
if(exists($urlexception{$url})) {
debug("Ignored (urlexception): %s", $candidate);
return(0);
}
if(exists($exception{$url})) {
debug("Ignored (exception): %s", $candidate);
return(0);
}
}
info("Adding new candidate: %s (referer=%s)", $candidate, $referer);
$referer =~ s/\075/%3D/g;
$referer =~ s/\073/%3B/g;
$candidatedb->put($candidate, "found=$checkpoint;referer=$referer;");
$candidatedb->sync();
return(1);
}
sub dumpcandidates() {
my ($key,$val);
open(CANDIDATE, ">$candidates") || error("$candidates: $!");
while(($key,$val) = each(%candidate)) {
print CANDIDATE "$key\t$val\n";
}
close(CANDIDATE);
}
sub extract() {
my @requests;
my ($key,$val,$status);
status("Checking the status of the sources..");
$key = $val = 0;
if (defined($sourcedb->seq($key, $val, R_FIRST))) {
my ($k,$v,%data,$request);
my $retry = time-($source_bouncing_ttl*86400);
my ($new,$succeeded,$failed,$downloads,$rest) = (0,0,0,0,0);
$key = $val = 0;
for ($status = $sourcedb->seq($key, $val, R_FIRST);
$status == 0;
$status = $sourcedb->seq($key, $val, R_NEXT)) {
%data = split(/[=;]/, $val || "");
if(!exists($data{found}) || !exists($data{last})
|| !exists($data{ttl}) || !exists($data{retries})) {
$data{found} = $start unless(exists($data{found}));
$data{last} = 0 unless(exists($data{last}));
$data{ttl} = 0 unless(exists($data{ttl}));
$data{retries} = $source_retries unless(exists($data{retries}));
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$sourcedb->put($key, $val);
$sourcedb->sync();
}
if(($data{last}>=0 && $data{last}+$data{ttl}<$now)
|| ($data{last}<0 && -$data{last}+$data{ttl}<$retry)) {
debug("Prepairing: %s", $key);
push(@requests, HTTP::Request->new(GET => $key));
}
}
$sourcedb->sync();
$now = time;
status("Chosed %d of %d sources in %s",
scalar(@requests), total($sourcedb),
strtime($now-$checkpoint));
$checkpoint = $delta = $now;
$downloads = $rest = scalar(@requests);
while(@requests) {
my ($ua,$response,$domain,$extor,%extorelements,$tag,%links,$link,$found,$n);
$now = time;
status("Downloading bulk of %d of %d chosen sources..",
min(scalar(@requests),$bulk), $downloads);
$checkpoint = $now;
$ua = RobotUserAgent->new();
$ua->proxy(['http', 'ftp'], $source_proxy) if($source_proxy);
$ua->agent($fake_user_agent) if($fake_user_agent);
$ua->timeout($source_timeout) if($source_timeout);
$ua->redirect(0);
$ua->in_order(0);
$ua->remember_failures(1);
$ua->max_hosts($simultaneous_sources||1);
$ua->max_req(4);
while(@requests) {
last if($n++ >= $bulk);
my $request = @requests%2? shift(@requests) : pop(@requests);
$rest--;
warning($response->status_line) if($response = $ua->register($request));
}
$SIG{PIPE} = "IGNORE";
$response = $ua->wait();
$SIG{PIPE} = "DEFAULT";
$now = time;
$found = 0;
status("Downloaded %d of %d chosen sources in %s",
scalar(keys(%$response)),
$downloads,
strtime($now-$checkpoint)
);
status("Parsing %d downloaded sources..", scalar(keys(%$response)));
%extorelements = %HTML::LinkExtor::LINK_ELEMENT;
%HTML::LinkExtor::LINK_ELEMENT = (a => "href",
#img => "src",
form => "action",
base => "href");
foreach (keys(%$response)) {
$key = $response->{$_}->{request}->{_uri};
%data = split(/[=;]/, $source{$key});
$data{status} = $response->{$_}->response->code;
if ($response->{$_}->response->is_success) {
debug("Checking %s: %s", $key, $response->{$_}->response->status_line);
$succeeded++;
$data{last} = $checkpoint;
$data{ttl} = int(rand($source_max_ttl - $source_min_ttl + 1))*86400;
$data{retries} = $source_retries;
$extor = HTML::LinkExtor->new(undef, $response->{$_}->response->base);
$extor->parse($response->{$_}->response->content);
addlink($key, $key) unless($key =~ /\?/);
foreach ($extor->links) {
($tag, %links) = @$_;
foreach (keys(%links)) {
$link = $links{$_};
($domain) = spliturl($link);
$found++ if(addlink($link, $key));
}
}
} else {
status("Failed %s: %s", $key, $response->{$_}->response->status_line);
$failed++;
$data{retries} = 0 if($data{status} == 404);
if($data{retries}-- <= 0) {
$data{retries} = $source_retries;
if ($data{last} < 0) {
$data{ttl} += int(rand($source_max_ttl - $source_min_ttl + 1))*86400;
} else {
$data{last} = -$checkpoint ;
$data{ttl} = int(rand($source_max_ttl - $source_min_ttl + 1))*86400;
}
}
if (redirect($data{status})) {
addcandidate($response->{$_}->response->header("Location"), $key);
}
}
if ($data{ttl} > $source_remember*86400 && $data{last} < 0) {
info("Removing bouncing source: %s (last=%d,ttl=%d,retries=%d,status=%s)",
$key,$data{last},$data{ttl},$data{retries},$data{status});
$sourcedb->del($key);
$sourcedb->sync();
} else {
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$sourcedb->put($key, $val);
$sourcedb->sync();
}
}
$sourcedb->sync();
$linkdb->sync();
%HTML::LinkExtor::LINK_ELEMENT = %extorelements;
$checkpoint = $now;
$now = time;
status("Added %d new links from bulk of %d sources in %s",
$found, $n, strtime($now-$checkpoint));
$new += $found;
status("Still %d of %d chosen sources to go..", $rest, $downloads) if($rest);
}
$checkpoint = $now;
$now = time;
status("Added %d new links from %d chosen sources", $new, $downloads);
status("Downloaded and parsed %d chosen of %d sources in %s, %d succeeded and %d failed",
$downloads, total($sourcedb), strtime($now-$delta),
$succeeded, $failed);
}
}
sub success($) {
my $code = shift;
return(1) if ($code >= 200 && $code < 300);
return(1) if ($code == RC_UNAUTHORIZED);
return(1) if ($code == RC_PAYMENT_REQUIRED);
return(1) if ($code == RC_FORBIDDEN);
return(0);
}
sub redirect($) {
my $code = shift;
return(1) if ($code == RC_MOVED_PERMANENTLY);
return(1) if ($code == RC_FOUND);
return(0);
}
sub spliturl($) {
my $link = lc(shift);
my ($proto, $host, $domain, $path, $url);
$link =~ /^(https?|ftp):\057\057([^\100\057]*\100)?((www|web|ftp)\d{0,2}\.)?([-.a-z0-9]+)\.?(:\d*)?([^\043]*)/i;
$proto = $1 || return(undef,undef,undef,undef);
$domain = $5 || return(undef,undef,undef,undef);
$host = ($3 || "") . $domain;
$domain =~ s@%20@@g;
$path = $7 || "";
$path =~ s@^[%20\s]+@@;
$path =~ s@\?.*@@ unless(exists($redirector{$domain}));
$path =~ s@>.*@@;
$path =~ s@/[^/]+\.(s?html?|cgi|php\d?|asp|jpe?g|gif|ra?m|mpe?g?|mov|movie|qt|avi|dif|dvd?|mpv2|mp3)$@@;
$path =~ s@/+$@@;
$path =~ s@//+@/@g;
$path =~ s@/pub/?$@/@;
$path =~ s/(\s)/sprintf("%%%02x",ord($1))/eg;
$path =~ s/%([a-f\d]{2})/if(hex($1)==9||hex($1)==10||hex($1)==13||hex($1)==32){"%$1"}else{chr(hex($1))}/egi;
$url = $domain . $path;
return(($domain, $path, $url, $host));
}
sub domain($) {
my $link = shift;
my ($domain, $path, $url);
$link =~ /^(https?|ftp):\057\057([^\100\057]*\100)?((www|web|ftp)\d{0,2}\.)?([-.a-z0-9]+)\.?(:\d*)?([^\043]*)/i;
return($5 || "");
}
sub check() {
my (%request,%lnk,$lnkdb,@requests);
my ($key,$val,$status,$links);
status("Checking the status of the links..");
$key = $val = 0;
if (defined($linkdb->seq($key, $val, R_FIRST))) {
my ($k,$v,%data,$request);
my $retry = time-($link_bouncing_ttl*86400);
my ($new,$succeeded,$failed,$tests,$rest) = (0,0,0,0,0);
$DB_BTREE->{compare} = \&linkmatch;
$lnkdb = tie(%lnk,"DB_File",undef,O_CREAT|O_RDWR,0664,$DB_BTREE) || error("tie: $!");
$key = $val = 0;
for ($status = $linkdb->seq($key, $val, R_FIRST);
$status == 0;
$status = $linkdb->seq($key, $val, R_NEXT)) {
if(exists($lnk{$key}) || exists($request{$key})) {
info("INGNORING DUPLICATE: %s");
for (my $i = 10;
$i && $linkdb->del($key) == 0;
$i--) {
$linkdb->sync();
}
$linkdb->sync();
$linkdb->put($key, $val);
$linkdb->sync();
next;
}
$links++;
next if($key =~ /^https:/); # Parallel::UserAgent can not handle 'https'-requests.
%data = split(/[=;]/, $val);
if (exists($source{$key})) {
my %srcdata = split(/[=;]/, $source{$key});
$data{last} = $srcdata{last};
$data{ttl} = $srcdata{ttl};
$data{retries} = $srcdata{retries};
}
if (!exists($data{last})
|| !exists($data{ttl})
|| !exists($data{retries})
|| !exists($data{used})
|| !exists($data{found})
|| !exists($data{status})) {
$data{last} = 0 unless(exists($data{last}));
$data{ttl} = 0 unless(exists($data{ttl}));
$data{retries} = $link_retries unless(exists($data{retries}));
$data{used} = 0 unless(exists($data{used}));
$data{found} = $now unless(exists($data{found}));
$data{status} = 0 unless(exists($data{status}));
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$lnkdb->put($key, $val);
$lnkdb->sync();
}
if(($data{used} && $data{last} > 0 && $data{last}+$data{ttl} < $now)
|| $data{last} == 0
|| $data{found} > $start
|| ($data{last} < 0 && -$data{last}+$data{ttl} < $retry)) {
debug("Prepairing: %s", $key);
$request{$key} = HTTP::Request->new(HEAD => $key);
} else {
debug("Skiping: %s", $key);
}
}
$lnkdb->sync();
$key = $val = 0;
for ($status = $lnkdb->seq($key, $val, R_FIRST);
$status == 0;
$status = $lnkdb->seq($key, $val, R_NEXT)) {
$linkdb->del($key);
$linkdb->sync();
$linkdb->put($key, $val);
$linkdb->sync();
}
undef($lnkdb);
untie(%lnk);
@requests = values(%request);
$now = time;
status("Chosed %d of %d links in %s",
scalar(@requests), $links,
strtime($now-$checkpoint));
$checkpoint = $delta = $now;
$tests = $rest = scalar(@requests);
while(@requests) {
my ($ua,$response,$link,$found,$n);
$now = time;
status("Verifying bulk of %d of %d chosen links..",
min(scalar(@requests),$bulk), $tests);
$checkpoint = $now;
$ua = RobotUserAgent->new();
$ua->proxy(['http', 'ftp'], $link_proxy) if($link_proxy);
$ua->agent($fake_user_agent) if($fake_user_agent);
$ua->timeout($link_timeout) if($link_timeout);
$ua->redirect(0);
$ua->in_order(0);
$ua->remember_failures(1);
$ua->max_hosts($simultaneous_links||1);
$ua->max_req(4);
while(@requests) {
last if($n++ >= $bulk);
my $request = @requests%2? shift(@requests) : pop(@requests);
$rest--;
warning($response->status_line) if($response = $ua->register($request));
}
$SIG{PIPE} = "IGNORE";
$response = $ua->wait();
$SIG{PIPE} = "DEFAULT";
$now = time;
$found = 0;
status("Verified %d of %d chosen links in %s",
scalar(keys(%$response)), $tests, strtime($now-$checkpoint));
status("Updating status for %d verified links..", scalar(keys(%$response)));
foreach (keys(%$response)) {
$key = $response->{$_}->{request}->{_uri};
%data = split(/[=;]/, $link{$key} || "");
$data{status} = $response->{$_}->response->code;
$data{last} = 0 unless($data{last});
$data{retries} = 0 unless($data{retries});
$data{ttl} = 0 unless($data{ttl});
if (success($data{status})) {
debug("Checking %s: %s", $key, $response->{$_}->response->status_line);
$succeeded++;
$data{last} = $checkpoint;
$data{ttl} = int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
$data{retries} = $link_retries;
} elsif (redirect($data{status})) {
my ($domain,$path,$url) = spliturl($key);
if (exists($redirector{$domain})
|| $path =~ /^\057cgi(-bin)?\057/
|| $path =~ /\?/) {
my $location = $response->{$_}->response->header("Location");
debug("Checking %s: %s", $key, $response->{$_}->response->status_line);
$succeeded++;
$found += addlink($location, $key)
if($location && domain($location) ne $domain);
$data{last} = $checkpoint;
$data{ttl} = int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
$data{retries} = $link_retries;
} else {
my $ok;
foreach (@patterns) {
if ($key =~ /$_/) {
$ok++;
last;
}
}
if ($ok) {
my $location = $response->{$_}->response->header("Location");
debug("Checking %s: %s", $key, $response->{$_}->response->status_line);
$succeeded++;
$found += addlink($location, $key)
if($location && domain($location) ne $domain);
$data{last} = $checkpoint;
$data{ttl} = int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
$data{retries} = $link_retries;
} else {
status("Failed %s: %s", $key, $response->{$_}->response->status_line);
$failed++;
if($data{retries}-- <= 0) {
$data{retries} = $link_retries;
if ($data{last} < 0) {
$data{ttl} += int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
} else {
$data{last} = -$checkpoint ;
$data{ttl} = int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
}
}
}
}
} else {
status("Failed %s: %s", $key, $response->{$_}->response->status_line);
$failed++;
$data{retries} = 0 if($data{status} == 404);
if($data{retries}-- <= 0) {
$data{retries} = $link_retries;
if ($data{last} < 0) {
$data{ttl} += int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
} else {
$data{last} = -$checkpoint ;
$data{ttl} = int(rand($link_max_ttl - $link_min_ttl + 1))*86400;
}
}
}
if ($data{ttl} > $link_remember*86400 && $data{last} < 0) {
info("Removing bouncing link: %s (last=%d,ttl=%d,retries=%d,status=%s)",
$key,$data{last},$data{ttl},$data{retries},$data{status});
$linkdb->del($key);
$linkdb->sync();
} else {
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$linkdb->put($key, $val);
$linkdb->sync();
}
}
$linkdb->sync();
$checkpoint = $now;
$now = time;
status("Added %d new links from bulk of %d links in %s",
$found, $n, strtime($now-$checkpoint));
$new += $found;
status("Still %d of %d chosen links to go..", $rest, $tests) if($rest);
}
$checkpoint = $now;
status("Added %d new links from redirects from %d verified links",$new,$tests);
status("Verified %d of %d chosen links in %s, %d succeeded and %d failed",
$tests, $links, strtime($now-$delta),
$succeeded, $failed);
}
}
sub adddomain($$) {
my ($domain, $referer) = @_;
my ($d,$val,$k,$v,$n);
return(0) unless($domain && $referer && $domain =~ /\w+\.\w+/);
return(0) if($domain =~ /^[\d.]+$/ && $domain !~ /^\d+\.\d+\.\d+\.\d+$/);
$domain =~ s/\.$//;
$d = $domain;
$v = 0;
if ($domaindb->seq($d,$v,R_CURSOR) == 0 && lc($domain) eq lc($d)) {
my %data = split(/[=;]/, $v);
$data{last} = $checkpoint;
$data{ttl} = ($domain_ttl < $link_max_ttl) ? $link_max_ttl*86400 : $domain_ttl*86400;
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s@=@%3D@g;
$v =~ s@;@%3B@g;
$val.="$k=$v;"
}
$domaindb->put($d, $val);
$domaindb->sync();
debug("Refreshed (seen before): %s", $domain);
return(0);
}
if (exists($domain{$domain})) {
debug("Ignored (redundant): %s", $domain);
return(0);
}
if (exists($domainexception{$domain}) && !exists($include{$domain})) {
debug("Ignored (domainexception): %s", $domain);
return(0);
}
info("Adding new domain: %s (referer=%s)", $domain, $referer);
push(@domaindiff,"+$domain");
my $id = int(rand(2147483647));
$k = $domain;
$v = 0;
for ($n = $urldb->seq($k,$v,R_CURSOR);
$n == 0 && urlmatch($k,$domain) == 0;
$k = $domain, $n = $urldb->seq($k,$v,R_CURSOR)) {
my %data = split(/[=;]/, $v || "");
info("Removing redundant url: %s", $k);
push(@urldiff,"-$k");
$urldb->del_dup($k,$v);
$domaindb->sync();
release($data{referer}) unless(lc($data{referer}) eq lc($referer));
}
$k = $domain;
$v = 0;
for ($n = $domaindb->seq($k,$v,R_CURSOR);
$n == 0 && domainmatch($k,$domain) == 0;
$k = $domain, $n = $domaindb->seq($k,$v,R_CURSOR)) {
my %data = split(/[=;]/, $v || "");
info("Removing redundant domain: %s", $k);
push(@domaindiff,"-$k");
$domaindb->del_dup($k,$v);
$domaindb->sync();
release($data{referer}) unless(lc($data{referer}) eq lc($referer));
}
$referer =~ s/\075/%3D/g;
$referer =~ s/\073/%3B/g;
$v = $domain_ttl*86400;
$domaindb->put($domain, "last=$checkpoint;ttl=$v;found=$checkpoint;referer=$referer;id=$id;");
$domaindb->sync();
return(1);
}
sub addurl($$$) {
my ($url, $domain, $referer) = @_;
my ($d,$u,$val,$k,$v,$n);
return(0) unless($url && $referer && $url =~ /\.\w+/);
$d = $domain;
$v = 0;
if ($domaindb->seq($d,$v,R_CURSOR) == 0 && lc($domain) eq lc($d)) {
my %data = split(/[=;]/, $v);
$data{last} = $checkpoint;
$data{ttl} = ($domain_ttl < $link_max_ttl) ? $link_max_ttl*86400 : $domain_ttl*86400;
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s@=@%3D@g;
$v =~ s@;@%3B@g;
$val.="$k=$v;"
}
$domaindb->put($d, $val);
$domaindb->sync();
debug("Refreshed (seen before): %s", $domain);
return(0);
}
if (exists($domain{$domain})) {
debug("Ignored (redundant): %s", $url);
return(0);
}
if (exists($domainexception{$domain}) && !exists($include{$url})) {
debug("Ignored (domainexception): %s", $url);
return(0);
}
$u = $url;
$v = 0;
if ($urldb->seq($u,$v,R_CURSOR) == 0 && lc($u) eq lc($url)) {
my %data = split(/[=;]/, $v);
$data{last} = $checkpoint;
$data{ttl} = ($url_ttl < $link_max_ttl) ? $link_max_ttl*86400 : $url_ttl*86400;
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$urldb->put($u, $val);
$urldb->sync();
debug("Refreshed (seen before): %s", $url);
return(0);
}
if (exists($url{$url})) {
debug("Ignored (redundant): %s", $url);
return(0);
}
info("Adding new url: %s (referer=%s)", $url, $referer);
push(@urldiff,"+$url");
my $id = int(rand(2147483647));
$k = $url;
$v = 0;
for ($n = $urldb->seq($k,$v,R_CURSOR);
$n == 0 && urlmatch($k,$url) == 0;
$k = $url, $n = $urldb->seq($k,$v,R_CURSOR)) {
my %data = split(/[=;]/, $v || "");
info("Removing redundant url: %s", $k);
push(@urldiff,"-$k");
$urldb->del_dup($k,$v);
$urldb->sync();
release($data{referer}) unless(lc($data{referer}) eq lc($referer));
}
$referer =~ s/\075/%3D/g;
$referer =~ s/\073/%3B/g;
$v = $url_ttl*86400;
$urldb->put($url, "last=$checkpoint;ttl=$v;found=$checkpoint;referer=$referer;id=$id;");
$urldb->sync();
return(1);
}
sub addresses($$@) {
my ($resolver,$host,$level) = @_;
my (%addresses,$socket,$select,@ready,$rr,$address);
return(undef) unless($resolver);
return(undef) unless($host);
return(undef) if($host =~ /^\d+\.\d+\.\d+\.\d+$/);
$socket = $resolver->bgsend($host);
$select = new IO::Select($socket);
@ready = $select->can_read($dns_timeout);
foreach(@ready) {
my $response = $resolver->bgread($socket) || next;
foreach $rr ($response->answer) {
if ($rr->type eq "A") {
$addresses{$rr->address}++;
} elsif ($rr->type eq "CNAME") {
unless($level++ > 10) {
foreach $address (addresses($resolver, $rr->cname, $level)) {
$addresses{$address}++;
}
}
}
}
}
$select->remove($socket);
return(keys(%addresses));
}
sub washdomains() {
my ($key,$val,$status,%redundant);
status("Checking the domain list for redundancy..");
undef($domaindb);
untie(%domain);
$DB_BTREE->{compare} = \&domaincmp;
$domaindb = tie(%domain,"DB_File","$domains.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$domains.db: $!");
$key = $val = 0;
for ($status = $domaindb->seq($key, $val, R_FIRST);
$status == 0;
$status = $domaindb->seq($key, $val, R_NEXT)) {
my %data = split(/[=;]/, $val || "");
my ($host,$domain) = split(/\./, $key, 2);
#debug("Checking: %s",$key);
if($domain && exists($domain{$domain})) {
info("Ooops! Removing redundant domain: %s", $key);
push(@domaindiff,"-$key");
release($data{referer});
$redundant{$key} = $val;
}
}
while(($key,$val) = each(%redundant)) {
$domaindb->del_dup($key,$val);
$domaindb->sync();
}
$DB_BTREE->{compare} = \&domainmatch;
$domaindb = tie(%domain,"DB_File","$domains.db",O_CREAT|O_RDWR,0664,$DB_BTREE)
|| error("$domains.db: $!");
$now = time;
status("Removed %d redundant of %d domains in %s",
scalar(keys(%redundant)), total($domaindb),
strtime($now-$checkpoint));
$checkpoint = $now;
}
sub washurls() {
my ($status,$key,$val,%data,$domain,%redundant);
status("Checking the url list for redundancy..");
$key = $val = 0;
for ($status = $urldb->seq($key, $val, R_FIRST);
$status == 0;
$status = $urldb->seq($key, $val, R_NEXT)) {
#debug("Checking: %s",$key);
$domain = $key;
$domain =~ s@/.*@@;
%data = split(/[=;]/, $val);
if(exists($domain{$domain})) {
my %d = split(/[=;]/, $domain{$domain});
release($data{referer}) unless(lc($data{referer}) eq lc($d{referer}));
$redundant{$key} = $val;
} else {
my $k = $key;
$k =~ s@/[^/]+/?$@@;
if(exists($url{$k})) {
my %u = split(/[=;]/, $url{$k});
release($data{referer}) unless(lc($data{referer}) eq lc($u{referer}));
$redundant{$key} = $val;
}
}
}
while(($key,$val) = each(%redundant)) {
info("Removing redundant url: %s", $key);
push(@urldiff,"-$key");
$urldb->del_dup($key,$val);
$urldb->sync();
}
$now = time;
status("Removed %d redundant of %d urls in %s",
scalar(keys(%redundant)), total($urldb), strtime($now-$checkpoint));
$checkpoint = $now;
}
sub wash() {
washdomains();
washurls();
washlinks();
}
sub optimum($) {
my $url = shift || return(undef);
my ($domain, @dirs) = split(/\057/,$url);
my ($pattern, $key, $val);
#debug("Findig the optimal key for %s..", $url);
$key = $domain;
$val = 0;
if ($domaindb->seq($key,$val,R_CURSOR) == 0 && domainmatch($domain,$key) == 0) {
$url = $domain;
undef(@dirs);
} else {
$key = $url;
$val = 0;
if ($urldb->seq($key,$val,R_CURSOR) == 0 && urlmatch($url,$key) == 0) {
$url = $key;
($domain, @dirs) = split(/\057/,$url);
}
}
foreach $pattern (@patterns) {
if ($domain =~ /$pattern/) {
my @zones = split(/\./,$domain);
my $zone = pop(@zones);
while(@zones) {
if($zone =~ /$pattern/) {
return($zone) if((!exists($domainexception{$zone})
&& !exists($exception{$zone}))
|| exists($include{$zone}));
}
return($zone) if(exists($domain{$zone}));
$zone = pop(@zones) . ".$zone";
}
return($zone) if((!exists($domainexception{$zone})
&& !exists($exception{$zone}))
|| exists($include{$zone}));
}
}
if (@dirs) {
my $url = "$domain";
while (@dirs) {
$url .= "/" . shift(@dirs);
foreach $pattern (@patterns) {
if($url =~ /$pattern/) {
return($url) if((!exists($urlexception{$url})
&& !exists($exception{$url}))
|| exists($include{$url}));
}
return($url) if(exists($url{$url}));
}
}
}
return($url);
}
sub compile() {
status("Compiling..");
my ($resolver,$key,$val,$status,$domain,$path,$url,$host,$k,$v,$links);
my ($domains,$urls) = (0,0);
if ($doinaddr) {
$resolver = new Net::DNS::Resolver;
$resolver->nameservers(@nameservers) if(@nameservers);
}
$key = $val = 0;
for ($status = $linkdb->seq($key, $val, R_FIRST);
$status == 0;
$status = $linkdb->seq($key, $val, R_NEXT)) {
my %data = split(/[=;]/, $val || "");
my $used = 0;
$links++;
$data{last} = 0 unless($data{last});
$data{used} = 0 unless($data{used});
$data{used} = 0 if($data{used} < 0);
$data{found} = $now unless($data{found});
next if($data{last} < 0);
next if($data{last} < $start && $data{used});
($domain,$path,$url,$host) = spliturl($key);
$url = optimum($url);
#debug("Choosing: %s", $url);
if ($url =~ /\057/) {
my $d = $url;
$d =~ s@\057.*@@;
if((!exists($domainexception{$d}) && !exists($urlexception{$url}))
|| exists($include{$url})) {
#debug("addurl(%s,%s,%s)",$url,$domain,$key);
if (addurl($url,$domain,$key)) {
$used++;
$urls++;
$data{used}++;
}
if ($doinaddr && $data{last} > $start && $host !~ /^\d+\.\d+\.\d+\.\d+$/) {
$path = $url;
$path =~ s@^[^/]+@@;
foreach (addresses($resolver,$host)) {
$url = $_ . $path;
#debug("addurl(%s,%s,%s)",$url,$domain,$key);
if (addurl($url,$domain,$key)) {
$used++;
$urls++;
$data{used}++;
}
}
}
}
} else {
if(!exists($domainexception{$url}) || exists($include{$url})) {
#debug("adddomain(%s,%s)",$url,$key);
if (adddomain($url,$key)) {
$used++;
$domains++;
$data{used}++;
}
if ($doinaddr && $data{last} > $start && $host !~ /^\d+\.\d+\.\d+\.\d+$/) {
foreach (addresses($resolver,$host)) {
#debug("adddomain(%s,%s)",$_,$key);
if (adddomain($_,$key)) {
$used++;
$domains++;
$data{used}++;
}
}
}
}
}
if($used) {
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s/\075/%3D/g;
$v =~ s/\073/%3B/g;
$val.="$k=$v;"
}
$linkdb->put($key, $val);
$linkdb->sync();
}
}
$domaindb->sync();
$urldb->sync();
$linkdb->sync();
$now = time;
status("Compiled %d links into %d domains and %d urls in %s",
$links, total($domaindb), total($urldb),
strtime($now-$checkpoint));
status("Added %d new domains and %d new urls", $domains, $urls);
$checkpoint = $now;
}
sub today() {
my ($sec,$min,$hour,$mday,$mon,$year) = localtime();
return(sprintf("%4d%02d%02d\n",$year+1900,$mon+1,$mday));
}
sub export() {
my ($k,$n);
$checkpoint = time;
status("Dumping the domainlist..");
($k,$n) = dumpkeys($domaindb, $domainlist);
$now = time;
status("Dumped %d keys of which %d new to the domainlist in %s..",
$k,$n,strtime($now-$checkpoint));
$checkpoint = $now;
status("Dumping the urllist..");
($k,$n) = dumpkeys($urldb, $urllist);
$now = time;
status("Dumped %d keys of which %d new to the urllist in %s..",
$k,$n,strtime($now-$checkpoint));
$checkpoint = $now;
}
sub valid($) {
my $db = shift;
my ($status, $key, $val, %data);
my $n = 0;
$key = $val = 0;
for ($status = $db->seq($key, $val, R_FIRST);
$status == 0;
$status = $db->seq($key, $val, R_NEXT)) {
%data = split(/[=;]/, $val || "");
$n++ if($data{last} && $data{last} > 0);
}
return($n);
}
sub total($) {
my $db = shift;
my ($status, $key, $val);
my $n = 0;
$key = $val = 0;
for ($status = $db->seq($key, $val, R_FIRST);
$status == 0;
$status = $db->seq($key, $val, R_NEXT)) {
$n++;
}
return($n);
}
sub dumpkeys($$) {
my ($db, $list) = @_;
my ($status, $key, $val, %data, $k, $n);
open(LIST, ">$list") || error("$list: $!");;
print LIST "#\n";
print LIST "# !!! WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING !!!\n";
print LIST "#\n";
print LIST "# This list is entierly a product of a dumb robot ($progname-$VERSION).\n";
print LIST "# We strongly recommend that you review the lists before using them!\n";
print LIST "# Don't blame us if there are mistakes, but please report errors with\n";
print LIST "# the online tool at http://www.squidguard.org/blacklist/\n";
print LIST "#\n";
print LIST "# !!! WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING !!!\n";
print LIST "#\n";
printf LIST "# This list was compiled in %s on %s.\n",
strtime(time-$start), date(time);
printf LIST "# This list was compiled from %d link sources and %d links,\n",
valid($sourcedb), total($linkdb);
printf LIST "# of which %d tested successfully.\n", valid($linkdb);
print LIST "#\n";
$key = $val = $k = $n = 0;
for ($status = $db->seq($key, $val, R_FIRST);
$status == 0;
$status = $db->seq($key, $val, R_NEXT)) {
$k++;
print LIST "$key\n";
%data = split(/[=;]/, $val || "");
if($data{found} && $data{found} > $start) {
$n++;
}
}
close(LIST);
return(($k,$n));
}
sub end($) {
my $exit = shift;
$exit = -1 unless(defined($exit));
if ($exit =~ /^[A-Z]+$/) {
$SIG{$exit} = "IGNORE";
status("Got %s signal..", $exit);
status("Cleaning up..");
}
unless($washonly || $exit) {
export() if($domaindb && $urldb && $linkdb);
}
if(@domaindiff) {
my $file = strftime("$domaindiff",localtime);
local *DIFF;
if (-f $file) {
open(DIFF,">>$file") || die("$file: $!");
} else {
open(DIFF,">$file") || die("$file: $!");
}
foreach(@domaindiff) {
chomp;
print DIFF "$_\n";
}
close(DIFF);
}
if(@urldiff) {
my $file = strftime("$urldiff",localtime);
local *DIFF;
if (-f $file) {
open(DIFF,">>$file") || die("$file: $!");
} else {
open(DIFF,">$file") || die("$file: $!");
}
foreach(@urldiff) {
chomp;
print DIFF "$_\n";
}
close(DIFF);
}
if($sourcedb) {
undef($sourcedb);
untie(%source);
}
if($candidatedb) {
dumpcandidates() unless($washonly || $exit);
undef($candidatedb);
untie(%candidate);
}
if($linkdb) {
undef($linkdb);
untie(%link);
}
if($domaindb) {
undef($domaindb);
untie(%domain);
}
if($urldb) {
undef($urldb);
untie(%url);
}
if($domainexceptiondb) {
undef($domainexceptiondb);
untie(%domainexception);
}
if($urlexceptiondb) {
undef($urlexceptiondb);
untie(%urlexception);
}
if($exceptiondb) {
undef($exceptiondb);
untie(%exception);
}
if($redirectordb) {
undef($redirectordb);
untie(%redirector);
}
if ($exit =~ /^[A-Z]+$/) {
status("Killed by a %s signal.", $exit);
$exit = $signal{$exit} || -2;
}
$exit = -3 unless($exit =~ /^\d+$/);
status("Total runtime %s", strtime(time-$start));
exit($exit);
}
sub disconnect() {
my ($status,$key,$val,$k,$v,$n);
status("Marking all links as unused..");
$key = $val = $n = 0;
for ($status = $linkdb->seq($key, $val, R_FIRST);
$status == 0;
$status = $linkdb->seq($key, $val, R_NEXT)) {
#debug("Checking: %s", $key);
my %data = split(/[=;]/, $val);
next unless($data{used});
info("Resetting: %s", $key);
$data{used} = 0;
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s@=@%3D@g;
$v =~ s@;@%3B@g;
$val.="$k=$v;"
}
$linkdb->put($key, $val, R_SETCURSOR);
$linkdb->sync();
$n++;
}
$now = time;
status("Reset %d of %d links in %s", $n, total($linkdb), strtime($now-$checkpoint));
$checkpoint = $now;
}
sub reconnect() {
my ($status,$key,$val,$referer,$k,$v,$n);
status("Marking all links refered in the domain and url lists as used..");
$key = $val = $n = 0;
for ($status = $domaindb->seq($key, $val, R_FIRST);
$status == 0;
$status = $domaindb->seq($key, $val, R_NEXT)) {
#debug("Checking: %s", $key);
my %data = split(/[=;]/, $val);
next unless($data{referer});
$referer = $data{referer};
info("Updating: %s: %s", $key, $referer);
if (exists($link{$referer})) {
my %data = split(/[=;]/, $link{$referer});
$data{used}++;
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s@=@%3D@g;
$v =~ s@;@%3B@g;
$val.="$k=$v;"
}
$linkdb->put($referer, $val);
$linkdb->sync();
} else {
$linkdb->put($referer, "last=0;ttl=0;status=0;found=$checkpoint;used=1;referer=$referer;");
$linkdb->sync();
}
$n++;
}
$now = time;
status("Marked %d of %d links refered from the domain list in %s",
$n, total($linkdb), strtime($now-$checkpoint));
$checkpoint = $now;
$key = $val = $n = 0;
for ($status = $urldb->seq($key, $val, R_FIRST);
$status == 0;
$status = $urldb->seq($key, $val, R_NEXT)) {
#debug("Checking: %s", $key);
my %data = split(/[=;]/, $val);
next unless($data{referer});
$referer = $data{referer};
info("Updating: %s: %s", $key, $referer);
if (exists($link{$referer})) {
my %data = split(/[=;]/, $link{$referer});
$data{used}++;
$val="";
while(($k,$v) = each(%data)) {
next unless($k && $keys{$k});
$v =~ s@=@%3D@g;
$v =~ s@;@%3B@g;
$val.="$k=$v;"
}
$linkdb->put($referer, $val);
$linkdb->sync();
} else {
$linkdb->put($referer, "last=0;ttl=0;status=0;found=$checkpoint;used=1;referer=$referer;");
$linkdb->sync();
}
$n++;
}
$now = time;
status("Marked %d of %d links refered from the url list in %s",
$n, total($linkdb), strtime($now-$checkpoint));
$checkpoint = $now;
}
sub fixconnect() {
disconnect();
reconnect();
}
#
# NOW JUST DO IT:
#
init();
if ($washonly) {
fixconnect();
} else {
expire();
load();
extract();
check();
compile();
wash();
end(0);
}
squidGuard-1.5/doc/ 0000750 0001750 0001750 00000000000 11124252206 013174 5 ustar adjoo adjoo squidGuard-1.5/doc/LDAPFlow.txt 0000640 0001750 0001750 00000002414 10717346070 015321 0 ustar adjoo adjoo This file documents the flow of control through the new LDAP logic:
UserSearch
|
|
SourceBlock *s;
search in s->userDb
/ \
----/ \-----------
/ \
not found found
| / \
| / \
| LDAP user list
| user user
| / \ |
| / \-------+
| timeout |
| / |
search LDAP ------------/ |
URL list |
/ \ |
/ \ |
not found |
found | |
| | return success
\ /
save both
states in the
userDb cache
i.e. add to userDb
if necessary
|
|
return found state as success or failure
squidGuard-1.5/doc/README 0000640 0001750 0001750 00000000205 11124252206 014052 0 ustar adjoo adjoo Note these are dumps of Shalla's squidGuard homepage.
Please check the online documentation at:
http://www.squidguard.org/
squidGuard-1.5/doc/configuration.html 0000640 0001750 0001750 00000172761 10717346070 016762 0 ustar adjoo adjoo
squidGuard - Configuration
|
This page was last modified 2002-01-08
|
|
The configuration file
The database
Tuning hints
Working configuration examples
The default path for the squidGuard configuration file is
"/usr/local/squidGuard/squidGuard.conf" but another
default can be set at compile time, and
can be changed at runtime. From here we'll
use squidGuard.conf for short.
Note: The number of configuration options and the
flexibility may look overwhelming. Don't panic! Concentrate on the
options that suits your needs. Start with a simple working configuration and extend as your
needs and experience grows. Don't try to solve everything in your
first attempt..
The recommended structure for squidGuard.conf is:
-
Note: No forward references are allowed! Within this strong
limitation you may actually chose any structure you prefer.
The following words are reserved in squidGuard.conf and
should be avoided in declaration names:
acl fri outside sun urllist
anonymous friday pass sunday user
date fridays redirect sundays userlist
dbhome ip rew thu wed
dest log rewrite thursday wednesday
destination logdir sat thursdays wednesdays
domain logfile saturday time weekly
domainlist mon saturdays tue within
else monday source tuesday
expressionlist mondays src tuesdays
-
In adition is:
-
#
|
used to start a comment. Everything from the # to the
end of line is ignored.
|
{ }
|
used to delimit the start and end of a group declaration.
|
-
|
often used to declare a range
(i.e. "from-to" or
"from - to").
|
Declaration names/lables have the same limitations as domainnames
except _ is allowed too (i.e. [-_.a-z0-9]+). Reserved words should be avoided as they may
cause unpredictable results.
Generally you may break a (long) line by repeating the leading
keyword. Repeated lines of the same type within a class will bee
joined when the rule trees are built. So:
-
src foo {
-
ip 1.2.3.4
ip 2.3.4.5
-
}
-
is equivalent to:
-
src foo {
-
ip 1.2.3.4 2.3.4.5
-
}
-
The default for the following directories
may be overruled by:
-
logdir
|
defines the diretory for the standard logfiles
"squidGuard.error" and "squidGuard.log", and
the base for relative logfilenames in log rules. The
default is "/usr/local/squidGuard/logs" but another
default can be set at compile time.
|
dbhome
|
defines the base for relative list filenames. The default is
"/usr/local/squidGuard/db" but another default can be
set at compile time.
|
Although the defaults can be used silently it is recommended to
declare these explicitly for clarity. For instance:
logdir /usr/local/squidGuard/logs
dbhome /usr/local/squidGuard/db
Time spaces, or zones if you prefer, are declared by:
-
time name {
-
specification
specification
...
-
}
where specification can be any reasonable combination of:
-
Days of the week with an optional time
constraint for each day:
-
weekly {smtwhfa} [HH:MM-HH:MM]
-
or
-
weekly dayname [...] [HH:MM-HH:MM]
-
where s=sun, m=mon, t =tue, w=wed, h=thu, f=fri, a=sat.
-
and dayname is one of:
-
"mon", "monday", "mondays", (synonymous)
-
"tue", "tuesday", "tuesdays", (synonymous)
-
"wed", etc.
-
For instance for monday to friday, mornings and evenings:
-
weekly mtwhf 00:00-08:00
weekly mtwhf 17:00-24:00
and for saturdays and sundays:
-
weekly as
-
or
-
weekly saturday
weekly sunday
-
Time of the day:
-
weekly * HH:MM-HH:MM
-
which is just a special case of weekly.
-
For instance:
-
weekly * 00:00-08:00
weekly * 17:00-24:00
-
Dates with an optional time
constraint for each date:
-
date YYYY-MM-DD [...] [HH:MM-HH:MM ...]
-
or
-
date YYYY.MM.DD [...] [HH:MM-HH:MM ...]
-
where the preferred of the two dateformats is just a matter of
personal taste.
-
For instance for the Ascension Day and the Whit Monday of 1999:
-
date 1999.05.13 1999.05.24
or for the Ash Wednesday afternoon of 1999:
-
date 1999.03.31 12:00-24:00
-
Date range with an optional time
constraint for each day:
-
date YYYY-MM-DD-YYYY-MM-DD [HH:MM-HH:MM ...]
-
or
-
date YYYY.MM.DD-YYYY.MM.DD [HH:MM-HH:MM ...]
-
For instance for the Easter of 1999:
-
date 1999.04.01-1999.04.05
-
Date wildcard with an optional time
constraint:
-
date YYYY-MM-DD [HH:MM-HH:MM ...]
-
or
-
date YYYY.MM.DD [HH:MM-HH:MM ...]
-
where YYYY, MM and
DD may be an asterisk,
"*".
-
For instance for the New Year's Day:
-
date *.01.01
-
and for the Christmas Eve:
-
date *.12.24 12:00-24:00
Note1: The numeric formats are strict (I.e. 08:00 not 8:00
for HH:MM etc).
Note2: Overlaps are OK, and the result is the union.
Thus for instance a Norwegian time space definition for leisure
time including holidays and short days could look something like:
time leisure-time {
weekly * 00:00-08:00 # night
weekly * 17:00-24:00 # evening
weekly fridays 16:00-17:00 # weekend
weekly saturdays sundays # weekend
date *.01.01 # New Year's Day
date *.05.01 # Labour Day
date *.05.17 # National Day
date *.12.24 12:00-24:00 # Christmas Eve
date *.12.25 # Christmas Day
date *.12.26 # Boxing Day
date 1999.03.31 12:00.24:00 # Ash Wednesday
date 1999.04.01-1999.04.05 # Easter
date 1999.05.13 1999.05.24 # Ascension Day and Whitsun
date 2000.04.19 12:00.24:00 # Ash Wednesday y2000
date 2000.04.20-2000.04.24 # Easter y2000
date 2000.06.01 2000.06.12 # Ascension Day and Whitsun y2000
}
Source group, or client groups if you prefer, are declared by:
-
src|source name [within|outside time_space_name] {
-
specification
specification
...
-
}
or
-
src|source name within|outside time_space_name {
-
specification
specification
...
-
} else {
-
specification
specification
...
-
}
where:
-
src and source are synonymous; use
the one you prefer.
-
within and outside sets an
optional time constraint to the definition.
-
the else part refers to the time constraint.
Time constraints on clientgroups can be used to make these clients
unknown (i.e. use the default rule) within or outside a given time
space. Or it can be used to define a usergroup that is expected to
move between two locations at given times (like office/home)
Specification can be any reasonable combination of:
-
IP addresses and/or ranges (multiple):
-
ip xxx.xxx.xxx.xxx [...]
-
or
-
ip xxx.xxx.xxx.xxx/nn [...]
-
or
-
ip xxx.xxx.xxx.xxx/mmm.mmm.mmm.mmm [...]
-
or
-
ip xxx.xxx.xxx.xxx-yyy.yyy.yyy.yyy [...]
-
where:
-
xxx.xxx.xxx.xxx is an IP address (host or net,
i.e. 10.11.12.13 or 10.11.12.0),
/nn a net prefix (i.e. /23),
mmm.mmm.mmm.mmm is a netmask
(i.e. 255.255.254.0) and
yyy.yyy.yyy.yyy
is a host address (must be >= xxx.xxx.xxx.xxx)
-
IP address/range list (single):
-
iplist filename
-
where:
-
filename is either a path relative to dbhome or an absolute path
(i.e. /full/path) to a database file.
-
the iplist file format is simply
addresses and/or networks separated by a newline as above but
without the ip keyword. Thus an iplist for all the private
addresses could look something like (Though the preferred use
of "iplist" over "ip" is for long lists of
WS/PC addresses primarily to reduce the size of the
configuration file):
-
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
-
Domains (multiple):
-
domain foo.bar [...] *)
-
where:
-
foo.bar is a domain (zone) the domain name (from a
reverse lookup on the client addresses) belongs to (directly or
as a subdomain).
-
Users (multiple):
-
user foo [...] **)
-
where:
-
foo is a username (from a ident/RFC-931 lookup to the
client.
-
User list (single):
-
userlist filename **)
-
where:
-
filename is either a path relative to dbhome or an absolute path
(i.e. /full/path) to a database file.
-
the userlist file format is simply
RFC-931 usernames, optionally followed by a `:' and a comment
(i.e. /etc/passwd or a .htpasswd file may be used) separated by
a newline as in the user declaration but
without the user keyword. Thus a userlist could look something
like:
-
root
administrator
foo
bar
-
Special clientgroup translation
log (single):
-
log|logfile [anonymous] filename
-
where:
-
filename is either a path relative to logdir or an absolute path
(i.e. /full/path) to a logfile where translation
for this group should be logged. If the anonymous
option is specified the logged info is somewhat anonymized to
protect the individual.
*) The use of domain match for clientsgroups requires Squid
is set up to do revese lookups on clients.
**) The use
of username match for clientsgroups requires Squid is set up to
do ident/RFC-931 lookups.
Note1: Overlaps are OK, and the groups are matched in the
order they are defined.
Note2: The logical operator
between different types within a group (ip/domain/user) is AND. The
default is any. Thus one of each defined type
must match but undefined types are ignored.
Thus an administrator client group could look something like:
src admin within leisure-time {
ip 10.11.12.13 10.11.12.26 # The administrators home WS/PCs
domain ras.teledanmark.no # The RAS domain
user root administrator foo bar # The administrators login names
} else {
ip 10.1.1.15 10.1.2.17 # The administrators office WS/PCs
domain lan.teledanmark.no # The LAN domain
user root administrator foo bar # The administrators login names
}
Destination group, or target groups if you prefer, are declared by:
-
dest|destination name [within|outside time_space_name] {
-
specification
specification
...
-
}
or
-
dest|destination name within|outside time_space_name {
-
specification
specification
...
-
} else {
-
specification
specification
...
-
}
where:
-
dest and destination are synonymous.
-
within and outside sets an
optional time constraint to the definition.
-
the else part refers to the time constraint.
Time constraints on destinationgroups can be used to make these
groups void (i.e. ignored) within or outside a given time space.
Specification can be any combination of zero or one of
each of:
-
Domainlist (single):
-
domainlist filename
-
URL list (single):
-
urllist filename
-
Expressionlist (single):
-
expressionlist filename
-
where:
-
filename is either a path relative to dbhome or an absolute path
(i.e. /full/path) to a database
file.
-
Special
destinationgroup redirect URL (single):
-
redirect [302:]url
-
Special destinationgroup redirect log (single):
-
log|logfile [anonymous] filename
-
where:
-
filename is either a path relative to logdir or an absolute path
(i.e. /full/path) to a logfile where redirects
caused by match of this group should be logged. If the
anonymous option is specified the logged info is
somewhat anonymized to protect the individual.
Note1: Overlaps are OK, and the groups are matched in the
order they are listed in the pass declaration in for the
actual clientgroup.
Note2: The logical operator between
different types (domainlist/urllist/expressionlist) is OR. The
default is void. Thus the destinationgroup is matched if
one of the defined types match. Within a destination group
the test order is domainlist, urllist, and expressionlist.
Thus an entertainment destination group declaration could look
something like:
dest not-business-related outside leisure-time {
domainlist entertainment/domains
urllist entertainment/urls
expressionlist entertainment/expressions
}
Rewrite rule groups, or rewrite rule sets if you prefer, are
declared by:
-
rew|rewrite name [within|outside time_space_name] {
-
substitution
substitution
...
[logging]
-
}
or
-
rew|rewrite name within|outside time_space_name {
-
substitution
substitution
...
[logging]
-
} else {
-
substitution
substitution
...
[logging]
-
}
where:
-
rew and rewrite are synonymous.
-
within and outside sets an
optional time constraint to the definition.
-
the else part refers to the time constraint.
Time constraints on rewritegroups can be used to make these groups
functional within or outside a given time space only; Like redirect
to local copies within peek business hours.
Substitution is sed style (multiple):
-
s@from@to@[irR]
-
where:
-
from is a regular
expression that will be replaced with the string
to.
-
the i option makes the from part match case
insensitive.
-
the r option makes the redirection visible to the user
with a HTTP
code 302 - Moved Temporarily (The default is to make
Squid silently fetch the alternate URL).
-
the R option makes the redirection visible to the user
with a HTTP
code 301 - Moved Permanently.
-
and logging is (single):
-
log|logfile [anonymous] filename
-
where:
-
filename is either a path relative to logdir or an absolute path
(i.e. /full/path) to a logfile where succeded
rewrites should be logged. If the anonymous option is
specified the logged info is somewhat anonymized to protect the
individual.
Note1: Sed style substitutions uses regular expressions and
thus slows down squidGuard more than B-tree lookups.
Note2: Suport for visible redirects (i.e. 302: URL
prefix) is broken in some versions of Squid.
A rewrite rule set declaration could look something like:
rew get-local {
s@.*/cb32e46.exe$@http://ftp/pub/www/client/windows/cb32e46.exe@r
s@.*/cc32e46.exe$@http://ftp/pub/www/client/windows/cc32e46.exe@r
s@.*/cp32e46.exe$@http://ftp/pub/www/client/windows/cp32e46.exe@r
}
The Access Control List, ACL, combies the previous
definitions into distinct rulesets for each clientgroup:
-
acl {
-
sourcegroupname [within|outside timespacename] {
-
pass [!]destgroupname [...]
-
[rew|rewrite rewritegroupname [...]
-
[redirect [301:|302:]new_url]
-
}
-
sourcegroupname within|outside timespacename {
-
pass [!]destgroupname [...]
-
[rew|rewrite rewritegroupname [...]
-
[redirect [301:|302:]new_url]
-
} else {
-
pass [!]destgroupname [...]
-
[rew|rewrite rewritegroupname [...]
-
[redirect [301:|302:]new_url]
-
}
-
...
-
default [within|outside timespacename] {
-
pass [!]destgroupname [...]
-
[rew|rewrite rewritegroupname [...]
-
redirect [301:|302:]new_url
-
}[ else {
-
pass [!]destgroupname [...]
-
[rew|rewrite rewritegroupname [...]
-
redirect [301:|302:]new_url
-
]
-
}
Note: There may be no more than one acl block.
The default rule set:
The default section defines fallbacks for all acl
rulesets. Thus if you define a rewrite rule here it will be used in
acls where there are no rewrite rules defined. (i.e. the other acls
inherits the definitions in the default acl optionally overruled by
own definitions). The default rule set is used for all clients that
match no clientgroup and for clientgroups with no acls declared.
The pass rule:
The pass rules declares destination groups that should
pass for the actual client group. "!" is the
NOT operator and indicates a destination group that should
not pass (i.e. be redirected to the actual redirect URL).
Note: Pass rules ends with an implicit "all". It is good
practice to allways en the pass rules with either "all" or "none"
to make them clear. Ie. use:
pass good none
or
pass good !bad all
Note: If there is a !group there must also
be a redirect definition for eiter that destination group, the
actual acl or the default acl. If you want some rules for unknown
clients that should not apply to the other acls you should define a
last clientgroup named "unknown" and with an IP range 0.0.0.0/0
(i.e. any), and put those rules in the "unknown" acl.
Built in wildcard groups:
-
The following are built in wildcard destination groups:
-
in-addr
-
!in-addr can be used to enforce the use of domainnames
over IP addresses in the host part of URLs. in-addr is a fast
equivalent to a group with the expressionlist
"^[^:/]+://[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}($|[:/])".
-
any
-
matches any URL and is a fast equivalent to the expression
".*".
-
all
-
is a synonym to any. Use the one you prefer.
-
none
-
is a fast equivalent to !any and should be used to
terminate pass rules where only the listed destination groups
should pass.
The rewrite rule:
The rewrite rules declares the substitution rulsets that
applies to the actual acl.
The redirect rule:
The redirect rules declares the altenative URL to be used
for blocked destination groups (!groups) for the actual
acl.
Note: Inside an acl, this is a fallback used when
there is no special redirect declared for the actual destination
group, and the default redirect is the last resort.
squidGuard can do runtime string substitutions in the
redirectors. Therefor the character "%" has special
meaning in the redirector URLs:
%a
|
is replaced with IP address of the client.
|
%n
|
is replaced with the domainname of the client or
"unknown" if not available.
|
%i
|
is replaced with the user ID (RFC931) or "unknown"
if not available.
|
%s
|
is replaced with the matched source group (client group) or
"unknown" if no groups were matched.
|
%t
|
is replaced with the matched destination group (target group)
or "unknown" if no groups were matched.
|
%u
|
is replaced with the requested URL.
|
%p
|
is replaced with the REQUEST_URI, i.e. the path and the optional
query string of %u, but note for
convenience without the leading "/".
|
%%
|
is replaced with a single "%".
|
Thus you can pass usefull information to a more or less intelligent
CGI page:
http://proxy/cgi/squidGuard?clientaddr=%a&clientname=%n&clientident=%i&clientgroup=%s&destinationgroup=%t&url=%u
For a start, there is a sample of such a script in
samples/squidGuard.cgi in the source tree.
squidGuard uses a database that can be devided into an unlimited
number of distinct categories like "local", "customers", "vendors",
"banners", "banned" etc. Each category may consist of separate
unlimited lists of domains, URLs and/or regular
expressions. For easy revision the lists are stored in separate
plain text files that. The lists are for efficiency stored in
in-memory-only B-trees at startup.
Note: All URLs are converted to lowercase before match
search. So the lists should not contain uppercase leters.
The domainlist file format is simply domainnames/zonenames
separated by a newline. The length of these lists have neglectable
influence on the performance.
For instance a start for a financial category:
amex.com
asx.com.au
bourse-de-paris.fr
exchange.de
londonstockex.co.uk
nasdaq.com
nyse.com
ose.no
tse.or.jp
xsse.se
Note: squidGuard will match any URL with the domainname
itself an any subdomains and hosts (i.e. amex.com,
www.amex.com, whatever.amex.com and
www.what.ever.amex.com but not
.*[^.]amex.com (i.e. aamex.com etc.)).
The urllist file format is simply URLs separated by newline but
with the "proto://((www|web|ftp)[0-9]*)?" and
"(:port)?" parts and normally also the ending
"(/|/[^/]+\.[^/]+)$" part (i.e. ending "/" or
"/filename") choped off. (i.e. "http://www3.foo.bar.com:8080/what/ever/index.html" =>
"foo.bar.com/what/ever")
For instance a category for banned sites:
foo.com/~badguy
bar.com/whatever/suspect
Note: The removed parts above are ignored by squidGuard in
URL matching. Thus all these URLs will match the above urllist:
http://foo.com/~badguy
http://foo.com/~badguy/whatever
ftp://foo.com/~badguy/whatever
wais://foo.com/~badguy/whatever
http://www2.foo.com/~badguy/whatever
http://web56.foo.com/~badguy/whatever
but not:
http://barfoo.com/~badguy
http://bar.foo.com/~badguy
http://foo.com/~goodguy
New in 1.0.0 is the ability to do
1-1 redirects on url basis with
"key new_url". Thus as an alternative
to using rewrites to redirect to local distributions you can have a
destination group with an urllist like:
netscape.com/pub/communicator/4.51/english/windows/windows95_or_nt/complete_install/cc32e451.exe http://ftp.teledanmark.no/pub/www/client/windows/cc32e451.exe
netscape.com/pub/communicator/4.51/english/windows/windows95_or_nt/base_install/cb32e451.exe http://ftp.teledanmark.no/pub/www/client/windows/cb32e451.exe
and an acl with
pass ... !download .... This may be a
faster alternative than using lots of s@from@to@ rewrites
for 1-1 mapping since it will be faster to search the B-tree than
perform a bunch of string edits.
The expressionlist file format is lines with regular expressions as
described in regex(5). Of most interrest is:
.
|
Matches any single character (use "\." to match a
".").
|
[abc]
|
Matches one of the characters ("[abc]" matches a
single "a" or "b" or
"c").
|
[c-g]
|
Matches one of the characters in the range ("[c-g]"
matches a single "c" or "d" or
"e" or "f" or "g".
"[a-z0-9]" matches any single letter or digit.
"[-/.:?]" matches any single "-" or
"/" or "." or ":"
or "?".).
|
?
|
None or one of the preceding ("words?" will match
"word" and "words". "[abc]?"
matches a single "a" or "b" or
"c" or nothing (i.e. "")).
|
*
|
None or more of the preceding ("words*" will match
"word", "words" and
"wordsssssss". ".*" will match anything
including nothing).
|
+
|
One or more of the preceding ("xxx+" will match a
sequence of 3 or more "x").
|
(expr1|expr2)
|
One of the expressions, which in turn may contain a similar
construction ("(foo|bar)" will match "foo"
or "bar". "(foo|bar)? will match
"foo" or "bar" or nothing
(i.e. "")).
|
$
|
The end of the line ("(foo|bar)$" will match
"foo" or "bar"only at the end of a
line).
|
\x
|
Disable the special meaning of x where
x is one of the special regex characters
".?*+()^$[]{}\" ("\." will match a
single ".", "\\" a single "\" etc.)
|
Thus a start to block possible sexual material by expression match
could look like:
-
(^|[-\?+=/_])(bondage|boobs?|busty?|hardcore|porno?|sex|xxx+)([-\?+=/_]|$)
Notes:
-
Unless you build your expressions very very carefully there is a
high risk you will have annoyed users on your neck. Typically
you might accidentally block "Essex", "Sussex", "breastcancer",
"www.x.org" etc. in your eagerness for blocking pornographic
material. In practice you would probably replace some of the
words in the example above with some more clearly pornographic
related words that I don't find appropriate to list
here.
-
While the size of the domain and urllists only has marginal
influence on the performance, too many large or complex
expressions will quickly degrade the performance of
squidGuard. Though it may depend heavily on the performance of
the regex library you link with.
-
There is a rich set of sample files
for a group of supposedly pornographic sites under
samples/dest/adult in the source tree that you can use as
a start if porn blocking is one of your tasks. Please note:
We recommend that you review these lists before using them. Those
domains and urls have been collected automagically by a
robot. No manual evaluation of the corresponding contents has been
performed. Therefor there is a chance some nonpornographic sites
have sliped in. Please report
such errors but don't blame us if your fine site is on the
list. (Blame those who have pointers to appropriate sites mixed in
on their heavy porn link pages!)
-
To avoid publishing to your users a complete guide to banned
sites, you probably want to have some or all of these files
protected by for instance:
-
chmod 640 /wherever/filter/db/dest/adult/*
chown cache_effective_user /wherever/filter/db/dest/adult/*
chgrp cache_effective_group /wherever/filter/db/dest/adult/*
where cache_effective_user and
cache_effective_group are the values for the
corresponding tags as defined in squid.conf.
To convert a domainlist or urllist from plain text file to a
prebuilt database use:
-
squidGuard -C listfile
and send Squid a HUP signal to respawn squidGuard. Note:
listfile is the absolute plain text filename or
relative to dbhome.
To add and remove entries from a prebuilt database in runtime put
the changes in a diff file (file.diff for
file.db) with the following simple format:
-
+new
-old
...
Then use:
-
squidGuard -u
and remove the diff files. The changes should take effect
immediately.
For optimal performance try:
-
limiting both the number of regular expressions and their
complexity. Use domainlists and/or urllists where possible.
-
limiting the number of rewrite rules. Use redirectors where
possible.
-
limiting the number of useless url list entries. Move the
domainnames to the domainlist and remove redundant urllist entries
where aplicable.
-
using ip addressranges rather than long lists of single ip
addresses. If possible try grouping different usergroups into
different ranges or subnets (virtual or physical).
Example 0 - The
absolutely minimal do nothing config:
The absolutely minimal config file is an emty but existing file
(i.e. squidGuard -c /dev/null) which is equivalent
to:
acl {
default {
pass all
}
}
Example 1 - The recommended minimal do nothing
config:
We do recommend, for clarity, to say explicitly what squidGuard is
expected to do (makes things less magic for a new operator):
logdir /usr/local/squidGuard/log
acl {
default {
pass all
}
}
Example 2 -
Limiting the access to one destination group only:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
dest local {
domainlist local/domains
}
acl {
default {
pass local none
redirect http://localhost/cgi/blocked?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&url=%u
}
}
This implies there must be a domain list file "/usr/local/squidGuard/db/local/domains"
that may simply look like:
teledanmark.no
Example 3 -
Blocking the access for unknown or unprivileged clients:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
src privileged {
ip 10.0.0.1 10.0.0.73 10.0.0.233 # ONE OF single clients
ip 10.0.0.10-10.0.0.20 # OR WITHIN range 10.0.0.10 - 10.0.0.20
ip 10.0.1.32/27 # OR WITHIN range 10.0.1.32 - 10.0.1.63
ip 10.0.2.0/255.255.255.0 # OR WITHIN range 10.0.2.0 - 10.0.2.255
# AND
domain foo.bar # MATCH foo.bar. OR *.foo.bar.
}
acl {
privileged {
pass all
}
default {
pass none
redirect http://info.foo.bar/cgi/blocked?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&url=%u
}
}
Using client domainname match implies reverse lookup is enabled
(log_fqdn on) in squid.conf.
teledanmark.no
Example 4 -
Blocking inappropriate sites:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
default {
pass !porn all
redirect http://localhost/cgi/blocked?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&url=%u
}
}
This implies there must be a domain list file "/usr/local/squidGuard/db/porn/domains"
and a domain list file "/usr/local/squidGuard/db/porn/urls". The
domain list file may have a
zillion lines like:
porn.com
sex.com
The "url list file may have an other
zillion lines like:
foo.com/~porn
bar.com/img/sex
Example 5 -
Blocking inappropriate sites for some users and blocking unknown
clients:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
src grownups {
ip 10.0.0.0/24 # range 10.0.0.0 - 10.0.0.255
# AND
user foo bar # ident foo or bar
}
src kids {
ip 10.0.0.0/22 # range 10.0.0.0 - 10.0.3.255
}
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
grownups {
pass all
}
kids {
pass !porn all
}
default {
pass none
redirect http://info.foo.bar/cgi/blocked?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&targetgroup=%t&url=%u
}
}
Using userident match implies RFC931/ident lookup is enabled in
squid.conf, optionally only for the actual client groups,
and that foo and bar's workstations must support RFC931.
Example 6 -
Blocking inappropriate sites partially with regex:
+ ensuring local and good sites are passed
even if they would match a blocking regex:
+ limiting the usage of IP-address URLs:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
dest local {
domainlist local/domains
}
dest good {
domainlist local/domains
}
dest porn {
domainlist porn/domains
urllist porn/urls
expressionlist porn/expressions
}
acl {
default {
pass local good !in-addr !porn all
redirect http://localhost/cgi/blocked?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&url=%u
}
}
Example 7 -
Blocking inappropriate sites within business hours only:
Lets extend example 5 with:
-
a time constraint on censorship
-
logging redirections of inappropriate sites anonymized
-
redirecting inappropriate sites specially.
-
and still protecting the kids 24h.
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
time leisure-time {
weekly * 00:00-08:00 17:00-24:00 # night and evening
weekly fridays 16:00-17:00 # weekend
weekly saturdays sundays # weekend
date *.01.01 # New Year's Day
date *.05.01 # Labour Day
date *.05.17 # National Day
date *.12.24 12:00-24:00 # Christmas Eve
date *.12.25 # Christmas Day
date *.12.26 # Boxing Day
date 1999.03.31 12:00.24:00 # Ash Wednesday
date 1999.04.01-1999.04.05 # Easter
date 1999.05.13 1999.05.24 # Ascension Day and Whitsun
date 2000.04.19 12:00.24:00 # Ash Wednesday y2000
date 2000.04.20-2000.04.24 # Easter y2000
date 2000.06.01 2000.06.12 # Ascension Day and Whitsun y2000
}
src grownups {
ip 10.0.0.0/24 # range 10.0.0.0 - 10.0.0.255
# AND
user foo bar # ident foo or bar
}
src kids {
ip 10.0.0.0/22 # range 10.0.0.0 - 10.0.3.255
}
dest porn {
domainlist porn/domains # file listing domains (clear text)
urllist porn/urls # file listing URLs (clear text)
expressionlist porn/expressions # file with expressions (clear text regex)
redirect 302:http://info.foo.bar/images/blocked.gif
# redirect matches to this URL
log anonymous porn.log # log redirects anonymized to logdir/porn.log
}
acl {
grownups within leisure-time {
pass all # don't censor peoples leisure-time
} else {
pass !in-addr !porn all # restrict access during business hours
}
kids {
pass !porn all # protect the kids 24h anyway
}
default {
pass none # reject unknown clients
redirect http://info.foo.bar/cgi/blocked?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&targetgroup=%t&url=%u
}
}
|
|
|
|
|
squidGuard-1.5/doc/configuration.txt 0000640 0001750 0001750 00000131634 10717346070 016627 0 ustar adjoo adjoo [1][squidGuard.gif] Configuring squidGuard [config.gif]
[2]squidGuard is an ultrafast and free filter, redirector and access
controller for [3]Squid
By [4]Pål Baltzersen and [5]Lars Erik Håland
[6]Copyright © 1999-2000, [7]Tele Danmark InterNordia
Visitors: [counter] (Since 2002-01-08 19:54:05)
This page was last modified 2002-01-08
Contents
[8]The configuration file
* [9]In general
* [10]The Structure
* [11]Reserved words
* [12]Declaration names/lables
* [13]Breaking long lines
[14]Path declarations
[15]Time space declarations
[16]Source group declarations
[17]Destination group declarations
[18]Rewrite rule group declarations
[19]Access control rule declarations
[20]The database
* [21]Domainlists
* [22]URLlists
* [23]Expressionlists
[24]Tuning hints
[25]Working configuration examples
* [26]Example 0 - The absolutely minimal do nothing
* [27]Example 1 - The recommended minimal do nothing
* [28]Example 2 - Limiting the access to one destination group only
* [29]Example 3 - Blocking the access for unknown or unprivileged
clients
* [30]Example 4 - Blocking inapropriate sites
* [31]Example 5 - Blocking inapropriate sites for some users and
blocking unknown clients
* [32]Example 6 - Blocking inapropriate sites partially with regex
* [33]Example 7 - Blocking inapropriate sites during business hours
only
[arrow-red.gif] The configuration file
The default path for the squidGuard configuration file is
"/usr/local/squidGuard/squidGuard.conf" but another default can be set
at [34]compile time, and can be changed at [35]runtime. From here
we'll use squidGuard.conf for short.
Note: The number of configuration options and the flexibility may look
overwhelming. Don't panic! Concentrate on the options that suits your
needs. Start with a [36]simple working configuration and extend as
your needs and experience grows. Don't try to solve everything in your
first attempt..
[arrow-green.gif] In general
The Structure
The recommended structure for squidGuard.conf is:
[37]Path declarations (i.e. logdir and dbhome) (optional)
[38]Time space declarations (i.e. time zones) (optional)
[39]Source group declarations (i.e. clients) (optional)
[40]Destination group declarations (i.e. URLs) (optional)
[41]Rewrite rule group declarations (optional)
[42]Access control rule declarations (required)
Note: No forward references are allowed! Within this strong limitation
you may actually chose any structure you prefer.
Reserved words
The following words are reserved in squidGuard.conf and should be
avoided in declaration names:
acl fri outside sun urllist
anonymous friday pass sunday user
date fridays redirect sundays userlist
dbhome ip rew thu wed
dest log rewrite thursday wednesda
y
destination logdir sat thursdays wednesda
ys
domain logfile saturday time weekly
domainlist mon saturdays tue within
else monday source tuesday
expressionlist mondays src tuesdays
In adition is:
# used to start a comment. Everything from the # to the end of line is
ignored.
{ } used to delimit the start and end of a group declaration.
- often used to declare a range (i.e. "from-to" or "from - to").
Declaration names/lables
Declaration names/lables have the same limitations as domainnames
except _ is allowed too (i.e. [-_.a-z0-9]+). [43]Reserved words should
be avoided as they may cause unpredictable results.
Breaking long lines
Generally you may break a (long) line by repeating the leading
keyword. Repeated lines of the same type within a class will bee
joined when the rule trees are built. So:
src foo {
ip 1.2.3.4
ip 2.3.4.5
}
is equivalent to:
src foo {
ip 1.2.3.4 2.3.4.5
}
[arrow-green.gif] Path declarations
The [44]default for the following directories may be overruled by:
logdir defines the diretory for the standard logfiles
"squidGuard.error" and "squidGuard.log", and the base for relative
logfilenames in log rules. The default is "/usr/local/squidGuard/logs"
but another default can be set at [45]compile time.
dbhome defines the base for relative list filenames. The default is
"/usr/local/squidGuard/db" but another default can be set at
[46]compile time.
Although the defaults can be used silently it is recommended to
declare these explicitly for clarity. For instance:
logdir /usr/local/squidGuard/logs
dbhome /usr/local/squidGuard/db
[arrow-green.gif] Time space declarations
Time spaces, or zones if you prefer, are declared by:
time name {
specification
specification
...
}
where specification can be any reasonable combination of:
Days of the week with an optional time constraint for each day:
weekly {smtwhfa} [HH:MM-HH:MM]
or
weekly dayname [...] [HH:MM-HH:MM]
where s=sun, m=mon, t =tue, w=wed, h=thu, f=fri, a=sat.
and dayname is one of:
"mon", "monday", "mondays", (synonymous)
"tue", "tuesday", "tuesdays", (synonymous)
"wed", etc.
For instance for monday to friday, mornings and evenings:
weekly mtwhf 00:00-08:00
weekly mtwhf 17:00-24:00
and for saturdays and sundays:
weekly as
or
weekly saturday
weekly sunday
Time of the day:
weekly * HH:MM-HH:MM
which is just a special case of [47]weekly.
For instance:
weekly * 00:00-08:00
weekly * 17:00-24:00
Dates with an optional time constraint for each date:
date YYYY-MM-DD [...] [HH:MM-HH:MM ...]
or
date YYYY.MM.DD [...] [HH:MM-HH:MM ...]
where the preferred of the two dateformats is just a
matter of personal taste.
For instance for the Ascension Day and the Whit Monday of
1999:
date 1999.05.13 1999.05.24
or for the Ash Wednesday afternoon of 1999:
date 1999.03.31 12:00-24:00
Date range with an optional time constraint for each day:
date YYYY-MM-DD-YYYY-MM-DD [HH:MM-HH:MM ...]
or
date YYYY.MM.DD-YYYY.MM.DD [HH:MM-HH:MM ...]
For instance for the Easter of 1999:
date 1999.04.01-1999.04.05
Date wildcard with an optional time constraint:
date YYYY-MM-DD [HH:MM-HH:MM ...]
or
date YYYY.MM.DD [HH:MM-HH:MM ...]
where YYYY, MM and DD may be an asterisk, "*".
For instance for the New Year's Day:
date *.01.01
and for the Christmas Eve:
date *.12.24 12:00-24:00
Note1: The numeric formats are strict (I.e. 08:00 not 8:00 for HH:MM
etc).
Note2: Overlaps are OK, and the result is the union.
Thus for instance a Norwegian time space definition for leisure time
including holidays and short days could look something like:
time leisure-time {
weekly * 00:00-08:00 # night
weekly * 17:00-24:00 # evening
weekly fridays 16:00-17:00 # weekend
weekly saturdays sundays # weekend
date *.01.01 # New Year's Day
date *.05.01 # Labour Day
date *.05.17 # National Day
date *.12.24 12:00-24:00 # Christmas Eve
date *.12.25 # Christmas Day
date *.12.26 # Boxing Day
date 1999.03.31 12:00.24:00 # Ash Wednesday
date 1999.04.01-1999.04.05 # Easter
date 1999.05.13 1999.05.24 # Ascension Day and Whitsun
date 2000.04.19 12:00.24:00 # Ash Wednesday y2000
date 2000.04.20-2000.04.24 # Easter y2000
date 2000.06.01 2000.06.12 # Ascension Day and Whitsun y2000
}
[arrow-green.gif] Source group declarations
Source group, or client groups if you prefer, are declared by:
src|source name [within|outside time_space_name] {
specification
specification
...
}
or
src|source name within|outside time_space_name {
specification
specification
...
} else {
specification
specification
...
}
where:
* src and source are synonymous; use the one you prefer.
* within and outside sets an optional time constraint to the
definition.
* the else part refers to the time constraint.
Time constraints on clientgroups can be used to make these clients
unknown (i.e. use the default rule) within or outside a given time
space. Or it can be used to define a usergroup that is expected to
move between two locations at given times (like office/home)
Specification can be any reasonable combination of:
IP addresses and/or ranges (multiple):
ip xxx.xxx.xxx.xxx [...]
or
ip xxx.xxx.xxx.xxx/nn [...]
or
ip xxx.xxx.xxx.xxx/mmm.mmm.mmm.mmm [...]
or
ip xxx.xxx.xxx.xxx-yyy.yyy.yyy.yyy [...]
where:
xxx.xxx.xxx.xxx is an IP address (host or net, i.e.
10.11.12.13 or 10.11.12.0),
/nn a net prefix (i.e. /23),
mmm.mmm.mmm.mmm is a netmask (i.e. 255.255.254.0)
and
yyy.yyy.yyy.yyy is a host address (must be >=
xxx.xxx.xxx.xxx)
IP address/range list (single):
iplist [48]filename
where:
filename is either a path relative to [49]dbhome or
an absolute path (i.e. /full/path) to a
[50]database file.
the iplist file format is simply addresses and/or
networks separated by a newline as above but
without the ip keyword. Thus an iplist for all the
private addresses could look something like (Though
the preferred use of "iplist" over "ip" is for long
lists of WS/PC addresses primarily to reduce the
size of the configuration file):
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
Domains (multiple):
domain foo.bar [...] *)
where:
foo.bar is a domain (zone) the domain name (from a
reverse lookup on the client addresses) belongs to
(directly or as a subdomain).
Users (multiple):
user foo [...] **)
where:
foo is a username (from a ident/RFC-931 lookup to
the client.
User list (single):
userlist [51]filename **)
where:
filename is either a path relative to [52]dbhome or
an absolute path (i.e. /full/path) to a
[53]database file.
the userlist file format is simply RFC-931
usernames, optionally followed by a `:' and a
comment (i.e. /etc/passwd or a .htpasswd file may
be used) separated by a newline as in the [54]user
declaration but without the user keyword. Thus a
userlist could look something like:
root
administrator
foo
bar
Special clientgroup translation log (single):
log|logfile [anonymous] filename
where:
filename is either a path relative to [55]logdir or an
absolute path (i.e. /full/path) to a logfile where
translation for this group should be logged. If the
anonymous option is specified the logged info is somewhat
anonymized to protect the individual.
*) The use of domain match for clientsgroups requires Squid is set up
to do revese lookups on clients.
**) The use of username match for clientsgroups requires Squid is set
up to do ident/RFC-931 lookups.
Note1: Overlaps are OK, and the groups are matched in the order they
are defined.
Note2: The logical operator between different types within a group
(ip/domain/user) is AND. The default is any. Thus one of each defined
type must match but undefined types are ignored.
Thus an administrator client group could look something like:
src admin within leisure-time {
ip 10.11.12.13 10.11.12.26 # The administrators home WS/PCs
domain ras.teledanmark.no # The RAS domain
user root administrator foo bar # The administrators login names
} else {
ip 10.1.1.15 10.1.2.17 # The administrators office WS/PC
s
domain lan.teledanmark.no # The LAN domain
user root administrator foo bar # The administrators login names
}
[arrow-green.gif] Destination group declarations
Destination group, or target groups if you prefer, are declared by:
dest|destination name [within|outside time_space_name] {
specification
specification
...
}
or
dest|destination name within|outside time_space_name {
specification
specification
...
} else {
specification
specification
...
}
where:
* dest and destination are synonymous.
* within and outside sets an optional time constraint to the
definition.
* the else part refers to the time constraint.
Time constraints on destinationgroups can be used to make these groups
void (i.e. ignored) within or outside a given time space.
Specification can be any combination of zero or one of each of:
Domainlist (single):
domainlist [56]filename
URL list (single):
urllist [57]filename
Expressionlist (single):
expressionlist [58]filename
where:
filename is either a path relative to [59]dbhome or an
absolute path (i.e. /full/path) to a [60]database file.
Special destinationgroup redirect URL (single):
redirect [302:]url
Special destinationgroup redirect log (single):
log|logfile [anonymous] filename
where:
filename is either a path relative to [61]logdir or an
absolute path (i.e. /full/path) to a logfile where
redirects caused by match of this group should be logged.
If the anonymous option is specified the logged info is
somewhat anonymized to protect the individual.
Note1: Overlaps are OK, and the groups are matched in the order they
are listed in the pass declaration in for the actual clientgroup.
Note2: The logical operator between different types
(domainlist/urllist/expressionlist) is OR. The default is void. Thus
the destinationgroup is matched if one of the defined types match.
Within a destination group the test order is domainlist, urllist, and
expressionlist.
Thus an entertainment destination group declaration could look
something like:
dest not-business-related outside leisure-time {
domainlist entertainment/domains
urllist entertainment/urls
expressionlist entertainment/expressions
}
[arrow-green.gif] Rewrite rule group declarations
Rewrite rule groups, or rewrite rule sets if you prefer, are declared
by:
rew|rewrite name [within|outside time_space_name] {
substitution
substitution
...
[logging]
}
or
rew|rewrite name within|outside time_space_name {
substitution
substitution
...
[logging]
} else {
substitution
substitution
...
[logging]
}
where:
* rew and rewrite are synonymous.
* within and outside sets an optional time constraint to the
definition.
* the else part refers to the time constraint.
Time constraints on rewritegroups can be used to make these groups
functional within or outside a given time space only; Like redirect to
local copies within peek business hours.
Substitution is sed style (multiple):
s@from@to@[irR]
where:
from is a [62]regular expression that will be replaced with the
string to.
the i option makes the from part match case insensitive.
the r option makes the redirection visible to the user with a
[63]HTTP code 302 - Moved Temporarily (The default is to make
Squid silently fetch the alternate URL).
the R option makes the redirection visible to the user with a
[64]HTTP code 301 - Moved Permanently.
and logging is (single):
log|logfile [anonymous] filename
where:
filename is either a path relative to [65]logdir or an
absolute path (i.e. /full/path) to a logfile where
succeded rewrites should be logged. If the anonymous
option is specified the logged info is somewhat
anonymized to protect the individual.
Note1: Sed style substitutions uses regular expressions and thus slows
down squidGuard more than B-tree lookups.
Note2: Suport for visible redirects (i.e. 302: URL prefix) is broken
in some versions of Squid.
A rewrite rule set declaration could look something like:
rew get-local {
s@.*/cb32e46.exe$@http://ftp/pub/www/client/windows/cb32e46.exe@r
s@.*/cc32e46.exe$@http://ftp/pub/www/client/windows/cc32e46.exe@r
s@.*/cp32e46.exe$@http://ftp/pub/www/client/windows/cp32e46.exe@r
}
[arrow-green.gif] Access Control Lists
The Access Control List, ACL, combies the previous definitions into
distinct rulesets for each clientgroup:
acl {
sourcegroupname [within|outside timespacename] {
[66]pass [!]destgroupname [...]
[[67]rew|rewrite rewritegroupname [...]
[[68]redirect [301:|302:]new_url]
}
sourcegroupname within|outside timespacename {
[69]pass [!]destgroupname [...]
[[70]rew|rewrite rewritegroupname [...]
[[71]redirect [301:|302:]new_url]
} else {
[72]pass [!]destgroupname [...]
[[73]rew|rewrite rewritegroupname [...]
[[74]redirect [301:|302:]new_url]
}
...
[75]default [within|outside timespacename] {
[76]pass [!]destgroupname [...]
[[77]rew|rewrite rewritegroupname [...]
[78]redirect [301:|302:]new_url
}[ else {
[79]pass [!]destgroupname [...]
[[80]rew|rewrite rewritegroupname [...]
[81]redirect [301:|302:]new_url
]
}
Note: There may be no more than one acl block.
The default rule set:
The default section defines fallbacks for all acl rulesets. Thus if
you define a rewrite rule here it will be used in acls where there are
no rewrite rules defined. (i.e. the other acls inherits the
definitions in the default acl optionally overruled by own
definitions). The default rule set is used for all clients that match
no clientgroup and for clientgroups with no acls declared.
The pass rule:
The pass rules declares destination groups that should pass for the
actual client group. "!" is the NOT operator and indicates a
destination group that should not pass (i.e. be redirected to the
actual [82]redirect URL).
Note: Pass rules ends with an implicit "all". It is good practice to
allways en the pass rules with either "all" or "none" to make them
clear. Ie. use:
pass good none
or
pass good !bad all
Note: If there is a !group there must also be a redirect definition
for eiter that destination group, the actual acl or the default acl.
If you want some rules for unknown clients that should not apply to
the other acls you should define a last clientgroup named "unknown"
and with an IP range 0.0.0.0/0 (i.e. any), and put those rules in the
"unknown" acl.
Built in wildcard groups:
The following are built in wildcard destination groups:
in-addr
!in-addr can be used to enforce the use of domainnames
over IP addresses in the host part of URLs. in-addr is a
fast equivalent to a group with the expressionlist
"^[^:/]+://[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9
]\{1,3\}($|[:/])".
any
matches any URL and is a fast equivalent to the
expression ".*".
all
is a synonym to any. Use the one you prefer.
none
is a fast equivalent to !any and should be used to
terminate pass rules where only the listed destination
groups should pass.
The rewrite rule:
The rewrite rules declares the substitution rulsets that applies to
the actual acl.
The redirect rule:
The redirect rules declares the altenative URL to be used for blocked
destination groups (!groups) for the actual acl.
Note: Inside an acl, this is a fallback used when there is no special
redirect declared for the actual destination group, and the default
redirect is the last resort.
squidGuard can do runtime string substitutions in the redirectors.
Therefor the character "%" has special meaning in the redirector URLs:
%a is replaced with IP address of the client.
%n is replaced with the domainname of the client or "unknown" if not
available.
%i is replaced with the user ID (RFC931) or "unknown" if not
available.
%s is replaced with the matched source group (client group) or
"unknown" if no groups were matched.
%t is replaced with the matched destination group (target group) or
"unknown" if no groups were matched.
%u is replaced with the requested URL.
%p is replaced with the REQUEST_URI, i.e. the path and the optional
query string of %u, but note for convenience without the leading "/".
%% is replaced with a single "%".
Thus you can pass usefull information to a more or less intelligent
CGI page:
http://proxy/cgi/squidGuard?clientaddr=%a&clientname=%n&clientident=%i&cli
entgroup=%s&destinationgroup=%t&url=%u
For a start, there is a sample of such a script in
samples/squidGuard.cgi in the source tree.
[arrow-red.gif] The database
squidGuard uses a database that can be devided into an unlimited
number of distinct categories like "local", "customers", "vendors",
"banners", "banned" etc. Each category may consist of separate
unlimited lists of [83]domains, [84]URLs and/or [85]regular
expressions. For easy revision the lists are stored in separate plain
text files that. The lists are for efficiency stored in in-memory-only
B-trees at startup.
Note: All URLs are converted to lowercase before match search. So the
lists should not contain uppercase leters.
[arrow-green.gif] Domainlists
The domainlist file format is simply domainnames/zonenames separated
by a newline. The length of these lists have neglectable influence on
the performance.
For instance a start for a financial category:
amex.com
asx.com.au
bourse-de-paris.fr
exchange.de
londonstockex.co.uk
nasdaq.com
nyse.com
ose.no
tse.or.jp
xsse.se
Note: squidGuard will match any URL with the domainname itself an any
subdomains and hosts (i.e. amex.com, www.amex.com, whatever.amex.com
and www.what.ever.amex.com but not .*[^.]amex.com (i.e. aamex.com
etc.)).
[arrow-green.gif] URLlists
The urllist file format is simply URLs separated by newline but with
the "proto://((www|web|ftp)[0-9]*)?" and "(:port)?" parts and normally
also the ending "(/|/[^/]+\.[^/]+)$" part (i.e. ending "/" or
"/filename") choped off. (i.e. "[DEL: http://www3. :DEL]
foo.bar.com[DEL: :8080 :DEL] /what/ever[DEL: /index.html :DEL] " =>
"foo.bar.com/what/ever")
For instance a category for banned sites:
foo.com/~badguy
bar.com/whatever/suspect
Note: The removed parts above are ignored by squidGuard in URL
matching. Thus all these URLs will match the above urllist:
http://foo.com/~badguy
http://foo.com/~badguy/whatever
ftp://foo.com/~badguy/whatever
wais://foo.com/~badguy/whatever
http://www2.foo.com/~badguy/whatever
http://web56.foo.com/~badguy/whatever
but not:
http://barfoo.com/~badguy
http://bar.foo.com/~badguy
http://foo.com/~goodguy
New in 1.0.0 is the ability to do 1-1 redirects on url basis with
"key new_url". Thus as an alternative to using rewrites to redirect to
local distributions you can have a destination group with an urllist
like:
netscape.com/pub/communicator/4.51/english/windows/windows95_or_nt/complet
e_install/cc32e451.exe http://ftp.teledanmark.no/pub/www/client/windows/cc32e45
1.exe
netscape.com/pub/communicator/4.51/english/windows/windows95_or_nt/base_in
stall/cb32e451.exe http://ftp.teledanmark.no/pub/www/client/windows/cb32e451.ex
e
and an acl with pass ... !download .... This may be a faster
alternative than using lots of s@from@to@ rewrites for 1-1 mapping
since it will be faster to search the B-tree than perform a bunch of
string edits.
[arrow-green.gif] Expressionlists
The expressionlist file format is lines with regular expressions as
described in regex(5). Of most interrest is:
. Matches any single character (use "\." to match a ".").
[abc] Matches one of the characters ("[abc]" matches a single "a" or
"b" or "c").
[c-g] Matches one of the characters in the range ("[c-g]" matches a
single "c" or "d" or "e" or "f" or "g".
"[a-z0-9]" matches any single letter or digit.
"[-/.:?]" matches any single "-" or "/" or "." or ":" or "?".).
? None or one of the preceding ("words?" will match "word" and
"words".
"[abc]?" matches a single "a" or "b" or "c" or nothing (i.e. "")).
* None or more of the preceding ("words*" will match "word", "words"
and "wordsssssss". ".*" will match anything including nothing).
+ One or more of the preceding ("xxx+" will match a sequence of 3 or
more "x").
(expr1|expr2) One of the expressions, which in turn may contain a
similar construction ("(foo|bar)" will match "foo" or "bar".
"(foo|bar)? will match "foo" or "bar" or nothing (i.e. "")).
$ The end of the line ("(foo|bar)$" will match "foo" or "bar"only at
the end of a line).
\x Disable the special meaning of x where x is one of the special
regex characters ".?*+()^$[]{}\" ("\." will match a single ".", "\\"
a single "\" etc.)
Thus a start to block possible sexual material by expression match
could look like:
(^|[-\?+=/_])(bondage|boobs?|busty?|hardcore|porno?|sex|xxx+)([
-\?+=/_]|$)
Notes:
* Unless you build your expressions very very carefully there is a
high risk you will have annoyed users on your neck. Typically you
might accidentally block "Essex", "Sussex", "breastcancer",
"www.x.org" etc. in your eagerness for blocking pornographic
material. In practice you would probably replace some of the words
in the example above with some more clearly pornographic related
words that I don't find appropriate to list here.
* While the size of the domain and urllists only has marginal
influence on the performance, too many large or complex
expressions will quickly degrade the performance of squidGuard.
Though it may depend heavily on the performance of the regex
library you link with.
* There is a rich set of sample files for a group of supposedly
pornographic sites under samples/dest/adult in the source tree
that you can use as a start if porn blocking is one of your tasks.
Please note: We recommend that you review these lists before using
them. Those domains and urls have been collected automagically by
a robot. No manual evaluation of the corresponding contents has
been performed. Therefor there is a chance some nonpornographic
sites have sliped in. Please [86]report such errors but don't
blame us if your fine site is on the list. (Blame those who have
pointers to appropriate sites mixed in on their heavy porn link
pages!)
* To avoid publishing to your users a complete guide to banned
sites, you probably want to have some or all of these files
protected by for instance:
chmod 640 /wherever/filter/db/dest/adult/*
chown cache_effective_user /wherever/filter/db/dest/adult
/*
chgrp cache_effective_group /wherever/filter/db/dest/adul
t/*
where cache_effective_user and cache_effective_group are the
values for the corresponding tags as defined in squid.conf.
[arrow-green.gif] Prebuilt databases
Creating a prebuilt database
To convert a domainlist or urllist from plain text file to a prebuilt
database use:
squidGuard -C listfile
and send Squid a HUP signal to respawn squidGuard. Note: listfile is
the absolute plain text filename or relative to dbhome.
Updating a prebuilt database
To add and remove entries from a prebuilt database in runtime put the
changes in a diff file (file.diff for file.db) with the following
simple format:
+new
-old
...
Then use:
squidGuard -u
and remove the diff files. The changes should take effect immediately.
[arrow-red.gif] Tuning hints
For optimal performance try:
* limiting both the number of regular expressions and their
complexity. Use domainlists and/or urllists where possible.
* limiting the number of rewrite rules. Use redirectors where
possible.
* limiting the number of useless url list entries. Move the
domainnames to the domainlist and remove redundant urllist entries
where aplicable.
* using ip addressranges rather than long lists of single ip
addresses. If possible try grouping different usergroups into
different ranges or subnets (virtual or physical).
[arrow-red.gif] Working configuration examples
[arrow-green.gif] [87]Example 0 - The absolutely minimal do nothing config:
The absolutely minimal config file is an emty but existing file (i.e.
squidGuard -c /dev/null) which is equivalent to:
acl {
default {
pass all
}
}
[arrow-green.gif] [88]Example 1 - The recommended minimal do nothing config:
We do recommend, for clarity, to say explicitly what squidGuard is
expected to do (makes things less magic for a new operator):
logdir /usr/local/squidGuard/log
acl {
default {
pass all
}
}
[arrow-green.gif] [89]Example 2 - Limiting the access to one destination
group only:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
dest local {
domainlist local/domains
}
acl {
default {
pass local none
redirect http://localhost/cgi/blocked?clientaddr=%a&clientname=%n&
clientuser=%i&clientgroup=%s&url=%u
}
}
This implies there must be a domain list file
"[90]/usr/local/squidGuard/db/local/domains" that may simply look
like:
teledanmark.no
[arrow-green.gif] [91]Example 3 - Blocking the access for unknown or
unprivileged clients:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
src privileged {
ip 10.0.0.1 10.0.0.73 10.0.0.233 # ONE OF single clients
ip 10.0.0.10-10.0.0.20 # OR WITHIN range 10.0.0.10 - 1
0.0.0.20
ip 10.0.1.32/27 # OR WITHIN range 10.0.1.32 - 1
0.0.1.63
ip 10.0.2.0/255.255.255.0 # OR WITHIN range 10.0.2.0 - 1
0.0.2.255
# AND
domain foo.bar # MATCH foo.bar. OR *.foo.bar.
}
acl {
privileged {
pass all
}
default {
pass none
redirect http://info.foo.bar/cgi/blocked?clientaddr=%a&clientname=
%n&clientuser=%i&clientgroup=%s&url=%u
}
}
Using client domainname match implies reverse lookup is enabled
(log_fqdn on) in squid.conf.
teledanmark.no
[arrow-green.gif] [92]Example 4 - Blocking inappropriate sites:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
default {
pass !porn all
redirect http://localhost/cgi/blocked?clientaddr=%a&clientname=%n&
clientuser=%i&clientgroup=%s&url=%u
}
}
This implies there must be a domain list file
"[93]/usr/local/squidGuard/db/porn/domains" and a domain list file
"[94]/usr/local/squidGuard/db/porn/urls". The [95]domain list file may
have a zillion lines like:
porn.com
sex.com
The "[96]url list file may have an other zillion lines like:
foo.com/~porn
bar.com/img/sex
[arrow-green.gif] [97]Example 5 - Blocking inappropriate sites for some users
and blocking unknown clients:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
src grownups {
ip 10.0.0.0/24 # range 10.0.0.0 - 10.0.0.255
# AND
user foo bar # ident foo or bar
}
src kids {
ip 10.0.0.0/22 # range 10.0.0.0 - 10.0.3.255
}
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
grownups {
pass all
}
kids {
pass !porn all
}
default {
pass none
redirect http://info.foo.bar/cgi/blocked?clientaddr=%a&clientname=
%n&clientuser=%i&clientgroup=%s&targetgroup=%t&url=%u
}
}
Using userident match implies RFC931/ident lookup is enabled in
squid.conf, optionally only for the actual client groups, and that foo
and bar's workstations must [98]support RFC931.
[arrow-green.gif] [99]Example 6 - Blocking inappropriate sites partially with
regex:
+ ensuring local and good sites are passed even if they would match a
blocking regex:
+ limiting the usage of IP-address URLs:
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
dest local {
domainlist local/domains
}
dest good {
domainlist local/domains
}
dest porn {
domainlist porn/domains
urllist porn/urls
expressionlist porn/expressions
}
acl {
default {
pass local good !in-addr !porn all
redirect http://localhost/cgi/blocked?clientaddr=%a&clientname=%n&
clientuser=%i&clientgroup=%s&url=%u
}
}
[arrow-green.gif] [100]Example 7 - Blocking inappropriate sites within
business hours only:
Lets extend [101]example 5 with:
* a time constraint on censorship
* logging redirections of inappropriate sites anonymized
* redirecting inappropriate sites specially.
* and still protecting the kids 24h.
logdir /usr/local/squidGuard/log
dbhome /usr/local/squidGuard/db
time leisure-time {
weekly * 00:00-08:00 17:00-24:00 # night and evening
weekly fridays 16:00-17:00 # weekend
weekly saturdays sundays # weekend
date *.01.01 # New Year's Day
date *.05.01 # Labour Day
date *.05.17 # National Day
date *.12.24 12:00-24:00 # Christmas Eve
date *.12.25 # Christmas Day
date *.12.26 # Boxing Day
date 1999.03.31 12:00.24:00 # Ash Wednesday
date 1999.04.01-1999.04.05 # Easter
date 1999.05.13 1999.05.24 # Ascension Day and Whitsun
date 2000.04.19 12:00.24:00 # Ash Wednesday y2000
date 2000.04.20-2000.04.24 # Easter y2000
date 2000.06.01 2000.06.12 # Ascension Day and Whitsun y20
00
}
src grownups {
ip 10.0.0.0/24 # range 10.0.0.0 - 10.0.0.255
# AND
user foo bar # ident foo or bar
}
src kids {
ip 10.0.0.0/22 # range 10.0.0.0 - 10.0.3.255
}
dest porn {
domainlist porn/domains # file listing domains (clear
text)
urllist porn/urls # file listing URLs (clear
text)
expressionlist porn/expressions # file with expressions (clear
text regex)
redirect 302:http://info.foo.bar/images/blocked.gif
# redirect matches to this URL
log anonymous porn.log # log redirects anonymized to l
ogdir/porn.log
}
acl {
grownups within leisure-time {
pass all # don't censor peoples leisure-
time
} else {
pass !in-addr !porn all # restrict access during busine
ss hours
}
kids {
pass !porn all # protect the kids 24h anyway
}
default {
pass none # reject unknown clients
redirect http://info.foo.bar/cgi/blocked?clientaddr=%a&clientname=
%n&clientuser=%i&clientgroup=%s&targetgroup=%t&url=%u
}
}
____________________________
[102][gnu-logo.gif] [103][perl-logo.gif]
[104][squid-logo.gif] [105][squidGuard.gif]
[106][home_header.gif]
References
1. http://ftp.teledanmark.no/pub/www/proxy/squidGuard/
2. http://www.squidguard.org/
3. http://www.squid-cache.org/
4. http://www.squidguard.org/authors/
5. http://www.squidguard.org/authors/
6. http://www.squidguard.org/copyright/
7. http://www.teledanmark.no/
8. http://www.squidguard.org/config/#Configuration_file
9. http://www.squidguard.org/config/#General
10. http://www.squidguard.org/config/#Structure
11. http://www.squidguard.org/config/#Reserved
12. http://www.squidguard.org/config/#Lables
13. http://www.squidguard.org/config/#Breaking
14. http://www.squidguard.org/config/#Directories
15. http://www.squidguard.org/config/#Timespace
16. http://www.squidguard.org/config/#Sourcegroups
17. http://www.squidguard.org/config/#Destinationgroups
18. http://www.squidguard.org/config/#Rewritegroups
19. http://www.squidguard.org/config/#Acls
20. http://www.squidguard.org/config/#Lists
21. http://www.squidguard.org/config/#Domainlists
22. http://www.squidguard.org/config/#URLlists
23. http://www.squidguard.org/config/#Expressionlists
24. http://www.squidguard.org/config/#Tuning
25. http://www.squidguard.org/config/#Examples
26. http://www.squidguard.org/config/#Minimal
27. http://www.squidguard.org/config/#example01
28. http://www.squidguard.org/config/#example02
29. http://www.squidguard.org/config/#example03
30. http://www.squidguard.org/config/#example04
31. http://www.squidguard.org/config/#example05
32. http://www.squidguard.org/config/#example06
33. http://www.squidguard.org/config/#example07
34. http://www.squidguard.org/install/#Defaultconfigfile
35. http://www.squidguard.org/install/#Configfile
36. http://www.squidguard.org/config/#example01
37. http://www.squidguard.org/config/#Directories
38. http://www.squidguard.org/config/#Timespace
39. http://www.squidguard.org/config/#Sourcegroups
40. http://www.squidguard.org/config/#Destinationgroups
41. http://www.squidguard.org/config/#Rewritegroups
42. http://www.squidguard.org/config/#Acls
43. http://www.squidguard.org/config/#Reserved
44. http://www.squidguard.org/config/#Logdir
45. http://www.squidguard.org/install/#Logdir
46. http://www.squidguard.org/install/#DBhome
47. http://www.squidguard.org/config/#Weekly
48. http://www.squidguard.org/config/#IPlists
49. http://www.squidguard.org/config/#DBhome
50. http://www.squidguard.org/config/#IPlists
51. http://www.squidguard.org/config/#Userlists
52. http://www.squidguard.org/config/#DBhome
53. http://www.squidguard.org/config/#Userlists
54. http://www.squidguard.org/config/#User
55. http://www.squidguard.org/config/#Logdir
56. http://www.squidguard.org/config/#Domainlists
57. http://www.squidguard.org/config/#URLlists
58. http://www.squidguard.org/config/#Expressionlists
59. http://www.squidguard.org/config/#DBhome
60. http://www.squidguard.org/config/#Lists
61. http://www.squidguard.org/config/#Logdir
62. http://www.squidguard.org/config/#Regular expressions
63. http://ftp.teledanmark.no/pub/networking/rfc/rfc1945.txt
64. http://ftp.teledanmark.no/pub/networking/rfc/rfc1945.txt
65. http://www.squidguard.org/config/#Logdir
66. http://www.squidguard.org/config/#Pass
67. http://www.squidguard.org/config/#Rewrite
68. http://www.squidguard.org/config/#Redirect
69. http://www.squidguard.org/config/#Pass
70. http://www.squidguard.org/config/#Rewrite
71. http://www.squidguard.org/config/#Redirect
72. http://www.squidguard.org/config/#Pass
73. http://www.squidguard.org/config/#Rewrite
74. http://www.squidguard.org/config/#Redirect
75. http://www.squidguard.org/config/#Default
76. http://www.squidguard.org/config/#Pass
77. http://www.squidguard.org/config/#Rewrite
78. http://www.squidguard.org/config/#Redirect
79. http://www.squidguard.org/config/#Pass
80. http://www.squidguard.org/config/#Rewrite
81. http://www.squidguard.org/config/#Redirect
82. http://www.squidguard.org/config/#Redirect
83. http://www.squidguard.org/config/#Domainlists
84. http://www.squidguard.org/config/#URLlists
85. http://www.squidguard.org/config/#Expressionlists
86. mailto:squidguard@squidguard.org?subject=squidGuard%20blacklist%20error
87. http://www.squidguard.org/config/examples/00.conf
88. http://www.squidguard.org/config/examples/01.conf
89. http://www.squidguard.org/config/examples/02.conf
90. http://www.squidguard.org/config/examples/02.domains
91. http://www.squidguard.org/config/examples/03.conf
92. http://www.squidguard.org/config/examples/04.conf
93. http://www.squidguard.org/config/examples/04.domains
94. http://www.squidguard.org/config/examples/04.urls
95. http://www.squidguard.org/config/examples/04.domains
96. http://www.squidguard.org/config/examples/04.urls
97. http://www.squidguard.org/config/examples/05.conf
98. http://www.squidguard.org/links/#Identd
99. http://www.squidguard.org/config/examples/06.conf
100. http://www.squidguard.org/config/examples/07.conf
101. http://www.squidguard.org/config/#example05
102. http://www.gnu.org/
103. http://www.perl.com/
104. http://www.squid-cache.org/
105. http://www.squidguard.org/
106. http://www.sleepycat.com/
squidGuard-1.5/doc/configure.html 0000640 0001750 0001750 00000016454 10717346075 016075 0 ustar adjoo adjoo
Another squidguard website
Another squidguard website
Basic Configuration of SquidGuard
Once SquidGuard is successfully installed, you want to configure the software
according to your needs. A sample configuration has been installed in the
default directory /usr/local/squidGuard (or whatever directory you
pointed you intallation to).
Below you find three examples for the basic configuration of SquidGuard.
- Most simple configuration
Most simple configuration: one category, one rule for all
|
#
# CONFIG FILE FOR SQUIDGUARD
#
dbhome /usr/local/squidGuard/db
logdir /usr/local/squidGuard/logs
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
default {
pass !porn all
redirect http://localhost/block.html
}
}
|
Make always sure that the very first line of your squidGuard.conf is
not empty!
The entries have the following meaning:
dbhome | Location of the blacklists |
logdir | Location of the logfiles |
dest | Definition of a category to block. You can
enter the domain and url file along with a regular expression list
(talk about regular expressions later on). |
acl | The actual blocking defintion. In our example only
the default is displayed. You can have more than one acl
in place. The category porn you defined in dest is
blocked by the expression !porn. You have to add the
identifier all after the blocklist or your users will
not be able to surf anyway.
The redirect directive is madatory! You must tell
SquidGuard which page to display instead of the blocked one. |
- Choosing more than one category to block
First you define your categories. Just like you did above for porn.
For example:
Defining three categories for blocking
|
dest adv {
domainlist adv/domains
urllist adv/urls
}
dest porn {
domainlist porn/domains
urllist porn/urls
}
dest warez {
domainlist warez/domains
urllist warez/urls
}
|
Now your acl looks like that:
acl {
default {
pass !adv !porn !warez all
redirect http://localhost/block.html
}
}
|
- Whitelisting
Sometimes there is a demand to allow specific URLs and domains although
they are part of the blocklists for a good reason. In this case you
want to whitelist these domains and URLs.
Defining a whitelist
|
dest white {
domainlist white/domains
urllist white/urls
}
acl {
default {
pass white !adv !porn !warez all
redirect http://localhost/block.html
}
}
|
In this example we assumed that your whitelists are located in a
directory called white whithin the blacklist directory
you specified with dbhome.
Make sure that your white identifier is the first in the
row of the pass directive. It must not have an exclamation
mark in front (otherwise all entries belonging to white
will be blocked, too).
- Initializing the blacklists
Before you start up your squidGuard you should initialize the blacklists
i.e. convert them from the textfiles to db files. Using the db format
will speed up the checking and blocking.
The initialization is performed by the following command:
Initializing the blacklists
|
squidGuard -C all
|
Depending on the size of your blacklists and the power of your computer
this may take a while. If anything is running fine you should see something
like the following output:
2006-01-29 12:16:14 [31977] squidGuard 1.2.0p2 started (1138533256.959)
2006-01-29 12:16:14 [31977] db update done
2006-01-29 12:16:14 [31977] squidGuard stopped (1138533374.571)
|
If you look into the directories holding the files domains and
urls you see that additional files have been created: domains.db
and urls.db. These new files must not be empty!
Only those files are converted you specified to block or whitelist
in your squidGuard.conf file.
Proceed with: Extended Configuration of SquidGuard
Mirko Lorenz - mirko at shalla.de
29.01.2006
squidGuard-1.5/doc/configure.txt 0000640 0001750 0001750 00000011703 10717346070 015733 0 ustar adjoo adjoo Another squidguard website
[1]Home [2]Documentation [3]Download [4]Blacklists [5]Useful stuff
[6]Installation [7]Basic Configuration [8]Extended Configuration
[9]Known Issues
Basic Configuration of SquidGuard
Once SquidGuard is successfully installed, you want to configure the
software according to your needs. A sample configuration has been
installed in the default directory /usr/local/squidGuard (or whatever
directory you pointed you intallation to).
Below you find three examples for the basic configuration of
SquidGuard.
1. Most simple configuration
Most simple configuration: one category, one rule for all
#
# CONFIG FILE FOR SQUIDGUARD
#
dbhome /usr/local/squidGuard/db
logdir /usr/local/squidGuard/logs
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
default {
pass !porn all
redirect http://localhost/block.html
}
}
Make always sure that the very first line of your squidGuard.conf
is not empty!
The entries have the following meaning:
dbhome Location of the blacklists
logdir Location of the logfiles
dest Definition of a category to block. You can enter the domain and
url file along with a regular expression list (talk about regular
expressions later on).
acl The actual blocking defintion. In our example only the default is
displayed. You can have more than one acl in place. The category porn
you defined in dest is blocked by the expression !porn. You have to add
the identifier all after the blocklist or your users will not be able
to surf anyway.
The redirect directive is madatory! You must tell SquidGuard which page
to display instead of the blocked one.
2. Choosing more than one category to block
First you define your categories. Just like you did above for porn.
For example:
Defining three categories for blocking
dest adv {
domainlist adv/domains
urllist adv/urls
}
dest porn {
domainlist porn/domains
urllist porn/urls
}
dest warez {
domainlist warez/domains
urllist warez/urls
}
Now your acl looks like that:
acl {
default {
pass !adv !porn !warez all
redirect http://localhost/block.html
}
}
3. Whitelisting
Sometimes there is a demand to allow specific URLs and domains
although they are part of the blocklists for a good reason. In this
case you want to whitelist these domains and URLs.
Defining a whitelist
dest white {
domainlist white/domains
urllist white/urls
}
acl {
default {
pass white !adv !porn !warez all
redirect http://localhost/block.html
}
}
In this example we assumed that your whitelists are located in a
directory called white whithin the blacklist directory you
specified with dbhome.
Make sure that your white identifier is the first in the row of the
pass directive. It must not have an exclamation mark in front
(otherwise all entries belonging to white will be blocked, too).
4. Initializing the blacklists
Before you start up your squidGuard you should initialize the
blacklists i.e. convert them from the textfiles to db files. Using
the db format will speed up the checking and blocking.
The initialization is performed by the following command:
Initializing the blacklists
squidGuard -C all
Depending on the size of your blacklists and the power of your
computer this may take a while. If anything is running fine you
should see something like the following output:
2006-01-29 12:16:14 [31977] squidGuard 1.2.0p2 started (1138533256.959)
2006-01-29 12:16:14 [31977] db update done
2006-01-29 12:16:14 [31977] squidGuard stopped (1138533374.571)
If you look into the directories holding the files domains and urls
you see that additional files have been created: domains.db and
urls.db. These new files must not be empty!
Only those files are converted you specified to block or whitelist
in your squidGuard.conf file.
Proceed with: [10]Extended Configuration of SquidGuard
______________________________________________________________
Mirko Lorenz - mirko at shalla.de
29.01.2006
References
1. http://squidguard.shalla.de/index.html
2. http://squidguard.shalla.de/Doc/index.html
3. http://squidguard.shalla.de/download.html
4. http://squidguard.shalla.de/blacklists.html
5. http://squidguard.shalla.de/addsoft.html
6. http://squidguard.shalla.de/Doc/install.html
7. http://squidguard.shalla.de/Doc/configure.html
8. http://squidguard.shalla.de/Doc/extended.html
9. http://squidguard.shalla.de/Doc/known_issues.html
10. http://squidguard.shalla.de/Doc/extended.html
squidGuard-1.5/doc/extended.html 0000640 0001750 0001750 00000016601 10717346075 015706 0 ustar adjoo adjoo
Another squidguard website
Another squidguard website
Extended Configuration of SquidGuard
There are several more options to configure SquidGuard according to your
needs.
- Not allowing IP adresses
To make sure that people don't bypass the URL filter by simply using the
IP addresses instead of the fully qualified domain names, you can add the
!in-addr following to your acl:
Disallowing access to IP addresses
|
acl {
default {
pass !in-addr all
redirect http://localhost/block.html
}
}
|
- Blocking based on times
There are two ways to define times and dates where access to websites are allowed or disallowed.
The weekly directive is used for reoccuring access time, f.e. allowing
web access to blocked sites after work.
Using the date directive you can additionally define special days where access may
be granted. Wildcards can be used.
Defining access times
|
time afterwork {
weekly * 17:00-24:00 # After work
weekly fridays 16:00-17:00 # On friday we close earlier
weekly saturdays sundays # Weekend
date *.01.01 # New Year's Day
date *.12.24 12:00-24:00 # Christmas Eve
date 2006.04.14-2006.04.17 # Easter 2006
date 2006.05.01 # Maifeiertag
}
|
To apply the defined times you can use the qualifiers within
and outside , respectively.
Now your acl looks like that:
acl {
all within afterwork {
pass all
}
else {
pass !adv !porn !warez all
}
default {
pass none
redirect http://localhost/block.html
}
}
|
This means that for everyone free access to web sites is possible
during the times defines in afterwork. Outsite these times
people cannot access whatever is defined in adv, porn and warez.
- Rules based on source IP adresses
If you have policies in place granting some people access to more sites
than others you have different options how to implement this policy.
One way is to define source IP acls. This can only work if your user groups
are well separated within your network.
Assuming that this is the case you can now define the source IP ranges
in your squidGuard.conf the following way:
Defining source IP addresses
|
src admins {
ip 192.168.2.0-192.168.2.255
ip 172.16.12.0/255.255.255.0
ip 10.5.3.1/28
}
|
You can secify IP addresses directly as well as defining IP ranges using
a from-to notation, defining the netmask or use the netmask prefix abbreviation.
Annotation: If you have many network definitions for a
user group you can put that info into a separate file and just tell your
squidGuard.conf about the location of the file. In this case you
write in your squidGuard.conf:
src admins {
iplist adminlist
}
|
SquidGuard will look for a file called adminlist located wherever
you pointed your dbhome directive to. Alternatively you can specify
an absolute path with your filename. The file itself holds the information
in the following style:
192.168.2.0-192.168.2.255
172.16.12.0/255.255.255.0
10.5.3.1/28
|
- Logging blocked access tries
It may be of interest who is accessing blocked sites. To track that
down you can add a log directive to your src or
dest definitions in your squidGuard.conf. If only
a file name is given, the file is search in the directory specified
in the logdir directive. Alternatively you can specify
an absolute path with your logfilename.
Logging blocked access tries
|
dest porn {
domainlist porn/domains
urllist porn/urls
log pornaccesses
}
|
Mirko Lorenz - mirko at shalla.de
26.03.2006
squidGuard-1.5/doc/extended.txt 0000640 0001750 0001750 00000011457 10717346070 015560 0 ustar adjoo adjoo Another squidguard website
[1]Home [2]Documentation [3]Download [4]Blacklists [5]Useful stuff
[6]Installation [7]Basic Configuration [8]Extended Configuration
[9]Known Issues
Extended Configuration of SquidGuard
There are several more options to configure SquidGuard according to
your needs.
[10]Not allowing IP adresses [11]Times
[12]Rules based on source IP adresses [13]Logging blocked access tries
Not allowing IP adresses
To make sure that people don't bypass the URL filter by simply using
the IP addresses instead of the fully qualified domain names, you can
add the !in-addr following to your acl:
Disallowing access to IP addresses
acl {
default {
pass !in-addr all
redirect http://localhost/block.html
}
}
Blocking based on times
There are two ways to define times and dates where access to websites
are allowed or disallowed. The weekly directive is used for
reoccuring access time, f.e. allowing web access to blocked sites after
work.
Using the date directive you can additionally define special days
where access may be granted. Wildcards can be used.
Defining access times
time afterwork {
weekly * 17:00-24:00 # After work
weekly fridays 16:00-17:00 # On friday we close earlier
weekly saturdays sundays # Weekend
date *.01.01 # New Year's Day
date *.12.24 12:00-24:00 # Christmas Eve
date 2006.04.14-2006.04.17 # Easter 2006
date 2006.05.01 # Maifeiertag
}
To apply the defined times you can use the qualifiers within and
outside , respectively. Now your acl looks like that:
acl {
all within afterwork {
pass all
}
else {
pass !adv !porn !warez all
}
default {
pass none
redirect http://localhost/block.html
}
}
This means that for everyone free access to web sites is possible
during the times defines in afterwork. Outsite these times people
cannot access whatever is defined in adv, porn and warez.
Rules based on source IP adresses
If you have policies in place granting some people access to more sites
than others you have different options how to implement this policy.
One way is to define source IP acls. This can only work if your user
groups are well separated within your network.
Assuming that this is the case you can now define the source IP ranges
in your squidGuard.conf the following way:
Defining source IP addresses
src admins {
ip 192.168.2.0-192.168.2.255
ip 172.16.12.0/255.255.255.0
ip 10.5.3.1/28
}
You can secify IP addresses directly as well as defining IP ranges
using a from-to notation, defining the netmask or use the netmask
prefix abbreviation.
Annotation: If you have many network definitions for a user group you
can put that info into a separate file and just tell your
squidGuard.conf about the location of the file. In this case you write
in your squidGuard.conf:
src admins {
iplist adminlist
}
SquidGuard will look for a file called adminlist located wherever you
pointed your dbhome directive to. Alternatively you can specify an
absolute path with your filename. The file itself holds the information
in the following style:
192.168.2.0-192.168.2.255
172.16.12.0/255.255.255.0
10.5.3.1/28
Logging blocked access tries
It may be of interest who is accessing blocked sites. To track that
down you can add a log directive to your src or dest definitions in
your squidGuard.conf. If only a file name is given, the file is search
in the directory specified in the logdir directive. Alternatively you
can specify an absolute path with your logfilename.
Logging blocked access tries
dest porn {
domainlist porn/domains
urllist porn/urls
log pornaccesses
}
__________________________________________________________________
Mirko Lorenz - mirko at shalla.de
26.03.2006
References
1. http://squidguard.shalla.de/index.html
2. http://squidguard.shalla.de/Doc/index.html
3. http://squidguard.shalla.de/download.html
4. http://squidguard.shalla.de/blacklists.html
5. http://squidguard.shalla.de/addsoft.html
6. http://squidguard.shalla.de/Doc/install.html
7. http://squidguard.shalla.de/Doc/configure.html
8. http://squidguard.shalla.de/Doc/extended.html
9. http://squidguard.shalla.de/Doc/known_issues.html
10. http://squidguard.shalla.de/Doc/extended.html#notIP
11. http://squidguard.shalla.de/Doc/extended.html#times
12. http://squidguard.shalla.de/Doc/extended.html#sourceIP
13. http://squidguard.shalla.de/Doc/extended.html#blocklog
squidGuard-1.5/doc/faq.html 0000640 0001750 0001750 00000040535 10717346070 014653 0 ustar adjoo adjoo
squidGuard - FAQ
|
This page was last modified 2002-01-08
|
|
FAQ - Frequently Asked/Answered Questions
This is out of date. Have a look at faq-plus
Currently in semirandom order:
-
-
Is there a mailing list for squidGuard?
-
Yes!.
-
-
I have db3.x.x installed and squidGuard won't compile?
-
Only db2.x.x
versions are supported. We are working on db3.x.x support, but
the API has changed so it may take a while to fix.
-
-
squidGuard does not block?
-
There my be at least 2 reasons for this:
-
You didn't end your pass rules with "none". Pass rules ends
with an implicit "all". It is good practice to allways en the
pass rules with either "all" or "none" to make them clear. Ie.
use:
pass good none
or
pass good !bad all
-
squidGuard goes into emergency mode. Reasons may be syntax
errors in the config file, reference to non existing database
files, filprotection problems or missing directories. Check
the squidGuard log.
Note:When run under Squid, squidGuard is run with the
same user and group ID as Squid
(cache_effective_user and
cache_effective_group in squid.conf). The
squidGuard configuration and database files must be readable
for this user and/or group and the squidGuard log directory
must be writable for this user and/or group. If not squidGuard
will go into the "pass all for all" emergency mode.
-
-
How do I debug squidGuard?
-
Do something like this:
echo "http://foo/bar 10.0.0.1/- - GET" | /usr/local/bin/squidGuard -c /tmp/test.cfg -d
This redirects the log to stderr. The response is either a blank
line (pass on) or a the input with the URL part rewritten
(redirect).
-
-
How can I block audio and video?
-
Use an expressionlist
with something like this:
\.(ra?m|mpe?g?|mov|movie|qt|avi|dif|dvd?|mpv2|mp3)($|\?)
-
-
Are there any blacklist exchange plans?
-
Yes, we plan to add an interface to the new web site to allow
proxyadministrators to indirectly add and remove URLs from the
robot config. Though there are still some practical issues to
solve.
-
-
How can I contribute to the blacklists?
-
If you have lists of links that would map to missing blacklist
entries, or lists of exceptions/errors, please send them to blacklist@squidguard.org
Note: The link list must consist of fully qualified URLs,
ie. http://... (not the blacklist format).
The exception/error lists must consist of domains and urls as
(potentially) found in the blacklists.
Direct additions to the domain and url lists are not very
usefull, since they are the output from the robot; not the
input. Though if you have a long list that would have been
usefull you may of course reverse engineer it through:
sed 's@^@http://www.@;s@$@/@' domains urls >links
-
-
How can I test timeconstraints
-
You can set a simulated start time with the
-t yyyy-mm-ddTHH:MM:SS
option:
-
squidGuard -c test.conf -t 1999-12-31T23:59:30 -d <test.in>test.out 2>test.log
With the -t option squidGuard parses the given date&time and
calculates an offset from the current time at startup and then
adds this offset to all timevalues during runtime.
-
-
squidGuard compiles fine and the tests succeed, but it seems to
pass all when run under Squid
-
There may be at leaste two reasons for this:
-
Some versions of Squid (supposedly 2.2.*) silently ignores
argumets to the right of
redirect_program prefix/bin/squidGuard. Solutions
are one of:
-
Set the actual config file location at compiletime
with --with-sg-config
-
Use a shell wraper with
redirect_program prefix/bin/squidGuard.sh
and make prefix/bin/squidGuard.sh an
executable shell like:
-
#! /bin/sh -
exec prefix/bin/squidGuard -c whatever/squidGuard.conf
-
When run under Squid, squidGuard is run with the same user and
group ID as Squid (cache_effective_user and
cache_effective_group in squid.conf). The
squidGuard configuration and database files must be readable
for this user and/or group and the squidGuard log directory
must be writable for this user and/or group. If not squidGuard
will go into the "pass all for all" emergency mode.
-
-
compilation of sg.l on fails with "sg.l:line ...: Error:
Too many positions" with native lex
-
Some native versions of lex have problems with sg.l. The
solution is to use GNU flex wich is better
anyway. Do "setenv LEX flex" if configure selects the
native lex before flex. Flex should compile right out of the
box similar to other GNU programs. (Thanks to
laurent.foulonneau@mail.loyalty.nc).
-
-
Can I use proxy authenticated user the same way as RFC931/Ident
user?
-
Yes, if you patch Squid < 2.3 with this simple diff,
kindly contributed by Antony T Curtis, the
authenticated user will be passed from Squid to squidGuard. This
patch has apparently already been incorporated in squid-2.3:
A useful patch to Squid 2.2STABLE which fixes per-user redirection where
the user is authenticated using proxy-auth...
*** src/redirect.c.orig Tue Jun 22 14:04:43 1999
--- src/redirect.c Tue Jun 22 15:46:41 1999
***************
*** 103,108 ****
--- 103,110 ----
cbdataAdd(r, cbdataXfree, 0);
r->orig_url = xstrdup(http->uri);
r->client_addr = conn->log_addr;
+ if (http->request->user_ident[0])
+ r->client_ident = http->request->user_ident; else
if (conn->ident == NULL || *conn->ident == '\0') {
r->client_ident = dash_str;
} else {
-
-
Can I manipulate domains.db and urls.db from Perl?
-
Yes, but you must bind custom comparefunctions. Also note the
domains are stored with a leading ".":
use DB_File;
sub mirror($) {
scalar(reverse(shift));
}
sub domainmatch($$) {
my $search = mirror(lc(shift));
my $found = mirror(lc(shift));
if ("$search." eq $found) {
return(0);
} else {
return(substr($search,0,length($found)) cmp $found);
}
}
sub urlmatch($$) {
my $search = lc(shift) . "/";
my $found = lc(shift) . "/";
if ($search eq $found) {
return(0);
} else {
return(substr($search,0,length($found)) cmp $found);
}
}
my (%url,%domain);
$DB_BTREE->{compare} = \&urlmatch;
my $url_db = tie(%url, "DB_File", "urls.db", O_CREAT|O_RDWR, 0664, $DB_BTREE)
|| die("urls.db: $!\n");
$DB_BTREE->{compare} = \&domainmatch;
my $domain_db = tie(%domain, "DB_File", "domains.db", O_CREAT|O_RDWR, 0664, $DB_BTREE)
|| die("domains.db: $!\n");
# Now you can operate on %url and %domain just as normal perl hashes:)
# Add "playboy.com" to the domainlist unless it's already there:
$domain{".playboy.com"} = "" unless(exists($domain{"playboy.com"}));
# or use the DB_File functions put, get, del and seq:
# Add "sex.com" and "dir.yahoo.com/business_and_economy/companies/sex"
# and delete "cnn.com":
$domain_db->put(".sex.com","") unless(exists($domain{"sex.com"}));
$domain_db->sync; # Seems to only sync the last change.
$domain_db->del("cnn.com") if(exists($domain{"cnn.com"}));
$domain_db->sync; # Seems to only sync the last change.
$url_db->put("xyz.com/~sex","") unless(exists($url{"xyz.com/~sex"}));
$url_db->sync; # Seems to only sync the last change.
$url_db->sync; # Seems to only sync the last change.
$domain_db->sync; # Seems to only sync the last change.
undef($url_db); # Destroy the object
undef($domain_db); # Destroy the object
untie(%url); # Sync and close the file and undef the hash
untie(%domain); # Sync and close the file and undef the hash
See the perltie(1) and DB_File(3) man pages that comes with Perl for more info.
-
-
How can I list domains.db or urls.db from Perl?
-
Use a script like this:
#!/local/bin/perl -w
use strict;
use DB_File;
foreach (@ARGV) {
my (%db, $key, $val);
die("$_: $!\n") unless(-f);
tie(%db, "DB_File", $_, O_RDONLY, 0664, $DB_BTREE) || die("$_: $!\n");
foreach $key (keys(%db)) {
if($val = $db{$key}) {
$val = "\"$val\"";
} else {
$val = "undef";
}
print "$key -> $val\n";
}
untie(%db);
}
See the perltie(1) and DB_File(3) man pages that comes with Perl for more info.
-
-
How can I get around "make: don't know how to make /bin/false. Stop"?
-
Your system does not have lynx and not /bin/false either: If it
has /usr/bin/false do:
# ln -s ../usr/bin/false /bin/.
Alternatively:
# echo exit 255 >/bin/false
# chmod a+rx /bin/false
If you have questions and/or answers that should be on the FAQ list
please send them to squidguard@squidguard.org
squidGuard-1.5/doc/faq.txt 0000640 0001750 0001750 00000030047 10717346070 014523 0 ustar adjoo adjoo [1][squidGuard.gif] The squidGuard FAQ [faq.gif]
[2]squidGuard is an ultrafast and free filter, redirector and access
controller for [3]Squid
By [4]Pål Baltzersen and [5]Lars Erik Håland
[6]Copyright © 1999-2000, [7]Tele Danmark InterNordia
Visitors: [counter] (Since 2002-01-08 19:54:05)
This page was last modified 2002-01-08
[arrow-red.gif] FAQ - Frequently Asked/Answered Questions
This is out of date. Have a look at [8]faq-plus
Currently in semirandom order:
1.
Is there a mailing list for squidGuard?
[9]Yes!.
2.
I have db3.x.x installed and squidGuard won't compile?
Only [10]db2.x.x versions are supported. We are working
on db3.x.x support, but the API has changed so it may
take a while to fix.
3.
squidGuard does not block?
There my be at least 2 reasons for this:
1. You didn't end your pass rules with "none". Pass rules
ends with an implicit "all". It is good practice to
allways en the pass rules with either "all" or "none" to
make them clear. Ie. use:
pass good none
or
pass good !bad all
2. squidGuard goes into emergency mode. Reasons may be
syntax errors in the config file, reference to non
existing database files, filprotection problems or
missing directories. Check the squidGuard log.
Note:When run under Squid, squidGuard is run with the
same user and group ID as Squid (cache_effective_user
and cache_effective_group in squid.conf). The squidGuard
configuration and database files must be readable for
this user and/or group and the squidGuard log directory
must be writable for this user and/or group. If not
squidGuard will go into the "pass all for all" emergency
mode.
4.
How do I debug squidGuard?
Do something like this:
echo "http://foo/bar 10.0.0.1/- - GET" | /usr/local/bin/s
quidGuard -c /tmp/test.cfg -d
This redirects the log to stderr. The response is either
a blank line (pass on) or a the input with the URL part
rewritten (redirect).
5.
How can I block audio and video?
Use an [11]expressionlist with something like this:
\.(ra?m|mpe?g?|mov|movie|qt|avi|dif|dvd?|mpv2|mp3)($|\?)
6.
Are there any blacklist exchange plans?
Yes, we plan to add an interface to the new web site to
allow proxyadministrators to indirectly add and remove
URLs from the robot config. Though there are still some
practical issues to solve.
7.
How can I contribute to the blacklists?
If you have lists of links that would map to missing
blacklist entries, or lists of exceptions/errors, please
send them to [12]blacklist@squidguard.org
Note: The link list must consist of fully qualified URLs,
ie. http://... (not the blacklist format).
The exception/error lists must consist of domains and
urls as (potentially) found in the blacklists.
Direct additions to the domain and url lists are not very
usefull, since they are the output from the robot; not
the input. Though if you have a long list that would have
been usefull you may of course reverse engineer it
through:
sed 's@^@http://www.@;s@$@/@' domains urls >links
8.
How can I test timeconstraints
You can set a simulated start time with the
-t yyyy-mm-ddTHH:MM:SS option:
squidGuard -c test.conf -t 1999-12-31T23:59:30 -d <
test.in>test.out 2>test.log
With the -t option squidGuard parses the given date&time
and calculates an offset from the current time at startup
and then adds this offset to all timevalues during
runtime.
9.
squidGuard compiles fine and the tests succeed, but it seems to
pass all when run under Squid
There may be at leaste two reasons for this:
o Some versions of Squid (supposedly 2.2.*) silently
ignores argumets to the right of
redirect_program prefix/bin/squidGuard. Solutions are
one of:
# Set the actual config file location at
[13]compiletime with --with-sg-config
# Use a shell wraper with
redirect_program prefix/bin/squidGuard.sh and make
prefix/bin/squidGuard.sh an executable shell like:
#! /bin/sh -
exec prefix/bin/squidGuard -c whatever/
squidGuard.conf
o When run under Squid, squidGuard is run with the same
user and group ID as Squid (cache_effective_user and
cache_effective_group in squid.conf). The squidGuard
configuration and database files must be readable for
this user and/or group and the squidGuard log directory
must be writable for this user and/or group. If not
squidGuard will go into the "pass all for all" emergency
mode.
10.
compilation of sg.l on fails with "sg.l:line ...: Error: Too many
positions" with native lex
Some native versions of lex have problems with sg.l. The
solution is to use [14]GNU flex wich is better anyway. Do
"setenv LEX flex" if configure selects the native lex
before flex. Flex should compile right out of the box
similar to other GNU programs. (Thanks to
laurent.foulonneau@mail.loyalty.nc).
11.
Can I use proxy authenticated user the same way as RFC931/Ident
user?
Yes, if you patch Squid < 2.3 with this simple [15]diff,
kindly contributed by [16]Antony T Curtis, the
authenticated user will be passed from Squid to
squidGuard. This patch has apparently already been
incorporated in squid-2.3:
A useful patch to Squid 2.2STABLE which fixes per-user redirection wher
e
the user is authenticated using proxy-auth...
*** src/redirect.c.orig Tue Jun 22 14:04:43 1999
--- src/redirect.c Tue Jun 22 15:46:41 1999
***************
*** 103,108 ****
--- 103,110 ----
cbdataAdd(r, cbdataXfree, 0);
r->orig_url = xstrdup(http->uri);
r->client_addr = conn->log_addr;
+ if (http->request->user_ident[0])
+ r->client_ident = http->request->user_ident; else
if (conn->ident == NULL || *conn->ident == '\0') {
r->client_ident = dash_str;
} else {
12.
Can I manipulate domains.db and urls.db from Perl?
Yes, but you must bind custom comparefunctions. Also note
the domains are stored with a leading ".":
use DB_File;
sub mirror($) {
scalar(reverse(shift));
}
sub domainmatch($$) {
my $search = mirror(lc(shift));
my $found = mirror(lc(shift));
if ("$search." eq $found) {
return(0);
} else {
return(substr($search,0,length($found)) cmp $found);
}
}
sub urlmatch($$) {
my $search = lc(shift) . "/";
my $found = lc(shift) . "/";
if ($search eq $found) {
return(0);
} else {
return(substr($search,0,length($found)) cmp $found);
}
}
my (%url,%domain);
$DB_BTREE->{compare} = \&urlmatch;
my $url_db = tie(%url, "DB_File", "urls.db", O_CREAT|O_RDWR, 0664, $DB_
BTREE)
|| die("urls.db: $!\n");
$DB_BTREE->{compare} = \&domainmatch;
my $domain_db = tie(%domain, "DB_File", "domains.db", O_CREAT|O_RDWR, 0
664, $DB_BTREE)
|| die("domains.db: $!\n");
# Now you can operate on %url and %domain just as normal perl hashes:)
# Add "playboy.com" to the domainlist unless it's already there:
$domain{".playboy.com"} = "" unless(exists($domain{"playboy.com"}));
# or use the DB_File functions put, get, del and seq:
# Add "sex.com" and "dir.yahoo.com/business_and_economy/companies/sex"
# and delete "cnn.com":
$domain_db->put(".sex.com","") unless(exists($domain{"sex.com"}));
$domain_db->sync; # Seems to only sync the last change.
$domain_db->del("cnn.com") if(exists($domain{"cnn.com"}));
$domain_db->sync; # Seems to only sync the last change.
$url_db->put("xyz.com/~sex","") unless(exists($url{"xyz.com/~sex"}));
$url_db->sync; # Seems to only sync the last change.
$url_db->sync; # Seems to only sync the last change.
$domain_db->sync; # Seems to only sync the last change.
undef($url_db); # Destroy the object
undef($domain_db); # Destroy the object
untie(%url); # Sync and close the file and undef the hash
untie(%domain); # Sync and close the file and undef the hash
See the perltie(1) and DB_File(3) man pages that comes
with Perl for more info.
13.
How can I list domains.db or urls.db from Perl?
Use a script like this:
#!/local/bin/perl -w
use strict;
use DB_File;
foreach (@ARGV) {
my (%db, $key, $val);
die("$_: $!\n") unless(-f);
tie(%db, "DB_File", $_, O_RDONLY, 0664, $DB_BTREE) || die("$_: $!\n")
;
foreach $key (keys(%db)) {
if($val = $db{$key}) {
$val = "\"$val\"";
} else {
$val = "undef";
}
print "$key -> $val\n";
}
untie(%db);
}
See the perltie(1) and DB_File(3) man pages that comes
with Perl for more info.
14.
How can I get around "make: don't know how to make /bin/false.
Stop"?
Your system does not have lynx and not /bin/false either:
If it has /usr/bin/false do:
# ln -s ../usr/bin/false /bin/.
Alternatively:
# echo exit 255 >/bin/false
# chmod a+rx /bin/false
If you have questions and/or answers that should be on the FAQ list
please send them to [17]squidguard@squidguard.org
____________________________
[18][gnu-logo.gif] [19][perl-logo.gif]
[20][squid-logo.gif] [21][squidGuard.gif]
[22][home_header.gif]
References
1. http://ftp.teledanmark.no/pub/www/proxy/squidGuard/
2. http://www.squidguard.org/
3. http://www.squid-cache.org/
4. http://www.squidguard.org/authors/
5. http://www.squidguard.org/authors/
6. http://www.squidguard.org/copyright/
7. http://www.teledanmark.no/
8. http://www.maynidea.com/squidguard/faq-plus.html
9. http://www.squidguard.org/contact/
10. http://ftp.tdcnorge.no/pub/db/
11. http://www.squidguard.org/config/#Expressionlists
12. mailto:blacklist@squidguard.org
13. http://www.squidguard.org/install/#Defaultconfigfile
14. http://www.squidguard.org/links/#Flex
15. http://ftp.tdcnorge.no/pub/www/proxy/squidGuard/contrib/squid-2.2-authuser.patch
16. mailto:antony@abacus.co.uk
17. mailto:squidguard@squidguard.org?subject=SquidGuard%20FAQ?
18. http://www.gnu.org/
19. http://www.perl.com/
20. http://www.squid-cache.org/
21. http://www.squidguard.org/
22. http://www.sleepycat.com/
squidGuard-1.5/doc/index.html 0000640 0001750 0001750 00000003324 10717346074 015212 0 ustar adjoo adjoo
Another squidguard website
Another SquidGuard Website
Documentation
FAQs
Mirko Lorenz - mirko at shalla.de
10.12.2006
squidGuard-1.5/doc/installation.html 0000640 0001750 0001750 00000045632 10717346070 016610 0 ustar adjoo adjoo
squidGuard - Installation
|
This page was last modified 2002-01-08
|
|
-
The good news:
-
squidGuard uses Squid's standard
redirector interface so no patching of Squid is
needed!
-
and the not so good news:
-
Currently we don't distribute precompiled versions of
squidGuard.
Though following these few steps should bring you
up and going with squidGuard within a few minutes, provided you
have the basic tools:
-
Install version 2.X of the Berkeley
DB library (if not already installed on your system)
-
./configure
-
make
-
make install
-
Create a squidGuard.conf that
suits your needs
-
Create the domain, url and expression lists you want
-
Test/simulate
-
Configure squid to use
squidGuard as the redirector and specify the number of
redirector processes you want
-
Send Squid a HUP
signal
Voilà!
-
Besides Squid you need a basic UNIX development
environment with a make compatible
build tool, an ANSI C compiler, a yacc compatible parser generator, a
lex compatible lexical analyzer
generator and a regcomp()/regexec() compatible
regular expression library. You also need gzip to unpack the
distribution. Don't despair: If you managed to install Squid
you most likely have all this! If not the links here points
you to all the free sources you need.
-
You need a version 2.X of the Berkeley DB library installed on your system. If
you don't already have it, download and
install the latest 2.X version. It should compile and install
right out of the box. (squidGuard is developed with Berkeley DB
version 2.x in mind, but it might work with Berkeley DB versions
1.85 and 1.86 too. If you have success linking and running with
versions 1.85 or 1.86 please report!)
Here is a
quick installation guide for the Berkeley DB library:
-
mkdir -p /local/src (or wherever you like)
cd /local/src
gzip -dc /wherever/db-2.y.z.tar.gz | tar xvf -
cd db-2.y.z/dist
./configure (optionally add the environment and flags you prefer) *)
make
make install
make clean (optional)
*) At Tele Danmark we
use:
-
#!/bin/sh -
cd build_unix
CC=gcc \
CXX=g++ \
CFLAGS="-O3 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64" \
CXXFLAGS="-O3 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64" \
../dist/configure \
--verbose \
--target=sparc-sun-solaris \
--enable-dynamic \
--enable-compat185 \
--enable-rpc \
--prefix=/local
By default the more recent versions of the Berkeley DB library
installs itself under
/usr/local/BerkeleyDB/{lib,include,bin,docs}
-
Download squidGuard and unpack the
distribution with:
-
mkdir -p /local/src (or wherever you like)
cd /local/src
gzip -dc /wherever/squidGuard-x.y.z.tar.gz | tar xvf -
cd squidGuard-x.y.z
-
squidGuard now comes with GNU auto
configuration for maximum portability and easy compilation
setup. For a default environment, simply run:
-
./configure
If you have gcc you may want to force the use
of gcc and optimize more:
-
csh|tcsh# (setenv CC "gcc"; setenv CFLAGS "-O3";
./configure)
or
-
sh|bash|ksh# CC="gcc" CFLAGS="-O3" ./configure
depending on your shell. This will prepare Makefiles to compile
and optionally install the squidGuard executable as
/usr/local/bin/squidGuard. If you prefer to
install squidGuard as for instance
/local/squid/bin/squidGuard, use the option:
-
./configure --prefix=/local/squid
To avoid the need of runing
squidGuard with the command line option
"-c /wherever/filter.conf"*), you may want to
change the default to the actual location of the configuration
file at compile time by adding:
-
./configure --with-sg-config=/wherever/filter.conf
*)Note: squid-2.2.x up to STABLE2are broken and ignores the
argument list silently without passing it to the
redirector. Therefor with squid-2.2.x up to STABLE2 you
must specify the correct config file location with
--with-sg-config=... at compile time. Versions up to
2.1.PATCH2 do not have this problem.
To see the full list of
build configuration options run:
-
./configure --help
At Tele Danmark we use:
-
#!/bin/sh -
CC="gcc" \
CFLAGS="-O3 -Wall" \
LIBS="-R/local/lib -lnls" \
./configure \
--verbose \
--target=sparc-sun-solaris \
--prefix=/local/squid \
--with-db-lib=/local/lib \
--with-db-inc=/local/include \
--with-sg-config=/var/spool/www/hosts/proxy.teledanmark.no/filter/conf/filter.conf \
--with-sg-logdir=/var/spool/www/hosts/proxy.teledanmark.no/filter/logs \
--with-sg-dbhome=/var/spool/www/hosts/proxy.teledanmark.no/filter/db
-
Now simply run:
-
make
This should compile squidGuard without errors. If you compile with
gcc -Wall you may safely ignore warnings for the machine
generated code y.tab.{c,h} (from sg.y) and
lex.yy.c (from sg.l). You should probably
investigate other warnings and errors.
-
To test the newly built
squidGuard run:
-
make test
-
If all is OK run:
-
make install
This will install the squidGuard executable in
prefix/bin/squidGuard where prefix
is /usr/local unless you changed it with
--prefix=/some/where/else.
-
Make a configuration file for squidGuard. Start
with a minimal configuration
and extend as your experience and needs grow.
-
Make the destination lists (databases) you
want (if any at all).
-
Test your configuration
isolated. Put some sample requests in three files named something
like test.pass, test.rewrite and
test.block. (Omit test.rewrite if you don't have
rewrite rules.) The format of these files is:
-
URL ip-address/fqdn ident method
For instance:
-
http://freeware.teledanmark.no/squidGuard/ 10.1.2.3/pc123.teledanmark.no fdgh GET
http://bad.site.com/dirty/stuff/foo.htm 10.3.2.1/- - GET
The ip-address is mandatory, the fqdn and ident fields may be "-"
depending of how you have configured Squid with respect to reverce
DNS lookups and indent lookups. The request method is
GET, POST, etc.
Put some sample
requests that should pass transparently, be rewritten/redirected
and blocked in test.pass, test.rewrite and
test.block respectively. Now you are ready to simulate
real requests. Run the three simulations:
-
prefix/bin/squidGuard -c /your/squidGuard.conf < test.pass > test.pass.out
prefix/bin/squidGuard -c /your/squidGuard.conf < test.rewrite > test.rewrite.out
prefix/bin/squidGuard -c /your/squidGuard.conf < test.block > test.block.out
Check the pass output:
-
wc -l test.pass
wc -l test.pass.out
wc -w test.pass.out
The numerical results should be identical for the first two tests
and 0 for the last.
Check the rewrite/redirect output
(Omit if you don't have rewrite rules.):
-
wc -l test.rewrite
wc -l test.rewrite.out
diff test.rewrite test.rewrite.out | egrep -ic '^> ..* [0-9.]+/..* ..* [a-z]+$'
more test.rewrite.out
The numerical results should be identical for the first three
tests. Visually ensure the new URLs are as expected with the
more command.
Check the block output:
-
wc -l test.block
wc -l test.block.out
diff test.block test.block.out | egrep -ic '^> ..* [0-9.]+/..* ..* [a-z]+$'
more test.block.out
The numerical results should be identical for the first three
tests. Visually ensure the new URLs are as expected with the
more command.
-
Install the empty image,
stopsign image, dummy access denied page, the more or
less intelligent CGI page or whatever your redirectors points to,
on a web server that Squid can access; typically on the proxy
server or a nearby server. If you don't have a web server we
strongly recommend Apache although any
stable web server of your choice can be used.
-
Tell
Squid to use squidGuard as the redirector by uncommenting and
changing the following tags in squid.conf to:
-
redirect_program /prefix/bin/squidGuard
or if squidGuard's config file is somewhere else than set at
compile time*):
-
redirect_program /prefix/bin/squidGuard -c /wherever/squidGuard.conf
where prefix is /usr/local unless you
changed it with
--prefix=/some/where/else.
*)Note: squid-2.2.x up to STABLE2 are broken and ignores
the argument list silently without passing it to the
redirector. Therefor with squid-2.2.x up to STABLE2 you
must specify the correct config file location with
--with-sg-config=... at compile time. Versions up to
2.1.PATCH2 do not have this problem.
Also configuere the number of redirector processes you think you
want:
-
redirect_children 4
I really don't know why one should have more than one squidGuard
process on a single CPU system cince squidGuard never blocks
indefinitly like the cache_dns_program and
optional authenticate_program are more likely to
do. Of course with more redirectors there is a chance a request
that matches the first client group, rule and destination group
could sneak out before a request that matches the last rule. But
on the other hand more redirectors also slows down the system by
added overhead and memory usage. Anyway 4 seems like a fine number
to start with. We haven't done any benchmarking to find the best
value and it may vary with the actual configuration.
-
Send Squid a HUP
signal:
-
kill -HUP `cat /somewhere/squid.pid`
or
-
squid -k reconfigure
-
Test with a browser.
|
|
|
|
|
squidGuard-1.5/doc/installation.txt 0000640 0001750 0001750 00000032236 10717346070 016457 0 ustar adjoo adjoo [1][squidGuard.gif] Installing squidGuard [install.gif]
[2]squidGuard is an ultrafast and free filter, redirector and access
controller for [3]Squid
By [4]Pål Baltzersen and [5]Lars Erik Håland
[6]Copyright © 1999-2000, [7]Tele Danmark InterNordia
Visitors: [counter] (Since 2002-01-08 19:54:05)
This page was last modified 2002-01-08
[arrow-red.gif] Installation instructions
The good news:
squidGuard uses Squid's [8]standard redirector interface so no
patching of Squid is needed!
and the not so good news:
Currently we don't distribute precompiled versions of
squidGuard.
Though following these few steps should bring you up and going
with squidGuard within a few minutes, provided you have the
basic tools:
[arrow-green.gif] For the impatient/experienced:
1. Install version 2.X of the [9]Berkeley DB library (if not already
installed on your system)
2. [10]./configure
3. [11]make
4. [12]make install
5. Create a [13]squidGuard.conf that suits your needs
6. Create the [14]domain, [15]url and [16]expression lists you want
7. Test/simulate
8. [17]Configure squid to use squidGuard as the redirector and
specify the number of redirector processes you want
9. [18]Send Squid a HUP signal
[19]Voilà!
[arrow-green.gif] For the less impatient:
1. Besides [20]Squid you need a basic UNIX development environment
with a [21]make compatible build tool, an ANSI [22]C compiler, a
[23]yacc compatible parser generator, a [24]lex compatible lexical
analyzer generator and a [25]regcomp()/regexec() compatible
regular expression library. You also need [26]gzip to unpack the
distribution. Don't despair: If you managed to install Squid you
most likely have all this! If not the links here points you to all
the free sources you need.
2. You need a version 2.X of the [27]Berkeley DB library installed on
your system. If you don't already have it, [28]download and
install the latest 2.X version. It should compile and install
right out of the box. (squidGuard is developed with Berkeley DB
version 2.x in mind, but it might work with Berkeley DB versions
1.85 and 1.86 too. If you have success linking and running with
versions 1.85 or 1.86 please [29]report!)
Here is a quick installation guide for the Berkeley DB library:
mkdir -p /local/src (or wherever you like)
cd /local/src
gzip -dc /wherever/db-2.y.z.tar.gz | tar xvf -
cd db-2.y.z/dist
./configure (optionally add the environment and flags you
prefer) *)
make
make install
make clean (optional)
*) At [30]Tele Danmark we use:
#!/bin/sh -
cd build_unix
CC=gcc \
CXX=g++ \
CFLAGS="-O3 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64" \
CXXFLAGS="-O3 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64" \
../dist/configure \
--verbose \
--target=sparc-sun-solaris \
--enable-dynamic \
--enable-compat185 \
--enable-rpc \
--prefix=/local
By default the more recent versions of the Berkeley DB library
installs itself under /usr/local/BerkeleyDB/{lib,include,bin,docs}
3. [31]Download squidGuard and unpack the distribution with:
mkdir -p /local/src (or wherever you like)
cd /local/src
gzip -dc /wherever/squidGuard-x.y.z.tar.gz | tar xvf -
cd squidGuard-x.y.z
4. squidGuard now comes with [32]GNU auto configuration for maximum
portability and easy compilation setup. For a default environment,
simply run:
./configure
If you have [33]gcc you may want to force the use of gcc and
optimize more:
csh|tcsh# (setenv CC "gcc"; setenv CFLAGS "-O3";
./configure)
or
sh|bash|ksh# CC="gcc" CFLAGS="-O3" ./configure
depending on your shell. This will prepare Makefiles to compile
and optionally install the squidGuard executable as
/usr/local/bin/squidGuard. If you prefer to install squidGuard as
for instance /local/squid/bin/squidGuard, use the option:
./configure --prefix=/local/squid
To avoid the need of runing squidGuard with the command line
option "-c /wherever/filter.conf"*), you may want to change the
default to the actual location of the configuration file at
compile time by adding:
./configure --with-sg-config=/wherever/filter.conf
*)Note: squid-2.2.x up to STABLE2are broken and ignores the
argument list silently without passing it to the redirector.
Therefor with squid-2.2.x up to STABLE2 you must specify the
correct config file location with --with-sg-config=... at compile
time. Versions up to 2.1.PATCH2 do not have this problem.
To see the full list of build configuration options run:
./configure --help
At [34]Tele Danmark we use:
#!/bin/sh -
CC="gcc" \
CFLAGS="-O3 -Wall" \
LIBS="-R/local/lib -lnls" \
./configure \
--verbose \
--target=sparc-sun-solaris \
--prefix=/local/squid \
--with-db-lib=/local/lib \
--with-db-inc=/local/include \
--with-sg-config=/var/spool/www/hosts/proxy.teledanmark.no/filter/conf/filte
r.conf \
--with-sg-logdir=/var/spool/www/hosts/proxy.teledanmark.no/filter/logs \
--with-sg-dbhome=/var/spool/www/hosts/proxy.teledanmark.no/filter/db
5. Now simply run:
make
This should compile squidGuard without errors. If you compile with
gcc -Wall you may safely ignore warnings for the machine generated
code y.tab.{c,h} (from sg.y) and lex.yy.c (from sg.l). You should
probably investigate other warnings and errors.
6. To test the newly built squidGuard run:
make test
7. If all is OK run:
make install
This will install the squidGuard executable in
prefix/bin/squidGuard where prefix is /usr/local unless you
changed it with --prefix=/some/where/else.
8. Make a [35]configuration file for squidGuard. Start with a
[36]minimal configuration and extend as your experience and needs
grow.
9. Make the [37]destination lists (databases) you want (if any at
all).
10. Test your configuration isolated. Put some sample requests in
three files named something like test.pass, test.rewrite and
test.block. (Omit test.rewrite if you don't have rewrite rules.)
The format of these files is:
URL ip-address/fqdn ident method
For instance:
http://freeware.teledanmark.no/squidGuard/ 10.1.2.3/pc123
.teledanmark.no fdgh GET
http://bad.site.com/dirty/stuff/foo.htm 10.3.2.1/- - GET
The ip-address is mandatory, the fqdn and ident fields may be "-"
depending of how you have configured Squid with respect to reverce
DNS lookups and indent lookups. The request method is GET, POST,
etc.
Put some sample requests that should pass transparently, be
rewritten/redirected and blocked in test.pass, test.rewrite and
test.block respectively. Now you are ready to simulate real
requests. Run the three simulations:
prefix/bin/squidGuard -c /your/squidGuard.conf < test.pas
s > test.pass.out
prefix/bin/squidGuard -c /your/squidGuard.conf < test.rew
rite > test.rewrite.out
prefix/bin/squidGuard -c /your/squidGuard.conf < test.blo
ck > test.block.out
Check the pass output:
wc -l test.pass
wc -l test.pass.out
wc -w test.pass.out
The numerical results should be identical for the first two tests
and 0 for the last.
Check the rewrite/redirect output (Omit if you don't have rewrite
rules.):
wc -l test.rewrite
wc -l test.rewrite.out
diff test.rewrite test.rewrite.out | egrep -ic '^> ..* [0
-9.]+/..* ..* [a-z]+$'
more test.rewrite.out
The numerical results should be identical for the first three
tests. Visually ensure the new URLs are as expected with the more
command.
Check the block output:
wc -l test.block
wc -l test.block.out
diff test.block test.block.out | egrep -ic '^> ..* [0-9.]
+/..* ..* [a-z]+$'
more test.block.out
The numerical results should be identical for the first three
tests. Visually ensure the new URLs are as expected with the more
command.
11. Install the empty image, stopsign image, dummy access denied page,
the more or less intelligent CGI page or whatever your redirectors
points to, on a web server that Squid can access; typically on the
proxy server or a nearby server. If you don't have a web server we
strongly recommend [38]Apache although any stable web server of
your choice can be used.
12. Tell Squid to use squidGuard as the redirector by uncommenting and
changing the following tags in squid.conf to:
redirect_program /prefix/bin/squidGuard
or if squidGuard's config file is somewhere else than set at
compile time*):
redirect_program /prefix/bin/squidGuard -c /wherever/squi
dGuard.conf
where prefix is /usr/local unless you changed it with
--prefix=/some/where/else.
*)Note: squid-2.2.x up to STABLE2 are broken and ignores the
argument list silently without passing it to the redirector.
Therefor with squid-2.2.x up to STABLE2 you must specify the
correct config file location with --with-sg-config=... at
[39]compile time. Versions up to 2.1.PATCH2 do not have this
problem.
Also configuere the number of redirector processes you think you
want:
redirect_children 4
I really don't know why one should have more than one squidGuard
process on a single CPU system cince squidGuard never blocks
indefinitly like the cache_dns_program and optional
authenticate_program are more likely to do. Of course with more
redirectors there is a chance a request that matches the first
client group, rule and destination group could sneak out before a
request that matches the last rule. But on the other hand more
redirectors also slows down the system by added overhead and
memory usage. Anyway 4 seems like a fine number to start with. We
haven't done any benchmarking to find the best value and it may
vary with the actual configuration.
13. Send Squid a HUP signal:
kill -HUP `cat /somewhere/squid.pid`
or
squid -k reconfigure
14. Test with a browser.
____________________________
[40][gnu-logo.gif] [41][perl-logo.gif]
[42][squid-logo.gif] [43][squidGuard.gif]
[44][home_header.gif]
References
1. http://ftp.teledanmark.no/pub/www/proxy/squidGuard/
2. http://www.squidguard.org/
3. http://www.squid-cache.org/
4. http://www.squidguard.org/authors/
5. http://www.squidguard.org/authors/
6. http://www.squidguard.org/copyright/
7. http://www.teledanmark.no/
8. http://www.squid-cache.org/Versions/1.1/Release-Notes-1.1.txt
9. http://www.squidguard.org/install/#Detailed_install_2
10. http://www.squidguard.org/install/#Detailed_install_4
11. http://www.squidguard.org/install/#Detailed_install_5
12. http://www.squidguard.org/install/#Detailed_install_7
13. http://www.squidguard.org/config/
14. http://www.squidguard.org/config/#Domainlists
15. http://www.squidguard.org/config/#URLlists
16. http://www.squidguard.org/config/#Expressionlists
17. http://www.squidguard.org/install/#Detailed_install_12
18. http://www.squidguard.org/install/#Detailed_install_13
19. http://www.squidguard.org/install/#Detailed_install_14
20. http://www.squidguard.org/links/#Squid
21. http://www.squidguard.org/links/#Gmake
22. http://www.squidguard.org/links/#Gcc
23. http://www.squidguard.org/links/#Bison
24. http://www.squidguard.org/links/#Flex
25. http://www.squidguard.org/links/#Regex
26. http://www.squidguard.org/links/#Gzip
27. http://www.squidguard.org/links/#DB
28. http://www.squidguard.org/links/#DB
29. mailto:squidguard@squidguard.org
30. http://www.teledanmark.no/
31. http://www.squidguard.org/download/
32. http://www.gnu.org/software/autoconf/
33. http://www.squidguard.org/links/#Gcc
34. http://www.teledanmark.no/
35. http://www.squidguard.org/config/
36. http://www.squidguard.org/config/#Minimal
37. http://www.squidguard.org/config/#Lists
38. http://www.squidguard.org/links/#Apache
39. http://www.squidguard.org/install/#Defaultconfigfile
40. http://www.gnu.org/
41. http://www.perl.com/
42. http://www.squid-cache.org/
43. http://www.squidguard.org/
44. http://www.sleepycat.com/
squidGuard-1.5/doc/squidGuard.gif 0000640 0001750 0001750 00000002704 10717346070 016011 0 ustar adjoo adjoo GIF89a} " „ ÿÿÿúºtpr€{€¢‹‘ /1 ”/6×<5´,&e$¯9;S
©PCJ,
l)7çXTû 욘Œà oÂqsªNSü vEMõmdõ
àKF !ù , } " þ Ždižhª®lë’0EmÁ«ï|"AT$…ÃÁðk:Ÿ¾Á‚‘@4¹AÃax4аx@ŒÈâ‘
€W@®ÛKÉd¿ÏL&
f`wa|‘|
_
m&tŽ¡Mz‘yy jš› Ÿ¢´;¥$
*"JµÈ+{¸$h9|Î&
"
ÉÞ'‘&¼" ¤'œt
tßò ’$ãño™"Ëë%Ú $™'o™
ÌDŒ ‚WÖH¨ã]<‚Éøˆ@8Aà ¡]¡±\Fþb¬`„lX˜`ÃŽ` rÄ2 e@qå€aÆ(H•ŠaO‡dÈÀGà ¤à º2_X"•1`-Û¶n×¾R©R0Èà!* \( 0P`O‚¨`À§æ90¡ÇÓ6^t EÑr€
€®QcèÓ„dèv¢å N7< ¦W‚Ô èÀñÀ8`à‰SB&3mnð³6yƒ¹œLa x„€Ñ¤³kß®½ÒmEÈXòt+†^zûRà°78$Ø5ì8°ÖPa$ÝW+°ATþ¡ ðpçàƒÙ
Bj]¨&O€l¥Þ
D€‰ …Â@
ˆ ÕëUÅGÄšDT'#@àPL8àã¦|BäKD*aä‘Lf%ZPŠZ1Q鉸Ià«}E€‡Ta€ ¤ _h‚@W‰±@>„E k¨p@X1ý¨g8`E€*è „
z&zàA C$ è£d01B–mtBA ´GJ9pA@$°G5Áã”ÒU€À ± €ŽyòI}òÉÊ®¼öê+¯ÄIÅVòA¤¦${%~H0f9"JµæoeþðâáVxGˆ9;b¢ÙŽ<ò¨@¤( ®?âê®»¨ôÙ@*°@:Ç*šG(Ú¯ (P#‘|U€cЇ9ä@ —UˆHÀL4 ÆÄ Qã