3 # TODO: should this be made a top-level script, called "sa-awl"?
7 usage: check_whitelist [--clean] [--min n] [dbfile]
16 $opt_clean $opt_min $opt_help
20 'clean' => \$opt_clean,
24 $opt_help and usage();
28 BEGIN { @AnyDBM_File::ISA = qw(DB_File GDBM_File NDBM_File SDBM_File); }
33 $db = $ENV{HOME}."/.spamassassin/auto-whitelist";
40 tie %h, "AnyDBM_File",$db, O_RDWR,0600
41 or die "Cannot open r/w file $db: $!\n";
43 tie %h, "AnyDBM_File",$db, O_RDONLY,0600
44 or die "Cannot open file $db: $!\n";
47 my @k = grep(!/totscore$/,keys(%h));
50 my $totscore = $h{"$key|totscore"};
52 next unless defined($totscore);
55 if ($count >= $opt_min) { next; }
59 printf "% 8.1f %15s -- %s\n",
60 $totscore/$count, (sprintf "(%.1f/%d)",$totscore,$count),
64 delete $h{"$key|totscore"};
72 check_whitelist - examine and manipulate SpamAssassin's auto-whitelist db
76 B<check_whitelist> [--clean] [--min n] [dbfile]
80 Check or clean a SpamAssassin auto-whitelist (AWL) database file.
82 The name of the file is specified after any options, as C<dbfile>.
83 The default is C<$HOME/.spamassassin/auto-whitelist>.
91 Clean out infrequently-used AWL entries. The C<--min> switch can be
92 used to select the threshold at which entries are kept or deleted.
96 Select the threshold at which entries are kept or deleted when C<--clean> is
97 used. The default is C<2>, so entries that have only been seen once are
104 The output looks like this:
106 AVG (TOTSCORE/COUNT) -- EMAIL|ip=IPBASE
110 0.0 (0.0/7) -- dawson@example.com|ip=208.192
111 21.8 (43.7/2) -- mcdaniel_2s2000@example.com|ip=200.106
113 C<AVG> is the average score; C<TOTSCORE> is the total score of all mails seen
114 so far; C<COUNT> is the number of messages seen from that sender; C<EMAIL> is
115 the sender's email address, and C<IPBASE> is the B<AWL base IP address>.
117 B<AWL base IP address> is a way to identify the sender's IP address they
118 frequently send from, in an approximate way, but remaining hard for spammers to
119 spoof. The algorithm is as follows:
121 - take the last Received header that contains a public IP address -- namely
122 one which is not in private, unrouted IP space.
124 - chop off the last two octets, assuming that the user may be in an ISP's
125 dynamic address pool.