|
White and Black List Spam FilteringCopyright 2007 by Morris Rosenthal -All Rights Reserved contact info |
Starting and Running Your Own PC Business
Copyright 2007 by Morris Rosenthal All Rights Reserved
|
Blocking Junk E-mailI've had an online website presence since 1995, and accepting questions from strangers by e-mail has always been part of the equation for me. Unfortunately, the level of junk mail has increased exponentially as of late, to the point where I was getting several hundred unwanted solicitations a day. While scanning the subject lines to weed out the spam has probably never cost me upwards of 15 minutes a day, it's resulted in my accidentally deleting legitimate e-mails with a poor choice of words in the subject. Besides, it was never any fun reading through all those subject headings, just a tax on my time. I finally implemented a White and Black list, and after a little tweaking, I appear to be receiving all of the e-mail I want, while eliminating 99% of the spam. My web host features the Merak mail server and I'll use their settings for the sake of this example, but all servers support white and black filters, as do many modern e-mail clients, such as Outlook 2003. In the Merak filter file, a line starting with 1 is a White list and a line starting with a 0 is a Black list. Lines with no number at the start are Black lists by default. I'll break this discussion into two logical parts, White lists and Black lists, and I'll start with the White list because if I was a mail server program, I'd process the White list first and not bother with the Black list as soon as a message passes. In fact, I put the White list arguments at the beginning of the filter file, and it may actually be required. I'm wrote this article because despite searching around with Google, I couldn't find any practical instructions on implementing White and Black filters. By the way, the default setting for my filter with Merak mail is to send a "rejected" message to anybody who gets rejected, and I left it that way. White ListsThe purpose of a White list is to specify elements whose inclusion in an e-mail guarantee it will pass the filter and be delivered. You can White list any element of an e-mail that you want, from the domain name of the sender to a word, or even a piece of a word, in the e-mail body. An obvious example of a While list element is a character string or exact match from the domain name of businesses you receive regular correspondence from, and we'll use Amazon and PayPal in this example. In this case, "S" signifies the "Sender." 1:S~amazon;paypal or 1:S~azon;aypa Since the "~" denotes "character string containing the following", both of these lines will pass e-mails sent from Amazon or PayPal. Yes, it will also pass through e-mails from spoofers pretending to be Amazon or PayPal, but both of these companies are very proactive in going after spoofers and prosecuting them. The semicolons divide between list items. Rather than just the sender, if you wanted to receive e-mails that contain "Amazon" anywhere in the e-mail header, you could include 1:Y~amazon As "Y" signifies "Any Header." If you want to White list e-mails coming from a discussion list you participate in, you can either add the domain name they are sent from to the "Sender" White list, or you can White list the name of the list that appears in the subject line, as in: 1:H~Subject: mygroup;digest This filter would pass any e-mails with "mygroup" in the subject line, where "mygroup" is the name of the group you subscribe to. Including "digest" in the White list means any e-mail with "digest" in the subject line will be passed through, which works fine if you're on several list digests and you don't get spam with "digest" in the subject. Another way to use a White list for the Subject line for the e-mail is to create a pass code that will ensure the delivery of e-mail for anybody who uses it, and then post the pass code on the contact page of your website, as I did here. My current pass code is the number 1967 (I used to own a 1967 Mustang), which is implemented as: 1:H~Subject:1967 Black ListsNow that you've seen how it's possible to make sure certain e-mails get through, it's time to start putting up barriers. One easy way to raise the level of your virus protection is to eliminate any e-mails from people not on your White list that include an executable attachment: A~.exe;.cmd;.scr;.com Note we didn't bother with a "0:" since in Merak, it's a Black list by default. If you don't want attachments from strangers at all, you could lengthen the list with more common files, such as: A~.exe;.cmd;.scr;.com;gif;jpg;avi;doc;txt;ppt;pdf This is helpful if your main concern is avoiding Power Point presentations or Acrobat data sheets from sales people. Y~base64 Will stop e-mails with "base64" anywhere in the header. The "base64" will block the majority of those foreign e-mails that look like one big encrypted block of who knows what, which are usually in base64. You could use the Sender field to Black list a particular individual or domain who is harassing you, but the real fun is in Subject line and Body filters. For example: H~Subject:pharmacy;debt;mortgage;afil;viagra;cheap;cialis;canada These are some pretty obvious words and strings that signify spam for the great majority of us. If you frequently get e-mails with your own e-mail address in the subject line, you can stick that in there as well. The real filtering goes on in the Body of the e-mail, and I block all sorts of sexually explicit words that I won't list here, and for which my friends will have to forgive me if their e-mails bounce because I haven't White listed them. I'm including a few drugs in the Body Black list by way of example, but the real filtering goes on at the end of the line: B~ viagra;valium;html;src=;php;unsubscribe;ads.msn;rd.yahoo;confidential First, "html." I hate getting HTML e-mail from people, I don't use an HTML capable e-mail client, and I don't want to read fancy formatted text in any case. A huge amount of the spam out there is formatted in HTML to make it pretty for the suckers, so I block it all. "Src=" is typical of spam that's pulling in a picture from another site. While the "html" blocking should fix it, I left it in here so if you want to receive html but not get images, just blocking "src=" should work. I block any e-mail including a link to a "php" web page. These dynamically generated pages are rarely going to be part of a legit e-mail, unless you have a friend who's always sending you links to newspaper articles, and who wants those anyway? "unsubscribe" is a counterintuitive word to block, but I've already White listed the legitimate newsletters I subscribe to, and I'd rather stop the others before they get in the front door. Besides, fake "unsubscribe" links have been used by spammers for years to identify the naive types that actually read the whole spam and then click on them. Many spams come in with links including "ads.msn" or "rd.yahoo" so I block them. "confidential" along with "congratulations" and "winner" are also words you might want to block if you have a good white list, along with "sec" and "symbol" if you get a lot of stock market spam . That's it for now, but I'll revisit this page with any future tweaks I make.
|
If you really love working with PCs, you might want to look at starting your own business | Visit our computer book store