|
April 14, 2004
Can fuzzy logic beat real logic?
Whatever happened to the idea of Bayesian spam filtering? I mean, any idiot these days can tell a SPAM email from a real email, so how on earth do these stupid things keep getting through? I am not talking about covert advertising cleverly disguised as real text; that may take time. But the vast majority of SPAM is blatant. I don't like insulting my friends by requiring them to fill in a stupid blank in order to send me an email, nor do I wish to install some sucky-ass software which generates its own SPAM by sending out ads to anyone who gets email from me. I wonder why the Bayesian software hasn't been developed more fully. Might it just put an end to SPAM as we know it? It might, but on the other hand it might not! This post (which begins with an "intelligent" SPAM email) demonstrates the problems inherent in the Bayesian approach: [M]arking such emails as spam will increase the probability of false positives in the future. If you receive a lot of these mails, certain rare words will be associated very highly with spam by your filter. Then, when you get an innocent-seeming email from a friend that happens to contain the words “schizophrenic pompous playwright”, that will be enough to get it black-holed.Well, yes, maybe it is. (Another reason I have repeatedly recommended crucifixion....) As soon as I get home I am going to get off my butt and configure Netscape's apparently Bayesian spam filter. Had I not written this post, I don't think I would have even known I had it! Netscape Mail has saved my ass over the years from the numerous viruses that are written to target IE; I just wish the Netscape browser could be made to work better. What just sticks in my craw is how easy it is for humans to spot SPAM, yet how difficult it is for computers. Fuzzy logic somehow has to supply the answer. Meanwhile, the spammers have nothing but time and fanatic devotion to their silly games of annoyance. Along similar lines, I often wonder whether anti-virus companies hire virus writers. (I can't think of a more intriguing conflict of interest than the sort of moonlighting which might be officially unapproved, yet guarantee promotion -- for if there are no new viruses you can't sell software! Money plus conflicts of interest means mutually escalated non-destruction -- if that's not too fuzzy.) posted by Eric on 04.14.04 at 05:45 PM
Comments
I second that. SpamAssassin is excellent. Out of the 300-400 spams I get a week, only one or two will slip through per month. Of course, I also have a number of other anti-spam measures in place. I use mailinator.com for throw-away addresses, configure procmail to automatically pass certain messages before they even hit SpamAssassin, have one long-standing email address that I use entirely as a spam trap, have my own domain, and run an exim (mail server) configuration that is very aggressively anti-spam. Still, the vast bulk of the work is done by SpamAssassin and it does it very well. mallarme · April 15, 2004 01:16 PM Popfile works for me. I maybe get 1 or 2 a month seeping through while 150 a day get consigned to e-mail hell. Bill Peschel · April 16, 2004 12:57 AM SpamSieve is 99.99% on a Macintosh running OSX. I'm average just under 250 pieces per day. I'd rate it 100% effective but no one believes in perfection these days. ;) BTW. I find some of your comments where you beat up on Glenn Reynolds humorous, others mundane. Most of them fall short of providing useful information, but that's why different views can be useful. I hope to return and find something in that category. I suspect he's not 100% accurate either. :) Steve · April 16, 2004 06:44 AM |
|
March 2007
WORLD-WIDE CALENDAR
Search the Site
E-mail
Classics To Go
Archives
March 2007
February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 April 2006 March 2006 February 2006 January 2006 December 2005 November 2005 October 2005 September 2005 August 2005 July 2005 June 2005 May 2005 April 2005 March 2005 February 2005 January 2005 December 2004 November 2004 October 2004 September 2004 August 2004 July 2004 June 2004 May 2004 April 2004 March 2004 February 2004 January 2004 December 2003 November 2003 October 2003 September 2003 August 2003 July 2003 June 2003 May 2003 May 2002 See more archives here Old (Blogspot) archives
Recent Entries
War For Profit
How trying to prevent genocide becomes genocide I Have Not Yet Begun To Fight Wind Boom Isaiah Washington, victim Hippie Shirts A cunning exercise in liberation linguistics? Sometimes unprincipled demagogues are better than principled activists PETA agrees -- with me! The high pitched squeal of small carbon footprints
Links
Site Credits
|
|
For what it's worth, Bayesian filtering seems to be thriving at the moment - there are a number of programs under active development (good results now and a lot of interesting research to improve matters). I prefer the SpamAssassin approach of using Bayesian filtering as one component in a larger system - it's much harder to get past all of those different tests, particularly since the obfuscation tactics which fool one check are often red flags on another - this also allows things like automatically training the Bayesian classifier on words which appeared in blacklisted spam - essential when the same thing starts arriving from a non-blacklisted source.
The bottom line is that with 200+ inbound spams per day filtered through SpamAssassin with a trained Bayesian classifier I will actually see 1-2 messages in a bad week.