Paul Wilders
February 27th, 2002, 10:18 AM
{QUOTE-> For just about as long as the commercial Internet has existed, SPAM email has been the bane of users worldwide. The harder and harder we try to fight the spammers and keep our email addresses out of their hands, the smarter they get and the harder they fight back. One example of peoples attempts to fight back is the large numbers of joe@NOSPAM.email.com, NO.mary.SPAM@REMOVESPAM.mary.com, etc email addresses you find on Usenet and web based communities these days. Worse yet, many people hold back from contributing to online discussions for fear their email address will be available for evil web spiders (I call them Spiderts - A web spider with a Catbert type personality) to harvest and exploit from mailing list archives.
As one who runs(and uses!) evolt's mailing lists, keeping thousands of people's email addresses out of the tentacles of Spiderts has always been a big concern of mine. At first, it was easily remedied by using the %40 'trick'. Instead of writing archives with an easily recognizable email address (abuse@aol.com for example), I had our mailing list software write all email addresses as abuse%40aol.com
This still allowed for a fairly easy to read address for humans while maintaining the ability to click the mailto: link and have one's associated email client create a new message with the correct email address entered. The Spiderts wouldn't recognize abuse%40aol.com as a valid email address and therefore not harvest it.
This was a fairly good solution until its use became widespread, at which point the creators of the Spiderts tweaked their unholy creations to recognize abuse%40aol.com as a harvestable email address and siphon it as well. As if it couldn't get worse, it was also becoming apparent that the newer generations of Spiderts don't play by the rules set out for web spiders, and would disregard any "Disallow: /" entries in the robots.txt file. In fact, I've seen Spiderts that only go for what we specifically tell them not to! What's a webmaster to do?!? <-QUOTE}
Read the full story here:
http://evolt.org/article/Using_Apache_to_stop_bad_robots/18/15126/index.html
As one who runs(and uses!) evolt's mailing lists, keeping thousands of people's email addresses out of the tentacles of Spiderts has always been a big concern of mine. At first, it was easily remedied by using the %40 'trick'. Instead of writing archives with an easily recognizable email address (abuse@aol.com for example), I had our mailing list software write all email addresses as abuse%40aol.com
This still allowed for a fairly easy to read address for humans while maintaining the ability to click the mailto: link and have one's associated email client create a new message with the correct email address entered. The Spiderts wouldn't recognize abuse%40aol.com as a valid email address and therefore not harvest it.
This was a fairly good solution until its use became widespread, at which point the creators of the Spiderts tweaked their unholy creations to recognize abuse%40aol.com as a harvestable email address and siphon it as well. As if it couldn't get worse, it was also becoming apparent that the newer generations of Spiderts don't play by the rules set out for web spiders, and would disregard any "Disallow: /" entries in the robots.txt file. In fact, I've seen Spiderts that only go for what we specifically tell them not to! What's a webmaster to do?!? <-QUOTE}
Read the full story here:
http://evolt.org/article/Using_Apache_to_stop_bad_robots/18/15126/index.html