PDA

View Full Version : Using Apache to stop bad robots


Paul Wilders
February 27th, 2002, 10:18 AM
-{ Quote: "For just about as long as the commercial Internet has existed, SPAM email has been the bane of users worldwide. The harder and harder we try to fight the spammers and keep our email addresses out of their hands, the smarter they get and the harder they fight back. One example of peoples attempts to fight back is the large numbers of joe@NOSPAM.email.com, NO.mary.SPAM@REMOVESPAM.mary.com, etc email addresses you find on Usenet and web based communities these days. Worse yet, many people hold back from contributing to online discussions for fear their email address will be available for evil web spiders (I call them Spiderts - A web spider with a Catbert type personality) to harvest and exploit from mailing list archives.

As one who runs(and uses!) evolt's mailing lists, keeping thousands of people's email addresses out of the tentacles of Spiderts has always been a big concern of mine. At first, it was easily remedied by using the %40 'trick'. Instead of writing archives with an easily recognizable email address (abuse@aol.com for example), I had our mailing list software write all email addresses as abuse%40aol.com

This still allowed for a fairly easy to read address for humans while maintaining the ability to click the mailto: link and have one's associated email client create a new message with the correct email address entered. The Spiderts wouldn't recognize abuse%40aol.com as a valid email address and therefore not harvest it.

This was a fairly good solution until its use became widespread, at which point the creators of the Spiderts tweaked their unholy creations to recognize abuse%40aol.com as a harvestable email address and siphon it as well. As if it couldn't get worse, it was also becoming apparent that the newer generations of Spiderts don't play by the rules set out for web spiders, and would disregard any "Disallow: /" entries in the robots.txt file. In fact, I've seen Spiderts that only go for what we specifically tell them not to! What's a webmaster to do?!?" }-

Read the full story here:

http://evolt.org/article/Using_Apache_to_stop_bad_robots/18/15126/index.html

Mike_Healan
March 6th, 2002, 06:52 AM
Here's what I use on all my pages. I've only been spammed twice, from the same spammer, and I have a feeling someone gave it to him/her rather than it being harvested.
-{ Quote: "<SCRIPT language=Javascript type=text/javascript>
<!--Begin JavaScript
// Hide e-mail from robots
username="account name"
at="@"
domainname="domain.com"
document.write("<a href='mailto:"+username+""+at+""+domainname+"?subject=About your site'>"+username+""+at+""+domainname+"</a>")
// end of JavaScript-->
</script>" }-
For those with javascript turned off, I use a tiny gif so they can at least see it.
http://www.spywareinfoforum.com/images/email.gif

spy1
March 6th, 2002, 10:58 AM
Mike - Are you going to make that into a d/l'able script that us brain-dead people can just click on in order to put it in their computers?

I'm a big fan of "Yup, that looks good- wut button does I push to install that thang?" :) Pete

Mike_Healan
March 6th, 2002, 11:35 AM
hehehe
No, but you gave me an idea.
<shameless plug>
http://www.spywareinfoforum.com/spambots.html
</shameless plug>
;)

spy1
March 6th, 2002, 01:02 PM
lol! Yes, we are the 'Idea People'! You're welcome! Pete