I have posted a list of IPs that have been blocked by my bot trap from July 2007. The list is a work in progress for two reasons. I’m updating the list almost daily. This is the easy part. Initially I only listed the IP and useragent. As time allows I’m adding additional Whois information, such as the IP range, host name, country, and organization. (more…)
July 24, 2008
July 17, 2008
Every webmaster should be interested in protecting their website from scrapers and email harvesters.
One tool for securing your site is using a bot trap. If your an experienced programmer you could probably write a better bot trap than the one I use. As someone who is technically challenged like I am you use the best you can find that is fairly simple to implement.
June 5, 2008
I’ve started something new. I’m publishing a list of bots that have disobeyed the robots.txt protocol on my sites. Because a bot is listed does not necessarily mean the bot is bad. It just means the bot did not obey the robots.txt file.
I think one thing that makes my list different from others (more…)
October 24, 2007
Process1 has hit my bot trap 3 different times this month with different IPs so I started doing a little digging. Since October is not quite over I checked the logs for September for the two sites involved. The results were somewhat startling to me.
September 16, 2007
I first heard about heard about installing Honey Pots (or Honey Traps) on web pages several months ago. I had no idea what that meant. In the context in which the references were made I could tell it was a reference to dealing with bad bots, or spiders. I’m not sure if the people were referring to Project Honey Pot or not. They were programmers who a very capable of developing their own honey pots.
August 28, 2007
What do I mean by Opt-in? In short rather than always adding user agents, or IP addresses to your .htaccess file to blacklist them, only allow certain bots, and browsers to access your site. So rather than using the a black list method to prevent unwanted bots, you just white list certain bots. All others will get a 403 error page.
August 22, 2007
All of these bots have got caught in one of my bot trap at least once. Bots like Panscient and Cyveillance have got caught on at least two of my sites. A few of the bots (spiders) listed are legitimate bots, who claim to obey Robots.txt.
August 9, 2007
Panscient Data Services Pty Ltd is a Company in Australia. In the last few days their bot hit my bot trap on two different sites. The IP both times was 220.127.116.11 and the user agent was panscient.com. They are operating from this IP Range. 18.104.22.168/30. (more…)
August 6, 2007
Supposedly Cyveillance is a security company roaming the net looking for illegal content which they sell the information to their customers. According to this article about Cyveillance their clients are those from the music & movie industry, thus the RIAA, and MPAA. If the article is correct, and you look at the end of the article you will see some customers listed who have nothing to do with RIAA or the MPAA industries. I do not know how accurate the authors information is, but he seems to have spent a lot of time researching this. Read the article and draw your own conclusions. (more…)
July 28, 2007
Something I noticed from my logs today on one site. The request was for non existent files. The request appears to be from the US Government.