Spam Whackers

Exposing Various Types of Spam – Offering SEO & Webmaster Tips

August 31, 2008

Blocked User Agents for your Black list

Filed under: — Connie @ 4:56 pm

Currently there are 417 user agents listed. This list has been put together based on various list of blocked user agents that I have come across. I’m sure this is not the ultimate list of blocked user agents. I will continue to add to the list. As time allows I will try to provide more information about each user agent.

This list of user agents includes legitimate search engines, site scrappers, and email harvesters.

The user agents are listed alphabetically. That makes it easier for me to quickly check when I come across a new list or user agent to see if I already have the user agent listed.

You might wonder as a webmaster why anyone would block a legitimate Search Engine? My questions to you? Why have a search engine crawling your site, eating up bandwidth that will never do you any good? Unless you target traffic from China, or Germany why let their crawlers crawl your site, and consume your resources?

Why let all the Universities that are running experimental bots crawl your site? Why let all the new worthless search engine startups crawl your site?

In the end only you can decide what is right for your site. It is not my prupose to tell you what to block or what to allow. I simply want to provide information that can help you in
those decisions.

By using a Opt-In or White List in your .htaccess file you can reduce the size of your black or blocked list. I use a combination of both. The article I linked to shows how to combine the two.

You might wonder if you read that article is why I maintain such a large black list? Because I knew about black list long before I knew about opt-in or white list. Using the opt-in (white list) method can save you a lot of time.

If you would like to add to this list or comment about it you can do so here.

You might be interested in seeing the list of blocked IPs that I maintain too. There are over 200 IPs listed. The reason any IP is listed is because a user agent from that IP address did not follow the directives in robots.txt on one of my sites.

8484_Boston_Project
#[Ww]eb[Bb]andit
Abacho
acontbot
AdoSpeaker
ah-ha
AIBOT
aipbot
#almaden
Amfibibot
AnswerBus
appie
Arachmo
Arameda
Arellis
Argus
ASPSeek
asterias
attach
BackWeb
baiduspider
Bandit
BatchFTP
BDFetch
BecomeBot
BigCliqueBOT
Bimbot
BLA
BlackWidow
boitho.com-dc
Bot\ mailto:craftbot@yahoo.com
BruinBot
btbot
Buddy
bumblebee
CCGCrawl
ccubee
CherryPicker
ChinaClaw
CipinetBot
citenikbot
ColdFusion
Collector
Combine
ContextAd Bot
contextadbot
ConveraCrawler
ConveraMultiMediaCrawler
Copier
cosmos
CostaCider
Cowbot
CrawlConvera
CrawlWave
#Crescent
Custo
CXL-FatAssANT
DA
DataCha0s
DataFountains
Deepindex
devoll.roswellspringcatalog.info/spring-fashion-2003.html8/18/2006
DiamondBot
Digger
DIIbot
DISCo
DISCo\ Pump
DM-Search
Download\ Demon
Download\ Wonder
Downloader
Drecombot
Drip
DTAagent
EasyDL
eCatch
EirGrabber
EmailSiphon
EmailWolf
EnfinBot
Eule-Robot
EuripBot
eventax
Exabot
Exabot-Images
Express\ WebPictures
ExtractorPro
EyeNetIE
fantomas
Favcollector
Faxobot
FDM_2.x
FileHound
Firefox_1.0.6_kasparek
Firefox_kastaneta
First_Browse_of_COnn
FlashGet
fluffy
Franklin_Locator
FrontPage
FyberSpider
Gaisbot
Galaxy
GalaxyBot
gazz
GenericBot-ax 0.85
genevabot
GeoBot
GetRight
GetSmart
GetWeb!
Girafabot
Go!Zilla
Go-Ahead-Got-It
GOFORITBOT
GornKer
gotit
Grabber
GrabNet
Grafula
Grafula
GroschoBot
Grub
gsa-crawler
GT::WWW
HappyFunBot
Healthbot
HMView
holmes
HooWWWer
Hotzonu
htdig
Html_Link_Validator_
http_sample
HttpProxy
httpunit
HTTrack
ia_archiver 09/17 see if obeys robots.txt
ichiro
IconSurf
Iltrovatore-Setaccio
Image\ Stripper
Image\ Sucker
Indy
#Indy Library
InetURL
InfociousBot
INGRID
InnerpriseBot
InterGET
InterGET
Internet\ Ninja
InternetSeer.com
intraVnews
IOneSearch.bot
Iria
ISC_Systems_iRc_Search
Jakarta_Commons-HttpClient
Java
Jayde Crawler
JetBot
JetCar
JOC\ Web\ Spider
JustView
KakleBot
Kyluka
lanshanbot
LapozzBot
larbin
LeechFTP
lftp
libwww
likse
Link_Valet_Online
LinkAlarm
LinkWalker
LocalcomBot
lwp-trivial
LWP::Simple
Mac_Finder
Mackster
Mag-Net
Magnet
Mass\ Downloader
Matrix
Memo
Metaspinner
Microsoft.URL
Microsoft_URL_Control
MIDown\ tool
Mirago
Mirror
Missigua_Locator
Mister\ PiX
MJ12bot
Mnogosearch
MonkeyCrawl
Mozdex
Mozilla.*NEWT
Mozzilla
MSIECrawler
MSNPTC
MVAClient
My_WinHTTP_Connection
NaverBot
NavissoBot
Navroad
NearSite
Net\ Vampire
NetAnts
NetMind-Minder
NetMonitor
NetSpider
Networking4all
NetZIP
Newsgroupreporter_LinkCheck
NextGenSearchBot
NG
nicebot
NICErsPRO
NimbleCrawler
Ninja
NLCrawler
noxtrumbot/1.0
NPBot
NuSearch Spider
Nutch
NutchCVS
ObjectsSearch
oBot
Ocelli
Octopus
Octora_Beta
Offline\ Explorer
Offline\ Navigator
OmniExplorer_Bot
Omnipelagos
online link validator
Openbot
Openfind
Orbiter
OutfoxBot
OutfoxBot
page_verifier
PageBitesHyperBot
PageGrabber
Pajaczek
Papa\ Foto
Patwebbot
pavuk
pcBrowser
PEAR_HTTP_Request_class
PEERbot
PHP_version_tracker
PhpDig
pipeLiner
Pockey
POE-Component-Client-HTTP
Poirot
polybot
Pompos
Poodle_predictor
Pooodle_predictor
Popdexter
Port_Huron_Labs
process
psbot test for robots.txt
psycheclone
Pump
PyQuery
Python-urllib
QweeryBot
RAMPyBot
Random
Ranking-Manager
RealDownload
Reaper
Recorder
ReGet
REL_Link_Checker_Lite
robschecker
RRG
RufusBot
SandCrawler
SANSARN
SBIder
schibstedsokbot
#scooter
Screw-Ball
Scrubby
Search-10
search.ch
Searchmee!
SearchSpider
Seekbot
Sensis Web Crawler
Sensis.com.au Web Crawler
Shim+Bot
ShunixBot
shybunnie-engine
SideWinder
silk
Siphon
sitecheck.internetseer.com
SiteSnagger
SiteSpider
#SlySearch test robots.txt
SmartDownload
sna-
Snake
Snap
Snapbot/1.0
Snapbot/2.0
Snappy
Snoopy
sohu-search
SpaceBison
Speed-Meter
SpeedySpider
Spinne
SpokeSpider
Squid-Prefetch
SquidClamAV_Redirector
SquigglebotBot
StackRambler
Stripper
Suck
SuperBot
SuperHTTP
sureseeker
Surfbot
SurveyBot
SygolBot
SynoBot
Szukacz
tAkeOut
Teleport\ Pro
TerrawizBot
ThisIsOurYear_Linkchecker
thumbshots-de-Bot
Tkensaku
topicblogs
TridentSpider
troovziBot
TurnitinBot
TutorGigBot
#ua
Ultraseek
unchaos_crawler
Updated
URL Spider Pro
URL Spider SQL
Vacuum
Vagabondo
vBSEO_
VoidEYE
VoilaBot
W3CRobot
Web\ Image\ Collector
Web\ Sucker
Web\ Sucker
Web_Downloader
WebAuto
webbot
WebCopier
WebCorp
webcrawl.net
WebDataCentreBot
WebEMailExtrac.*
WebFetch
WebFindBot
WebGather
WebGo\ IS
WebIndexer
WebLeacher
webMirror
Webnavigator
webPluck
WebReaper
WebSauger
Website
Website\ eXtractor
Website\ Quester
Webster
WebStripper
WebWhacker
WebZIP
Wells_Search_II
WEP_Search
Wget
Whacker
WhizBang
Widow
WISEbot
Wotbox
WWW-Mechanize
WWWeasel
WWWOFFLE
wwwster
Xaldon
Xaldon\ WebSpider
Xenu_Link_Sleuth
xirq
Xombot
XunBot
yacybot
YadowsCrawler
Yeti
YodaoBot
YottaShopping_Bot
Zao
Zatka
Zealbot
Zeus.*Webster
#Zeus_
ZipppBot
ZyBorg



No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

You must be logged in to post a comment.

Powered by WordPress