Quantcast
Channel: XT Zone
Viewing all articles
Browse latest Browse all 10

Block Searchengine Spiders

$
0
0

What is a Bot?
Bot is short for robot. It is a computer program that runs automatically. Bots are sometimes referred to as spiders because they crawl the entire Web looking for Web pages to add to a database. Bots can be used for both GOOD and EVIL!

How do you tell Bad Bots What You Want Them to do?

Use .htaccess (Hypertext Access).

Hypertext Access files are different than Robots Meta tags or the Robots Exclusion Protocol. That is because the bad bots can’t ignore Hypertext Access files!

Administrators must be using Unix/Linux servers (Apache) to place an .htaccess file on their server.

.htaccess files are invisible. Create a text file with a text editor like Notepad or WordPad and name it .htaccess. Place your .htaccess file in any directory of the server that you wish to protect. .htaccess will effectively stop bots in their tracks!

The dot “.” at the start of the .htaccess file name renders the file invisible in some computer programs. If you save the file in Notepad, you may need to enclose the file name with quotes when you create it. Another thing you can do is to initially name the .htaccess file without the dot. Then, once you upload it to your server, rename the file with the dot.

Make sure to upload the file in ASCII mode. If your FTP client does not upload the file with the correct file permissions, set the permissions to 644. This will allow you, the owner of the file, to read/write to the file and will allow others who view the file to read only.

Example: “.htaccess” & It’s my config.

SetEnvIfNoCase User-Agent “^Baidu” bad_bot
SetEnvIfNoCase User-Agent “^Baiduspider” bad_bot
SetEnvIfNoCase User-Agent “^sogou” bad_bot
SetEnvIfNoCase User-Agent “^sogou\ spider2″ bad_bot
SetEnvIfNoCase User-Agent “^Bloghoo” bad_bot
SetEnvIfNoCase User-Agent “^Scooter” bad_bot
SetEnvIfNoCase User-Agent “^YodaoBot” bad_bot
SetEnvIfNoCase User-Agent “^Yeti” bad_bot
SetEnvIfNoCase User-Agent “^NaverBot” bad_bot
SetEnvIfNoCase User-Agent “^iaskspider” bad_bot
SetEnvIfNoCase User-Agent “^QihooBot” bad_bot
SetEnvIfNoCase User-Agent “^larbin” bad_bot
SetEnvIfNoCase User-Agent “^Sosoimagespider” bad_bot
SetEnvIfNoCase User-Agent “^Sosospider” bad_bot
SetEnvIfNoCase User-Agent “^Sogou\ web\ spider” bad_bot
SetEnvIfNoCase User-Agent “^iearthworm” bad_bot
SetEnvIfNoCase User-Agent “^Twiceler” bad_bot
SetEnvIfNoCase user-agent “^BlackWidow” bad_bot
SetEnvIfNoCase user-agent “^ChinaClaw” bad_bot
SetEnvIfNoCase user-agent “^Custo” bad_bot
SetEnvIfNoCase user-agent “^DISCo” bad_bot
SetEnvIfNoCase user-agent “^Download\ Demon” bad_bot
SetEnvIfNoCase user-agent “^eCatch” bad_bot
SetEnvIfNoCase user-agent “^EirGrabber” bad_bot
SetEnvIfNoCase user-agent “^EmailSiphon” bad_bot
SetEnvIfNoCase user-agent “^EmailWolf” bad_bot
SetEnvIfNoCase user-agent “^Express\ WebPictures” bad_bot
SetEnvIfNoCase user-agent “^ExtractorPro” bad_bot
SetEnvIfNoCase user-agent “^EyeNetIE” bad_bot
SetEnvIfNoCase user-agent “^FlashGet” bad_bot
SetEnvIfNoCase user-agent “^GetRight” bad_bot
SetEnvIfNoCase user-agent “^GetWeb!” bad_bot
SetEnvIfNoCase user-agent “^Go!Zilla” bad_bot
SetEnvIfNoCase user-agent “^Go-Ahead-Got-It” bad_bot
SetEnvIfNoCase user-agent “^GrabNet” bad_bot
SetEnvIfNoCase user-agent “^Grafula” bad_bot
SetEnvIfNoCase user-agent “^HMView” bad_bot
SetEnvIfNoCase user-agent “^HTTrack” bad_bot
SetEnvIfNoCase user-agent “^Image\ Stripper” bad_bot
SetEnvIfNoCase user-agent “^Image\ Sucker” bad_bot
SetEnvIfNoCase user-agent “^Indy\ Library” bad_bot
SetEnvIfNoCase user-agent “^InterGET” bad_bot
SetEnvIfNoCase user-agent “^Internet\ Ninja” bad_bot
SetEnvIfNoCase user-agent “^JetCar” bad_bot
SetEnvIfNoCase user-agent “^JOC\ Web\ Spider” bad_bot
SetEnvIfNoCase user-agent “^larbin” bad_bot
SetEnvIfNoCase user-agent “^LeechFTP” bad_bot
SetEnvIfNoCase user-agent “^Mass\ Downloader” bad_bot
SetEnvIfNoCase user-agent “^MIDown\ tool” bad_bot
SetEnvIfNoCase user-agent “^Mister\ PiX” bad_bot
SetEnvIfNoCase user-agent “^Navroad” bad_bot
SetEnvIfNoCase user-agent “^NearSite” bad_bot
SetEnvIfNoCase user-agent “^NetAnts” bad_bot
SetEnvIfNoCase user-agent “^NetSpider” bad_bot
SetEnvIfNoCase user-agent “^Net\ Vampire” bad_bot
SetEnvIfNoCase user-agent “^NetZIP” bad_bot
SetEnvIfNoCase user-agent “^Octopus” bad_bot
SetEnvIfNoCase user-agent “^Offline\ Explorer” bad_bot
SetEnvIfNoCase user-agent “^Offline\ Navigator” bad_bot
SetEnvIfNoCase user-agent “^PageGrabber” bad_bot
SetEnvIfNoCase user-agent “^Papa\ Foto” bad_bot
SetEnvIfNoCase user-agent “^pavuk” bad_bot
SetEnvIfNoCase user-agent “^pcBrowser” bad_bot
SetEnvIfNoCase user-agent “^RealDownload” bad_bot
SetEnvIfNoCase user-agent “^ReGet” bad_bot
SetEnvIfNoCase user-agent “^SiteSnagger” bad_bot
SetEnvIfNoCase user-agent “^SmartDownload” bad_bot
SetEnvIfNoCase user-agent “^SuperBot” bad_bot
SetEnvIfNoCase user-agent “^SuperHTTP” bad_bot
SetEnvIfNoCase user-agent “^Surfbot” bad_bot
SetEnvIfNoCase user-agent “^tAkeOut” bad_bot
SetEnvIfNoCase user-agent “^Teleport\ Pro” bad_bot
SetEnvIfNoCase user-agent “^VoidEYE” bad_bot
SetEnvIfNoCase user-agent “^Web\ Image\ Collector” bad_bot
SetEnvIfNoCase user-agent “^Web\ Sucker” bad_bot
SetEnvIfNoCase user-agent “^WebAuto” bad_bot
SetEnvIfNoCase user-agent “^WebCopier” bad_bot
SetEnvIfNoCase user-agent “^WebFetch” bad_bot
SetEnvIfNoCase user-agent “^WebGo\ IS” bad_bot
SetEnvIfNoCase user-agent “^WebLeacher” bad_bot
SetEnvIfNoCase user-agent “^WebReaper” bad_bot
SetEnvIfNoCase user-agent “^WebSauger” bad_bot
SetEnvIfNoCase user-agent “^Website\ eXtractor” bad_bot
SetEnvIfNoCase user-agent “^Website\ Quester” bad_bot
SetEnvIfNoCase user-agent “^WebStripper” bad_bot
SetEnvIfNoCase user-agent “^WebWhacker” bad_bot
SetEnvIfNoCase user-agent “^WebZIP” bad_bot
SetEnvIfNoCase user-agent “^Widow” bad_bot
SetEnvIfNoCase user-agent “^WWWOFFLE” bad_bot
SetEnvIfNoCase user-agent “^Xaldon\ WebSpider” bad_bot
SetEnvIfNoCase user-agent “^Zeus” bad_bot

SetEnvIfNoCase User-Agent “^Exabot” bad_bot
SetEnvIfNoCase User-Agent “^Majestic-1*” bad_bot
SetEnvIfNoCase User-Agent “^msnbot” bad_bot
SetEnvIfNoCase User-Agent “^live” bad_bot
SetEnvIfNoCase User-Agent “^bing” bad_bot
SetEnvIfNoCase User-Agent “^Yahoo” bad_bot
SetEnvIfNoCase User-Agent “^Yahoo!\ Slurp” bad_bot
SetEnvIfNoCase User-Agent “^Yahoo!\ Slurp\ China” bad_bot
SetEnvIfNoCase User-Agent “^Googl” bad_bot
SetEnvIfNoCase User-Agent “^Googlebot” bad_bot
SetEnvIfNoCase User-Agent “^feedfetcher-Google” bad_bot
SetEnvIfNoCase User-Agent “^Indy\ Library” bad_bot
SetEnvIfNoCase User-Agent “^Yandex” bad_bot
SetEnvIfNoCase User-Agent “^ScoutJet” bad_bot
SetEnvIfNoCase User-Agent “^Cuil” bad_bot
SetEnvIfNoCase User-Agent “^Ask\ Jeeves” bad_bot
SetEnvIfNoCase User-Agent “^DotBot” bad_bot

Order Allow,Deny
Allow from all
Deny from env=bad_bot
deny from 202.96.51.0/24
deny from 202.96.170.0/24
deny from 202.104.129.0/24
deny from 202.106.186.0/24
deny from 202.108.4.0/24
deny from 202.108.5.0/24
deny from 202.108.7.0/24
deny from 202.108.9.0/24
deny from 202.108.11.0/24
deny from 202.108.22.0/24
deny from 202.108.23.0/24
deny from 202.108.33.0/24
deny from 202.108.36.0/24
deny from 202.108.44.0/24
deny from 202.108.45.0/24
deny from 202.108.249.0/24
deny from 202.108.250.0/24
deny from 202.160.178.0/24
deny from 202.160.179.0/24
deny from 202.160.180.0/24
deny from 202.160.181.0/24
deny from 202.160.183.0/24
deny from 202.165.102.0/24
deny from 220.181.12.0/24
deny from 220.181.13.0/24
deny from 220.181.14.0/24
deny from 220.181.19.0/24
deny from 220.181.26.0/24
deny from 220.181.28.0/24
deny from 220.181.31.0/24
deny from 220.181.32.0/24
deny from 220.181.38.0/24
deny from 220.181.61.0/24
deny from 222.185.245.0/24
deny from 60.28.17.0/24
deny from 60.28.22.0/24
deny from 61.135.132.0/24
deny from 61.135.145.0/24
deny from 61.135.146.0/24
deny from 61.135.152.0/24
deny from 61.135.157.0/24
deny from 61.135.168.0/24
deny from 61.135.169.0/24
deny from 61.135.220.0/24
deny from 203.209.252.25/24
deny from 124.115.4.0/24
deny from 124.115.1.0/24
deny from 114.80.93.0/24

deny from 72.30.0.0/16
deny from 74.6.0.0/16
deny from 67.195.0.0/16

If a bad bot tries to access your website, the bad bot will receive a 403 error that will forbid it from going any further into your site.

To test how your site’s files or directories will ban bad bots go to www.wannabrowser.com and type your URL into the appropriate field, then choose the bot that you have banned in your .htaccess file. If you recieve a 403 forbidden error, you have foiled the bots!


Viewing all articles
Browse latest Browse all 10

Trending Articles