Webmasterworld Shuts Out The Search Engines
I noticed a very interesting thread on one of my favourite sites, Brett Tabke’s Webmasterworld, in which he has decided to completely ban all search engine robots from the site with a robots.txt file.
This means no pages from Webmasterworld will be findable in Yahoo, Google, MSN or any other legitimate search engine. The main problem members have with the idea is that Webmasterworld is an enormous site which I think has in the region of half a million forum posts… and no decent onsite search engine of it’s own.
The main method members used for finding anything on the site was through a well-formed query on a search engine, so the task is much more difficult now. Tabke says a new site search solution is in the pipeline, but it doesn’t appear that it’ll be online immediately.
Heavy spidering by rogue search crawlers is the main reason given by Tabke for the move, which will last for “a month or three”:
we have been doing EVERYTHING you can think of. This is a part of that ongoing process. We can’t require all people to login and allow bots onto the site (eg: pure cloaking). Even the random ad scripts we cloak off to keep bots from seeing session id like content, gets grumbles from alot of members. The claims are that we are either selling links (which they claimed about our links to westhost and now rackspace are paid), or claim we are cloaking to get higher pr when we do block bots from seeing session ids. eg: no win situation for us.Related:So, we start by banning bots, and then follow immediatly with required cookies/logins for everyone. That will stop most of the bots. The ones it don’t, we will follow up with session id’s, and auto ban in htaccess for page view abuse. Lastly, we will move to captcha logins, and then random login challenges with other captcha gfx requirements.
WebmasterWorld Bans Spiders From Crawling