Do any of you use MSN as your primary search engine? The MSNbot that crawls my site is pretty much just stealing bandwidth, so I’m about to ban that bot from crawling here. For comparison purposes, allow me to present you with a few statistics from the month of June 2005.
- Exhibit A – MSNbot
- Crawl hits = 9561
- Bandwidth used = 124.43 MB
- Visits to wantmoore.com as a results from searches @ Google = 683*
- Exhibit B – Googlebot
- Crawl hits = 3415
- Bandwidth used = 51.74 MB
- Visits to wantmoore.com as a results from searches @ MSN = 41
The cost/benefit just isn’t enough for me. Not that I’m starving for bandwidth, and not because it’s causing performance issues (a href=”http://mindstormhosting.com”>Go Mindstorm Hosting!), but it just annoys me. And since it’s my site, I can do what I want right?
So, effective immediately:
User-agent: MSNBot
Disallow: /
If you notice any errors anywhere or your access gets blocked for some reason (I’m blocking some other bots and things as well), please email me and let me know.
* Includes both standard web searches as well as Google Images searches.
I seem to have something very different completely to you:
(#1) Googlebot – 17166 hits – 334mb – 2137 visitors (plus 167 from Images)
(#5) MSNBot – 1546 hits – 33.74mb – 70 visitors
Very bizarre, but MSNBot doesn’t seem to be abusing me half has badly as its hit you, but Googlebot sure his hitting my website a damn large number of times!!
Ahh, the mysterious ways of the search engine!
[...] As part of the Bigdaddy infrastructure switchover, Google has been working on frameworks for smarter crawling, improved canonicalization, and better indexing. On the smarter crawling front, one of the things we’ve been working on is bandwidth reduction. For example, the pre-Bigdaddy webcrawl Googlebot with user-agent “Googlebot/2.1 (+http://www.google.com/bot.html)” would sometimes allow gzipped encoding. The newer Bigdaddy Googlebots with user-agent “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” are much more likely to support gzip encoding. That reduces Googlebot’s bandwidth usage for site owners and webmasters. From my conversations with the crawl/index team, it sounds like there’s a lot of head-room for webmasters to reduce their bandwith by turning on gzip encoding. [...]
Google’s Cwarl Caching Proxy…
Matt Cutts went over BigDaddy in several Webmasterworld Sessions last week and just did a post on his blog that sums that up. "When you surf around the web, you fetch pages via your ISP. Some ISPs cache web pages……
[...] Matt Cutts, famous Google employee, has linked to an old post of mine reloated to search engine crawling and bandwidth usage. Welcome to all the visitors stopping by. « TDM60 Dual T1/E1/J1 PCI Card [...]
I get hit a lot by the MSN bots, but usually they are the first to index and bring in results. Lately it has been yahoo though.
Sometimes yahoo bots piss me off. They crawl hundreds of megs and I end up with about 50 visitors
Strange to know about MSNbot crawler. I will check this.
Same goes for me googlebot visits more often. Msn has more links indexed too compared to google index so kind of strange i might call it
This is really a common problem with MSNbot that it steals the bandwidth during its crawling process. I have also banned it. Use Google instead.