0

Yahoo! + wretch.cc + robots.txt = EVIL

Posted in Search Engines at October 2nd, 2007 /

Update: 無名小站已經把robots.txt拿掉了

wretch.cc就是台灣的無名小站,一向負面新聞不絕,過往的負面新聞可以參考blog.xdite.org,雖然被台灣Yahoo!收購,但情況好像沒有改善。

無名小站最近將 robots.txt 改成

User-agent: Slurp
Disallow:
User-agent: *
Disallow: /

也就是說只有Yahoo!自家的robot才會/可以index無名小站上的頁面,當然,如果是不守規矩的robot就可以照index無誤,但守規矩的robot例如googlebot就無法再index無名小站上的頁面,這樣的舉動一來令人聯想到Yahoo!用來對付Google,二來用這樣的手段對付遵守standard的公司,不守規矩的公司可以照index無名小站上的頁面,變相鼓勵不遵守standard。

看看香港人常用的BSP的robots.txt︰

spaces.live.com 只有一句remarks

http://home.services.spaces.live.com/robots.txt

# robots.txt for http://spaces.msn.com/

mysinablog.com

http://mysinablog.com/robots.txt

#
# robots.txt for http://www.mysinablog.com
# last updated: 2nd Oct 2007
#

User-agent: hl_ftien_spider
Disallow: /

User-agent: larbin
Disallow: /

User-agent: wget
Disallow: /

User-agent: libwww
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: NPBot
Disallow: /

User-agent: WebReaper
Disallow: /

User-agent: *
Disallow: /gallery/
Disallow: /imgs/
Disallow: /js/
Disallow: /styles/
Disallow: /templates/
Disallow: /tools
Disallow: /admin.php
Disallow: /atom.php
Disallow: /authimage.php
Disallow: /chkauthimage.php
Disallow: /resserver.php
Disallow: /trackback.php
Crawl-delay: 1
Published in Search Engines

No Responses to “Yahoo! + wretch.cc + robots.txt = EVIL”

Leave a Reply