Yahoo! + wretch.cc + robots.txt = EVIL

In: Search Engines

2 Oct 2007

Update: 無名小站已經把robots.txt拿掉了

wretch.cc就是台灣的無名小站,一向負面新聞不絕,過往的負面新聞可以參考blog.xdite.org,雖然被台灣Yahoo!收購,但情況好像沒有改善。

無名小站最近將 robots.txt 改成

User-agent: Slurp
Disallow:
User-agent: *
Disallow: /

也就是說只有Yahoo!自家的robot才會/可以index無名小站上的頁面,當然,如果是不守規矩的robot就可以照index無誤,但守規矩的robot例如googlebot就無法再index無名小站上的頁面,這樣的舉動一來令人聯想到Yahoo!用來對付Google,二來用這樣的手段對付遵守standard的公司,不守規矩的公司可以照index無名小站上的頁面,變相鼓勵不遵守standard。

看看香港人常用的BSP的robots.txt︰

spaces.live.com 只有一句remarks

http://home.services.spaces.live.com/robots.txt

# robots.txt for http://spaces.msn.com/

mysinablog.com

http://mysinablog.com/robots.txt

#
# robots.txt for http://www.mysinablog.com
# last updated: 2nd Oct 2007
#

User-agent: hl_ftien_spider
Disallow: /

User-agent: larbin
Disallow: /

User-agent: wget
Disallow: /

User-agent: libwww
Disallow: /

User-agent: HTTrack
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: NPBot
Disallow: /

User-agent: WebReaper
Disallow: /

User-agent: *
Disallow: /gallery/
Disallow: /imgs/
Disallow: /js/
Disallow: /styles/
Disallow: /templates/
Disallow: /tools
Disallow: /admin.php
Disallow: /atom.php
Disallow: /authimage.php
Disallow: /chkauthimage.php
Disallow: /resserver.php
Disallow: /trackback.php
Crawl-delay: 1

Comment Form

About this blog

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Quisque sed felis. Aliquam sit amet felis. Mauris semper, velit semper laoreet dictum, quam diam dictum urna, nec placerat elit nisl in quam. Etiam augue pede, molestie eget, rhoncus at, convallis ut, eros. Aliquam pharetra.

Photostream

  • Steve: 我是從 Amazon 直接訂回來的。 [...]
  • xiaojin: 您好,不小心来到,请问下,您的光盘是哪买的??谢谢,我在大陆这边,找不到有卖 [...]
  • Steve: I bought the S530 in Golden Shopping Arcade in Shamshuipo for HK$438. Hope this help. [...]
  • viewer: Hello. I found your blog in blog while finding a information about logitech keyboard S530. Could I [...]
  • Steve: 多謝哂﹗ [...]