Keep it Simple, Stupid
In: Search Engines
2 Oct 2007Update: 無名小站已經把robots.txt拿掉了
wretch.cc就是台灣的無名小站,一向負面新聞不絕,過往的負面新聞可以參考blog.xdite.org,雖然被台灣Yahoo!收購,但情況好像沒有改善。
無名小站最近將 robots.txt 改成
User-agent: Slurp Disallow: User-agent: * Disallow: /
也就是說只有Yahoo!自家的robot才會/可以index無名小站上的頁面,當然,如果是不守規矩的robot就可以照index無誤,但守規矩的robot例如googlebot就無法再index無名小站上的頁面,這樣的舉動一來令人聯想到Yahoo!用來對付Google,二來用這樣的手段對付遵守standard的公司,不守規矩的公司可以照index無名小站上的頁面,變相鼓勵不遵守standard。
看看香港人常用的BSP的robots.txt︰
spaces.live.com 只有一句remarks
http://home.services.spaces.live.com/robots.txt
# robots.txt for http://spaces.msn.com/
mysinablog.com
http://mysinablog.com/robots.txt
# # robots.txt for http://www.mysinablog.com # last updated: 2nd Oct 2007 # User-agent: hl_ftien_spider Disallow: / User-agent: larbin Disallow: / User-agent: wget Disallow: / User-agent: libwww Disallow: / User-agent: HTTrack Disallow: / User-agent: grub-client Disallow: / User-agent: NPBot Disallow: / User-agent: WebReaper Disallow: / User-agent: * Disallow: /gallery/ Disallow: /imgs/ Disallow: /js/ Disallow: /styles/ Disallow: /templates/ Disallow: /tools Disallow: /admin.php Disallow: /atom.php Disallow: /authimage.php Disallow: /chkauthimage.php Disallow: /resserver.php Disallow: /trackback.php Crawl-delay: 1
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Quisque sed felis. Aliquam sit amet felis. Mauris semper, velit semper laoreet dictum, quam diam dictum urna, nec placerat elit nisl in quam. Etiam augue pede, molestie eget, rhoncus at, convallis ut, eros. Aliquam pharetra.