Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Notice how it's not a grey area when Google do it. The usually double standard apply I guess.


I don't understand what this comment is referring to. Google's spider respects robots.txt, just block all paths and google will not crawl your site. So too for Bing, Yahoo, Baidu (some complications though, I think), Yandex.... Most of the major spiders respect robots.txt.

Is there some major Google web scraping effort I'm not aware of?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: