Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting, can you give an example page that’s not listed?


Sure. A very simple one would be https://ipbl.herrbischoff.com, a public blocklist page referencing a resource used by a couple dozen users. The HTML doesn't get a lot simpler than that and is entirely valid markup. It's (unsurprisingly) low ranked in Google but it's there.

Not so in Bing, it's simply not there. The page exists since March 2021. Bing Webmaster Tools reads "Discovered but not crawled. URL cannot appear on Bing", giving no further reason. Also: "Last crawl attempted 01 Feb 2022 at 19:35", which means that Bing did not bother to retry for months, despite me submitting it manually on a regular basis. Clicking the "Live URL" tab results in entirely green checkmarks along with "URL can be indexed by Bing".

Another example would be my personal site: https://herrbischoff.com. Same issue. That one is listed on Google for more than 10 years.


This is fascinating, thanks. Have you experimented on allowing the bing ad bot that you have blocked? If they have some kind of retaliatory non-crawling?


Interesting theory. But the IPBL doesn’t even have a robots.txt and a different, larger site from a German celebrity I host does have the same directives and is indexed, although incompletely.

My working theory is that Bing’s selection algorithm is biased towards large and already popular sites. In the server logs, I don’t see Bing even attempting to crawl the sites I mentioned, except requesting robots.txt and the root page. Bing appears to be excruciatingly slow to update anything but high traffic sites.

Again, Microsoft Support was unable to explain this behavior even after manual, human review found everything to be in order.

I tried deleting robots.txt entirely and got only Chinese crawlers and SEO bots, but still no Bing crawl. All organic traffic comes from blogs linking directly and Google.


Just received a final non-answer from Microsoft support after weeks of escalating:

—————

Thank you for your patience!

After further review, it appears that your site < http://herrbischoff.com https://ipbl.herrbischoff.com/> did not meet the standards set by Bing the last time it was crawled.

Bing constantly prioritizes the content to be indexed that will drive highest users satisfaction. Please follow Bing Webmaster Guidelines to better understand criteria for most valuable content.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: