Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> However, how is it reasonable to force a web site to serve its contents to a third-party company, without being allowed to make a decision whether to serve it or not?

Your statement makes absolutely no sense. That's not how internet works. If you serve something publicly you don't get to cherry pick who sees it.

Not only it makes no sense technically it's also a huge anti-competitive case.



It makes sense and it is how the internet works. Servers cherry pick who sees their content all the time. Scrapers are often blocked, as are entire IP address ranges. Things like Selenium server scrapers can be (approximately) detected and often are denied access.

I’m not sure about being anti-competitive. Serving a website is an action in which you open up your resources for others to access. My friend runs an open source stock market tracking website for free. He started getting hit with scrapers from big hedge funds and fintech companies a couple of months back. This costs him around $50-100 a month to serve all of these scrapers.


If he gives them a stable, fast API with a subscription fee, and the scrapers are truly from hedge funds, he’s going to make a lot more than $100/mo.


He should open up a Patreon, tip jar, something to get that funded.

Could also delay results, offer reduced temporal precision and other things to differentiate use cases.


He and I both have similar free open source websites with donate buttons. They are rarely clicked. Ad revenue over a month for me has been ~$400 while donations over two years have totaled $20. There are about 80,000 unique visitors per month.

It is nice to think donation platforms can fund high traffic open source projects, but this is simply not the case.

In any regard, I fear the potential of this ruling limiting developers’ ability to protect their servers and making us all roll over to the big players with their hefty scrapers taking all of our data for resale.


how long are you allowed to delay results, I mean not serving results is just delaying them forever but that's out. Can I delay serving results longer than chromium's default timeout?


Probably up to the point where a judge says 'this is blocking not delaying'.


I don’t see what legal or technical argument you’re making.

Technically, of course you can identify IP ranges owned by certain entities and restrict their access. That’s trivial, so what do you mean when you say the internet doesn’t work like that?

Legally, there’s plenty of region locked content for copyright and censorship reasons. A distributor might region lock because they don’t have distribution rights in particular regions. Are you saying distributors can’t publish free content at all because they can’t choose who sees it but would be breaking copyright law to publish to everyone? Or a site might region lock because certain content is censored in particular countries. Can you not publish anti-regime articles because a totalitarian country is on the Internet?

The entire world isn’t and shouldn’t be held hostage to the most restrictive laws that exist in the world. And the answer isn’t blocking on the requesting end because that’s technically much harder and blocks much, much more content. So what am I missing?

Edit: Forgot to include the other end of the spectrum. If I, as an individual, host my own site on my own hardware with my own connection that I pay the bandwidth for, can I deny a suspected not network?


Of course you get to choose. You can reject requests based on their user agent, their IP address, the owner or likely geographic location of the IP address, and many other possibilities.


What are these possibilities? You only get IP and client side information that client is _willingly_ sending to you. So if a script/user/bot/etc tells it's Firefox from 1.2.3.4 then all you know that it's a request from 1.2.3.4 that says it's Firefox. You can ask it to run Javascript code but that's beyond classic web interaction and then again you need to trust the client.

This interaction is impossible to be trustless thus every client can only be served based on their IP or some convoluted, hack exchange that is cat-and-mouse game at best.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: