> However, how is it reasonable to force a web site to serve its contents to a t...

sezna · on Sept 10, 2019

It makes sense and it is how the internet works. Servers cherry pick who sees their content all the time. Scrapers are often blocked, as are entire IP address ranges. Things like Selenium server scrapers can be (approximately) detected and often are denied access.

I’m not sure about being anti-competitive. Serving a website is an action in which you open up your resources for others to access. My friend runs an open source stock market tracking website for free. He started getting hit with scrapers from big hedge funds and fintech companies a couple of months back. This costs him around $50-100 a month to serve all of these scrapers.

ericd · on Sept 10, 2019

If he gives them a stable, fast API with a subscription fee, and the scrapers are truly from hedge funds, he’s going to make a lot more than $100/mo.

ddingus · on Sept 10, 2019

He should open up a Patreon, tip jar, something to get that funded.

Could also delay results, offer reduced temporal precision and other things to differentiate use cases.

sezna · on Sept 10, 2019

He and I both have similar free open source websites with donate buttons. They are rarely clicked. Ad revenue over a month for me has been ~$400 while donations over two years have totaled $20. There are about 80,000 unique visitors per month.

It is nice to think donation platforms can fund high traffic open source projects, but this is simply not the case.

In any regard, I fear the potential of this ruling limiting developers’ ability to protect their servers and making us all roll over to the big players with their hefty scrapers taking all of our data for resale.

bryanrasmussen · on Sept 10, 2019

how long are you allowed to delay results, I mean not serving results is just delaying them forever but that's out. Can I delay serving results longer than chromium's default timeout?

rocqua · on Sept 10, 2019

Probably up to the point where a judge says 'this is blocking not delaying'.

spoondan · on Sept 10, 2019

I don’t see what legal or technical argument you’re making.

Technically, of course you can identify IP ranges owned by certain entities and restrict their access. That’s trivial, so what do you mean when you say the internet doesn’t work like that?

Legally, there’s plenty of region locked content for copyright and censorship reasons. A distributor might region lock because they don’t have distribution rights in particular regions. Are you saying distributors can’t publish free content at all because they can’t choose who sees it but would be breaking copyright law to publish to everyone? Or a site might region lock because certain content is censored in particular countries. Can you not publish anti-regime articles because a totalitarian country is on the Internet?

The entire world isn’t and shouldn’t be held hostage to the most restrictive laws that exist in the world. And the answer isn’t blocking on the requesting end because that’s technically much harder and blocks much, much more content. So what am I missing?

Edit: Forgot to include the other end of the spectrum. If I, as an individual, host my own site on my own hardware with my own connection that I pay the bandwidth for, can I deny a suspected not network?

_ps6d · on Sept 10, 2019

Of course you get to choose. You can reject requests based on their user agent, their IP address, the owner or likely geographic location of the IP address, and many other possibilities.

kabacha · on Sept 10, 2019

What are these possibilities? You only get IP and client side information that client is _willingly_ sending to you. So if a script/user/bot/etc tells it's Firefox from 1.2.3.4 then all you know that it's a request from 1.2.3.4 that says it's Firefox. You can ask it to run Javascript code but that's beyond classic web interaction and then again you need to trust the client.

This interaction is impossible to be trustless thus every client can only be served based on their IP or some convoluted, hack exchange that is cat-and-mouse game at best.