That's a very interesting way to look at the problem - algorithms like mine and Techmeme's do indeed digest the full article whereas HN does not. I had thought the primary issue would be whether or not the site provided a summary / thumbnail (as I see WindyCitizen does), not whether or not it scanned the source's bits.
Having to pay for my algorithm to access this data would be a big deal to me - I'm operating on a shoestring budget and I don't want to do that.
Sure, it would make things hard for you, but again, if you were using any other sort of data, there'd be usage costs. The news folks are just figuring that part out now, while every tech-first company has that build into their business from the start.
Except for the fact that transmission/distribution costs are almost zero and 'news' is a broadly consumed information resource. It doesn't matter that there would have been usage costs in the past -- they would be silly now.
But the API is HTTP and the news stories are syndicated across a hundred different sites. How do you limit the crawlers under this scheme? It seems like any serious attempt to limit crawling will require major software redeployment, cooperation of crawlers, widespread authentication, or some combination of these. Is there actually a feasible way to do this without breaking the web?
These are all good points. I don't have answers to any of them. Feasibility is a whole other issue.
My point is that if you look at online newspapers as online services, then they should be able to charge people for programmatic access to their service, just like any other tech service does through its API.
If I want to build an app on the back of Yahoo BOSS, I have to pay Yahoo.
If I want to build an app on the back of the New York Times, maybe I should have to pay the New York Times.
Having to pay for my algorithm to access this data would be a big deal to me - I'm operating on a shoestring budget and I don't want to do that.