Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> HTTP is an API of sorts…

True, but it is an API that they can't easily deprecate on a whim.



But they can quickly change the "structure" of that API.


I worked for a very early billpay company where you could pay your bills online to vendors, even if the vendor didn't support it. We used API's where we could, but where we couldn't...

We had a whole team dedicated to keeping up the changes vendors would make to their websites that we scraped for info. The team was called, of course, "Scrape and P(r)ay".


I believe this is where smarter tools like Kadoa [0] can be of help. It detects data structure changes for existing workflows, and adapts to them.

[0] https://www.kadoa.com/


A small problem compared to not having an API key and being stonewalled as to why.


If you build your scraper to find data on the page based on the shape of the data itself instead of the structure of the page then it will be resilient to most changes that don't materially change what data is displayed on the page.

So, prefer regex over css selectors, and css selectors over xpath, where possible. And don't select based on nesting or position if possible.


Isn't that similar to deleting your API key except you can at least fix the structure of the selenium one.


You're one CAPTCHA added to the flow from this access being effectively deprecated.


In the AI era, is this as true?


Depends on your development and per-action cost. And on the possible latency. It also changes your whole stack from "send a request" to "emulate each step in a browser while taking screenshots at (hopefully) the right event/delay" - that's a huge difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: