Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This doesn't really work with most sites anymore, does it? It can't run JavaScript (unlike headless browsers with Playwright/Puppeteer, for example), has limited supported for more modern protocols, etc.?

Any suggestions for an easy way to mirror modern web content, like an HTTrack for the enshittifed web?



ArchiveBox works decently with Javascript and uses a headless browser, can be deployed with Docker


Which content/information sites dependant on javascript can be found? I always find marketing oriented or app interaction heavily using js and thus unable to archive...but otherwise...


Literally any modern social media, YouTube comments, anything using client-side rendering frameworks


I don't see Youtube/IG/Fb/Tiktok comments as valuable content or information, I would be surprised if somebody is treasuring such content.

Maybe tweets, but APIs are available for archiving (more or less).


True. But it's more like most sites don't work anymore. They can't run without javascript and lack any content but blank pages.


monolith can work together with headless chromium. YMMV




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: