This is more of a beginners guide than master class. This method will not extract most content on modern websites because of the way javascript behaves on them. It is also vertically not horizontally scalable. There are many other reasons that this a step one when web scraping.
It's part of a series of blog posts that talks explicitly about crawling. There are indeed other links that do better explaining advanced extraction techniques.
I worked on a large web scraper for several years and JavaScript almost never needs to be executed. The only times I've had were to extract obfuscated links that are revealed by some bit twiddling code, specific to each request, and this was achievable by forking out to deno.
I think javascript comes up because cloudflare use some kind of javascript challenge as part of the DDOS protection. There are python libraries that know how to deal with it, or you can use some level of headless browser.
https://github.com/VeNoMouS/cloudscraper
This is something I find a lot of web scraping tools miss. Are there any you'd recommend that specifically deal with things like async JavaScript content loading, or loading content based on what you click on a page (e.g., in Single Page Apps)?
Javascript content loading is easier in most cases. Just look at your browser network inspector and grab the URL.
Usually the response is in JSON and you can ignore the original page. You might have to auth/grab session cookies first, but thats still easier than working with the HTML.
Thanks! If there are any third-party managed tools to do this, that would be awesome to know about (i.e., where they somehow run common JS functions/site interactions to test for additional content).