Would you consider rolling your own? Python’s goose3 has worked well for me in article extraction. It seemed to be successful more often than trafilatura and newspaper3k.
I was not aware of any of those projects - thank you for pointing me in the right direction!
goose3, trafilatura, newspaper3k (and newspaper4k even) all look like great tools. We were not planning on rolling our own, but that might be the right way to go after all. Thanks again.
- reading an algos book (if you haven’t) might be helpful: Aho et al., CLRS, Sedgwick
- on LC, start with easy algorithm problems and sort by Acceptance in descending order. As you work the problem, talk aloud about your thought process and what you’re doing and where you’re headed. This is good practice for narrating your work during an interview
- maybe set a timer so if you’re truly stuck then you can pull up the solution and study it. Next time you see a similar problem or class of problem, you’ll be ready
- if you have a particular particular field or area you would like to work in, determine what kinds of algos or data structures might be particularly pertinent. For example, if you might work with storing and looking up strings, dig into trie structures. Or maybe it would be more suitable to dig into branch and bound strategies. Try to tailor what you’re learning to what you would find useful
Keywords of use to you might be “mock data” “mocks” and “mocking”. There are lots of tools, even for generating specific data types (names, emails, json, location coordinates, etc. etc.).
I am currently in the process of setting something up for our seven-year-old. My project is to set up a squid forwarding proxy server and have the family computer in the living room utilize that proxy server as its Internet connection. As usual, I am making this whole enterprise harder on myself by configuring the squid server to be able to dynamically change which WAN connection it uses dependent on which port is accessed by the client computer on my local network. Port 3128 means WAN1 (Spectrum cable), port 3129 means WAN2 (AT&T fiber). Step after that will be setting up port 3130 means outgoing VPN interface. Currently slogging through ip tables, routes, and rules, which isn’t my wheelhouse. :-D
We have pi-hole too, primarily for blocking ads and call-homes.