Adding delay means you have to keep more connections open at a single time. Parallelism doesn't favor a server if your problem is already a small server getting hit by a big scraper
About 20 kilobytes of socket + TLS state, if you've really optimised it down to the minimum. Most server software isn't that lean, of course, so pick a framework designed for running a million or so concurrent connections on a single server (i.e. something like Nginx)
Something similar to proof-of-work but on a much smaller scale than Bitcoin.