My recipe is to use Typhoeus (https://github.com/typhoeus/typhoeus) + Nokogiri. I have tried lots of different options including EventMachine with em-http-request and reactor loop and concurrent-ruby (both a re very poorly documented)
Typhoeus has a built-in concurrency mechanism with callbacks with specified number of concurrent http requests. You just create a hydra object, create the first request object with URL and a callback (you have to check errors like 404 yourself) where you extract another URLs from the page and push them to hydra again with the same on another callback.