That is insane speedup! Thanks for sharing. How do I find next and hasNext is the bottleneck? Should I run profiler to see if they are being called a large number of times?
Unfortunately, any profiler I tried does not show these issues. Because the code you are profiling will be optimized completely differently. Although it makes sense to do the optimizations/inlining inside the top-level functions that profiler reports to take a lot of time (even if the iteration happens in deeper levels).
You just have to try it and see. I remember it was quite easy in IntelliJ, I believe there's an option to inline all invocations of a function. I just went crazy with that option but did it systematically to really find where the bottleneck is and make a minimal change.