Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Generate a stream of random bytes, pre-generation is advised to avoid dominating the benchmark.

Current PRNGs are pretty fast. The Xoroshiro RNG "shootout" benchmark [0] lists some self-reported speeds. They claim 8+ GB/s for even their slowest, highest-quality PRNG. The general-purpose one they recommend is 10GB/s, and 32GB/s when vectorized.

The vectorized versions get close to current DRAM speeds. I think I'd prefer that over reading from a giant table, given that the table reads will have significant implications for caching and disrupt that aspect of the benchmark.

[0]: https://prng.di.unimi.it/#shootout



Thanks for that suggestion, I shall explore faster PRNGS and vactorization too then.

My domain is c#, where it's perhaps unsurprising that it's a lot faster to have an array in memory than go via the default Random, which is I believe is Xoroshiro ( at least in .net 6+ ). It certainly can generate data quickly with Random.GetBytes(), but repeated calls to Random.NextInt64() are much slower.

Another issue I found with generating random numbers and trying to multi-thread is thread safety / synchronisation. If each thread is doing it's own random generation then it's difficult to compare results between using 1 thread and 2 threads, because it's no longer the same single stream of random data measured across each benchmark, so the meta-randomness of how early you "should" hit a dupe becomes a factor.

Having pre-generated numbers and single thread responsible for handing out those numbers makes it easier, you can either have a thread-safe counter, or can increment a larger increment plus an offset for each thread number. The first "real" dupe is still in the same place in the original pre-generated data.

You can then compare the incremental gains, for example, of each thread having their own hashtable would get you logarithmic gains (I think that's the right way to express it, but essentially it's gaining just by having First Dupe = Min(First Dupe across N threads) vs an actual thread-safe concurrent hash-table, where you should[1] still see First dupe but sped up by a greater factor than just a naive work but not state sharing.

I recognise there are potential memory caching issues at play with the pregeneration approach, but for larger bit counts the actual work should hopefully dominate, particularly since look-up values aren't revisited by design.

[1] "Should", because there's a small chance in many algorithms that due to concurrency and race conditions you actually miss the duplicate. Either multiple runs should be tried looking for bi-modal data, or accept that the next dupe shouldn't be so long after the first, and be happy the speed-up is greater than the slow-down caused by very occassional race condition misses. The chance of a such a race condition is absolutely minuscule at the 48bit level, if we assume we are using say, 8 threads, and assume for a race condition to occur, the same 48-number would have to be concurrently being handled/generated by 2 different threads.


> If each thread is doing it's own random generation then it's difficult to compare results between using 1 thread and 2 threads, because it's no longer the same single stream of random data

Most modern RNGs support a "jump" operation that takes care of this; PCG has the best standard API I've ever seen. Alternatively, with any RNG you can interleave (call `next` numthreads times before selecting a number) but that's slower.


> which is I believe is Xoroshiro ( at least in .net 6+ )

Xoshiro I believe is in .NET 6, close cousin

https://blogs.siliconorchid.com/post/coding-inspiration/rand...


Thanks for that correction, I hadn't appreciated the subtle difference there.


In my experience, vectorization is very simple—just running multiple instsnces of Xoroshiro (or one of the weaker variants) inside a vector works quite well. Fitting the resulting peg into the hole of a preexisting API is difficult, but that’s always a problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: