Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
There's Math.random(), and then there's Math.random() (2015) (v8.dev)
35 points by corentin88 on Nov 8, 2023 | hide | past | favorite | 49 comments


Weird to see this show up on HN - I was just looking at this page a couple of days ago.

I have a little puzzle I've been trying to solve in my spare time; maybe someone here can point me in the right direction.

The puzzle is: given a float from Math.random(), suppose you know only whether it is greater than it is then 0.5 (i.e you only see the result of a coin flip which depends on Math.random()). What is a practical method to reverse out the state of the xorshift+ generator, given multiple such successive observations of the output?

Any input appreciated!


In V8 engine, Math.random is implemented using xorshift128, which is a completely linear PRNG with 128 bits of state. Since you can recover 1 bit from each output, 128 outputs should be enoguh for you to setup a GF(2) linear system and solve for the state. Once you have the state, all you need is to simulate how V8 output future random values from state.

There are a series of challenges called "fastrology" from PlaidCTF 2023, which is about predicting Math.random() given some partial output (e.g. Math.floor(Math.random() * k)) with several variants. You can try to find writeups for that challenges, those solvers should be easy to adapt to predicting Math.random()>0.5 .


I've been studying this myself just a few days ago!

tl;dr For a "Math.random() > 0.5" function, you need ~128 successive outputs, each of which leaks exactly 1 bit of the state at that particular iteration, and then you "just" solve a system of linear equations over gf(2) to recover the full state for any given iteration.

You can use z3 to solve it automagically, or you can write your own gaussian elimination solver (which works out much faster than z3 in practice).

Math isn't exactly my strong suit, so it took me quite a bit of bashing my head against the wall to make sense of things and turn it into code, so I've been planning on writing a blog post that explains this all in-depth (or at least, as much depth as I understand it).

For Chrome, there's an added catch in that it generates random numbers in batches, and returns those batches in reverse-order, so you'd need to do a small brute-force to identify where the batch boundaries are in your output sequence.

For Firefox and Safari, there's an added catch in that they add together the two halves of the state before returning it. This can also be worked around with a small brute force.


It's fun it's exactly 1:1 and you don't need any more output than the state!

It's sort of sad it isn't all AES-CTR or ChaCha12. Fast enough not to be the bottleneck in your webpage, and if you find any hint of a pattern in the output, good news, you get to publish a paper on it!


If you need secure random values you should be using https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR.... Absolutely do not use `Math.random` for anything requiring security.


Sorry, to clarify, I know CSPRNG APIs are exposed and what you use for safe randomness.

What I'm saying is that outside rare, extreme situations--maybe some HPC Monte Carlo simulation where you need gobs of randomness--you might as well use strong primitives for all your randomness, because they work well and are now plenty cheap for the use case with hardware support, etc.

The games to find a fast non-cryptographic function that passes enough statistical tests, etc., don't make as much sense to me when cryptographic randomness costs a few cycles a byte. There is not much upside to every standard library exposing a bad random generator that we have to warn people not to use. The handful of people that really need xorshift128+ can figure something out.

Maybe still a position you disagree with, but at least clearer without the sarcastic gloss.


The intention is that web.crypto is used when reversing should be difficult.

The goal for Math.random is to have much lower memory and speed impact. You can certainly disagree with their priorities but given those priorities, it makes sense to go with a much simpler algorithm that can be reversed easily.


Sorry, I know you should be using a CSPRNG API for cryptography, and probably my glib phrasing there obscured that.

I'm saying the cost of running actual cryptographic primitives has dropped a ton over time: on computers from decades ago a cheaper flavor of pseudorandomness was clearly necessary, now hardware AES is very cheap. And webpages aren't typically massive doing HPC simulations or other things that will be bound by the PRNG taking a few cycles per byte.

So the memory/CPU benefit of keeping the bad PRNG around is not obviously still worth it to me. In your words, I think I disagree with their priorities, particularly because the cost savings are not what they used to be.


“the 2⁵² numbers between 0 and 1 that double precision floating point can represent”

it's a bit of a nitpick, but i believe there are 1023×2⁵² such numbers, which is quite a bit more. there are 2⁵² double precision floats in just [0.5;1)!


That's not a nitpick at all — it has huge ramifications for how `Math.random()` needs to be interpreted. Many folks want it (and transformations of it!) to behave as though it were a true random variable sampled from a theoretically ideal [0, 1) uniform distribution.

But it's not, and will never be. It can only ever return values that have been rounded down to some floating point number. Even if you realize this, you still might expect it to be able to return _any_ floating point number x∈[0, 1) with `eps(x)` probability. But that's not typically the case, either. Typical implementations round down to the previous multiple of `eps(1.0)` or `eps(0.5)`.



That article makes the same unstated assumption, attempting to treat `random()`'s output (and transformations thereof) as true random variables. But it just changes the rounding mode and bin size in a manner that _happens_ to work better for some transforms like log.

It changes the fundamental property of the distribution — the cdf — without stating it.

With floating point realizations of [0, 1) intervals:

   cdf(p) = P(random() < p) := p
With (0, 1] intervals:

   cdf(p) = P(random() <= p) := p


Using your cdf framing, while there is a point about [0, 1) versus (0, 1] intervals, the bigger point of the article is about whether said cdf holds for any IEEE-754 double-precision p, or whether it only holds for p of the form i*2^-53 (for integer i).


Oh I completely agree there's value in improving the accuracy of the cdf, especially for 32-bit float! There it can be quite a surprise that tests like `random() < x` can be off by as much as .6% for values of x on the order of 1f-5 and 6% for values on the order of 1f-6, even though they are an order of magnitude or two over `eps(1f0)`.

Following your post, I've found the following fast and straightforward and SIMD-friendly implementation that uses all 64 bits for a [0, 1) distribution:

    ```julia
    function random_float(rng)
        r = rand(rng, UInt64)
        last_bit = r & -r
        exponent = UInt64(2045)<<52 - reinterpret(UInt64, Float64(last_bit))
        exponent *= !iszero(r)
        fraction = ((r ⊻ last_bit)>>(8*sizeof(UInt64) - 52)) % UInt64
        return exponent | fraction
    end
    ```


You are correct, however picking floats uniformly like that is much more difficult, as some values should be thousands of orders of magnitude more likely to be picked than others. It's much easier to implement a generator that randomizes the mantissa and calls it a return.


My favourite implementation is

    double random_double(rng_t *rng) {
        return((double)(random_uint64(rng) >> 11) * 0x1.0p-53);
    }
It has the advantage of not needing bit_cast (which C lacks) and has 53 instead of 52 bits of randomness.


i for one expect my random floating point numbers to have the same distribution as a random bitfield interpreted as a float, and anything else should be considered a bug


I don’t think that would be very useful, because there would be a 1/2048 chance of getting a NaN, and it would be difficult to convert the numbers to a known distribution.


In Java there's just Math.random(), and it's been broken, with WONTFIX, since 1998.

Java's random number generator returns exceptionally non-random values, but Sun, er, Oracle, won't fix it because of the most insane of reasons: unlike in any reasonable language, Java's PRNG essentially has a contract to be deterministic. There's seemingly a worry that someone, somewhere, is actually relying on java.util.Random to always produce the same random number sequence for a given seed from Java version to version.


Not java but a company I worked for had a perl application where old sales records were encrypted then put on dvd's for long term cold storage. I was a lowly tape monkey at the time so the actual details of the application were far above my pay grade. but long nights between shuffling tapes and swapping dvd I would try and figure out how the system worked. One thing I found out is that they were generating the encryption keys by combining a couple of values(something like secret + date I think) then hashing. They would then regenerate the key if the data ever needed to be retrieved. On it's own this is... fine I guess. cold records in a secure(ish) facility. The worrisome part was that they were using perl's rand(seed) as a hash function. Young me was like "well I don't know much about perl but I don't think there is any documentation guarantee about how exactly the rand(seed) function works." a few quick tests later and I found out that yes rand(seed) will return different values from different operating systems. and sometime even changing the version of perl was enough. They ended up having to make sure they went back to the same system the key was generated on then regenerate and store all keys used.

The lesson I learned, don't use random() when you want hash(). random is for non-deterministic output, hash is for deterministic output. random(seed) is an artifact of implementation and should never be used for deterministic output.

The real wtf was that this was perl on win98, probably due to when it was implemented, they wanted dvd burning capability and someone sort of knew perl.


Why is it unreasonable that given the same seed the same sequence will be generated? It's useful for generative art and video games at least. Sure it shouldn't be used for security...


Because it permanently prevents all bugfixes for a major library class. And java.util.Random has some whopping bugs.

If you need determinism, then the proper thing to do is use your own RNG. There's been a good Mersenne Twister implementation on Java since 1999. But a system-wide generator should never be beholden to determinism.


If you need determinism, and the standard library is deterministic, it seems reasonable to use the standard library.


Even if its high bits go 1 0 1 0 1 0 1 0 ... ?


I would say that's a bug in the RNG, nothing to do with any determinism issue


The point is that if you document your rng as deterministic, you can't fix any bugs.


why do you believe that? Can you explain why not?


because those bugfixes will change the random numbers generated


That's fine (if you document it). Repeatability may mean 'repeatability over the same program instance/minor version/major instance'. It's fine if you're clear. But ISWYM too.


Links to these whopping bugs pls?

I also don't think you've justified why determinism from a seed is a bad thing.


Name another language where random() is deterministic. Ask yourself why that is.

As to bugs. Let's start with the famous one:

http://alife.co.uk/nonrandom/

This is due to boneheaded mistakes in Sun's choice of constants for its LCG and errors in its bit-handling. These are massive mistakes, which it can no longer fix.

Next, there's an outstanding bug in nextBytes(), which generates ints and then cuts them into bytes (a big no-no for this particular LCG).

Some unfortunate omissions: nextChar(), nextShort(), nextByte() are missing, and there is no save-state procedure. There is no nextDouble() method that is inclusive for one or exclusive for zero.

There was a notorious bug in nextGaussian() which would take the log of 0 and then divide it by 0, but that has long been fixed. :-) [And some other bugs which were fixed early on despite Sun's claim that it couldn't fix bugs due to its stupid nondeterminism promise. For an RNG!]


Deterministic random number generation, where the same seed produces the same sequence of random numbers, can be achieved in various programming languages through different methods or libraries. Here are some programming languages and methods that allow for deterministic random number generation:

1. Python: The `random` module can be seeded to produce deterministic random sequences. 2. Java: Java's `Random` class can be seeded to produce deterministic random sequences. 3. C++: The C++ Standard Library provides functions like `srand` and `rand` that can be used for deterministic random number generation when seeded. 4. C#: C# offers the `System.Random` class, which can be seeded for deterministic randomness. 5. JavaScript: In browsers and Node.js, the `Math.random` function can be overridden to make it deterministic. 6. MATLAB: MATLAB's `rand` function can be made deterministic by setting the seed with `rng`. 7. Ruby: Ruby's `Kernel.rand` method can be seeded for deterministic random numbers. 8. Julia: Julia provides a `Random.seed!` function to set the seed for deterministic randomness. 9. R: R has functions like `set.seed` to control the randomness and make it deterministic. 10. Swift: In Swift, you can use the `arc4random_uniform` function with a fixed seed for deterministic random numbers.

These are just a few examples, and many other programming languages and libraries offer similar functionality for deterministic random number generation when a specific seed value is used.


Any PRNG will return the same sequence when starting with the same seed. That's basically the point of it and it's quite a useful and important property, e.g. for simulations where you then just have to document your chosen PRNG and the seed for others to replicate your results instead of also shipping gibibytes of random numbers.

And since the exact algorithm has been documented for java.util.Random you can't just change it.

That being said, .NET had a similar problem, with System.Random having some drawbacks and bugs over the years. They chose to keep the algorithm the same when a seed is used (again, it's an important property you don't just break), but otherwise switch to something completely different. It gets more complicated even, as you can derive from Random there and some of those changes may be observable, so they also check for that: https://source.dot.net/#System.Private.CoreLib/src/libraries...


[Sigh] Sounds are misunderstanding what I meant by determinism, my apologies. Of course a given instance of an RNG will generate the same sequence given the same seed. That's not the issue.

The issue is Java historically has guaranteed that, for all future versions and implementations of Java, the RNG will produce the exact same sequence given the same seed. It is deterministic across language versions and implementations. This is extremely unusual for a programming language, and a very bad idea. Java's RNG has grievous errors: but these bugs cannot be fixed because it would change the sequence! Its bugs are fixed in stone.


> Name another language where random() is deterministic

pretty well all of them, surely? eg. https://learn.microsoft.com/en-us/sql/t-sql/functions/rand-t... "For a specified seed value, the result returned is always the same."

Pretty well every language I'm aware of does it this way. Are we even talking about the same thing?

As for the others you point out, I'm afraid I can't speak for that. I'll just have to accept what you say.


Julia, for example promises that seeded random numbers will only be the same within the minor version and there's a 3rd party package if you need reproducible random numbers. Guaranteeing cross version random number compatibility comes at a pretty major cost for performance (as new better algorithms are discovered) and removes your ability to bugfix the random number generator.


Because random(seed) has no specification for what the output will be. random() should only be used if you specifically want a non-deterministic value. The function you want to use for uniform distribution in a deterministic manner is hash(value)



I guess sort of with the new proposal.


When they changed the iteration order of HashMap keys (1.7 I think?), tons of code broke, although fortunately mostly tests. It was easy to depend on it by doing something like creating a gold file test with expected results.

They don't want code to break like that, which means they need to preserve this sort of thing.


Isn't it normal to set a fixed seed in testing and then expect deterministic results so you can write simpler assertions?


Maybe its Minecraft?


I don't understand why you'd want a floating point random.

It just seems like a terrible idea (as opposed to int)


The random floating-point numbers are not general-purpose, they are needed in applications whose input parameters are floating-point numbers.

For instance they are needed for testing the implementation of various functions of FP numbers, or for the so-called Monte-Carlo methods used for numerical integration, especially in multiple dimensions or for the solution of certain systems of equations with partial derivatives, or for the simulation of how some industrial product, e.g. an electronic integrated circuit, behaves when its components have random values of their characteristic parameters, within their accepted tolerances, as it always happens in any industrial production process, or the random numbers may be used in an optimization problem, to search through the possible solution space.

For random floating-point numbers it is frequent to need various other probability density distributions than uniform.

In most cases the best way to generate pseudo-random FP numbers is to derive them from uniform pseudo-random integers, using an appropriate method that will ensure the desired probability density distribution.


The uniform distribution of real numbers is easy to convert to other distributions, so if you want a language to have only one randomness function, it's a reasonable choice.


This is in javascript. All of the numbers are in floating point. (or were, initially)


Many of the algorithms for generating random numbers from different distributions want to start with uniform floats in [0,1) or (0,1]


In my field (music, video, games) random floating point numbers are ubiquitous.


Make no mistake however: even though xorshift128+ is a huge improvement over MWC1616, it still is not cryptographically secure. For use cases such as hashing, signature generation, and encryption/decryption, ordinary PRNGs are unsuitable. The Web Cryptography API introduces window.crypto.getRandomValues, a method that returns cryptographically secure random values, at a performance cost.

If the 256 bit private key of the encryption is derived from a large character set ( A-Z, 1-9, etc.), it does not matter if the RNG is not perfect. I am assuming it's not an online channel.

This is explained in more detail here https://www.reddit.com/r/cryptography/comments/fw2cdu/can_yo...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: