Thursday, December 17, 2015

There's Math.random(), and then there's Math.random()

Randomness visualization before PRNG update (V8 4.7) and after (V8 4.9) (visualization by Mike Malone, http://bl.ocks.org/mmalone/bf59aa2e44c44dde78ac, CC-BY-4.0)

Math.random()
Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.


Math.random() is the most well-known and frequently-used source of randomness in Javascript. In V8 and most other Javascript engines, it is implemented using a pseudo-random number generator (PRNG). As with all PRNGs, the random number is derived from an internal state, which is altered by a fixed algorithm for every new random number. So for a given initial state, the sequence of random numbers is deterministic. Since the bit size n of the internal state is limited, the numbers that a PRNG generates will eventually repeat themselves. The upper bound for the period length of this permutation cycle is of course 2n.

There are many different PRNG algorithms; among the most well-known ones are Mersenne-Twister and LCG. Each has its particular characteristics, advantages, and drawbacks. Ideally, it would use as little memory as possible for the initial state, be quick to perform, have a large period length, and offer a high quality random distribution. While memory usage, performance, and period length can easily be measured or calculated, the quality is harder to determine. There is a lot of math behind statistical tests to check the quality of random numbers. The de-facto standard PRNG test suite, TestU01, implements many of these tests.

Until recently (up to version 4.9.40), V8’s choice of PRNG was MWC1616 (multiply with carry, combining two 16-bit parts). It uses 64 bits of internal state and looks roughly like this:
uint32_t state0 = 1;
uint32_t state1 = 2;
uint32_t mwc1616() {
  state0 = 18030 * (state0 & 0xffff) + (state0 >> 16);
  state1 = 30903 * (state1 & 0xffff) + (state1 >> 16);
  return state0 << 16 + (state1 & 0xffff);
The 32-bit value is then turned into a floating point number between 0 and 1 in agreement with the specification.

MWC1616 uses little memory and is pretty fast to compute, but unfortunately offers sub-par quality:

  • The number of random values it can generate is limited to 232 as opposed to the 252 numbers between 0 and 1 that double precision floating point can represent.
  • The more significant upper half of the result is almost entirely dependent on the value of state0. The period length would be at most 232, but instead of few large permutation cycles, there are many short ones. With a badly chosen initial state, the cycle length could be less than 40 million.
  • It fails many statistical tests in the TestU01 suite.

This has been pointed out to us, and having understood the problem and after some research, we decided to reimplement Math.random based on an algorithm called xorshift128+. It uses 128 bits of internal state, has a period length of 2128 - 1, and passes all tests from the TestU01 suite.
uint64_t state0 = 1;
uint64_t state1 = 2;
uint64_t xorshift128plus() {
  uint64_t s1 = state0;
  uint64_t s0 = state1;
  state0 = s0;
  s1 ^= s1 << 23;
  s1 ^= s1 >> 17;
  s1 ^= s0;
  s1 ^= s0 >> 26;
  state1 = s1;
  return state0 + state1;
}
The new implementation landed in V8 4.9.41.0 within a few days of us becoming aware of the issue. It will become available with Chrome 49. Both Firefox and Safari switched to xorshift128+ as well.

Make no mistake however: even though xorshift128+ is a huge improvement over MWC1616, it still is not cryptographically secure. For use cases such as hashing, signature generation, and encryption/decryption, ordinary PRNGs are unsuitable. The Web Cryptography API introduces window.crypto.getRandomValues, a method that returns cryptographically secure random values, at a performance cost.

Please keep in mind, if you find areas of improvement in V8 and Chrome, even ones that—like this one—do not directly affect spec compliance, stability, or security, please file an issue on our bug tracker.

Posted by Yang Guo, Software Engineer and dice designer

26 comments:

  1. >The number of random values it can generate is limited to 2**32 as opposed to the 2**52 numbers between 0 and 1 that double precision floating point can represent.

    double precision floating point can represent ~2**62 values between +0 and 1, but with "uniform distribution" - 2**53 values

    ReplyDelete
  2. you forgot "return state1 + s0"

    ReplyDelete
  3. Interesting, why didn't you choose PCG? It's got better statistical properties than xorshift* (from what I hear). I integrated it in a product last year, and as I remember it's only two or three lines more code than xorshift.

    ReplyDelete
    Replies
    1. A program can be shorter, but use less CPU cycles. For PCG, it has `^ oldstate` which is very very expensive.

      Delete
  4. I might be biased, having invented xorshift128+, but I'd be extremely careful before taking from granted any claim made about PCG generators. Wait at least until the paper is published, if it ever happens.

    ReplyDelete
    Replies
    1. This is precisely the reason I chose xorshift128+ over PCG. I wanted to go with something known to be a good and safe choice, after the bad PRNG we had before. Xorshift128+ serves our needs pretty well. Thanks for inventing it!

      Delete
    2. Hey, since you're here, two minor observations (really, really minor, but you get obsessed when you start working on this stuff):

      - The current xorshift128+ on the site has slightly different constants (23/18/5). I changed them because I wanted to have perfect polynomial weight (65 vs. 63), which is theoretically better. The resulting "escape from zero" time is also slightly better. However, there are 10%+ "suspect" p-values over 100 BigCrush tests (most likely a statistical fluctuation). So don't panic if you see different numbers. :)

      - Technically, it would be better if you used the upper 52 bits, rather than the lower 52 bits, to generate a double. The lowest bit of a xorshift128+ generator is an LSFR, and while people has been happy using LSFR for decades, it is slightly inferior in quality to all other bits. This is really OCD, as computational errors makes the lowest bit almost irrelevant, but now you know.

      I'm really happy you picked up xorshift128+. When I wrote "...making it an excellent drop-in substitute for the low-dimensional generators found in many programming languages" I was hoping that exactly this would happen :).

      Delete
    3. Another minor observation: I suggest now to use SplitMix (http://xorshift.di.unimi.it/splitmix64.c) to initialize the state bits. The reason is that MurmurHash3's avalanching function has 0 as fixpoint, so the 0 case must be handled manually. If presently someone calls SetSeed(0), MurmurHash3 will return zero twice and the generator will be stuck in state 0. I used to patch manually the 0 seed in my Java code, but using SplitMix is a nicer solution.

      Delete
    4. Thanks for the suggestions! I will definitely revisit the current implementation with your tips in mind.

      Delete
    5. In fairness: many of the PCG claims are technically verifiable. You don't need to take them for granted. You can benchmark performance claims; and you can run the statistical tests yourself. Also, since the claim isn't cryptographic security, I'm not sure what the huge risk is - as V8's old implementation of random demonstrates - despite huge flaws, it was still good enough to for many use cases (and most use cases *are* trivial). It needn't be perfect; it needs to be good enough.

      Also, although from an academic point of view you might claim the paper isn't published, that's not strictly true - http://www.pcg-random.org/paper.html - it simply hasn't withstood peer review yet. And while that's worth something, it's by no means a sure seal of quality.

      Not that I disagree that using a conservative choice is a good idea.

      Delete
    6. Which ones? Several claims on the PCG site are false, or weasel words (e.g., xorshift64* has "no" for "k-dimensional equidistribution", but it is actually equidistributed in the maximum dimension).

      Please be precise. You should also be precise about which generator you have in mind—the PCG general definition covers basically any generator ever conceived.

      Benchmarks have been done, and the results are at http://prng.di.unimi.it/ . The code used is publicly distributed. Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*.

      Delete
    7. "Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*."

      At last, someone (Sebastiano himself) mentions the obvious, seemingly omitted bit of information anywhere that I have seen on that PCG website –the very existence of our beloved xorshift1024* and xorshift128+ algorithms–. :)

      Delete
  5. Which statistics tests did you prepared? What is the value of Chi-square obtained for the whole period?

    ReplyDelete
    Replies
    1. The blog post states that TestU01 is used. It contains a many different statistical tests.

      Delete
    2. The blog post states that TestU01 is used. It contains a many different statistical tests.

      Delete
  6. I tested xorshift128+ a bit last night. It's probably better than what most people are using. I wouldn't use it for some big physics simulation that needs a terabyte of very unbiased non-correlated data. But hopefully no-one does that in javascript. The most significant 17 bits are fairly poor quality. The 1 least significant bit is very poor quality. The 46 bits in between are ok as far as my limited testing goes.

    As for PCG, just taking a single stream, it's quality seems very good. But it looks very touchy about seed selection (more so than xorshift128+). I haven't fully tested that part yet, but i urge caution until it's been looked at more closely.

    ReplyDelete
    Replies
    1. Can you be more precise about the kind of tests? The lowest bit is an LSFR, so it's quality is lower than the rest, but that's known.

      Delete
  7. pracrand, or the testunif suite from gjrand. Both are on Sourceforge. They quickly bust the full 64 bits. Then try extracting just 32 bits from each call, from various positions. I tried the version given in the main article here. I haven't tried your most recent version. I intend to do that tonight. So should you. :-)

    ReplyDelete
    Replies
    1. Sorry, but that's not precise at all. Which tests? With which p-values? How many seeds did you try? Do you have an output of the tests you can share? "Bust" doesn't mean anything.

      Delete
  8. Ok, this is a bit imprecise because the computer i do random number stuff on is not net-connected and is on the opposite side of town to the computer i do the net on, so this is just from memory. I'll take some more notes before i get back on the net next time, but really you'll be a lot more convinced if you try this for yourself.

    pracrand has a program called RNGtest or something like that. I wrote a program that does your generator and writes random numbers to standard output as raw binary. I pipe the output to RNGtest -stdin or something like that (read the documentation for "tools".) This does a variety of tests. I don't recall exactly which ones found the problem, but there's more than one. It can do as much data as you want, and for this generator the P-value gets smaller the longer you go. For this generator using the full 64 bits pracrand is quicker than gjrand.

    For gjrand the mcp program in the testunif directory runs a variety of tests. I think binr finds the least significant bit problem and both z9 and z9 -t find the bad most significant bits. Again you can run as much data as you want and the P values just get lower and lower. 10 minutes runtime should see a low enough value to be distressing. Eventually they underflow the IEEE double precision and report as zero.

    Both the pracrand and gjrand authors are of the opinion that you shouldn't reject generators on the basis of a few bad tests "at the 95% confidence interval". If you are in any doubt, do the test again with more data until the P value is so low you can't dismiss it as a fluke, they say.

    Over the weekend i tested the version on your website with shift counts 23, 18, 5, only with gjrand so far. It looks slightly better. The least significant bit is still linear, obviously. Now only the most significant 5 bits look bad. The 58 bits in between look good to limited testing so far. It could be argued that 58 good bits is plenty to do a double precision variate. Certainly i think so. But there are people who disagree.

    ReplyDelete
  9. In pracrand the program is RNG_test.
    ./RNG_test --help
    has the best documentation.
    ./RNG_test stdin64
    or
    ./RNG_test stdin64 -tlmaxonly
    which continues after the first failure. Eventually (maybe 10 minutes) it finds more problems.

    With gjrand, first the src subdirectory and
    ./compile
    then back up one level and the testunif/src subdirectory, again
    ./compile
    cd ..
    ls
    should show a program called mcp .
    That's the one.
    Try
    ./mcp --tiny < /dev/urandom
    should be good.
    ./mcp --tiny < /dev/zero
    should be really bad
    myprog | ./mcp --tiny
    should pass
    myprog | ./mcp --big
    should fail. Assuming myprog is something you write which runs your generator, and writes raw binary output to stdout.
    All this assumes something fairly similar to linux with gcc.

    ReplyDelete
  10. The only thing I see is binary rank failures caused by the lowest bit (known) and on the whole output for preposterously large sequences (>=64GB), which is not surprising (TestU01 has some fixed matrix size <=5000 if I remember correctly).

    Considering that everybody uses since 20 years everywhere generators failing the same tests with much less bits, that doesn't seem to be a real problem.

    It's much more interesting to examine the behaviour of passed tests on a global scale using POPs (p-values of p-values) as suggested by Knuth and by the NIST. BigCrush implements a significantly larger set of tests than pracrand, and generates hundreds of p-values. By running hundreds of tests you get global information about the generator behaviour, as you can see whether p-values are uniformly distributed or not. Even if a test passes at all 100 points, it might have the wrong distribution, which highlights a global misbehaviour.

    This actually happens, for example, in a PCG64 generator I tried which appears to pass pracrand. xorshift128+ has no such issues.

    ReplyDelete
    Replies
    1. Just to make the picture clearer: after running a test at k different seeds the k p-values should be both good (within reasonable bounds) and distributed uniformly. NIST document (NIST Statistical Test Suite) states that P-values for a particular test can be considered uniformly distributed, if its POP is greater or equal than 0.0001.

      Also, I would like to take this opportunity to say a big “thank you” to Sebastiano Vigna. I just wanted to let you know that the design and analysis of xorshift64* xorshift1024*, xorshift128+, etc. is really very insightful, well summarized and absolutely delightful to read. Thank you for doing a great job!

      Delete
  11. Why isn't Math.random() seedable? This is a nearly trivial bit of functionality to add, but it's crucial for some applications. On Stackoverflow and other places, people are passing around Javascript RNGs with, apparently, almost no idea of what they're doing, simply because they need a seedable RNG. I write simulations. A non-seedable RNG is nearly useless to me, since I have to be able to rerun the same simulation to find out what happened along the way. I do know where to find trustworthy seedable Javascript RNGs, but I'd rather just trust that you folks have provided a high-quality, fast algorithm, and use the one in the browser. Thanks.

    ReplyDelete