I don't have the resources to try this unfortunately. I highly encourage somebod...

eesmith · on Oct 11, 2024

Yes, you do. You can download the digits at https://www.pilookup.com/download.html . You can generate models for different sized subsets. If it's really non-random, you should see the predictability stabilize as you get larger.

Otherwise you risk being seen as yet another math crank.

seccode · on Oct 11, 2024

The issue is not with getting the digits, the issue is with running a large model for larger digit ranges. I tried running with 10,000,000 digits and haven't gotten a prediction yet.

seccode · on Oct 11, 2024

Also, I am testing different ranges of digits other than first 10,000, but the problem with other ranges is that the distribution of digits is highly imbalanced and the model is not showing statistical significance, but models have a harder time when the distribution of classes is not 50/50, so I think its not quite fair to evaluate the model on these ranges.

So why do you think the first 10,000 digits are somewhat predictable?

eesmith · on Oct 11, 2024

The distribution of digits is 'highly imbalanced' because that's what random distributions look like. I'll randomly select the digits 0-9 for 10,000 times and show the distribution, then do the same with the first 10,000 digits of pi, then do the random distribution again:

  >>> import random
  >>> from collections import Counter
  >>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
  >>> for digit, count in ctr.most_common():
  ...   print(f"{digit}: {count}")
  ...
  2: 1039
  4: 1035
  0: 1031
  7: 1022
  3: 1008
  6: 998
  1: 976
  5: 973
  9: 963
  8: 955
  >>> pi_ctr = Counter(open("1-10000.txt").read().rstrip())
  >>> for digit, count in pi_ctr.most_common():
  ...   print(f"{digit}: {count}")
  ...
  5: 1046
  1: 1026
  2: 1021
  6: 1021
  9: 1014
  4: 1012
  3: 974
  7: 970
  0: 968
  8: 948
  >>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
  >>> for digit, count in ctr.most_common(): print(f"{digit}: {count}")
  ...
  8: 1060
  2: 1048
  0: 1034
  4: 1026
  5: 1025
  3: 979
  7: 977
  6: 960
  1: 956
  9: 935

You can see that the distribution of pi's first 10,000 digits is what one should expect for a random distribution. If your method requires a 50/50 distribution then it cannot be used for this purpose.

Also, you are thinking about it wrong. The first 10,000 digits of pi are perfectly predictable.

seccode · on Oct 11, 2024

I'm not predicting the number I'm predicting number%2==0. The model predicted better than the distribution probability

eesmith · on Oct 12, 2024

It doesn't really matter. There are 4970 even digits and 5030 odd digits in the first 10,000. Predicting all odds gives you a better-than-even chance of being right.

What does "highly unbalanced" mean?

How often will a random sequence be "highly unbalanced"?

How many people used another model, found no pattern, and never reported it?

You have plenty of data to work with. Try the second 10,000, the third 10,000 and so on.

Keep clear in your mind that a lot of people worked on this problem, including trained mathematicians. It is far more likely that you do not fully understand what you are doing than that they are wrong. Believing otherwise is the path of crankdom.

seccode · on Oct 12, 2024

Better to use statistical significance tests to talk about what is "far more likely"

seccode · on Oct 12, 2024

It doesn't predict better than even, it predicts better than the distribution probability

lifthrasiir · on Oct 11, 2024

Because "somewhat predictable" doesn't mean "non-random". In fact, almost all prefixes of algorithmically random bit sequences are somewhat predictable with an appropriate definition of "somewhat", because you can find and exploit an accidental bias from taking the prefix and any such bias translates to some predictability.

(Another possibility is that, since pi itself is not really algorithmically random, the classifier was somehow able to learn how to partially compute pi! That's another pitfall you need to avoid even when you have a good understanding of information theory and statistics...)