Yes, you do. You can download the digits at https://www.pilookup.com/download.html . You can generate models for different sized subsets. If it's really non-random, you should see the predictability stabilize as you get larger.
Otherwise you risk being seen as yet another math crank.
The issue is not with getting the digits, the issue is with running a large model for larger digit ranges. I tried running with 10,000,000 digits and haven't gotten a prediction yet.
Also, I am testing different ranges of digits other than first 10,000, but the problem with other ranges is that the distribution of digits is highly imbalanced and the model is not showing statistical significance, but models have a harder time when the distribution of classes is not 50/50, so I think its not quite fair to evaluate the model on these ranges.
So why do you think the first 10,000 digits are somewhat predictable?
The distribution of digits is 'highly imbalanced' because that's what random distributions look like. I'll randomly select the digits 0-9 for 10,000 times and show the distribution, then do the same with the first 10,000 digits of pi, then do the random distribution again:
>>> import random
>>> from collections import Counter
>>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
>>> for digit, count in ctr.most_common():
... print(f"{digit}: {count}")
...
2: 1039
4: 1035
0: 1031
7: 1022
3: 1008
6: 998
1: 976
5: 973
9: 963
8: 955
>>> pi_ctr = Counter(open("1-10000.txt").read().rstrip())
>>> for digit, count in pi_ctr.most_common():
... print(f"{digit}: {count}")
...
5: 1046
1: 1026
2: 1021
6: 1021
9: 1014
4: 1012
3: 974
7: 970
0: 968
8: 948
>>> ctr = Counter(random.choice(range(10)) for i in range(10_000))
>>> for digit, count in ctr.most_common(): print(f"{digit}: {count}")
...
8: 1060
2: 1048
0: 1034
4: 1026
5: 1025
3: 979
7: 977
6: 960
1: 956
9: 935
You can see that the distribution of pi's first 10,000 digits is what one should expect for a random distribution. If your method requires a 50/50 distribution then it cannot be used for this purpose.
Also, you are thinking about it wrong. The first 10,000 digits of pi are perfectly predictable.
It doesn't really matter. There are 4970 even digits and 5030 odd digits in the first 10,000. Predicting all odds gives you a better-than-even chance of being right.
What does "highly unbalanced" mean?
How often will a random sequence be "highly unbalanced"?
How many people used another model, found no pattern, and never reported it?
You have plenty of data to work with. Try the second 10,000, the third 10,000 and so on.
Keep clear in your mind that a lot of people worked on this problem, including trained mathematicians. It is far more likely that you do not fully understand what you are doing than that they are wrong. Believing otherwise is the path of crankdom.
Because "somewhat predictable" doesn't mean "non-random". In fact, almost all prefixes of algorithmically random bit sequences are somewhat predictable with an appropriate definition of "somewhat", because you can find and exploit an accidental bias from taking the prefix and any such bias translates to some predictability.
(Another possibility is that, since pi itself is not really algorithmically random, the classifier was somehow able to learn how to partially compute pi! That's another pitfall you need to avoid even when you have a good understanding of information theory and statistics...)