You can have infinite and nonrepeating sequences which don't contain every possible subsequence. For instance, 1010010001000... (where there is one more zero each time) never repeats itself, but it never even has the digit 2 in it.
You're right, that's a mistake! It should be the 92,296,989th through 92,300,940th bits.
Yes, I did only search at 4-bit aligned bytes. Doing otherwise would be more complicated, much harder for others to easily verify from the downloadable hex digit data file, and not really more likely to succeed. Looking further out is just as beneficial as looking at more offsets.
Well the alien that Arroway meets on the beach says that they didn't create the worm holes, they found them, but have never found the creators, iirc. I just re read some of the parts that mention pi, and I was remembering it wrong. She goes looking for a picture, but I'm not sure she finds anything by the end of the book. I'm going to re read the whole book this weekend, it has been too long. Thanks for having a look!
So to your question, I think the book says that yes, either the universe was created with this cool egg in pi as a message for civilizations to find once the got computers that were good enough to scan for statistical anomalies 2*20 digits deep in pi, or else the egg was somehow hacked into it some time after creation.
2) Find the index in π (say) where a matchable sequence of bytes occur, like for Waldo. (If your image is big, this could take several years of computation time.)
3) Transmit the palette, width, height, and index: about 800 bytes.
4) Probably the index is too far out for everyone to have a copy of the data already (it would be beyond terabytes). So the recipient then spends several years computing the number out far enough.
5) Profit!
Note: the palette is optional; you could leave it out and only transmit about 32 bytes. Using a palette also means lossiness, because you're reducing to 256 colors and making compromises between pixels. But using one saves you several orders of magnitude of computation time.
PS. The fact that any image can be encoded in 32 bytes this way implies that there are only 2^256 possible images — about one hundred and sixteen quattuorvigintillion. That's obviously not literally true, but we are searching for a subsequence of π which is "close enough"; the number of possible images which humans would consider distinct is probably well less than that.
(That is, the vast majority of possible images are "color noise" which all look more or less the same.)
The part that's from π is the actual pixel data, which is extracted in the Python snippet "waldo.tobytes()".
More-or-less equivalently, you could use "list(waldo.getdata())".
The GIF file as stored on disk has extra headers, and the pixel data's been compressed. So the whole file isn't found in the digits of π, just the pixel data itself.
I'd guess the chance of actually finding a correctly formatted full GIF file in contiguous bytes of π is effectively nil.
That's still a choice of palette; you'll find it listed on Wikipedia [1] under "Regular RGB palettes".
There's no doubt it's a much more objective choice than the one I used! I did say I was cheating.
Going to 4-bit color won't make it feasible. Even with 1-bit black/white pixels on about the minimum possible 18x24 Waldo face, you have 432 bits to look for, and you're not going to find them without cheating somehow. This guy on Twitter tried pretty thoroughly: https://twitter.com/gsuberland/status/1508697913177915393
For the palette hack, you want to go the other way; it's easier with bigger pixels. I was able to create a perfect Waldo face from the first 988 bytes of π as a TIFF, where the standard supports 16-bit palette indices for the pixel data. Unfortunately nothing except imagemagick seems to support actually viewing these TIFFs.
I made a video showing a slow zoom-in to Waldo in the first hundred million hex digits of pi, using your color palette. Feel free to use it if you want with a link back to this comment, I release it under CC-BY.
(The idea is basically that each consecutive 988 bytes gets put into an individual 19x26 8 bit image with your color palette, and these are then stacked in row order.)
I added a reduced preview of this to my blog post. I really like how it gives an impression of finding Waldo in a haystack of noise, just like in the picture books.
The preview is not as pretty as the real thing, but I thought embedding a 27MB video in the page might be a bit much.
Yeah I did my best to compress it but it's so noisy that there's very little compression that can be done without turning it into a blur. I can give you the original lossless PNG frames if you want them, but honestly I don't know that they'd be that useful.
Here's one lossless frame from near the end of the video, it's kinda nice because it gives you a little bit of context for the final Waldo without being overwhelming. https://i.imgur.com/wNZICnP.png
I am really curious about the search algorithm. I love the palette hack, how did you find candidates to then start searching through the possible palettes?
I can sort of think of:
1) Collapse the colours in the palette to the minimum necessary to be seen as "Waldo". The more slack in the gif palette the better - 24bit colour vs 16 colours (or fewer) in the starting image?
2) For each substring in Pi, map the hex value to a colour (or close to the colour) to match the expected image.
We're looking for a substring with the largest number of unique values. If we have unique values, we can paint each value with a colour close enough to the expected value that humans will see them as the same.
Yes, it helps to have more unique values. We want each repeated byte to occur in part of the pattern where a color also repeats.
What I do is search for substrings with the fewest repeated bytes that don't match the target pattern. I prioritize first minimizing conflicts between light (white and tan) vs. dark (black and red). Reducing other conflicts is a tie-breaker.
The featured gif has 79 "mismatched" pixels out of 494, by the black/white metric. I've found candidates with as few as 75, but subjectively I didn't think they looked as good.