Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could there be a number, such that for certain interesting data, you can compress the data more than normally achiveable, by indexing into the number?

I feel this should not be possible, but I don't know how to proove it without a circular argument about entropy.

On the other hand, if you make a string of common byte sequences, a couple GB long, and distribute it with every PC... then you surely can achive hyper compression with some clever indexing (if you don't count the magic string to the compressed size, of course).



Yes, to an extent that would work.

1) Take target image

2) Find the index in π (say) where a matchable sequence of bytes occur, like for Waldo. (If your image is big, this could take several years of computation time.)

3) Transmit the palette, width, height, and index: about 800 bytes.

4) Probably the index is too far out for everyone to have a copy of the data already (it would be beyond terabytes). So the recipient then spends several years computing the number out far enough.

5) Profit!

Note: the palette is optional; you could leave it out and only transmit about 32 bytes. Using a palette also means lossiness, because you're reducing to 256 colors and making compromises between pixels. But using one saves you several orders of magnitude of computation time.


PS. The fact that any image can be encoded in 32 bytes this way implies that there are only 2^256 possible images — about one hundred and sixteen quattuorvigintillion. That's obviously not literally true, but we are searching for a subsequence of π which is "close enough"; the number of possible images which humans would consider distinct is probably well less than that.

(That is, the vast majority of possible images are "color noise" which all look more or less the same.)


It does seem possible, e.g. rather than sending an uppercase alphabetic message as 8-bit ASCII bytes, you can send their position in the alphabet instead, so 01000001 becomes 00001.

Of course this only works when you're limiting yourself to a subset of all possible data (in this case, the uppercase alphabet). A general method where all possible binary combinations are indexed (and the indexes are sent rather than the data) would not save any data as the binary data would always be its own index.



> On the other hand, if you make a string of common byte sequences, a couple GB long, and distribute it with every PC...

This is what brotli does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: