Well obviously. That idea is not (IMO) worthy of a news post. But it doesn't say that the integers are distinct. I'd be interested to hear if it was possible to store numbers in general better than above.
If the integers cannot be guaranteed to be distinct, and all of them must be presented at the end, then no algorithm which restricts itself to O(1) memory usage can be correct. Since the problem restricts you to using 4KB of RAM (O(1)), and there is presumably a solution, you can conclude that you need not count non-distinct inputs - but yeah, it's kind of a bad problem description.
I'm not sure you can change the question just because there is no answer :P
4KB of RAM is not the same as an O(1) memory requirement, since we have already restricted the number and range of integers to 10000 (i.e. constants),so there is nothing left to vary. The page size just presents a target ratio for compression, not an asymptotic restriction on memory usage.
In general, do that then run some compression on it. That will do better if the data wasn't random.
Or if you're expecting random data, invent a biased compression algorithm and use that (e.g. one that has shortcuts for storing any multiples of 3). Most of the time it will do nothing and a few times you'll randomly get data it's good at.
Actually on second thought I'm not sure if that works or not. You'll need a header to identify if the compression was used or not. So the compression has to save more on average than this header info costs.
This header problem becomes clearer if you try to chain thousands of special-case compression algorithms that map a single input to one bit and otherwise are unused. Seems to save space at first (at cost of CPU time, and space to store these algorithms), but actually identifying which are used or not will be a problem. Since they all need unique headers, you need just as many header possibilities as there are numbers in the range, so you must as well just use the numbers for the headers, at which case you're actually not doing anything since the header is the data.
You can't do a bitmap in general b/c random integers are really, really big.
If there is a compression algorithm that works on 10% of data sets, and does massive harm on the other 90%, you can use or not use it, at the cost of a small amount of header information and a bunch of CPU time, and it doesn't matter that in general, on average, it's quite bad. All that matters is whether the times it's good beat the header info cost. I think. I'm not sure this is helpful but it doesn't require a compression algorithm being a net gain on average.