Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is not just kanji. For instance, the 3rd character is ⡸ (a braille character). I suspect what happened is that the author choose the characters needed based entirely on their unicode value, and it just so happened that many of them fell into the kanji region of Unicode[0].

Anyway, what is going on is that each character takes 4 decimal digits to encode its Unicode value.

When you escape it, the characters are replaced by their unicode values expressed in decimal, so the string looks like:

"%u7769%u7468..."

At this point, it uses a regex to change it from being a sequence of 4 hexadecimal digit characters to 2 hexadecimal digit characters, so the string becomes "%77%69%74%68%..."

At this point, it escapes the string. Giving the intended script of

    with(x)for(j=c.width=412+S(t/2)*98;v=j--<<3;beginPath(fill()))for(i=5;i--;clearRect(a*a%712,t*v%400,i,3))fillStyle=R(j/2,98-v,j<98?190-v:j),lineTo(98+S(a=i*98+v+t/8)*v,80+C(a)*v+v-j/2)

Essentially, what is going on is that the rules state there is a 140 character limit. Most people interperat that as a 140 byte limit, however, in exchange for 48 bytes of overhead, you can actually pack 2 bytes into a character, which gives 184 bytes of usable payload. Which is good for this submission, as the unencoded payload is exactly 184 characters (and 184 bytes).

[0] A quick search says that 4E00-9FFF is CJK (Chinese, Japanese and Korean) Unified Ideographs, and that is not the only Unicode block that seems to be devoted to CJK.

Combined with the fact that that the lower ascii characters are control characters that would't be used in a script, and you end up with almost all CJK characters. If you actually wanted to be entirely CJK, it probably wouldn't be that difficult to fudge the characters around so it fits.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: