> For CJK characters, they unified all semantically similar han-characters, even...

wodenokoto · on June 23, 2016

It is true for lots of characters (so I guess I was being a little hyperbolic when I said "all"), and you cannot rely on choosing the correct code points in order to have a text display Japanese or Chinese. You need to tell your rendering program (often through choice of font) if things are to be rendered with Japanese or Chinese forms.

I wouldn't know how to show you examples here, as 直 will 直 display the same since they have the same code point, but different number of strokes in japabese and chinese.

https://en.m.wikipedia.org/wiki/Han_unification

rspeer · on June 23, 2016

Aren't they putting the disunified characters into the U+2xxxx plane now?

Han unification is generally seen as a bad choice in retrospect, but it was something Unicode had to do when it looked like 2^16 codepoints were all they were going to get.

wodenokoto · on June 25, 2016

Never heard of that, but I would appreciate if all the characters with different glyphs had different codepoints. Do you have a source? Do you know what happens to the "unified" code-points?

footpath · on June 23, 2016

It is true to some extent. While 青 and 靑 have different codepoints, there are plenty of characters of the same codepoint that are rendered differently depends on the language specificed:

https://en.wikipedia.org/wiki/Han_unification#Examples_of_la...

Han characters that are traditionally viewed as variants of one another, or that are simplified from more complex logograms (such as 龜, which was simplified into 亀 in Japan and 龟 in mainland China) tend to have different codepoints, but the stylistically different ones usually belong to the same codepoint.

thaumasiotes · on June 23, 2016

I do know about the issue; it causes problems for me. But I couldn't let the claim that all semantically equivalent characters were unified pass.

> the stylistically different ones usually belong to the same codepoint

Fair enough. Do you happen to know why 青 and 靑 weren't unified?

msbarnett · on June 23, 2016

Han Unification "rules" were an inconsistent mess, but I do know that in Japanese 靑 was at one time a printer's simplification of 青, so you could find either in texts, and the Consortium tended to encode a character separately if you could find printed examples of both in the same language.