Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If we try to print out some Japanese characters… [] The output isn’t what we expect.

Yes it is. And I bet on a modern windows version it is too. The terminal has been (probably intentionally) neglected by ms for a long time, but as far as I know this has mostly been fixed on modern windows versions.

EDIT: Author admits it later in the text "will be fixed in Windows 11 and Windows Server 2022"

Also it says "strlen("有り難う")); [...] and the output is… The length of the string is 12 characters". But according to "man strlen": "RETURN VALUE: The strlen() function returns the number of bytes in the string pointed to by s.". It says nothing about "number of characters".



> And I bet on a modern windows version it is too.

It's still broken unfortunately, you need to switch the console to a special UTF-8 codepage in your own code:

    SetConsoleOutputCP(CP_UTF8);
...and before exit restore it to the original code page.


... except that that is also subtly broken.

It works if you write multiple UTF-8 code-units in one go, but breaks if you send them in several writes (and by that, I mean direct writes to the HANDLE). It also breaks if you try to use the ANSI API (with the A suffix), as it internally tries to convert the bytes from codepage-random to UTF-8.

You run into both issues if you try to use the MS implementation of stdio (printf and friends).

And we didn't even discuss command line argument passing yet :-)

I had a lot of fun with this (more explanation in the issue comments): https://github.com/AgentD/squashfs-tools-ng/issues/96#issuec...

I tried to test it with the only other two languages I know besides English: German and Mandarin. Specifically also, because the later requires multi-byte characters to work. Getting Chinese text I/O to work at all in a Windows DOS box, on an existing, German Windows 7 installation was an adventure on it's own and ended up breaking things in different ways than German text.

Turns out, trying to write language agnostic command line applications on Windows is a PITA.


Windows is truly the gift that keeps on giving :D


> Also it says "strlen("有り難う")); [...] and the output is… The length of the string is 12 characters". But according to "man strlen": "RETURN VALUE: The strlen() function returns the number of bytes in the string pointed to by s.". It says nothing about "number of characters".

Yeah - when dealing with Unicode, you have to be very clear about whether you're dealing with bytes, runes or glyphs.


Runes are not a Unicode concept - that’s a Golangism. Basically a code point.

Also in terms of Unicode, graphemes are even more relevant to the programming side than glyphs - unless you’re writing a renderer.


> The terminal has been (probably intentionally) neglected by ms for a long time,

I don't think it is an intentional lack of care, just a lack of care. Internally MS devs affected by the appalling state of the console just did what the rest of us did and installed an alternative.

> but as far as I know this has mostly been fixed on modern windows versions.

Ish. The default console for powershell is better, but a lot of improvements you might be thinking are in there are in fact only in Windows Terminal (https://en.wikipedia.org/wiki/Windows_Terminal) which is not currently included by default.


> you might be thinking are in there are in fact only in Windows Terminal

A lot of those changes are in ConsoleHost, so Windows 10 and 11 get those improvements (like VT100 sequences) in cmd.exe as well


Honestly if you are expecting a sane and modern text console on Windows you're just begging for disappointment. I note that even the author briefly tries it on Bash and finds it to be undramatic.


It makes sense to point it out even if it's fixed in win11, lots of people (myself included) are still on 10.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: