Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pascal strings are also kind of bad though. All sub-string operations need allocation, or have to be defined with intermediate results which aren't "really" strings, so in that sense it's not an improvement on Zero-terminated strings. Equality tests are cheaper which is nice, since strings of different lengths compare unequal immediately, but most things aren't really improved.

C++ string_view is closer to the Right Thing™ - a slice, but C++ doesn't (yet) define anywhere what the encoding is, so... that's not what it could be. Rust's str is a slice and it's defined as UTF-8 encoded.



D's strings were defined to be UTF-8 back in 2000. wstring is UTF-16, and dstring is UTF-32.

Back then it wasn't clear which encoding method would turn out to be dominant, so we did all three. (Java was built on UTF-16.)

As it eventually became clear, UTF-8 is da winnah, and the other formats are sideshows. Windows, which uses UTF-16, is handled by converting UTF-8 to -16 just before calling a Windows function, and converting anything coming back to UTF-8.

D doesn't distinguish between a string and a string view.


A lot of people don't know about this but Microsoft is taking steps to move everything over to utf-8.

They added a setting in Windows 10 to switch the code page over to utf-8 and then in Windows 11 they made it on by default. Individual applications can turn it on for themselves so they don't need to rely on the system setting being checked.

With that you can, in theory, just use the -A variants of the winapi with utf-8 strings. I haven't tried it out yet as we still support prior Windows releases but it's nice that Microsoft has found a way out from the utf-16 mess.


The A-variants had problems years ago, which is why D abandoned them in favor of the W versions.

I don't mind seeing UTF-16 fade away. We've been considering scaling back the D support for UTF-16/32 in the runtime library, in favor of just using converters as necessary. We recommend using UTF-8 as much as practical.


What’s the ownership story for string views?


They don't own anything. It's just a pointer and length. They don't allocate/deallocate.


I mean clearly something needs to own the buffer for a new string.


Sure, but that's not the string_view's problem, you can't just make string_views, the string you want to borrow a view into needs to exist first.

Imagine you go to a library and insist on borrowing "My Cousin Rachel", but they don't have it. "Oh I don't care whether you have the book, I just want to borrow it" is clearly nonsense. If they don't have it, you can't borrow it.


Walter is talking about D, and he said this:

> D doesn't distinguish between a string and a string view.

In C++ std::string owns the buffer and std::string_view borrows it. If there is no difference between the two in D, then how is this difference bridged?


You can use automatic memory management and not worry about it. Or you can use D's prototype ownership/borrowing system. Or you can encapsulate them in something that manages the memory. Or you can do ownership/borrowing by convention (it's not hard to do).


Automatic memory management makes copies?


No. Another word for automatic memory management is garbage collection.


I guess I should rephrase. Let's say I have a string, which owns its buffer. What happens in D if I take a substring of it? Does a copy of that section occur to form a new string?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: