Realistically you don't even want to allow good chunk of ASCII in filenames. Actually the subset that you would like to allow is small, namely lower and uppercase, numbers, ., - and _. That is what 65 out of 127... Maybe a few more at stretch, but even some of those are questionable with some history like ~.
How would you implement caching, or looking up for the decryption key for some directory in a lookup table, if `Documents` and `documents` must both resolve to the same entry? Some kind of normalization would be needed, right? Then you need to introduce encoding, and [Unicode?] normalization. Shivers
> How would you implement caching, or looking up for the decryption key for some directory in a lookup table, if `Documents` and `documents` must both resolve to the same entry
Calculating a hash or equality for a string always uses some kind of comparer logic. Being able to use a "raw" comparer would be one special case of that. In C#/Windows, you'd use
var cache = new Dictionary<string, Cached>(StringComparison.InvariantCultureIgnoreCase);
, or similar. These correctly calculate that "documents".GetHashCode() == "Documents".GetHashCode(), and that "documents".Equals("Documents"). You might think that this is more complex because of the case insensitivity, but it's only slightly so.
E.g. if you instead assumed case sensititivity and naively use a default here:
var cache = new Dictionary<string, Cached>();
, then you'd actually be in MORE trouble because now the default comparison using locale-specific collation comes into play. So e.g. in germany files weiß,txt and weiss.txt would compare equal and thus also compare to the same hash (despite being two different files). A working linux lookup table with case sensitivity would look pretty similar to the first one
var cache = new Dictionary<string, Cached>(StringComparison.InvariantCulture);
What if we only allowed the following in filenames: [a-z0-9.-_]?
Now there will be no 'Documents', only 'documents'.
The comment I replied to already restricted us to ASCII letters, numbers, _, ., and -. I questioned why upper and lower case numbers should be allowed.