That's Excel's type inference causing problems. Not an issue with CSV or any oth...

thunderfork · 2026-04-20T21:43:26 1776721406

The file format being ambiguous and underspecified enough to mangle is, though.

mcdonje · 2026-04-21T14:43:58 1776782638

No, it's Excel trying to be too clever. It does the same thing with manual imput if you don't proactively change the field type.

You can import a DSV into Excel without mangling datatypes in a few different ways. Probably the best way is using Power Query.

A DSV generally does have a schema. It's just not in the file format itself. Just because it isn't self-describing doesn't mean it isn't described. It just means the schema is communicated outside of the data interchange.

jcattle · 2026-04-21T06:26:58 1776752818

If you get an .xls which doesn't have very esoteric functions, I expect it to open about the same way in any Excel program and any other office suite.

With CSV I do not have that expectation. I know that for some random user-submitted CSVs, I will have to fiddle. Even if that means finding the one row in thousand rows which has some null value placeholder, messing up the whole automatic inference.

mcdonje · 2026-04-21T14:46:56 1776782816

You're just saying when there's no filetype transfer, you don't have to deal with issues related to filetype transfer.

jcattle · 2026-04-27T13:55:26 1777298126

No. That's not at all what I'm saying. I am saying that a fixed CSV file will open differently depending on the program you open it with.

Don't even need to transfer it. Opening a csv in pandas can be different than opening with polars, can be different to DuckDB, can be different to Excel.

You've got not guarantees. There's no spec, and how edge cases (if you want to call how to serialize and deserialize a float an edge case) are handled is open to the implementation.

IanCal · 2026-04-21T21:22:11 1776806531

It's both of their faults. CSV is not blameless here - Excel is doing something broadly that users expect, have dates as dates and numbers as numbers. Not everything as strings. If CSV had types then Excel would not have to guess what they are.

mcdonje · 2026-04-22T01:47:55 1776822475

It does have types if you define them in the schema. Not every format needs to be self-describing. It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

Power Query does a better job handling it, but you should be able to just supply a schema on import, like you can with Polars or DuckDb.

It's another example of MS babying their userbase too much. Like how VBA is single threaded only because threads are hard. They're making their product less usable and making it harder for their users to learn how stuff works.

IanCal · 2026-04-22T17:37:08 1776879428

Csv doesn’t have a schema, it has a barely adhered to post-hoc “not a specification” and everything is strings.

That you can solve some of these problems by using something as well as the csv file is not anywhere near as helpful, and it’s a clear problem of csv files. There is no universally followed schema, for a start, so now we’re at unique solutions all over the place.

> It's often more efficient to share the schema once outside of the data feed than have the overhead of restating it for every data point.

You cannot be suggesting that csv files are efficient surely, they’re atrociously inefficient. Having the same format and a tied in schema would solve a lot and add barely anything as overhead. If you want efficiency, do not use csv.

Asking users to manually load in the right schema every time they open a file is asking for trouble. Why wouldn’t you combine them?

> It's completely Excel's fault for pushing their type-inference and making it difficult for users to define or supply their own.

It’s not entirely excels fault that csv doesn’t have types. They didn’t invent and promote a new standard, but then why would you? There’s better formats out there. I’m sure they would argue that the excel files are a better format for a start.

And people did make better formats. That’s why I think csv should be consigned to the bin of history.