Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I have operated upon it with a non-CSV parser, but for my way of thinking, the file itself is still a CSV. You disagree here, because in order for my use of the parser to be correct, I can't possibly have operated upon a CSV file, I must have operated on a CSV-like file.

Not quite my opinion. The file is still a CSV file, but IMO the parser is not a CSV parser unless it supports the full spec. The file is still CSV, and it happens to be compatible with the incomplete parser because it does not use any "harder" CSV features.

Lets say we have a website that uses UTF-8 (declared via content-encoding and similar). Some pages on this website only uses ASCII, some uses higher codepoints within UTF-8.

I can parse some of these pages with a ASCII decoder, but that does not mean that my ASCII decoder is a UTF-8 decoder since it only handles a very small subset of UTF-8 that aligns with ASCII. In this example your CSV-lite would be like ASCII and CSV would be UTF-8.



I completely understand the concept, I really do. I'm just struggling to work out where the original disagreement came from, I think it's completely my fault for not articulating myself properly, thank you for your patience. I'm going to annotate my original comment here with clarification on what I originally meant:

Because, while you always _should_ implement the proper escaping [in order to extract the information you need from CSV/TSV files that you have received from an external source that produces correctly-formatted CSV/TSV files], that takes [human effort]. Not a large amount of [human effort], but more than zero. In many cases [the data stored in CSV/TSV representation] doesn't contain commas or tabs, so you can [extract the data from the file] the super simple way [by implementing a naiive CSV/TSV-like parser that just happens to work for a subset of CSV files that don't contain escaping] and get back that time. [In doing this, you have extracted the information you need from the file, but you have not done the work to implement a real CSV/TSV parser. You have implemented a parser for a mystery format, misused it on a CSV/TSV file, but it happened to work and you got the data you needed]. There are more cases where data [in the CSV/TSV file you got from an external source] is tabless than commaless, so [if the external source happens to provide you a TSV file instead of a CSV file, this] affords you more opportunities to [be able to misuse use your TSV-like parser on the TSV file and still get the data you need, giving you a] quick and dirty timesave when you need [an immediate] solution [where you lack the time to get a real CSV/TSV parser and can tolerate the inherent lack of safety in using a CSV/TSV-like parser on a CSV/TSV].


> I think it's completely my fault for not articulating myself properly, thank you for your patience.

Not at all, and thank you for your patience and engaging this deep!

I think the disagreement came from different people. I initially tried to question why /u/guidedlight thought a certain delimiter would be easier/simpler just because it was less common and then we went down a rabbit hole of "what is a CSV/TSV".

I agree with you and I think there are many use-cases for not-as-full CSV/TSV parsers/encoders. My main objection was calling a CSV/TSV implementation a CSV/TSV implementation when it clearly skipped a lot of parts (while it can of course parse a lot of files without those parts). I'd like to call those simpler formats something other than CSV/TSV, but that ship has sailed.

So I think it seems like we are on the same page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: