More

dbro · on March 26, 2025

Forgive me for promoting this that I wrote:

csvquote: https://github.com/dbro/csvquote

Especially for use with existing shell text processing tools, eg. cut, sort, wc, etc.

dbro · on June 29, 2023

Yes, this is what csvquote does. It does nothing else, just this so that programs like awk, sed, cut, etc. can work properly.

https://github.com/dbro/csvquote

dbro · on June 28, 2023

That is correct, the data needs to be simple where the delimiter characters are never embedded inside a quoted field. I wrote a simple (and fast) utility to ensure that CSV files are handled properly by all the standard UNIX command line data tools. If you like using awk, sed, cut, tr, etc. then it may be useful to you.

<https://github.com/dbro/csvquote>

Using it with the first example command from this article would be

  csvquote file.csv | awk -F, '{print $1}' | csvquote -u

By using the "-u" flag in the last step of the pipeline, all of the problematic quoted delimiters get restored.

dbro · on Oct 1, 2021

There is a small program I wrote called csvquote[1] that can be used to sanitize input to awk so it can rely on delimiter characters (commas) to always mean delimiters. The results from awk then get piped through the same program at the end to restore the commas inside the field values.

In principle:

  cat textfile.csv | csvquote | awk -f myprogram.awk | csvquote -u > output.csv

Also works for other text processing tools like cut, sed, sort, etc.

[1] https://github.com/dbro/csvquote

dbro · on Oct 31, 2020

Good idea! Looks similar to something I wrote called csvquote https://github.com/dbro/csvquote , which enables awk and other command line text tools to work with CSV data that contains embedded commas and newlines.

dbro · on Jan 21, 2020

To simplify working with CSV data using command line tools, I wrote csvquote ( https://github.com/dbro/csvquote ). There are some examples on that page that show how it works with awk, cut, sed, etc.

jimbokun · on Jan 21, 2020

That's very clever.

dbro · on May 20, 2019

While not exactly what you asked for, I wrote something similar called csvquote ( https://github.com/dbro/csvquote ) which transforms "typical" CSV or TSV data to use the ASCII characters for field separators and record separators, and also allows for a reverse transform back to regular CSV or TSV files.

It is handy for pipelining UNIX commands so that they can handle data that includes commas and newlines inside fields. In this example, csvquote is used twice in the pipeline, first at the beginning to make the transformation to ASCII separators and then at the end to undo the transformation so that the separators are human-readable.

> csvquote foobar.csv | cut -d ',' -f 5 | sort | uniq -c | csvquote -u

It doesn't yet have any built-in awareness of UTF or multi-byte characters, but I'd be happy to receive a pull request if it's something you're able to offer.

dbro · on Aug 18, 2018

You might want to check out https://github.com/dbro/csvquote which helps awl and other text tools handle csv files which have quoted strings as values.

helper · on Aug 18, 2018

If I'm going to preprocess before invoking awk I think I'd rather switch the separators to use ascii record/unit separator values than to replace the content of the actual fields.

dbro · on Aug 16, 2015

Hi- I'm curious to know what your use case is. Can you explain why substitution is not sufficient?

acveilleux · on Aug 17, 2015

(1) I filter on column content using regex and dealing with a sub character adds complexity.

(2) Many of my columns are free-form text containing commas, carriage returns, new lines, tab, vertical tabs and file separator (0x1c). Occasionally, text is in UCS-2/UTF-16 or uses UTF-8 and foreign characters (a non-trivial quantity of the text I process is in French for example.)

(If you read between the lines here, some columns can contain MLLP-encoded HL7 messages, others contain free-form text and I'm in the medical field.)

dbro · on Feb 22, 2015

Here's another suggestion for the criticism section (which is a good idea for any open-minded project to include):

Instead of using a separate set of tools to work with CSV data, use an adapter to allow existing tools to work around CSV's quirky quoting methods.

csvquote (https://github.com/dbro/csvquote) enables the regular UNIX command line text toolset (like cut, wc, awk, etc.) to work properly with CSV data.

burntsushi · on Feb 22, 2015

That's a wicked cool tool! Thank you for sharing.

I do think there is room for both tools though. One of the cooler things I did with `xsv` was implement a very basic form of indexing. It's just a sequence of byte offsets where records start in some CSV data. Once you have that, you can do things like process the data in parallel or slice records in CSV instantly regardless of where those records occur.

It helps when the CSV parser has support for this: http://burntsushi.net/rustdoc/csv/struct.Reader.html#method....