But why? Just for people to gron it back into usable form? Unix tools work best ...

pluto_modadic · on April 20, 2024

./some_command | jq '.memory_use'

vs:

okay, run it once with head (assuming we have a --dry option...) ah, that's the column. okay cut -d"," -f0, ah whoops it's starting at 1. ah, damn, there's a weird comma in the name/quote. oh weird, that one's null, ah heck.

schemas are cool. JSON extends and builds on the UNIX idea of having things pipe and plumb well together.

hgs3 · on April 20, 2024

But why JSON and not CSV [1]? Most of what the article suggests, like formatting the output as lines and flattening, is how CSV works. CSV is much easier to parse than JSON (use split(",") or the equivalent). A complete record (JSON object) of CSV data can be parsed in a single line, unlike typical JSON. The line-based nature of CSV makes it far more fault tolerant to broken streams/truncation and more in-line with standard Unix conventions.

[1] https://en.wikipedia.org/wiki/Comma-separated_values

photonthug · on April 21, 2024

Nested objects? JSON holds json fine, with arbitrary depth, retaining ability to print pretty and parse. Not sure how that’s going to work with csv

kortex · on April 21, 2024

A) Csv isn't actually a standard, so there's no one universal way of dealing with it (we can get really close).

B) the keys and values are disjointed in csv as separate rows/columns, vs key:value

C) yes flat is better but when you need nested, nested is useful

zulu-inuoe · on April 21, 2024

One of my values contains a literal ,

alerighi · on April 20, 2024

Plain text is more easy to reason about, because we are used to process text. A good textual output, that is records delimited by spaces, tabs or a delimiter, to me is all it's needed, for most applications.

An object structure it's much more complex to use. For example an output that is a set of records can be easily imported in an Excel sheet, in an SQL database, processed line by line, without issues. Processing JSON is not straight forward, not all programs support JSON.

Finally JSON can't be processed as a stream, meaning that tools like head, tail, etc. doesn't work on JSON, you have to read it all in memory, or use JSON lines, that is not a standard format, that not all parsers support natively, etc.

JSON is good if for integrating the program inside other programs (as a subprocess), so having an option to input/output JSON in a program is useful, but to me it's not as useful for interactive shell usage. I prefer to use UNIX tools such as grep, cut, head, tail, etc.

mlhpdx · on April 20, 2024

But as the article points out, the advice to use unbuffered JSON lines for commands that are line oriented is well given. Not doing that can really make life sad.

dylan604 · on April 20, 2024

you still need to know the schema of the JSON which could also use a dry run as well. not really sure how just because it's JSON solves that in your mind

Jcowell · on April 20, 2024

Orient the idea be that because it’s json there’s a schema somewhere the end user can refer to?

dylan604 · on April 20, 2024

as if the output of the other command also isn't available?

zulu-inuoe · on April 21, 2024

But that output is wildly more variant between applications, especially for cases involving escape sequences, whitespace handling, and so on. All of these are specified in JSON

dylan604 · on April 21, 2024

where is this supposed specification for JSON?

I can make whatever I want in a [{},{},{}] and it be valid JSON. If it's the first time you've used my thing, you'll have to somehow look up how the JSON is structured. Whether that's from howtousething.com, man thing, or thing --help, you'll still need to find out what thing does. it doesn't matter if it's your thing or my thing, but some how, thing needs to be able to tell people what to do. there is no universal thing that thing outputs. otherwise, nobody would need yours or my thing, but someone else's thing already does it.

simonw · on April 20, 2024

Newline-delimited JSON is so much more useful to me than weird Unix line-formatted output that I have to parse with pattern matching or regular expressions.

It's basically a line-based format that can represent all of the JSON types and includes support for nested data structures where necessary. What's not to like?

bsdetector · on April 20, 2024

JSON is not immediately usable by and is cumbersome to parse correctly in a shell.

A simple line-based shell variable name=value format works unreasonably well. For example:

    # ls --shell-var ./thefile
    dir="/home/user" file="thefile" size=1234 ...
    # eval $(ls --shell-var ./thefile); echo $size
    1234

If this had been in shells and cmdline tools since the beginning it would have saved so much work, and the security problems could have been dealt with by an eval that only set variables, adding a prefix/scope to variables, and so on.

Unfortunately it's too late for this and today you'll be using a pipeline to make the json output shell friendly or use some substring hacks that probably work most of the time.

starttoaster · on April 20, 2024

That's great for key=value data, but more complex data structures don't work so well in that format, JSON does. "Why would you need to represent data as a complex data structure?" Sometimes attributes are owned by a specific entity, and that entity might own multiple attributes. It might even own other sub-entities. JSON represents that. Key=value does not.

bsdetector · on April 20, 2024

JSON is literally key=value, just nested. Which you can do with shell variables.

The question was "What's not to like [about JSON output from cmdline tools]?" and the answer is that it's cumbersome to read in a shell and all but requires another pipeline stage.

I didn't even recommend shell variable output and made it clear this isn't today a reasonable solution so I'm not sure where this hostility in the replies comes from, but I assume from recognition that it's a more practical solution to reading data within a shell but not wanting that to be so.

starttoaster · on April 20, 2024

> JSON is literally key=value, just nested.

The nature of being nested, and also containing structures like lists, maps, etc. All of which makes it more complicated than key=value.

> The question was "What's not to like [about JSON output from cmdline tools]?" and the answer is that it's cumbersome to read in a shell and all but requires another pipeline stage.

It depends on the intended use for your shell program. If you intend the CLI tool to be used in CI pipelines (eg. your CLI tool's output is being read by an automated process on a computer) and the data it outputs is more complicated than a simple key=value, JSON is great for that. Your CI program can pipe to jq. You as a human can pipe to jq, though I agree it's somewhat less desirable. Though just piping to jq without any arguments pretty prints it for you which also makes it fairly readable for humans.

> so I'm not sure where this hostility in the replies comes from

You're reading into hostility where there isn't any.

bsdetector · on April 20, 2024

> The nature of being nested, and also containing structures like lists, maps, etc. All of which makes it more complicated than key=value.

These are javascript objects, which are key-value. A list array is just keyed by a number instead of a string. They're functionally exactly the same as name=value except JSON is parsed depth-first whereas shell variables are breadth-first parsing (which is way better from shells).

Do you have an example of a CLI tool - intended for human use - that has output so complicated it can't be easily mapped to name=value? I don't think there is one, and it's certainly not common.

> You're reading into hostility where there isn't any.

I think "it seems you're determined not to use jq" is pretty hostile since I made no intimation of that at all.

starttoaster · on April 20, 2024

> I think "it seems you're determined not to use jq" is pretty hostile since I made no intimation of that at all.

Well, I didn't say that, so I don't know what that other person's feelings or intentions are, to be fair. I personally have no feeling of hostility towards you just because we (apparently) disagree on the usefulness of JSON to represent complex data types, or at least disagree on how often human-usable CLI tools output complex data. But to answer:

> Do you have an example of a CLI tool - intended for human use - that has output so complicated it can't be easily mapped to name=value? I don't think there is one, and it's certainly not common.

kubectl. Which to be fair defaults to output to a table-like format. Though it gets all that data in the table from JSON for you. smartctl is another one, which also defaults to table format. To be honest, I could go on and on if the only qualifier is a CLI tool that emits complex data, not suited for just key=value.

> These are javascript objects, which are key-value. A list array is just keyed by a number instead of a string. They're functionally exactly the same as name=value except JSON is parsed depth-first whereas shell variables are breadth-first parsing (which is way better from shells).

As mentioned before, just because you can compare JSON to key=value, does not mean it's as simple as key=value. It's a data serialization language that builds well on top of simple key=value formats. You're welcome to enjoy other data serialization languages, like yaml, HCL, or PKL. But none of those are simple key=value formats either. They built the ability to represent more complex structures on top of that.

A data serialization language allows the end-user to specify how they would like to use that data, while allowing them to use standard parsing tools like jq. Cramming complex data into a value string in a key=value format gives end users the same allowance to use that data however they want, while also giving them a chore to handle parsing it in custom ways tailored to just your CLI application, likely in ways that would seem far more brittle than parsing a defined language with well defined constraints. That doesn't sound like great UX to me. But to be fair to you, you're not saying that you wish to use key=value to represent complex data. Rather, you're saying there's a general lack of complex data to be found, to which I also disagree with.

bsdetector · on April 20, 2024

> But none of those are simple key=value formats either.

What is the difference between:

    { object: { name: value }}
    { object: "{ name: value }"}
    object="name=value"

There's zero difference between any of them except how you parse and process the data.

> kubectl. Which to be fair defaults to output to a table-like format.

With line-based shell-variable output you have a line of variables and you have blocks of lines separated by an empty line (like an HTTP 1 header).

This can easily map to any table, two dimensions, or two levels of data structure without even quoting subvariables like in the example above. So, no, kubectl is not an example at least not how you've described it.

starttoaster · on April 20, 2024

> What is the difference between .. There's zero difference between any of them except how you parse and process the data.

Answered in the previous message... "A data serialization language allows the end-user to specify how they would like to use that data, while allowing them to use standard parsing tools like jq. Cramming complex data into a value string in a key=value format gives end users the same allowance to use that data however they want, while also giving them a chore to handle parsing it in custom ways tailored to just your CLI application, likely in ways that would seem far more brittle than parsing a defined language with well defined constraints."

> With line-based shell-variable output you have a line of variables and you have blocks of lines separated by an empty line (like an HTTP 1 header)...

I would not choose to write application logic that foregoes defined data serialization languages for parsing barely structured strings the way you seem to prefer. But you go about it the way you prefer, I guess. This whole discussion leaves a lot of room for personal opinions. I think we both agree that the other person's opinion here is subjectively the more annoying route to deal with. But that's the way life is sometimes.

kitd · on April 20, 2024

That's not your original request though, to use line-based data. It seems you're determined not to use jq but if anything, json output | jq is more the unix way than piping everything through shell vars.

bsdetector · on April 20, 2024

> That's not your original request though, to use line-based data.

It wasn't my request and OP (not me) said "line-based data" is best. The comment I replied to said "Newline-delimited JSON ... a line-based format".

If the only objection you have is "but that's line-based!" then you're in a completely different conversation.

> if anything, json output | jq is more the unix way than piping everything through shell vars.

The unix way is line-based. The comment I replied to is talking about line-based output. Line-based output is the only structure for data universal to unix cmdline tools - even tab/space isn't universal; sending structured non-line-delimited data to a program to unpack it is the least unix-like way to do it.

Also there's no pipe in the shell-variable output scheme I described, whereas "json | jq" is a shell pipeline.

mlhpdx · on April 20, 2024

And, the author isn’t suggesting only having JSON output, but adding it as an option for those of use that would make use of it. The plain text should remain as well (and has to or many, many things would break).

On a separate point, I find the JSON much easier to reason about. The wall of text output doesn’t work for my brain - I just can’t see it all. Structuring/nesting with clear delineations makes it far easier for me to grok.

simonw · on April 20, 2024

I use jq - which ChatGPT knows inside out, so I can generally get exactly what I want from it with a single prompt.

qwertox · on April 20, 2024

It's also supported by simdjson [0] (which has a lot of language bindings [1]):

> Multithreaded processing of gigantic Newline-Delimited JSON (ndjson) and related formats at 3.5 GB/s

[0] https://simdjson.org/

[1] https://github.com/simdjson/simdjson?tab=readme-ov-file#bind...

placatedmayhem · on April 20, 2024

Line-oriented formats, like most traditional Unix-style tools, are for human consumption. JSON is bad at that, thus gron.

On the other hand, structured output formats, like JSON, make it easier to consume with other programs. Standard formats have readily-available and commonly used libraries, whereas line parsing tends to be one-off for every program. Whether JSON is the best format for this is certainly debatable, but it is quite ubiquitous, which is a huge advantage. I doubt many folks would propose XML as a general recommendation.

Tools should have both options on their path to maturity -- both human-consumable and computer-consumable output format options.

keybored · on April 20, 2024

What orthogonality? “Line-based data” that varies randomly from tool to tool? Using json between programs is perfectly orthogonal.

eternityforest · on April 20, 2024

If you're using UNIX tools to parse it, it sucks, but generally if I'm reading the output of a command, and the command is more than one word, I'm doing the whole thing in Python.

That's real programming, and for that I want type checkers and debuggers and modern syntax and all that. And the performance is often faster because you're not spinning up subprocesses for each command.

Spivak · on April 20, 2024

Unix tools on newline delineated list like objects work great, but try them on dict like objects and it becomes clunky really fast. Parsing `ip` output is a good example where the data naturally lends itself to iface:attrs. Plucking the value you want by `.[.name | startswith('eth')].addr` is way easier than pulling this out with grep/awk.

BenjiWiebe · on April 20, 2024

You probably already know this but just in case: You can specify an interface when using ip, e.g. ip addr show dev enp2s0

janderland · on April 20, 2024

I’d rather use JMES Path (or jq even tho it’s messier) to restructure my data rather than some mixture of awk, sed, cut, etc.

Line output often needs restructuring in my experience, though JSON will always need it.

IshKebab · on April 20, 2024

Nonsense. Using a structured format means all tools are using the same format, instead of every tool making up their own (usually broken) system.

Look at how many flags `ls` has to feed filenames into other programs without breaking.

Also, using a proper format like JSON makes it actually robust. Most ad hoc pipelines break if you so much as put a space in a filename.

__MatrixMan__ · on April 20, 2024

The caller can just specify the output format that they want, and ever since I switched to nushell I pretty much always want json.