Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But why? Just for people to gron it back into usable form?

Unix tools work best with line-based data. Using some "structured" monstrosity like xml or json forces all other tools to deal with this particular format, thus breaking the orthogonality between programs.



./some_command | jq '.memory_use'

vs:

okay, run it once with head (assuming we have a --dry option...) ah, that's the column. okay cut -d"," -f0, ah whoops it's starting at 1. ah, damn, there's a weird comma in the name/quote. oh weird, that one's null, ah heck.

schemas are cool. JSON extends and builds on the UNIX idea of having things pipe and plumb well together.


But why JSON and not CSV [1]? Most of what the article suggests, like formatting the output as lines and flattening, is how CSV works. CSV is much easier to parse than JSON (use split(",") or the equivalent). A complete record (JSON object) of CSV data can be parsed in a single line, unlike typical JSON. The line-based nature of CSV makes it far more fault tolerant to broken streams/truncation and more in-line with standard Unix conventions.

[1] https://en.wikipedia.org/wiki/Comma-separated_values


Nested objects? JSON holds json fine, with arbitrary depth, retaining ability to print pretty and parse. Not sure how that’s going to work with csv


A) Csv isn't actually a standard, so there's no one universal way of dealing with it (we can get really close).

B) the keys and values are disjointed in csv as separate rows/columns, vs key:value

C) yes flat is better but when you need nested, nested is useful


One of my values contains a literal ,


Plain text is more easy to reason about, because we are used to process text. A good textual output, that is records delimited by spaces, tabs or a delimiter, to me is all it's needed, for most applications.

An object structure it's much more complex to use. For example an output that is a set of records can be easily imported in an Excel sheet, in an SQL database, processed line by line, without issues. Processing JSON is not straight forward, not all programs support JSON.

Finally JSON can't be processed as a stream, meaning that tools like head, tail, etc. doesn't work on JSON, you have to read it all in memory, or use JSON lines, that is not a standard format, that not all parsers support natively, etc.

JSON is good if for integrating the program inside other programs (as a subprocess), so having an option to input/output JSON in a program is useful, but to me it's not as useful for interactive shell usage. I prefer to use UNIX tools such as grep, cut, head, tail, etc.


But as the article points out, the advice to use unbuffered JSON lines for commands that are line oriented is well given. Not doing that can really make life sad.


you still need to know the schema of the JSON which could also use a dry run as well. not really sure how just because it's JSON solves that in your mind


Orient the idea be that because it’s json there’s a schema somewhere the end user can refer to?


as if the output of the other command also isn't available?


But that output is wildly more variant between applications, especially for cases involving escape sequences, whitespace handling, and so on. All of these are specified in JSON


where is this supposed specification for JSON?

I can make whatever I want in a [{},{},{}] and it be valid JSON. If it's the first time you've used my thing, you'll have to somehow look up how the JSON is structured. Whether that's from howtousething.com, man thing, or thing --help, you'll still need to find out what thing does. it doesn't matter if it's your thing or my thing, but some how, thing needs to be able to tell people what to do. there is no universal thing that thing outputs. otherwise, nobody would need yours or my thing, but someone else's thing already does it.


Newline-delimited JSON is so much more useful to me than weird Unix line-formatted output that I have to parse with pattern matching or regular expressions.

It's basically a line-based format that can represent all of the JSON types and includes support for nested data structures where necessary. What's not to like?


JSON is not immediately usable by and is cumbersome to parse correctly in a shell.

A simple line-based shell variable name=value format works unreasonably well. For example:

    # ls --shell-var ./thefile
    dir="/home/user" file="thefile" size=1234 ...
    # eval $(ls --shell-var ./thefile); echo $size
    1234
If this had been in shells and cmdline tools since the beginning it would have saved so much work, and the security problems could have been dealt with by an eval that only set variables, adding a prefix/scope to variables, and so on.

Unfortunately it's too late for this and today you'll be using a pipeline to make the json output shell friendly or use some substring hacks that probably work most of the time.


That's great for key=value data, but more complex data structures don't work so well in that format, JSON does. "Why would you need to represent data as a complex data structure?" Sometimes attributes are owned by a specific entity, and that entity might own multiple attributes. It might even own other sub-entities. JSON represents that. Key=value does not.


JSON is literally key=value, just nested. Which you can do with shell variables.

The question was "What's not to like [about JSON output from cmdline tools]?" and the answer is that it's cumbersome to read in a shell and all but requires another pipeline stage.

I didn't even recommend shell variable output and made it clear this isn't today a reasonable solution so I'm not sure where this hostility in the replies comes from, but I assume from recognition that it's a more practical solution to reading data within a shell but not wanting that to be so.


> JSON is literally key=value, just nested.

The nature of being nested, and also containing structures like lists, maps, etc. All of which makes it more complicated than key=value.

> The question was "What's not to like [about JSON output from cmdline tools]?" and the answer is that it's cumbersome to read in a shell and all but requires another pipeline stage.

It depends on the intended use for your shell program. If you intend the CLI tool to be used in CI pipelines (eg. your CLI tool's output is being read by an automated process on a computer) and the data it outputs is more complicated than a simple key=value, JSON is great for that. Your CI program can pipe to jq. You as a human can pipe to jq, though I agree it's somewhat less desirable. Though just piping to jq without any arguments pretty prints it for you which also makes it fairly readable for humans.

> so I'm not sure where this hostility in the replies comes from

You're reading into hostility where there isn't any.


> The nature of being nested, and also containing structures like lists, maps, etc. All of which makes it more complicated than key=value.

These are javascript objects, which are key-value. A list array is just keyed by a number instead of a string. They're functionally exactly the same as name=value except JSON is parsed depth-first whereas shell variables are breadth-first parsing (which is way better from shells).

Do you have an example of a CLI tool - intended for human use - that has output so complicated it can't be easily mapped to name=value? I don't think there is one, and it's certainly not common.

> You're reading into hostility where there isn't any.

I think "it seems you're determined not to use jq" is pretty hostile since I made no intimation of that at all.


> I think "it seems you're determined not to use jq" is pretty hostile since I made no intimation of that at all.

Well, I didn't say that, so I don't know what that other person's feelings or intentions are, to be fair. I personally have no feeling of hostility towards you just because we (apparently) disagree on the usefulness of JSON to represent complex data types, or at least disagree on how often human-usable CLI tools output complex data. But to answer:

> Do you have an example of a CLI tool - intended for human use - that has output so complicated it can't be easily mapped to name=value? I don't think there is one, and it's certainly not common.

kubectl. Which to be fair defaults to output to a table-like format. Though it gets all that data in the table from JSON for you. smartctl is another one, which also defaults to table format. To be honest, I could go on and on if the only qualifier is a CLI tool that emits complex data, not suited for just key=value.

> These are javascript objects, which are key-value. A list array is just keyed by a number instead of a string. They're functionally exactly the same as name=value except JSON is parsed depth-first whereas shell variables are breadth-first parsing (which is way better from shells).

As mentioned before, just because you can compare JSON to key=value, does not mean it's as simple as key=value. It's a data serialization language that builds well on top of simple key=value formats. You're welcome to enjoy other data serialization languages, like yaml, HCL, or PKL. But none of those are simple key=value formats either. They built the ability to represent more complex structures on top of that.

A data serialization language allows the end-user to specify how they would like to use that data, while allowing them to use standard parsing tools like jq. Cramming complex data into a value string in a key=value format gives end users the same allowance to use that data however they want, while also giving them a chore to handle parsing it in custom ways tailored to just your CLI application, likely in ways that would seem far more brittle than parsing a defined language with well defined constraints. That doesn't sound like great UX to me. But to be fair to you, you're not saying that you wish to use key=value to represent complex data. Rather, you're saying there's a general lack of complex data to be found, to which I also disagree with.


> But none of those are simple key=value formats either.

What is the difference between:

    { object: { name: value }}
    { object: "{ name: value }"}
    object="name=value"
There's zero difference between any of them except how you parse and process the data.

> kubectl. Which to be fair defaults to output to a table-like format.

With line-based shell-variable output you have a line of variables and you have blocks of lines separated by an empty line (like an HTTP 1 header).

This can easily map to any table, two dimensions, or two levels of data structure without even quoting subvariables like in the example above. So, no, kubectl is not an example at least not how you've described it.


> What is the difference between .. There's zero difference between any of them except how you parse and process the data.

Answered in the previous message... "A data serialization language allows the end-user to specify how they would like to use that data, while allowing them to use standard parsing tools like jq. Cramming complex data into a value string in a key=value format gives end users the same allowance to use that data however they want, while also giving them a chore to handle parsing it in custom ways tailored to just your CLI application, likely in ways that would seem far more brittle than parsing a defined language with well defined constraints."

> With line-based shell-variable output you have a line of variables and you have blocks of lines separated by an empty line (like an HTTP 1 header)...

I would not choose to write application logic that foregoes defined data serialization languages for parsing barely structured strings the way you seem to prefer. But you go about it the way you prefer, I guess. This whole discussion leaves a lot of room for personal opinions. I think we both agree that the other person's opinion here is subjectively the more annoying route to deal with. But that's the way life is sometimes.


That's not your original request though, to use line-based data. It seems you're determined not to use jq but if anything, json output | jq is more the unix way than piping everything through shell vars.


> That's not your original request though, to use line-based data.

It wasn't my request and OP (not me) said "line-based data" is best. The comment I replied to said "Newline-delimited JSON ... a line-based format".

If the only objection you have is "but that's line-based!" then you're in a completely different conversation.

> if anything, json output | jq is more the unix way than piping everything through shell vars.

The unix way is line-based. The comment I replied to is talking about line-based output. Line-based output is the only structure for data universal to unix cmdline tools - even tab/space isn't universal; sending structured non-line-delimited data to a program to unpack it is the least unix-like way to do it.

Also there's no pipe in the shell-variable output scheme I described, whereas "json | jq" is a shell pipeline.


And, the author isn’t suggesting only having JSON output, but adding it as an option for those of use that would make use of it. The plain text should remain as well (and has to or many, many things would break).

On a separate point, I find the JSON much easier to reason about. The wall of text output doesn’t work for my brain - I just can’t see it all. Structuring/nesting with clear delineations makes it far easier for me to grok.


I use jq - which ChatGPT knows inside out, so I can generally get exactly what I want from it with a single prompt.


It's also supported by simdjson [0] (which has a lot of language bindings [1]):

> Multithreaded processing of gigantic Newline-Delimited JSON (ndjson) and related formats at 3.5 GB/s

[0] https://simdjson.org/

[1] https://github.com/simdjson/simdjson?tab=readme-ov-file#bind...


Line-oriented formats, like most traditional Unix-style tools, are for human consumption. JSON is bad at that, thus gron.

On the other hand, structured output formats, like JSON, make it easier to consume with other programs. Standard formats have readily-available and commonly used libraries, whereas line parsing tends to be one-off for every program. Whether JSON is the best format for this is certainly debatable, but it is quite ubiquitous, which is a huge advantage. I doubt many folks would propose XML as a general recommendation.

Tools should have both options on their path to maturity -- both human-consumable and computer-consumable output format options.


What orthogonality? “Line-based data” that varies randomly from tool to tool? Using json between programs is perfectly orthogonal.


If you're using UNIX tools to parse it, it sucks, but generally if I'm reading the output of a command, and the command is more than one word, I'm doing the whole thing in Python.

That's real programming, and for that I want type checkers and debuggers and modern syntax and all that. And the performance is often faster because you're not spinning up subprocesses for each command.


Unix tools on newline delineated list like objects work great, but try them on dict like objects and it becomes clunky really fast. Parsing `ip` output is a good example where the data naturally lends itself to iface:attrs. Plucking the value you want by `.[.name | startswith('eth')].addr` is way easier than pulling this out with grep/awk.


You probably already know this but just in case: You can specify an interface when using ip, e.g. ip addr show dev enp2s0


I’d rather use JMES Path (or jq even tho it’s messier) to restructure my data rather than some mixture of awk, sed, cut, etc.

Line output often needs restructuring in my experience, though JSON will always need it.


Nonsense. Using a structured format means all tools are using the same format, instead of every tool making up their own (usually broken) system.

Look at how many flags `ls` has to feed filenames into other programs without breaking.

Also, using a proper format like JSON makes it actually robust. Most ad hoc pipelines break if you so much as put a space in a filename.


The caller can just specify the output format that they want, and ever since I switched to nushell I pretty much always want json.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: