Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The `ls` example is especially interesting: it made me think about perfomance of pipelines. Imagine a folder with 10^6 files. Listing them would take some time; if you only need 10^3 of files selected by the simplest filter (LIMIT 1000, for example), piping the whole list, in whatever format, to the filter program would be very far from optimal.

What would be a perfect solution is to use iterators instead of lists (the best example that I know of is Linq, but it seems that this approach is common enough).

However, now we don't even exchange data between different programs; now we couple them, running at the same time, by a complicated interface! It seems that if we take one more logical step in this direction, we'll get generics.

Don't you start to feel that this kind of ideal shell, piping different functions together in the most correct way possible already exists, and it's already installed on your computer? I'm talking about interpreted languages like Python, Perl, Javascript, Ruby or whatever else you fancy.

Of course, I don't seriously think that we should abandon shells for language interpreters. It's just that balance between complexity and universality is a very delicate thing, and convention, however bad, is good just because it's a standard.

In the meanwhile, I'm quite happy with the Fish shell.



> It seems that if we take one more logical step in this direction, we'll get generics.

I agree, Unix would have been so much better with generics, ADTs, higher-order functions, monads, higher-kinded types, dependent types and structural pattern matching.

As I always say: "Those who do not understand generics were condemned to invent Unix, poorly."


Yes, what you mean by sarcasm is exactly what I meant. When you try to improve stuff by going more universal, more abstract way, with data formats that are more "right", that's where you'll find yourself.

So, may be it's actually good that among all the modern tools in our arsenal we actually have something extremely simple and non-generic, which doesn't follow some abstract principle and just works instead.


If the Unix guys were smart they would have used an Idris REPL as default shell.


Why not Nimlang?


Because Nim is not dependently typed. You can't get sh*t done without dependent types.


Well, Unix is prepared to handle that nice enough: when the pipe reader terminates (because it received 1000 lines) the writer would receive SIGPIPE on the next write to the pipe and consequently be killed (as the default, at least.) Case closed.


Well, `LIMIT 1000` would of course be the simplest case, so closing it isn't really an achievement. What if I want to list all information about files with some mode? `ls` will still send 10^6 lines, carefully formatted, and almost all of them will be discarded with a simplest filter it could perform before reading all additional information from disc.


> What if I want to list all information about files with some mode? `ls` will still send 10^6 lines, carefully formatted, and almost all of them will be discarded with a simplest filter it could perform before reading all additional information from disc.

Indeed. That's probably one of the reasons PowerShell was designed to execute the commands in-process: The objects being piped are just the object references.

There's still the problem of using native indexes. That hasn't been solved elegantly yet. Reading all file names when wildcards would have eliminated 99% of them using file system metadata seems a waste. Which is probably why "ls" in PowerShell still allows a -Include filter and an -Exclude filter that take wildcards.


To execute this automatically in an efficient way we need a query planner which can look at the whole pipeline and decide to use a different set of primitives if the naive/implied ones aren't sufficient. What you're talking about is implemented in relation database management systems but it requires the query planner to know about the whole system - that is it's the opposite of Unix - there is a piece of the system that needs to know about the whole system.

As far as Unix is concerned, the whole list is not stored in memory. Pipes are buffered streams (they are iterators, just with a buffer attached, which makes them more efficient not less).


> What you're talking about is implemented in relation database management systems but it requires the query planner to know about the whole system - that is it's the opposite of Unix

And that's exactly my point: that this line of thinking about how to make these things right will lead to overcomplicated, bloated system.


Speaking of bloat (from another comment in this thread):

> I am really curious about how much parsing and formatting occupy *nix source code. (IIRC 30% of ls),


If you've got a lot of files in a directory and you only want information about a subset of them then you're better off using a command like find:

    find . -maxdepth 1 -perm /g=w -exec ls -ald {} \;
This finds all group-writeable files in the current directory and lists them using ls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: