The gist of those examples in TypeScript: EXAMPLE 1: Echo command-line input con...

ori_b · on June 20, 2023

I'm pretty sure your examples behave differently from the example programs.

For example, 'console.log' will pretty-print the data structure, rather than a single space separated line, the duplicate counter only reads from one file (with a hard-coded name) and will not correctly handle errors and move to the next file, and your parallel get doesn't seem to print one duration for all requests to complete.

a_wild_dandan · on June 20, 2023

Ergo the weasel word "gist" of the examples! :) Want variadic arguments? Wrap your code in `argv.slice(2).forEach(mySweetParser).catch()` or whatever. My point wasn't to scrupulously transpile Tcl to TS via HN dialog boxes. I don't hate myself that much. My point is just that the real work in those examples can easily be hogged out faster via other languages, so Tcl isn't an outstanding tool for brevity.

vdksbskdb · on June 20, 2023

i frown on anything JS because it always end up requiring some sort of 'npm install' or, shivers, 'npm install -g'. it's just as bad a running a random 4gb java binary with extra steps.

js will only run on users machines or inside a vm without even kvm because we've seen plent ecosystem hijacks and exploits in the wild already. and you can only fool me MAX_INT.

motogpjimbo · on June 20, 2023

Generally, when you see a package asking you to install it globally with `npm -g i`, you can install it locally with `npm i` and then run it with `npx`. Requesting global installation is either an expression of ego or the author is a Python refugee.

a_wild_dandan · on June 21, 2023

The above code requires no 3P dependencies. With a few notation tweaks, you can paste/pipe those examples right into a (~40 MB) `node` binary and get answers.

Where security is concerned, I can't say that Node.js has been more problematic than any other language I've used, but that's anecdotal and YMMV.

arp242 · on June 20, 2023

> 2. Fast, owing to Node.js/V8's insane optimization level

But you need to convert the TS to JS, and the TS compiler takes like 3 seconds to start on my laptop (orders of magnitude slower than any other compiler I know, although you can speed it up a bit if you disable type checking, but that, well, removes type checking).

a_wild_dandan · on June 20, 2023

You don't need a compiler. The vanilla `node` binary works fine with a few trivial syntactic changes.

It sucks that your compiler's slow! Check out `swc` if you prefer speedy compilation.

arp242 · on June 20, 2023

swc doesn't do type checking, so while useful in some cases it's not really a replacement for tsc.

tom_ · on June 20, 2023

Does node do static type checking?

a_wild_dandan · on June 21, 2023

Nope! `node` is just a binary that executes JS code (e.g. the above code, mod a few syntax choice tweaks). Since JS has no static types, `node` has no types to check.

Other languages can be compiled (perhaps "transpiled") to JS, some of which have static type mechanisms. One of those other languages, TypeScript, has static types. There are tools to compile TypeScript into JS: `swc`, `tsc`, etc. There are also tools to statically type check TypeScript (`deno`, `tsc`, etc).

Personally, I use `swc` for speedy TS compilation. I use my IDE's `tsc` language server for the separate job of highlighting type issues. But everyone has their preferred workflow!

e12e · on June 21, 2023

> But you need to convert the TS to JS

Or use deno? Or bun?

bentinata · on June 20, 2023

I used to think POSIX shell scripts are the best. But ever since deno support npm packages directly, running `deno run url-to-scripts` seems really good now. You get readability, types, and library ecosystem.

eesmith · on June 20, 2023

        .split("\n")

It looks like this has the same issue as the Tcl code, where the empty string after a terminal "" is treated as a line. I think that is an error in the Tcl.

You'll also need a .catch(err => console.log("dup:", err));

And the console.log(value, key) needs a tab separator.

And a loop over the argv input files.

All minor tweaks which don't detract from your conclusion.

kazinator · on June 21, 2023

Rather than splitting, we can positively tokenize: recognize lines as items that match lexical tokens. They will include the terminating newlines that we need to trim away.

We need to be tolerant to files that are not terminated by a final newline.

This regex seems to do the trick (under the TXR Lisp tok function):

  1> (tok #/[^\n]*./ "")
  nil
  2> (tok #/[^\n]*./ "no newline at end of file")
  ("no newline at end of file")
  3> (tok #/[^\n]*./ "\nno-newline")
  ("\n" "no-newline")
  4> (tok #/[^\n]*./ "line\nno-newline")
  ("line\n" "no-newline")
  5> (tok #/[^\n]*./ "line\nline\n")
  ("line\n" "line\n")
  6> (tok #/[^\n]*./ "\n")
  ("\n")
  7> (tok #/[^\n]*./ "\n\n")
  ("\n" "\n")
  8> (tok #/[^\n]*./ "\n\n\n")
  ("\n" "\n" "\n")

A line is a (maximally long) sequence of zero or more non-newline characters, followed by a character.

LOL; I don't think I've ever used regex-driven tokenizing to recognize Unix-style lines in a text stream.

eesmith · on June 21, 2023

Or we can do like the Python code does and depend either on line iteration of the input file, or use str.splitlines(). ;)

kazinator · on June 21, 2023

All those tools conceal finite automata which have to do the equivalent of that regex pattern.

eesmith · on June 21, 2023

Yes, of course. Lines of text are classic regular grammars so anything parsing them is a DFA at heart.

Is there a take-home message I'm supposed to get from your comment?

My comment was meant to be a light-hearted observation that you're overthinking the problem, since your language of choice likely has this built-in.

Or a boast that Python has a built-in feature for what requires a custom TXL and/or TypeScript solution. ;)

kazinator · on June 21, 2023

Yes, line splitting is built-in. But this is more about split versus tokenize.

I have split and tokenize functions which have the same interface.

Split places the focus on identifying separators: the "negative space" between what we want to keep. When we use that for lines, it has problems with edge cases, like not giving us an empty list of pieces when the string being split is empty.

eesmith · on June 22, 2023

I find it difficult to be interested in this philosophical difference between the two when only difference is in how they treat the empty string.

That is, splitting with regex look-behind/look-ahead, like:

  (?<=\n)(?=.)

with a suitable definition of dot, also matches "negative space", and produces the same results as Python's str.splitlines(True), except for the empty string.

  >>> import re
  >>> p = re.compile(r"(?<=\n)(?=.)", re.DOTALL)
  >>> def check(s):
  ...   lines = p.split(s)
  ...   assert lines == s.splitlines(True), lines
  ...   return lines
  ...
  >>> check("no newline at end of file")
  ['no newline at end of file']
  >>> check("\nno-newline")
  ['\n', 'no-newline']
  >>> check("line\nno-newline")
  ['line\n', 'no-newline']
  >>> check("line\nline\n")
  ['line\n', 'line\n']
  >>> check("\n")
  ['\n']
  >>> check("\n\n")
  ['\n', '\n']
  >>> check("\n\n\n")
  ['\n', '\n', '\n']
  >>> check("")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 3, in check
  AssertionError: ['']