Parsing R code: Freedom of expression is not always a good idea (2012)

dash2 · on June 14, 2022

This seems very much like an outsider's critique. It's super interesting, but if the goal is actually improving R I'm almost sure a better approach would be to take a hand with development, fix some bugs, gain respect and push for incremental - or even less-than-incremental - changes. Core R are definitely trying to encourage new developers to join in, so the opportunity is there.

datastoat · on June 15, 2022

> This seems very much like an outsider's critique

Spot on! This same sentiment was expressed very well in a paper evaluating the R language [1]:

"This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular."

It would be nice to see a post about R that analyses _why_ it's so hugely popular with data scientists. It's easy to write "R doesn't do what we computer scientists think languages should do, so it's no good". It's harder to analyse what R gets right (for its domain) that other languages get wrong. Personally, I think it's not just that R has the best data-handling libraries (ggplot2, plyr, data.table), it's its "unlikely linguistic cocktail" that is perfectly suited for data exploration.

I think that maybe we hear the views of software engineers who get handed a messy R script and are asked to make it run in production, or make it run on big datasets, and so they only ever see the downsides of R. R wasn't designed to make life easy for production! It's designed to make it easy to explore datasets, which often means one-off code, 99% of which you run and then delete because your hypothesis about the data was wrong.

[1] https://www.researchgate.net/publication/240040602_Evaluatin...

svnpenn · on June 14, 2022

I wholly disagree with this sentiment. I have seen, time and again, awful language design choices. Granted, they might have been the best option at the time, or maybe no one could think of a better method.

But things change with time. And quite often, the response to suggestions ends up being "it's too late to change it", "it's good enough", or any number of comments to minimize the obvious negatives. Too often people get stuck in "local maximums", thinking their way is best, until years later when time has proven them wrong.

dash2 · on June 14, 2022

I didn't say the guy was wrong. I said he wasn't going to change anything by standing outside and complaining. I think your argument supports that idea. Maybe in an ideal world people would be very open to drive-by critique. In reality, not so.

svnpenn · on June 14, 2022

From experience, I find the better option is to just move on.

You can only shout at a wall for so long, before you realize that you're wasting your time. So you make your concerns heard (which this guy did), and you move on. If people want to take the advice to heart, great, but its not likely. Plenty of other languages to use, and plenty of other software to use.

hoosieree · on June 14, 2022

APL grammar has its flaws, but one thing it gets really _right_ is operator precedence. In an expression like this:

    A op B op C op D

No matter what _op_ is, it parses right-to-left:

    A op (B op (C op D))

Much nicer than having to learn a subtly-different version of C's already convoluted operator precedence[0] with each new language that comes out.

[0]: https://en.cppreference.com/w/c/language/operator_precedence

mst · on June 14, 2022

I genuinely really like it - strict left-to-right or strict right-to-left are so much more predictable at least for arithmetical type expressions (I might still prefer specific precedences for =/==/etc.).

However at this point something PEMDAS-like seems to be substantially easier to understand for most people since it's AFAICT the common rule taught in (high school) mathematics these days.

Trade-offs all the way down, as ever.

GrumpySloth · on June 14, 2022

I've just started reading about APL, so maybe I'm wrong, but I think there is one caveat: you need to know which operators are monadic, and which are dyadic, because if op is monadic, then it may be:

  A (op (B (op (C (op D)))))

sshine · on June 14, 2022

Awesome stuff.

As someone who occasionally writes parsers for real languages, and as someone who was really into R in university, I am happy that I stepped back on this one. ;-)

R's syntax belongs in the same category as Ruby and JavaScript:

Too much freedom of expression makes the meaning of a program highly dependent on its execution. It is hard to say concise things about a program without running it.

It is the murky side of (untyped) Lisp, if you ask me.

klrall · on June 14, 2022

Freedom of expression helps you think though. Lisp is helping the thoughts, other languages obstruct them (that certainly includes Python).

For a scientific language like R this quality is important.

Perhaps the ideal data science language would be a Lisp/R with excellent embedding qualities like Lua for the scientific parts. People could then choose their favorite language for shoveling data around.

NeutralForest · on June 14, 2022

Why would say Python obstructs the thought?

wdkrnls · on June 14, 2022

Because with original thinking there is more than one way to do it. Pythonic conformity might make long term maintenance easier, but it's rigidity exacts a cost on expressing new thoughts. R is basically the epitome of Greenspun's 10th rule: it's really their implementation of a common lisp that looks like C with metaprogramming and conditions and restarts and all. They tried standardizing on Common Lisp first (XLisp-Stat) but S from Bell Labs was too popular. In short, today R is a lisp with access to modern numeric libraries.

sshine · on June 14, 2022

Duck typing allows for freedom of expression/thinking.

You can declare a new property on an object simply by assigning, and consistency is not required either.

It also potentially leaves a mess over time.

disgruntledphd2 · on June 14, 2022

Python doesn't, but pandas definitely does.

nomilk · on June 14, 2022

I'm constantly discovering oddities about the R language. Since I use it interactively, it's extremely rare that such oddities cause any problems. Here's an example I found yesterday (lines 1, 2, 3, and 4 make sense, 5 is interesting, and 6 is perplexing!):

1 == TRUE # TRUE

as.logical(1) # TRUE

0 == FALSE # TRUE

as.logical(0) # FALSE

2 == TRUE # FALSE

as.logical(2) # TRUE

wodenokoto · on June 14, 2022

I think it it makes sense.

TRUE and FALSE are 1 and 0, while `as.logical` will transform your value to the "closest" of those two.

If you are used to Pythons truthiness, that is what `as.logical` is similar to.

abirch · on June 14, 2022

Truthy values go through most C type languages. Python included. bool(2) is True

qsort · on June 14, 2022

Python goes even further beyond.

  >>> True + True
  2

  >>> True * 13 + (1 - False) * 17
  30

kgwgk · on June 14, 2022

What you show there is that int(True) is one and int(False) is zero.

(The same happens in R, for what it's worth. as.integer(TRUE) is one and as.integer(FALSE) is zero and the operations you wrote work just he same.)

tech2 · on June 14, 2022

That's history coming back to bite it though. There was a period where the symbols True and False existed in Python but the bool class did not. True was _literally_ 1, and False was 0. Because of this, for backward compat, bool is a subclass of int.

nomilk · on June 14, 2022

TIL. Also in python bool("a") is True, whereas in R as.logical("a") is NA.

marcosdumay · on June 14, 2022

It makes sense if you know C or any architecture's assembly.

But it is an issue, and high level languages shouldn't just replicate it. In a high-level language `1 == TRUE` should be either an error or false.

ellisv · on June 14, 2022

If you like these oddities, then check out The R Inferno [1]. It’s from 2011, but still holds up since the language hasn’t changed that much.

[1] https://www.burns-stat.com/pages/Tutor/R_inferno.pdf

pxeger1 · on June 14, 2022

My favourite odd R snippet is this, which prints the numbers 0 to 100 without using any digits in the code:

    F:volcano

(from https://codegolf.stackexchange.com/a/219617)

waffleiron · on June 14, 2022

As someone who actually likes R I think this makes sense. Line 5 is checking for equality, whereas in R as.* functions actually convert types or structures. The documentation on as.logical is also pretty clear on what would happen.

https://www.rdocumentation.org/packages/raster/versions/3.5-...

> Change values of a Raster* object to logical or integer values. With as.logical, zero becomes FALSE, all other values become TRUE. With as.integer values are truncated.

jghn · on June 14, 2022

Yeah. There's a difference between "This language isn't being internally consistent" and "Due to my experience using language X, I find this confusing". As you say, this is well documented behavior, albeit perhaps unexpected to a novice R user.

davidktr · on June 14, 2022

Seems perfectly fine to me? It's R, for interactive data analysis and statistics. I love it, but would not use it for anything else.

kgwgk · on June 14, 2022

as.logical(2) being TRUE is perplexing only if you interpreted

2 == TRUE

as

as.logical(2)==TRUE

rather than as

2==as.integer(TRUE)

[Edit: "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw."]

abirch · on June 14, 2022

I'm sad that R didn't go closer to JavaScript style for Lambda instead of going toward Haskell, e.g., ‘\(x) x + 1’ vs x=> x+1

wodenokoto · on June 14, 2022

I've always liked that lambda and functions are the same thing in R and write `function(x){x + 1}`

I dislike both of your examples and am glad to have never seen that in any R code I've met.

abirch · on June 14, 2022

it's new as of R 4.1 along with the new pipe operator

https://www.r-bloggers.com/2021/05/the-new-r-pipe/

tomlue · on June 14, 2022

I prefer the R formula syntax. i.e

~ . + 1 ~ .x + .y

tfehring · on June 14, 2022

I like this syntax better than the new lambda syntax, but I think it's good that a proper lambda syntax exists now. Not all higher-order functions accept formulas (they have to wrap their function arguments with rlang::as_function or equivalent), and there are probably some obscure cases where the distinction between a formula and a function matters.

bruce343434 · on June 14, 2022

What's sad about it?

reitanqild · on June 14, 2022

JS version is more readable for most people.

(I'm no big fan of JS in general, unless it is TS but I consider that a different language.)

abirch · on June 14, 2022

Exactly and it could bring R to a larger audience.