I don't even know why people need to use dpyler and the tidyverse, in my opinion...

aftoprokrustes · on June 28, 2024

My feeling is that people with a more mathematical background tend to like developping DSLs that look more like math than code, and is typically written once and then thrown away; whereas people with a more software engineering background tend to prefer code that is more explicit about what it does, and have a better understanding about long term implications for maintenanability/extensibility. Which for me is the summary of the R versus Python debate in general.

One can see that in the JVM world with java vs scala: people attracted to scala tend to like "cute" DSL, java people tend to be more careful with shiny new features. (This is an oversimplification, of course)

Specifically for dplyr: it looks cute and tends to be easier to use in a REPL setting (you can build your pipeline step by step by running your command, looking at the output, get the command from history, add a step, run again; and at the end you get a single line to copy paste in your script). But if you want to wrap it in a function, it tends to create issues.

SassyBird · on June 28, 2024

The base graphics packages make the plots as ugly as the ones generated by gnuplot though. ggplot2 on the other hand has very pretty output. And the concept of grammar of plots just makes so much sense to me.

asdff · on June 28, 2024

You can make plots look however you want with base graphics. ggplot2 users mainly use the default settings honestly, you get that classic grey background plot I personally find more ugly than the cleaner white background defaults of the base package.

kuhewa · on June 29, 2024

That's only true IMO of the in-IDE plots, but actual exported PNG or vector graphics I think base R plots are pretty perfect, other than perhaps the default colour palette

lottin · on June 28, 2024

Beauty is in the eyes of the beholder. I much prefer the aesthetics of plots made with the lattice package or even base R over ggplot's.

vegabook · on June 28, 2024

Base graphics are also _massively_ faster than ggplot when data sizes get larger. To the extent that ggplot essentially becomes unusable.

ImaCake · on June 28, 2024

Maybe it is for you, but the success of Dplyr and ggplot suggests a lot of others disagree.

ds_opseeker · on June 28, 2024

I wonder how much of this is just a feedback loop; were people taught both tools and then chose the one that works best, or was one more heavily promoted than the other, so people went with what was easiest to get started?

kuhewa · on June 29, 2024

Once you are using the tidy paradigm, it lends itself to efficient plotting with ggplot2. Plotting with base R would require reshaping your data. So I think insofar as dplyr becomes a popular default it makes sense ggplot2 would be in lock step

asdff · on June 28, 2024

Its definitely a feedback loop. Every time you look up an R question on stackoverflow people give you a ggplot or dpylr answer and usually not a base package implementation. Its almost as bad as Ole Tang spamming gnu parallel on every xargs thread.

ImaCake · on June 28, 2024

Im sure that’s part of it. But you could say the same for using python or R over another language. Besides, someone who knows R well enough to write DplyR thought the situation was dire enough to write it. And there’s also data.table but that is inscrutable to most folks and I have only ever used it for fread - which is 10x faster than any other method of loading csvs into R.

asdff · on June 28, 2024

Hardly. Hand holding tools are popular but that doesn't mean they aren't hand holding tools that don't give you any new function you didn't have otherwise. Jupyter notebooks are probably more popular to write than python scripts for new data scientists too, doesn't mean anything though or take away some of the advantages you get writing properly packaged scripts instead of a big old notebook you iterate a pipeline in line by line and figure by figure.

kuhewa · on June 29, 2024

I learned r too long ago so I am pretty fluent writing readable data wrangling code in base R. But I'm a biologist first, in my community I see the value dplyr adds in making it approachable for people who need to do some basic stats but probably will never need to really understand the language or do any development.

It also provides guardrails and encourages best practices which I find a bit to paternalistic and annoying but again I can see the value.

I think most R users would be surprised and just how much tidyverse functionality is hidden in base R but majority of the dplyr versions of functions have at least some intended improvement over the base R versions, and some are a massive improvement in functionality.

For example in a typical script the only tidyverse package I may load besides ggplot2 is tidyr, because the pivot_ wider/longer() functions really do solve a problem that was not fun in base R.