I don't even know why people need to use dpyler and the tidyverse, in my opinion R is very comfortable for data wrangling and making all kinds of plots out of the box. Its able to handle huge amounts of data as well especially if you adopt a functional programming approach vs object oriented (what I see with a lot of the classic "academic" brittle hardcoded slop that R gets a bad rap for). Very fast if you keep in mind it is a vectorized language and write your scripts with that perspective. Tidyverse seems like a new, unrelated syntax to learn on top of it all, whereas the base graphics packages very much work like the base statistical functions and the base data wrangling and everything else in the base R package.
My feeling is that people with a more mathematical background tend to like developping DSLs that look more like math than code, and is typically written once and then thrown away; whereas people with a more software engineering background tend to prefer code that is more explicit about what it does, and have a better understanding about long term implications for maintenanability/extensibility. Which for me is the summary of the R versus Python debate in general.
One can see that in the JVM world with java vs scala: people attracted to scala tend to like "cute" DSL, java people tend to be more careful with shiny new features. (This is an oversimplification, of course)
Specifically for dplyr: it looks cute and tends to be easier to use in a REPL setting (you can build your pipeline step by step by running your command, looking at the output, get the command from history, add a step, run again; and at the end you get a single line to copy paste in your script). But if you want to wrap it in a function, it tends to create issues.
The base graphics packages make the plots as ugly as the ones generated by gnuplot though. ggplot2 on the other hand has very pretty output. And the concept of grammar of plots just makes so much sense to me.
You can make plots look however you want with base graphics. ggplot2 users mainly use the default settings honestly, you get that classic grey background plot I personally find more ugly than the cleaner white background defaults of the base package.
That's only true IMO of the in-IDE plots, but actual exported PNG or vector graphics I think base R plots are pretty perfect, other than perhaps the default colour palette
I wonder how much of this is just a feedback loop; were people taught both tools and then chose the one that works best, or was one more heavily promoted than the other, so people went with what was easiest to get started?
Once you are using the tidy paradigm, it lends itself to efficient plotting with ggplot2. Plotting with base R would require reshaping your data. So I think insofar as dplyr becomes a popular default it makes sense ggplot2 would be in lock step
Its definitely a feedback loop. Every time you look up an R question on stackoverflow people give you a ggplot or dpylr answer and usually not a base package implementation. Its almost as bad as Ole Tang spamming gnu parallel on every xargs thread.
Im sure that’s part of it. But you could say the same for using python or R over another language. Besides, someone who knows R well enough to write DplyR thought the situation was dire enough to write it. And there’s also data.table but that is inscrutable to most folks and I have only ever used it for fread - which is 10x faster than any other method of loading csvs into R.
Hardly. Hand holding tools are popular but that doesn't mean they aren't hand holding tools that don't give you any new function you didn't have otherwise. Jupyter notebooks are probably more popular to write than python scripts for new data scientists too, doesn't mean anything though or take away some of the advantages you get writing properly packaged scripts instead of a big old notebook you iterate a pipeline in line by line and figure by figure.
I learned r too long ago so I am pretty fluent writing readable data wrangling code in base R. But I'm a biologist first, in my community I see the value dplyr adds in making it approachable for people who need to do some basic stats but probably will never need to really understand the language or do any development.
It also provides guardrails and encourages best practices which I find a bit to paternalistic and annoying but again I can see the value.
I think most R users would be surprised and just how much tidyverse functionality is hidden in base R but majority of the dplyr versions of functions have at least some intended improvement over the base R versions, and some are a massive improvement in functionality.
For example in a typical script the only tidyverse package I may load besides ggplot2 is tidyr, because the pivot_ wider/longer() functions really do solve a problem that was not fun in base R.