Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't want to say "advantage", so much as preference. But a few things come to mind.

- Lots of high quality statistical libraries, for one thing.

- RStudio's RMarkown is great; I prefer it to Jupyter Notebook.

- I personally found the syntax more intuitive, easier to pick up. I don't usually find myself confused about the structure of the objects I'm looking at. For whatever reason, the "syntax" of pandas doesn't square well (in my opinion) with python generally. I certainly want to just use python. But, shrug.

- The tidyverse package, especially the pipe operator %>%, which afaik doesn't have an equivalent in Python. E.g.

    with_six_visits <- task_df %>%
      group_by(turker_id, visit) %>%
      summarise(n_trials = n_distinct(trial_num)) %>%
      mutate(completed_visit = n_trials>40) %>%
      filter(completed_visit) %>%
      summarise(n_visits = n_distinct(visit)) %>%
      mutate(six_visits = n_visits >= 6) %>%
      filter(six_visits) %>%
      ungroup()
Here I'm filtering participants in an mturk study by those who have completed more than 40 trials at least six times across multiple sessions. It's not that I couldn't do the same transformation in pandas, but it feels very intuitive to me doing it this way.

- ggplot2 for plotting; its really powerful data visualization package.

Truthfully, I often do my data text parsing in Python, and then switch over to R for analysis, E.g. python's JSON parsing works really well.



I can see how this is more intuitive. In pandas I'd assign the output of groupby to a variable, and then add the new column in a separate statement.

(The below is off topic, but I don't use R so I'd love to know whether I'm reading the code correctly)

"Here I'm filtering participants in an mturk study by those who have completed more than 40 trials at least six times across multiple sessions."

A user with this pattern of trials seems like they would fit the above definition:

Session 1: 82 trials Session 2: 82 trials Session 3: 82 trials

But the code seems to want 6 distinct sessions with >40 trials each. Have I misunderstood?

Also, is 'mutate' necessary before 'filter' or is that just to make the intent of the code clearer to your future self?


My initial wording was sloppy.

There were 50 trials in each session; so I counted a session completed if they did more than 40 in that session. They needed to have completed at least six sessions.

The mutate is unnecessary. I forget why I did that.


What it woul take to recreate dplyr in python:

https://mchow.com/posts/2020-02-11-dplyr-in-python/


Didn’t R introduce the native pipe operator?

%>% is now simply >|


They did. I just haven't gotten around to using it yet!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: