Nice idea! Em dashes were giveaways for AIs and typos for human, at least in the ones I did, so those are at least trivial. So might have to do some filtering at least for those.
Some were hard though, yeah (at least if not looking longer than 5-10 seconds). Btw, it seemed more logical to me to just see a green/red card when you click, i.e. right choice or wrong choice. Getting red for the correct answer confused me a bit (but this might just be me).
Also for example this one has a giveaway for the human case: "There are lots of great people here at /r/personalfinance" (actually, not sure if that is a giveaway, that was my guess, but depends on how the model was prompted, I guess). And human ones often seem to have two spaces sometimes instead of one, idk why. If you want to get a serious dataset, maybe you could use this one to find all the flaws and perfect it, and then try to get a real dataset from the next one? People will be more eager to help too if they've seen you designed it all very carefully. (Or you could filter the results from this one to make it a good dataset if you get lots of responses.)
You'd be surprised at the nuances we tend to miss :)
This time around I prompted the models not necessarily to be adversarial - i didn't ask them to try and fool the reader. But i gave them contextual info - something to the effect of "you're a user posting on hacker news"
True, if you look for all "obvious patterns" and filter those out of the dataset, not much will be left probably. Maybe the best is then to just publish as complete a dataset as possible, so all questions, all user answers, for each user the nr of questions they did, time for each question, etc. Then people using that dataset can draw their own conclusions.
Thanks for checking it out! The color signal is useful feedback. Let me think about it and rework!
Yeah there are some very obvious tells, but the models that are most capable are very good at writing like human.
Especially when the human responses for reddit or HN prompts were presumably made after reading the content of the article or the post; whilw the model is simply going off of the title.
Some were hard though, yeah (at least if not looking longer than 5-10 seconds). Btw, it seemed more logical to me to just see a green/red card when you click, i.e. right choice or wrong choice. Getting red for the correct answer confused me a bit (but this might just be me).