Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process.

It can be used for some decision (i.e. not critical ones), but it should NOT be used to accused someone of academic misconduct unless the tool meets a very robust quality standard.

> this tool is as reliable as a magic 8-ball

Citation needed



The AI tool doesn't give accurate results. You don't know when it's not accurate. There is no accurate way to check its results. Who should use a tool to help them make a decision when you don't know when the tool will be wrong and it has a low rate of accuracy? It's in the article.


> The AI tool doesn't give accurate results.

Nearly everything doesn't give 100% accurate results. Even CPUs have had bugs their calculation. You have to use a suitable tool for a suitable job with the correct context while understanding it's limitation to apply it correctly. Now that is proper engineering. You're partially correctly but you're overstating:

> A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process.

That's totally wrong and an overstated position.

A better position is that some tools have such a low accuracy rate that they shouldn't be used for their intended purpose. Now that position I agree with it. I accept that CPUs may give incorrect results due to a cosmic ray event, but I wouldn't accept a CPU that gives the wrong result for 1/100 instructions.


The thread is about tools to evaluate LLMs. Please re-read my comment in that light and generously assume I'm talking about that.


Your comment applies to all these tools though lol. No need to clarify, it's all a probabilistic machine that's very unreliable.


>"should NOT be used to accused someone of academic misconduct unless the tool meets a very robust quality standard."

Meanwhile, the leading commercial tools for plagiarism detection often flag properly cited/annotated quotes from sources in your text as plagiarism.


That sounds like a less serious problem—if the tool highlights the allegedly plagarized sections, at worst the author can conclusively prove it false with no additional research (though that burden should instead be on the tool’s user, of course). So it’s at least possible to use the tool to get meaningful results.

On the other hand, an opaque LLM detector that just prints “that was from an LLM, methinks” (and not e.g. a prompt and a seed that makes ChatGPT print its input) essentially cannot be proven false by an author who hasn’t taken special precautions against being falsely accused, so the bar for sanctioning people based on its output must be much higher (infinitely so as far as I am concerned).


I agree. Just noting the bar is very low for these tools, which may have set low expectations.


ChatGPT isn't the only AI. It is possible, and inevitable, to train other models specifically to avoid detection by tools designed to detect ChatGPT output.

The whole silly concept of an "AI detector" is a subset of an even sillier one: the notion that human creative output is somehow unique and inimitable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: