> A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process.
It can be used for some decision (i.e. not critical ones), but it should NOT be used to accused someone of academic misconduct unless the tool meets a very robust quality standard.
The AI tool doesn't give accurate results. You don't know when it's not accurate. There is no accurate way to check its results. Who should use a tool to help them make a decision when you don't know when the tool will be wrong and it has a low rate of accuracy? It's in the article.
Nearly everything doesn't give 100% accurate results. Even CPUs have had bugs their calculation. You have to use a suitable tool for a suitable job with the correct context while understanding it's limitation to apply it correctly. Now that is proper engineering. You're partially correctly but you're overstating:
> A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process.
That's totally wrong and an overstated position.
A better position is that some tools have such a low accuracy rate that they shouldn't be used for their intended purpose. Now that position I agree with it. I accept that CPUs may give incorrect results due to a cosmic ray event, but I wouldn't accept a CPU that gives the wrong result for 1/100 instructions.
That sounds like a less serious problem—if the tool highlights the allegedly plagarized sections, at worst the author can conclusively prove it false with no additional research (though that burden should instead be on the tool’s user, of course). So it’s at least possible to use the tool to get meaningful results.
On the other hand, an opaque LLM detector that just prints “that was from an LLM, methinks” (and not e.g. a prompt and a seed that makes ChatGPT print its input) essentially cannot be proven false by an author who hasn’t taken special precautions against being falsely accused, so the bar for sanctioning people based on its output must be much higher (infinitely so as far as I am concerned).
ChatGPT isn't the only AI. It is possible, and inevitable, to train other models specifically to avoid detection by tools designed to detect ChatGPT output.
The whole silly concept of an "AI detector" is a subset of an even sillier one: the notion that human creative output is somehow unique and inimitable.
It can be used for some decision (i.e. not critical ones), but it should NOT be used to accused someone of academic misconduct unless the tool meets a very robust quality standard.
> this tool is as reliable as a magic 8-ball
Citation needed