Where are you getting that from? (And again, no more "human in the loop" in "reading WebMd" than "talk to chatbot.")
> Participants using an LLM identified relevant conditions less consis-
tently than those in the control group, identifying at least one relevant condition in at most 34.5% of cases compared to 47.0% for the control.
So good old "do your own research" (hardly a gold standard, still, too, at 47%) is doing like 35% better for people than "talk to the chatbot."
The more interesting part is:
> We found that the LLMs suggested at least one relevant condition in at least 65.7% of conversations with participants [...] with observed cases
of participants providing incomplete information and LLMs misinterpreting prompts
since this is nearly double the rate at which participants actually came away with a relevant condition identification, suggesting that the bots are way worse at the interactions than they are at the information. That's presumably trainable, but it also requires a certain patience and willingness on the part of the human, which seems like a bit of a black art for a machine to be able to learn how to coax out of everyone all the time.
But it's not just a failure to convince, it's also a failure to elicit the right information and/or understand it - the LLM being prompted in a controlled fashion, vs having to have a conversation with the participant, found at least one relevant condition even more often still!