Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good summary of some of the main "theoretical" criticism of LLMs but I feel that it's a bit dated and ignores the recent trend of iterative post-training, especially with human feedback. Major chatbots are no doubt being iteratively refined on the feedback from users i.e. interaction feedback, RLHF, RLAIF. So ChatGPT could fall within the sort of "enactive" perspective on language and definitely goes beyond the issues of static datasets and data completeness.

Sidenote: the authors make a mistake when citing Wittgenstein to find similarity between humans and LLMs. Language modelling on a static dataset is mostly not a language game (see Bender and Koller's section on distributional semantics and caveats on learning meaning from "control codes")



FWIW even more recently, models have been tuned using a method called DPO instead of RLHF.

IIRC DPO doesn’t have human feedback in the loop


it does. that's what the "direct preference" part of DPO means. you just avoid training an explicit reward model on it like in rlhf and instead directly optimize for log probability of preferred vs dispreferred responses


What is it called when humans interact with a model through lengthy exchanges (mostly humans correcting the model’s responses to a posed question to the model, mostly through chat and labeling each statement by the model as correct or not), and then all of that text (possibly with some editing) is fed to another model to train that higher model?

Does this have a specific name?


I don’t think that process has a specific name. It’s just how training these models works.

Conversations you have with like chatgpt are likely stored, then sorted through somehow, then added to an ever growing dataset of conversations that would be used to train entirely new models.


DPO most essentially has human feedback, depends on what the preference optimizations are




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: