edit: I'm not sure what jdkwkbdbs means (dead-banned) by "LLMs don't. ML works p...

edit: I'm not sure what jdkwkbdbs means (dead-banned) by "LLMs don't. ML works pretty well." (well, I do); LLMs solve certain tasks -- and really interesting ones, at that, too -- at certain average cross-entropy loss levels, and typically form the Pareto front of the models of all sizes in terms of parameter count, at least once you start looking at the biggest that we have, e.g. start above a certain parameter count; and they typically increase in accuracy with respect to parameter count in a statistically significant fashion.

In essence, they represent the state of the art with respect to those specific tasks, as measured at the current time. Though you may desire for there to be a better (cross entropy loss, accuracy percentage so 100% acc) tuple at any given instant (e.g. (optimal expected ce-loss, an actual proof of correctness)), for the current time, it seems like not only do they do the best as a class of models for these sets of tasks, but also improve the fastest in terms of accuracy as a class of models as of late. They're quite noteworthy in that regard, fundamentally, imo. Just my 2c.