My take is that Altman recognizes LLM winter is coming and is trying to entrench...

dartos · on Sept 25, 2024

I don’t think we’re gonna see a winter. LLMs are here to stay. Natural language interfaces are great. Embeddings are incredibly useful.

They just won’t be the hottest thing since smartphones.

Yizahi · on Sept 25, 2024

LLMs as programs are here to stay. The issue is with expenses/revenue ratio all these LLM corpos have. According to Sequoia analyst (so not some anon on a forum) there is a giant money hole in that industry, and "giant" doesn't even begins to describe it (iirc it was 600bln this summer). That whole industry will definitely see winter soon, even if all things Altman says would be true.

015a · on Sept 25, 2024

You just described what literally anyone who says "AI Winter" means; the technology doesn't go away, companies still deploy it and evolve it, customers still pay for it, it just stops being so attractive to massive funding and we see fewer foundational breakthroughs.

ForHackernews · on Sept 25, 2024

They're useful in some situations, but extremely expensive to operate. It's unclear if they'll be profitable in the near future. OpenAI seems to be claiming they need an extra $XXX billion in investment before they can...?

xtracto · on Sept 25, 2024

I just made a (IMHO) cool test with OpenAI/Linux/TCL-TK:

"write a TCL/tk script file that is a "frontend" to the ls command: It should provide checkboxes and dropdowns for the different options available in bash ls and a button "RUN" to run the configured ls command. The output of the ls command should be displayed in a Text box inside the interface. The script must be runnable using tclsh"

It didn't get it right the first time (for some reason wants to put a `mainloop` instruction) but after several corrections I got an ugly but pretty functional UI.

Imagine a Linux Distro that uses some kind of LLM generated interfaces to make its power more accessible. Maybe even "self healing".

LLMs don't stop amazing me personally.

ethbr1 · on Sept 25, 2024

The issue (and I think what's behind the thinking of AI skeptics) is previous experience with the sharp edge of the Pareto principle.

Current LLMs being 80% to being 100% useful doesn't mean there's only 20% effort left.

It means we got the lowest-hanging 80% of utility.

Bridging that last 20% is going to take a ton of work. Indeed, maybe 4x the effort that getting this far required.

And people also overestimate the utility of a solution that's randomly wrong. It's exceedingly difficult to build reliable systems when you're stacking a 5% wrong solution on another 5% wrong solution on another 5% wrong solution...

nebula8804 · on Sept 25, 2024

Thank You! You have explained the exact issue I (and probably many others) are seeing trying to adopt AI for work. It is because of this I don't worry about AI taking our jobs for now. You still need somewhat foundational knowledge in whatever you are trying to do in order to get that remaining 20%. Sometimes this means pushing back against the AI's solution, other times it means reframing the question, and other times its just giving up and doing the work yourself. I keep seeing all these impressive toy demos and my experience (Angular and Flask dev) seem to indicate that it is not going to replace any subject matter expert anytime soon. (And I am referring to all the three major AI players as I regularly and religiously test all their releases).

>And people also overestimate the utility of a solution that's randomly wrong. It's exceedingly difficult to build reliable systems when you're stacking a 5% wrong solution on another 5% wrong solution on another 5% wrong solution...

I call this the merry go round of hell mixed with a cruel hall of mirrors. LLM spits out a solution with some errors, you tell it to fix the errors, it produces other errors or totally forgets important context from one prompt ago. You then fix those issues, it then introduces other issues or messes up the original fix. Rinse and repeat. God help you if you don't actually know what you are doing, you'll be trapped in that hall of mirrors for all of eternity slowly losing your sanity.

theGnuMe · on Sept 26, 2024

and here we are arguing for internet points.

tomrod · on Sept 26, 2024

Much more meaningful to this existentialist.

dartos · on Sept 26, 2024

It can work with things of very limited scope, like that you describe.

I wrote some data visualizations with Claude and aider.

For anything that someone would actually pay for (expecting the robustness of paid-for software) I don’t think we’re there.

The devil is in the details, after all. And detail is what you lose when running reality through a statistical model.

therouwboat · on Sept 25, 2024

Why make tool when you can just ask AI to give you filelist or files that you need?

eastbound · on Sept 25, 2024

It’s a glorified grammar corrector?

stocknoob · on Sept 25, 2024

TIL Math Olympiad problems are simple grammar exercises.

dartos · on Sept 26, 2024

They do way more than correcting grammar, but tbf, they did make something like 10,000 submissions to the math Olympiad to get that score.

It’s not like it’ll do it consistently.

Just a marketing stunt.

boroboro4 · on Sept 27, 2024

You’re talking about informatics Olympiad and O-1. As for Google’s DeepMind network and math Olympiad it didn’t do 10000 submissions. It did however generated bunch of different solutions but it was all automatic (and consistent). We’re getting there.

dartos · on Sept 30, 2024

I wouldn’t really put AlphaProof in the came category as o1, Claude, or llama.

It was trained to generate text in the lean language (https://www.lean-lang.org/) which is specifically used for formal proofs.

It’s not a natural language model.

Source: https://deepmind.google/discover/blog/ai-solves-imo-problems...

Google seems to mainly be playing the game of more specialized models (AlphaGo, AlphaProof) with general training methods (AlphaZero)

I do think it’s kind of funny that they mention AGI in that article, but the model is specifically not general.

ben_w · on Sept 25, 2024

If you consider responding to this:

"oi i need lik a scrip or somfing 2 take pic of me screen evry sec for min, mac"

with an actual (and usually functional) script to be "glorified grammar corrector", then sure.

CharlieDigital · on Sept 25, 2024

Not really.

I think actually the best use case for LLMs is "explainer".

When combined with RAG, it's fantastic at taking a complex corpus of information and distilling it down into more digestible summaries.

bot347851834 · on Sept 25, 2024

Can you share an example of a use case you have in mind of this "explainer + RAG" combo you just described?

I think that RAG and RAG-based tooling around LLMs is gonna be the clear way forward for most companies with a properly constructed knowledge base but I wonder what you mean by "explainer"?.

Are you talking about asking an LLM something like "in which way did the teams working on project X deal with Y problem?" and then having it breaking it down for you? Or is there something more to it?

nebula8804 · on Sept 25, 2024

I'm not the OP but I got some fun ones that I think are what you are asking? I would also love to hear others interesting ideas/findings.

1. I got this medical provider that has a webapp that downloads graphql data(basically json) to the frontend and shows some of the data to the template as a result while hiding the rest. Furthermore, I see that they hide even more info after I pay the bill. I download all the data, combine it with other historical data that I have downloaded and dumped it into the LLM. It spits out interesting insights about my health history, ways in which I have been unusually charged by my insurance, and the speed at which the company operates based on all the historical data showing time between appointment and the bill adjusted for the time of year. It then formats everything into an open format that is easy for me to self host. (HTML + JS tables). Its a tiny way to wrestle back control from the company until they wise up.

2. Companies are increasingly allowing customers to receive a "backup" of all the data they have on them(Thanks EU and California). For example Burger King/Wendys allow this. What do they give you when you request data? A zip file filled with just a bunch of crud from their internal system. No worries: Dump it into the LLM and it tells you everything that the company knows about you in an easy to understand format (Bullet points in this case). You know when the company managed to track you, how much they "remember", how much money they got out of you, your behaviors, etc.

dartos · on Sept 30, 2024

1. The cynic in my really doesn’t want to send my medical records to openai.

2. I think if the data was significantly large, the llm would alias a ton of potentially important info.

tomrod · on Sept 26, 2024

#1 would be a good FLOSS project to release out.

I don't understand enough about #2 to comment, but it's certainly interesting.

CharlieDigital · on Sept 26, 2024

If you go to https://clinicaltrials.gov/, you can see almost every clinical trial that's registered in the US.

Some trials have their protocols published.

Here's an example trial: https://clinicaltrials.gov/study/NCT06613256

And here's the protocol: https://cdn.clinicaltrials.gov/large-docs/56/NCT06613256/Pro... It's actually relatively short at 33 pages. Some larger trials (especially oncology trials) can have protocols that are 200 pages long.

One of the big challenges with clinical trials is making this information more accessible to both patients (for informed consent) and the trial site staff (to avoid making mistakes, helping answer patient questions, even asking the right questions when negotiating the contract with a sponsor).

The gist of it here is exactly like you said: RAG to pull back the relevant chunks of a complex document like this and then LLM to explain and summarize the information in those chunks that makes it easier to digest. That response can be tuned to the level of the reader by adding simple phrases like "explain it to me at a high school level".

theGnuMe · on Sept 26, 2024

What's your experience with clinical trials?

CharlieDigital · on Sept 26, 2024

Built regulated document management systems for supporting clinical trials for 14 years of my career.

The last system, I led one team competing for the Transcelerate Shared Investigator Portal (we were one of the finalist vendors).

Little side project: https://zeeq.ai

chinathrow · on Sept 25, 2024

Looking at ChatGPT or Claude coding output, it's already here.

criticalfault · on Sept 25, 2024

Bad?

I just tried Gemini and it was useless.

mnk47 · on Sept 25, 2024

Starting to wonder why this is so common in LLM discussions at HN.

Someone says "X is the model that really impressive. Y is good too."

Then someone responds "What?! I just used Z and it was terrible!"

I see this at least once in practically every AI thread

tomrod · on Sept 26, 2024

Humans understand mean but struggle with variance.

rpmisms · on Sept 26, 2024

It depends on what you're writing. GPT-4 can pump out average React all day long. It's next to useless with Laravel.

fzzzy · on Sept 25, 2024

You're the one that chose to try Gemini for some reason.

andrewinardeer · on Sept 25, 2024

Google ought to hang its head in utter disgrace over the putrid swill they have the audacity to peddle under the Gemini label.

Their laughably overzealous nanny-state censorship, paired with a model so appallingly inept it would embarrass a chatbot from the 90s, makes it nothing short of highway robbery that this digital dumpster fire is permitted to masquerade as a product fit for public consumption.

The sheer gall of Google to foist this steaming pile of silicon refuse onto unsuspecting users borders on fraudulent.

piuantiderp · on Sept 26, 2024

A cash out