Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The demo is impressive but personally, as a commercial user, for my practical use cases, the only thing I care about is how smart it is, how accurate are its answers and how vast is its knowledge. These haven’t changed much since GPT-4, yet they should, as IMHO it is still borderline in its abilities to be really that useful


But that's not the point of this update


I know, and I know my comment is dismissive of the incredible work shown here, as we’re shown sci-fi level tech. But I feel I have this kettle, that boils water in 10min, and it really should boil it in 1, but instead is now voice operated.

I hope the next version delivers on being smarter, as this update instead of making me excited, makes me feel they’ve reached a plateau on the improvement of the core value and are distracting us with fluff instead


Everything is amazing & Nobody is happy: https://www.youtube.com/watch?v=PdFB7q89_3U


gpt4 isn't quite "amazing" in terms of commercial use. Gpt4 is often good, and also often mediocre or bad. Its not going to change the world, it needs to get better.


Near real-time voice feedback isn't amazing? Has the bar risen this high?

I already know an application for this, and AFAIK it's being explored in the SaaS space: guided learning experiences and tutoring for individuals.

My kids, for instance, love to hammer Alexa with random questions. They would spend a huge amount of time using a better interface, esp. with quick feedback, that provided even deeper insight and responses to them.

Taking this and tuning it to specific audiences would make it a great tool for learning.


"My kids, for instance, love to hammer Alexa with random questions. They would spend a huge amount of time using a better interface, esp. with quick feedback, that provided even deeper insight and responses to them."

Great, using GPT-4 the kids will be getting a lot of hallucinated facts returned to them. There are good use cases for tranformer currently but they're not at the "impact company earnings or country GDP" stage currently, which is the promise that the whole industry has raised/spent 100+B dollars on. Facebook alone is spending 40B on AI. I believe in the AI future, but the only thing that matters for now is that the models improve.


I always double-check even the most obscure facts returned by GPT-4 and have yet to see a hallucination (as opposed to Claude Opus that sometimes made up historical facts). I doubt stuff interesting to kids would be so out of the data distribution to return a fake answer.

Compared to YouTube and Google SEO trash, or Google Home / Alexa (which do search + wiki retrieval), at the moment GPT-4 and Claude are unironically safer for kids: no algorithmic manipulation, no ads, no affiliated trash blogs, and so on. Bonus is that it can explain on the level of complexity the child will understand for their age



My kids get erroneous responses from Alexa. This happens all the time. The built-in web search doesn't provide correct answers, or is confusing outright. That's when they come to me or their Mom and we provide a better answer.

I still see this as a cool application. Anything that provides easier access to knowledge and improved learning is a boon.

I'd rather worry about the potential economic impact than worry about possible hallucinations from fun questions like "how big is the sun?" or "what is the best videogame in the world?", etc.

There's a ton you can do here, IMO.

Take a look at mathacademy.com, for instance. Now slap a voice interface on it, provide an ability for kids/participants to ask questions back and forth, etc. Boom: you've got a math tutor that guides you based on your current ability.

What if we could get to the same style of learning for languages? For instance, I'd love to work on Spanish. It'd be far more accessible if I could launch a web browser and chat through my mic in short spurts, rather than crack open Anki and go through flash cards, or wait on a Discord server for others to participate in immersive conversation.

Tons of cool applications here, all learning-focused.


People should be more worried about how much this will be exploited by scammers. This thing is miles ahead of the crap fraudsters use to scam MeeMaw out of her life savings.


It's an impressive demo, it's not (yet) an impressive product.

It seems like the people who are ohhing and ahhing at the former and the people who are frustrated that this kind of this is unbelivably impractical to productize will be doomed to talk past one another forever. The text generation models, image generation models, speech-to-text and text-to-speech have reached impressive product stages. Multi-model hasn't got there because no one is really sure what to actually do with the thing outside of make cool demos.


Multi modal isn't there because "this is an image of a green plant" is viable in a demo, but its not commercially viable. "This is an image of a monstera deliciosa" is commercially viable, but not yet demoable. The models need to improve to be usable.


Sure, but "not enough, I want moar" is a trivial demand. So trivial that it goes unsaid.


It's equivalent to "nothing to see here" which is exactly the TLDR I was looking for.


Watch the last few minutes of that linked video, Mira strongly hints that there’s another update coming for paid users and seems to make clear that GPT4o is moreso for free tier users (even though it is obviously a huge improvement in many features for everyone).


There is room for more than one use case and large language model type.

I predict there will be a zoo (more precisely tree, as in "family tree") of models and derived models for particular application purposes, and there will be continued development of enhanced "universal"/foundational models as well. Some will focus on minimizing memory, others on minimizing pre-training or fine-tuning energy consumption, some need high accuracy, others hard realtime speed, yet others multimodality like GPT4.o, some multilinguality, and so on.

Previous language models that encoded dictionaries for spellcheckers etc. never got standardized (for instance, compare aspell dictionaries to the ones from LibreOffice to the language model inside CMU PocketSphinx) so that you could use them across applications or operating systems. As these models are becoming more common, it would be interesting to see this aspect improve this time around.

https://www.rev.com/blog/resources/the-5-best-open-source-sp...


I disagree, transfer learning and generalization are hugely powerful and specialized models won't be as good because their limited scope limits their ability to generalize and transfer knowledge from one domain to another.

I think people who emphasis specialized models are operating under a false assumption that by focusing the model it'll be able to go deeper in that domain. However, the opposite seems to be true.

Granted, specialized models like AlphaFold are superior in their domain but I think that'll be less true as models become more capable at general learning.


They say it's twice as fast/cheap, which might matter for your use case.


It's twice as fast/cheap relative to GPT-4-turbo, which is still expensive compared to GPT-3.5-turbo and Claude Haiku.

https://openai.com/api/pricing/


For commercial use at scale, of course cost matters.

For the average Joe programmer like me, GPT4 is already "dirt cheap". My typical monthly bill is $0-3 using it as much as I like.

The one time it was high was when I had it take 90+ hours of Youtube video transcripts, and had it summarize each video according to the format I wanted. It produced about 250 pages of output.

That month I paid $12-13. Well worth it, given the quality of the output. And now it'll be less than $7.

For the average Joe, it's not expensive. Fast food is.


but better afaik


But may not be better enough to warrant the cost difference. LLM cost econonmics are complicated.


I’d much rather have it be slower, more expensive, but smarter


Depends what you want it for. I'm still holding out for a decent enough open model, Llama 3 is tantalisingly close, but inference speed and cost are serious bottlenecks for any corpus-based use case.


I think, that might come with the next GPT version.

OpenAI seems to build in cycles. First they focus on capabilities, then they work on driving the price down (occasionally at some quality degradation)


Then the current offering should suffice, right?


I understand your point, and agree that it is "borderline" in its abilities — though I would instead phrase it as "it feels like a junior developer or an industrial placement student, and assume it is of a similar level in all other skills", as this makes it clearer when it is or isn't a good choice, and it also manages expectations away from both extremes I frequently encounter (that it's either Cmdr Data already, or that's it's a no good terrible thing only promoted by the people who were previously selling Bitcoin as a solution to all the economics).

That said, given the price tag, when AI becomes genuinely expert then I'm probably not going to have a job and neither will anyone else (modulo how much electrical power those humanoid robots need, as the global electricity supply is currently only 250 W/capita).

In the meantime, making it a properly real-time conversational partner… wow. Also, that's kinda what you need for real-time translation, because: «be this, that different languages the word order totally alter and important words at entirely different places in the sentence put», and real-time "translation" (even when done by a human) therefore requires having a good idea what the speaker was going to say before they get there, and being able to back-track when (as is inevitable) the anticipated topic was actually something completely different and so the "translation" wasn't.


I guess I feel like I’ll get to keep my job a while longer and this is strangely disappointing…

A real time translator would be a killer app indeed, and it seems not so far away, but note how you have to prompt the interaction with ‘Hey ChatGPT’; it does not interject on its own. It is also unclear if it is able to understand if multiple people are speaking and who’s who. I guess we’ll see soon enough :)


> It is also unclear if it is able to understand if multiple people are speaking and who’s who. I guess we’ll see soon enough :)

Indeed; I would be pleasantly surprised if it can both notice and separate multiple speakers, but only a bit surprised.


One thing I've noticed, is the more context and more precise the context I give it the "smarter" it is. There are limits to it of course. But, I cannot help but think that's where next barrier will be brought down. An agent or multiple of that tag along with everything I do throughout the day to have the full context. That way, I'll get smarter and more to the point help as well as not spending much time explaining the context.. but, that will open a dark can that I'm not sure people will want to open - having an AI track everything you do all the time (even if only in certain contexts like business hours / env).


There are definitely multiple dimensions these things are getting better in. The popular focus has been on the big expensive training runs but inference , context size, algorithms, etc are all getting better fast


I have a few LLM benchmarks that were extracted from real products.

GPT-4o got slightly better overall. Ability to reason improved more than the rest.


Its faster, smarter and cheaper over the API. Better than a kick in the teeth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: