Hacker Newsnew | past | comments | ask | show | jobs | submit | jader201's commentslogin

Both your post and the parent post can be true.

Am I the only one that feels like Claude is clearly winning code generation, and Gemini in general LLM?

I just don’t feel like OpenAI has a legitimate shot at winning any of the AI battles.

Therefore, I feel like “Sam Altman may control our future” is a far stretch.


Well I just canceled my Claude Pro subscription because of the mysterious limits that I don't experience with codex, even after paying for "extra usage". If Anthropic can't figure out their capacity problems they are in trouble.

I doubt Anthropic see this as their capacity problem. They like "extra usage", and users who don't, well its their capacity problem.

The preference rankings keep fluctuating on every release for me. A year ago it was Gemini dominating coding tasks, then it was Claude, now it is the latest Codex again. With the next point release(s) the cycle will continue.

>>and Gemini in general LLM?

You might be. Or at least I feel like Gemini is actually dumber than a house of bricks - I have multiple examples, just from last week, where following its advice would have lead to damage to equipment and could have hurt someone. That's just trying to work on an electronics project and askin Gemini for advice based on pictures and schematics - it just confidently states stuff that is 100000% bullshit, and I'm so glad that I have at least a basic understanding of how this stuff works or I would have easily hurt myself.

It's somewhat decent at putting together meal plans for me every week, but it just doesn't follow instructions and keeps repeating itself. It hardly feels worth any money right now, like it's some kind of giant joke that all these companies are playing on us, spending billions of these talking boxes that don't seem that intelligent.

I also use claude at work, and for C++ programming it behaves like someone who read a C++ book once and knows all the keywords, but has never actually written anything in C++ - the code it produces is barely usable, and only in very very small portions.

Edit: I just remembered another one that made me incredibly angry. I've been reading the Neuromancer on and off, and I got back into it, but to remind myself of the plot I asked Gemini to summarise the plot only up to chapter 14, and I specifically included the instruction that it should double check it's not spoiling anything from the rest of the book. Lo and behold, it just printed out the summary of the ending and how the characters actions up to chapter 14 relate to it. And that was in the "Pro" setting too. Absolute travesty. If a real life person did that I'd stop being friends with them, but somehow I'm paying money for this. Maybe I'm the clown here.


I'm curious: did you give Gemini the entire text of Neuromancer or did you expect it to use search results for chapters 1 to 14?

I would have just fed it the text of chapters 1 to 14 from a non drm copy.


I just asked like I said, give me plot summary until chapter 14, don't spoil the rest of the book. And of course when I told it what it just did it was like oh I'm sorry, here's a summary without the spoilers for the ending. So clearly it could do it without additional context.

I wouldn't expect any LLM to be able to respect such a request. Do they even have direct access to published works to use as reference material?

Also, last time I played 20 questions with ChatGPT, it needed 97 turns and tons of my active hinting to get the answer.


>>Do they even have direct access to published works to use as reference material?

I mean, clearly, given that it did answer my question eventually. Also wasn't it a whole thing that these models got trained on entire book libraries(without necessarily paying for that).

>>I wouldn't expect any LLM to be able to respect such a request

Why though? They seem to know everything about everything, why not this specifically. You can ask it to tell you the plot of pretty much any book/film/game made in the last 100 years and it will tell you. Maybe asking about specific chapters was too much, but Neuromancer exists in free copies all over the internet and it's been discussed to death, if it was a book that came out last year then ok, fair enough, but LLMs had 40 years of discussions about Neuromancer to train on.

But besides, regardless of everything else - if I say "don't spoil the rest of the book" and your response includes "in the last chapter character X dies" then you just failed at basic comprehension? Whether an LLM has any knowledge of the book or not, whether that is even true or not, that should be an unacceptable outcome.


I wouldn't expect an AI to know exactly what happens in every chapter of a book.

Knowing the plot of Neuromancer isn't the same as being able to recite a chapter by chapter summary.

I tried this Neuromancer query a few times and results greatly vary with each regeneration but "do not include spoilers" seems to make Gemuni give more spoilers, not less.


>>I wouldn't expect an AI to know exactly what happens in every chapter of a book.

Cool. I did - and turns out it can do it, just not without giving me some spoilers first.

>>vary with each regeneration but "do not include spoilers" seems to make Gemuni give more spoilers, not less.

I'm glad I'm not the only one experiencing this then.


> and turns out it can do it

Not really- if you had examined the output closely you probably would have seen noticed it conflated chapter 13 and 14 or 14 and 15. Or you got very lucky on a generation. It definitely doesn't exactly know what happens in each chapter unless it has a reference to check.


Why though? They seem to know everything about everything, why not this specifically.

The problem with this line of reasoning is that it is unscientific. "They seem to" is not good enough for an operational understanding of how LLMs work. The whole point of training is to forget details in order to form general capability, so it is not surprising if they forget things about books if the system deemed other properties as more important to remember.


>> if they forget things about books if the system deemed other properties as more important to remember.

I will repeat for the 3rd time that it's not a problem with the system forgetting the details, quite the opposite.

>>The problem with this line of reasoning is that it is unscientific.

How do you scientifically figure out if the LLM knows something before actually asking the question, in case of a publicly accessible model like Gemini?

Just to be clear - I would be about 1000000x less upset if it just said "I don't know" or "I can't do that". But these models are fundamentally incapable of realizing their own limits, but that alone is forgivable - them literally ignoring instructions is not.


how is gemini winning in general llm. what is general llm .

General LLM is what Apple is paying Google for.

I noticed that Apple speech to text has gotten pretty good lately. Is that because they’re paying Google? Not sure I use other AI features from Apple as I have my Siri turned off.

> Is that because they’re paying Google?

No, the Google deal hasn't shipped yet.


Actual study: https://jamanetwork.com/journals/jama/article-abstract/28447...

”After adjusting for potential confounders and pooling results across cohorts, higher caffeinated coffee intake was significantly associated with lower dementia risk (141 vs 330 cases per 100 000 person-years comparing the fourth [highest] quartile of consumption with the first [lowest] quartile; hazard ratio, 0.82 [95% CI, 0.76 to 0.89]) and lower prevalence of subjective cognitive decline (7.8% vs 9.5%, respectively; prevalence ratio, 0.85 [95% CI, 0.78 to 0.93]).”

So about 18% relative reduction. But if your risks are already low (e.g. active and healthy diet) the relative reduction is less impactful (e.g. 4% to 3.28%).


> the relative reduction is less impactful (e.g. 4% to 3.28%

That's also an 18% reduction


I think what he means is a reduction of 18% based on 4% is way less than 18% based on 80%.


Percents of percents always felt kludgey.

Log probabilities (like decibans) unify this to say there is a -0.86 dB risk reduction for everybody.

https://rationalnumbers.james-kay.com/?p=306

It makes the math of combining risks easier and works the same even if we're operating near 99.999% or 0.0001%


That’s exactly my point.

If someone is high risk, say 20%, then an 18% drop from that is 14.4%. That may justify picking up caffeine.

But if you’re otherwise healthy, picking up caffeine has diminishing returns, and the downsides may not be worth it.


I wonder how cool it would be to have a live ephemeral chat for each channel?

One thing I love(d) about live TV (or even live radio) was the community around knowing other people were watching the exact same thing I was watching (and then the watercooler chat around it afterwards).

If there was live chat attached to each of these "stations", it could spark some interesting chatter/community.

I know this already exists OOTB with YouTube Live, FB Live, etc.

But this would be for things that were simply uploaded, and now streamed live like you're doing here.

Obviously, that only works if there's enough viewership/participation.


I mean, this argument isn’t really specific to banking apps. This could apply to any native vs. web app, in general.

Native apps can provide a bit more streamlined UX (e.g. Face ID), while also being able to provide more robust features (mobile deposit).

The downsides are arguably higher development costs / OS compatibility, and having to install a separate app.


Can we also add “Don’t complain about AI-generated content. It does not promote interesting discussion.”?

I see this all the time, and even if I find the topic interesting, I don’t want to see comments littered with discussion about how the content was AI generated.

To be clear, I'm not condoning AI-generated content. I’m completely fine if the community chooses to not upvote AI-generated content, or flagging it off the FP.

But many threads can turn into nothing but AI complaints, and it’s just not interesting.


From my experience, it usually happens when people are too brazen about it, with boring stuff like "Interesting! Now here's what Gemini said about the above..". IMHO that is an entirely adequate reaction.


I’m mostly referring to responding to the article itself (allegedly) being AI-written. Then the top half of the thread is derailed by a discussion about the article itself being AI-written.


Now instead of derailing the convo with a complaint, you can just flag it.


The thing is, New Coke was at least an attempt (if failed attempt) to improve Coke for consumers.

I don’t get the impression Microsoft has any desire to improve Windows for the consumer — they’re trying to improve it for Microsoft.


Previously (30 days ago, 355 points, 268 comments):

https://news.ycombinator.com/item?id=46819809


> “We have no cure. I don’t want to know.”

> If astronomers announced that a large asteroid might strike Earth in twenty years, and that we currently had no way to deflect it, nobody would respond by saying, “Come back when you already have the rocket.”

I don’t think the analogy fits, for a couple reasons.

1. People not wanting to know whether they have Alzheimer’s is because of the fear of a fate worse than death — living with Alzheimer’s.

2. People not wanting to know whether they have Alzheimer’s is not the same was not wanting a way to detect it. As you said, being able to measure it may help lead to a cure/treatment. I doubt people are against improving detection — they may just not want the detection to be applied personally.


Cure is the wrong word. Alzheimer’s can be best described as a failure of a system and "debris" accumulates faster than it can be "cleared". There are many moving parts and everyone is unique about the cause of their system failure.

Wrote up my current systems understanding here https://metamagic.substack.com/p/the-alzheimers-equation, but it makes clear why treatments that target only one variable are mathematically doomed to fail to work on everyone and why there will never be a single "cure". It explains without needing to read 10,000 papers why we keep getting research talking about treatment X helps in some, but not all cases or symptom Y is associated in some, but not all, etc.


This is some personal opinion that I would bet the vast majority of Alzheimer's researchers would not actually agree with. The current consensus is that Alzheimer's is a particular disease, or a cluster of similar diseases.

I'm not saying your wrong, just that the level of confidence in your assertions is not warranted.


After spending years tracking through the genetics, conditions, lab work, research papers and seeing individuals years into the condition, this model is the best I have and explains everything I currently know. Why the cluster of conditions result in the same outcome, why some treatments help some folks, but not others.

But that is sort of the point of science, you take all the evidence you have and create a hypothesis and iterate as you get more evidence. If I find evidence that suggests something else then I will be happy to tweak or abandon this. My level of confidence comes from the existing evidence and lack of evidence otherwise.


You forming a personal opinion after years of interest in the subject is fine. You asserting that opinion as a fact is the problem.

It is a tale as old as time. See the story behind the term. ultracrepidarian: https://en.wiktionary.org/wiki/ultracrepidarian#English


Versus https://en.wikipedia.org/wiki/Argument_from_authority

See also: https://www.science.org/content/article/potential-fabricatio...

Amateur's asserting their opinions as facts isn't great, but epistemologically it's no worse (and systemically, like less harmful) than when the experts do it.


Experts, when given the chance, have a tendency to speak with nuance and describe the degree of confidence they have in different statements.

Compare this with an amateur writing with certainty about a subject that subject matter experts continue to debate after decades of work.

I know which one of the two I would rather bother listening to.


You just moved the goal post.

Saying that experts are less likely to do X doesn't say anything about the relative harm of their doing so. If some rando on the streets is shouting their opinion about what causes Alzheimer's and asserting it's God's Own Truth, it's going to cause less overall harm that a carefully worded (but equally wrong) statement from an expert. (And the fact that we tend to hold experts in higher regard is the reason we should be more concerned about them stating their opinions as facts than about amateurs doing the same.)


You’re proving the exact point of the OP arguing against the “And vibe coding is coding.” statement.

You’re focusing only on the results, and not the difference in cognitive function necessary to achieve those results.

An illiterate person can “read” an audiobook.

Just like a person that knows zero about coding could (theoretically) vibe code a program with similar/same results.

So yes, if you focus 100% on only the results, then it could be argued they’re the same.

But the OP is saying there’s more to doing something than just the results.


The medium feels wholly immaterial in this case. The words reach your brain, and then it's up to you to think about them, imagine the scene, process ideas. Audiobooks let the narrator add inflection, which maybe takes a slight load off you, but I don't see the big deal. I've read lots of fiction, and listened to a lot on road trips, and I don't feel like my comprehension suffered in either case compared to the other. The important thing is you can have the same level of conversation about the material - I don't believe all this woo about reading being the only pure and intellectual way to process information.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: