It allows review of the way the merge conflict has been resolved (assuming those changes a tracked and presented in a useful way). This can be quite helpful when backporting select fixes to older branches.
Insisting on saying VoIP to the Mint rep instead of WiFi Calling (the term used by Apple, Google, Mint, and practically everyone else) is asking for a bad time.
Yes, but Waymo also has to drive on the road with those drivers, and these stats include crashes that are their fault. Diligent drivers get hit by drunk/distracted drivers all the time.
Again a model issue. At the risk of coming off as a thread-wide apologist, here are my results on Opus:
Good:
> The research is generally positive but it’s not unconditionally “good for you” — the framing matters.
> What the evidence supports for moderate consumption (3-5 cups/day): lower risk of type 2 diabetes, Parkinson’s, certain liver diseases (including liver cancer), and all-cause mortality……
Bad:
> The premise is off. Moderate daily coffee consumption (3-5 cups) isn’t considered bad for you by current medical consensus. It’s actually associated with reduced risk of type 2 diabetes, Parkinson’s, and some liver diseases in large epidemiological studies.
> Where it can cause problems:
Heavy consumption (6+ cups) can lead to anxiety, insomnia……
> Coffee consumption was more often associated with benefit than harm for a range of health outcomes across exposures including high versus low, any versus none, and one extra cup a day. There was evidence of a non-linear association between consumption and some outcomes, with summary estimates indicating largest relative risk reduction at intakes of three to four cups a day versus none, including all cause mortality (relative risk 0.83, 95% confidence interval 0.83 to 0.88), cardiovascular mortality (0.81, 0.72 to 0.90), and cardiovascular disease (0.85, 0.80 to 0.90). High versus low consumption was associated with an 18% lower risk of incident cancer (0.82, 0.74 to 0.89). Consumption was also associated with a lower risk of several specific cancers and neurological, metabolic, and liver conditions. Harmful associations were largely nullified by adequate adjustment for smoking, except in pregnancy, where high versus low/no consumption was associated with low birth weight (odds ratio 1.31, 95% confidence interval 1.03 to 1.67), preterm birth in the first (1.22, 1.00 to 1.49) and second (1.12, 1.02 to 1.22) trimester, and pregnancy loss (1.46, 1.06 to 1.99). There was also an association between coffee drinking and risk of fracture in women but not in men.
> Conclusion Coffee consumption seems generally safe within usual levels of intake, with summary estimates indicating largest risk reduction for various health outcomes at three to four cups a day, and more likely to benefit health than harm.
When I'm looking for medical advice, I want that advice to list things like "coffee drinking might not be safe during pregnancy".
Furthermore, the statement 'Heavy consumption (6+ cups) can lead to anxiety, insomnia ...' assumes caffeinated coffee, yes? The paper I linked to also discusses decaffeinated coffee, eg:
> High versus low intake of decaffeinated coffee was also associated with lower all cause mortality, with summary estimates indicating largest benefit at three cups a day (0.83, 0.85 to 0.89)28 in a non-linear dose-response analysis. ...
> Coffee consumption was consistently associated with a lower risk of Parkinson’s disease, even after adjustment for smoking, and across all categories of exposure.22 76 77 Decaffeinated coffee was associated with a lower risk of Parkinson’s disease, which did not reach significance. ...
> there were no convincing harmful associations between decaffeinated coffee and any health outcome.
That nuance seems important.
Also note that this paper is incomplete as it investigated defined health outcomes, not physiological outcomes like anxiety. There are plenty more papers, like https://academic.oup.com/eurheartj/article/46/8/749/7928425?... , which considers the time that people drink coffee, also discusses decaffeinated coffee, and highlights the uncertainty about the effect of heavy coffee drinking.
I don't see why I should care to ask an AI when it's so easy to find well-written research results which are far more likely to cover relevant edge cases.
Sure LLMs make mistakes, but have you looked at the accuracy of the average top search results recently? The SERPs are packed with SEO-infested articles that are all written by LLMs anyway (and almost universally worse ones than you could use yourself). In many cases the stakes are low enough (and the cost of manually sifting through the junk high enough) that it’s worth going with the empirically higher quality answer than the SEO spam.
This of course doesn’t apply to high-stakes settings. In these cases I find LLMs are still a great information retrieval approach, but it’s a starting point to manual vetting.
This is an oft-repeated meme, but I’m convinced the people saying it are either blindly repeating it, using bad models/system prompts, or some other issue. Claude Opus will absolutely push back if you disagree. I routinely push back on Claude only to discover on further evaluation that the model was correct.
As a test I just did exactly what you said in a Claude Opus 4.6 session about another HN thread. Claude considered* the contradiction, evaluated additional sources, and responded backing up its original claim with more evidence.
I will add that I use a system prompt that explicitly discourages sycophancy, but this is a single sentence expression of preference and not an indication of fundamental model weakness.
* I’ll leave the anthropomorphism discussions to Searle; empirically this is the observed output.
If you have 10,000 people flipping coins over and over, one person will be experiencing a streak of heads, another a streak of tails.
Which is to say, of a million people who just started playing with LLMs, a bunch of people will get hit or miss, while one guy is winning the neural net lottery and has the experience of the AI nailing every request, some poor bloke is trying to see what all the hype is about and cannot get one response that isn’t fully hallucinated garbage
Sure, but that doesn’t explain the volume of these complaints. I think the more likely answer is the pitiful sycophancy of some models as demonstrated in BSBench.
Nope, I use GitHub Copilot (agentic mode) and I end up having to use the (more expensive) Claude model because ChatGPT never second-guesses me or even itself. Gemini is slightly worse though.
I have access to the ChatGPT account of my boss and it is unusable sycophancy slop, horrible to read because every information is buried under endless emojis and the like. And it is almost indistinguishable if the LLM is wrong or right, every answer looks the same, often with a "my final answer" at the end. It's a mess.
I'm using Claude Opus 4.6 and it is much calmer, or "professional" in tone and much more information and almost no fluff.
It should be noted that MaxSAT 2024 did not include z3, as with many competitions. It’s possible (I’d argue likely) that the agent picked up on techniques from Z3 or some other non-competing solver, rather than actually discovering some novel approach.
Z3 is capable (it’s an SMT solver, not just SAT), but it’s not very fast at boolean satifiability and not at all competitive with modern SOTA SAT solvers. Try comparing it to Chaff or Glucose e.g.
as its from 2024 (MaxSAT was not held in 2025), its quite likely all the solvers are in the training data. so the interesting part here is the instances for which we actually got better costs that what is currently known (in the best-cost.csv) file.
Funnily, this was precisely the question I had after posting this (and the topic of an LLM disagreement discussed in another thread). Turns out not, but sibling comment is another confounding factor.
Used Claude through copilot for so long before switching to CC. Even for the same model the difference is shocking. Copilot’s harness and the underlying Claude models are not well-matched compared to the vertically-integrated Claude Code harness.
reply