It is funny how you can break diet/nutrition into generations like this.
I think the trends are a reflection of poor education. Fiber/protein/whatever being important components of a diet isn't new information. But the information is new to folks that never had nutrition explained to them.
I don't love talking politics on this site. Hackernews has done a pretty decent job of staying non-political and I think that's been a positive thing.
AI is re-shaping American society in a lot of ways. And this is happening at a time where the U.S. is more politically divided than it's ever been. People who use LLMs regularly (most SWEs at this point) can understand the danger signs. The bad outcomes are not inevitable. But the conversations around this cannot only be held in internet forums and blogposts.
Hackernews is an echo chamber of early adopters of tech. The discussions had here don't percolate to the general population.
I believe many of us have a duty to make this feel real to the less technical people in our lives. Too many folks have an information filter that is one of Fox News/CNN/MSNBC. Fox is the worst on misinformation. The others are also bad. Their viewers will not hear, in any clear way, how the Trump admin is trying to bully AI companies into doing what it wants. This will be a headline or an article. A footnote not given the attention it deserves.
Plainly: there is an attempt to turn AI into a political weapon aimed at the general population. Misinformation and surveillance are already out of control. If you can, imagine that getting worse.
This feels like one of those hinge moments. If you can, have real-life conversations with people around you. Explain what's at stake and why it matters now, not later.
I see that your prompt includes 'Do not use any tools. If you do, write "I USED A TOOL"'
This is not a valid experiment, because GPT models always have access to certain tools and will use them even if you tell them not to. They will fib the chain of thought after the fact to make it look like they didn't use a tool.
This isn't an experiment a consumer of the models can actually run. If you have a chance to read the article I linked, it is difficult even for the model maintainers (openai, anthropic, etc.) to look into the model and see what it actually used in it's reasoning process. The models will purposefully hide information about how they reasoned. And they will ignore instructions without telling you.
The problem really isn't that LLM's can't get math/arithmetic right sometimes. They certainly can. The problem is that there's a very high probability that they will get the math wrong. Python or similar tools was the answer to the inconsistency.
"I should explain that both the “python” and “python_user_visible” tools execute Python code and are stateful. The “python” tool is for internal calculations and won’t show outputs to the user, while “python_user_visible” is meant for code that users can see, like file generation and plots."
But really the most important thing, is that we as end-users cannot with any certainty know if the model used python, or didn't. That's what the alignment faking article describes.
> To avoid timeouts, try using background mode. As our most advanced reasoning model, GPT-5 pro defaults to (and only supports) reasoning.effort: high. GPT-5 pro does not support code interpreter.
You are wrong from the link you shared. It was about ChatGPT not the api.
The documentation makes it unambiguously clear that gpt 5 pro does not support code interpreter. Unless you think they secretly run it which is a conspiracy, is it enough to falsify?
> Unless you think they secretly run it which is a conspiracy
tbh this doesn't sound like a conspiracy to me at all. There's no reason why they couldn't have an internal subsystem in their product which detects math problems and hands off the token generation to an intermediate, more optimized Rust program or something, which does math on the cheap instead of burning massive amounts of GPU resources. This would just be a basic cost optimization that would make their models both more effective and cheaper. And there's no reason why they would need to document this in their API docs, because they don't document any other internal details of the model.
I'm not saying they actually do this, but I think it's totally reasonable to think that they would, and it would not surprise me at all if they did.
Let's not get hung up on the "conspiracy" thing though - the whole point is that these models are closed source and therefore we don't know what we are actually testing when we run these "experiments". It could be a pure LLM or it could be a hybrid LLM + classical reasoning system. We don't know.
“Code interpreter” is a product feature the customer can use that isn’t being discussed.
They can obviously support it internally, and the feature exists for ChatGPT, but they’re choosing not to expose that combo in the API yet because of product rollout constraints.
Alright let's say I'm wrong about the details/nuances. That's still really not the point.
The point is this:
> we as end-users cannot with any certainty know if the model used python, or didn't
These tools can and do operate in ways opposite to their specific instructions all the time. I've had models make edits to files when I wasn't in agent mode (just chat mode). Chat mode is supposedly a sandboxed environment. So how does that happen? And I am sure we've all seen models plainly disregard an instruction for one reason or another.
The models, like any other software tool, have undocumented features.
You as an end-user cannot falsify the use of a python tool regardless of what the API docs say.
I know what falsifiable means--you're misusing it and I simply adopted your misuse. A claim is falsifiable or not ... it can't be made falsifiable. The way you're using it is "Can we come up with a test to show that it's false"--no, we can't, because it's not false.
Again, there's nothing that one can do to prove that something that isn't false is false. Sheesh. I won't respond to you again as there's no need to simply repeat it.
Please don't cross into posting like this, no matter how wrong someone else is or you feel they are. It's not what this site is for, and destroys what it is for.
Please don't cross into posting like this, no matter how wrong someone else is or you feel they are. It's not what this site is for, and destroys what it is for.
Well, your first Google result is a blog post that makes my point.
> For example, baby boomers are the generation with the most dramatic increase in harmful alcohol abuse. In contrast, Gen Z prefers the sober lifestyle as they are known to consume alcohol much less than any of their older counterparts, including millennials.
My team has gained a reputation of being some sort of firefighting crew.
We are being called by PMs when projects are failing, usually engineering-data and engineering-adjacent stuff. (Mechanical/Electrical).
We automate the heck out the processes, using a mix of AI processing, RAGs, and AI assisted coding.
We rescue the projects. Finish ahead of schedule. Make fewer mistakes. We gain additional scope. We win new projects. We bring new clients.
But when higher ups ask the people we helped about productivity gains, the most generous will say stuff like "it takes as long to review as it takes to do things manually", "They really helped on {inconsequential part of the deliverable}"
If the that is the takeaway these people were taking, they would incredibly misled. Luckily for me, I have people who deal with the politics, while my team can focus on delivery.
Our reputation keeps growing, and we keep delivering faster. The heads of the departments we work with love us, the middle rank who were doing the laborious crap, maybe not so much.
I don't think the OP gave enough information for us to really have any honest conversation about this one way or the other.
That said: I suspect that OP is providing low-detail prompts.
These tools cannot read your mind. If you provide an under-specified prompt, they will fill in all the details for things that are necessary to complete the task, but that you didn't provide. This is how you end up with slop.
I think the trends are a reflection of poor education. Fiber/protein/whatever being important components of a diet isn't new information. But the information is new to folks that never had nutrition explained to them.
reply