Those aren't vulnerabilities. You're missing the point.
Nobody is saying there's no such thing as a slop report. Not only are there, but slop vulnerability reports as a time-consuming annoying phenomenon predate LLM chatbots by almost a decade. There's a whole cottage industry that deals with them.
It was never about cyber capability. It's a liability transfer framework.
If a service provider has a control that says "we use firewalls on all network access points, and configure those firewalls to CIS benchmark whatever", and a third-party signs off with "yes we checked, they have the firewalls, and they're configured properly", you now have two parties you can sue when a security incident caused by lack of firewalls causes you material damage.
Your org's cyber insurance will also go down if you can say "all our vendors have third-party attested compliance, and we do annual compliance reviews".
> This is surprising given the excellent capabilities of GPT-5.2
The real surprise is that someone writing a paper on LLMs doesn't understand the baseline capabilities of a hallucinatory text generator (with tool use disabled).
The real suprise is people saying it's surprising when researchers and domain experts state something the former think goes against common sense/knowledge - as if they got them, and those researcers didn't already think their naive counter-argument already.
This wasn't immediately obvious to me, but it's important to note this unit is remotely controlled. The article made it sound autonomous. Further, the unit went back to base nightly (for maintenance / battery swaps I assume).
How could this possibly work? Google is profitable because they can insert ~4 ads into a search query. An LLM query costs about 2 orders of magnitude more resources to run than a Google search, so.. I'm not seeing it unless OAI can figure out how to shoehorn 400 ads per prompt into the interface somehow.
Longer sessions. You also have more potential for brand advertising compared to Google Search (which is mostly conversion ads). That being said, it's risky and can ruin your product if done wrong, but I think there are ways to do it right.
Only if the margins are negative. Yes, the cost to serve is most likely higher than Google's SRP, but I think the ads can be even better targeted and potentially have a higher CPM than Goog's.
What I'm saying is that I believe their ARPU could be higher than Google's, while what I think you're saying is that their cost will also be higher. I agree with that, but where we differ is that I think that while the margin will be lower, there is still potential to make a ton of money there.
Google doesn't have to fight context windows. They can cache and store an AI response to a Google query without having to worry about much other than locale etc. You can't do that a dozen messages into an LLM conversation.
If anyone is replaceable by AI, executives are first in line. Make "decisions" based on expert input, give presentations, sit in meetings and on calls. No liability, no concrete "work product" to speak of, so why not?
> can anyone on the product side actually predict traffic
Hypothetically, could you not? If you engineer a bridge you have no idea what kind of traffic it'll see. But you know the maximum allowable weight for a truck of X length is Y tons and factoring in your span you have a good idea of what the max load will be. And if the numbers don't line up, you add in load limits or whatever else to make them match. Your bridge might end up processing 1 truck per hour but that's ultimately irrelevant compared to max throughput/load.
Likewise, systems in regulated industries have strict controls for how many concurrent connections they're allowed to handle[1], enforced with edge network systems, and are expected to do load testing up to these numbers to ensure the service can handle the traffic. There are entire products built around this concept[2]. You could absolutely do this, you just choose not to.
reply