Claude provides nicer explanations, but when it comes to CoT tokens or just prompting the LLM to explain -- I'm very skeptical of the truthfulness of it.
Not because the LLM lies, but because humans do that also -- when asked how the figured something, they'll provide a reasonable sounding chain of thought, but it's not how they figured it out.
> Gemini also frequently gets twisted around, stuck in loops, and unable to make forward progress.
Yes, gemini loops but I've found almost always it's just a matter of interrupting and telling it to continue.
Claude is very good until it tries something 2-3 times, can't figure it out and then tries to trick you by changing your tests instead of your code (if you explicitly tell it not to, maybe it will decide to ask) OR introduce hyper-fine-tuned IFs to fit your tests, EVEN if you tell it NOT to.
> let myself atrophy, run on a treadmill forever, for something
You're lucky to afford the luxury not to atrophy.
It's been almost 4 years since my last software job interview and I know the drills about preparing for one.
Long before LLMs my skills naturally atrophy in my day job.
I remember the good old days of J2ME of writing everything from scratch. Or writing some graph editor for universiry, or some speculative, huffman coding algorithm.
That kept me sharp.
But today I feel like I'm living in that netflix series about people being in Hell and the Devil tricking them they're in Heaven and tormenting them: how on planet Earth do I keep sharp with java, streams, virtual threads, rxjava, tuning the jvm, react, kafka, kafka streams, aws, k8s, helm, jenkins pipelines, CI-CD, ECR, istio issues, in-house service discovery, hierarchical multi-regions, metrics and monitoring, autoscaling, spot instances and multi-arch images, multi-az, reliable and scalable yet as cheap as possible, yet as cloud native as possible, hazelcast and distributed systems, low level postgresql performance tuning, apache iceberg, trino, various in-house frameworks and idioms over all of this?
Oh, and let's not forget the business domain, coding standards, code reviews, mentorships and organazing technical events.
Also, it's 2026 so nobody hires QA or scrum masters anymore so take on those hats as well.
This is a very good point. Years ago working in a LAMP stack, the term LAMP could fully describe your software engineering, database setup and infrastructure. I shudder to think of the acronyms for today's tech stacks.
And yet many the same people who lament the tooling bloat of today will, in a heartbeat, make lame jokes about PHP. Most of them aren't even old enough to have ever done anything serious with it, or seen it in action beyond Wordpress or some spaghetti-code one-pager they had to refactor at their first job. Then they show up on HN with a vibe-coded side project or blog post about how they achieved a 15x performance boost by inventing server-side rendering.
Ya I agree it's totally crazy.... but, do most app deployments need even half that stuff? I feel like most apps at most companies can just build an app and deploy it using some modern paas-like thing.
> I feel like most apps at most companies can just build an app and deploy it using some modern paas-like thing.
Most companies (in the global, not SV sense) would be well served by an app that runs in a Docker container in a VPS somewhere and has PostgreSQL and maybe Garage, RabbitMQ and Redis if you wanna get fancy, behind Apache2/Nginx/Caddy.
But obviously that’s not Serious Business™ and won’t give you zero downtime and high availability.
Though tbh most mid-size companies would also be okay with Docker Swarm or Nomad and the same software clustered and running behind HAProxy.
> Most companies (in the global, not SV sense) would be well served by an app that runs in a Docker container in a VPS somewhere and has PostgreSQL and maybe Garage, RabbitMQ and Redis if you wanna get fancy, behind Apache2/Nginx/Caddy.
That’s still too much complication. Most companies would be well served by a native .EXE file they could just run on their PC. How did we get to the point where applications by default came with all of this shit?
When I was in primary school, the librarian used a computer this way, and it worked fine. However, she had to back it up daily or weekly onto a stack of floppy disks, and if she wanted to serve the students from the other computer on the other side of the room, she had to restore the backup on there, and remember which computer had the latest data, and only use that one. When doing a stock–take (scanning every book on the shelves to identify lost books), she had to bring that specific computer around the room in a cart. Such inconveniences are not insurmountable, but they're nice to get rid of. You don't need to back up a cloud service and it's available everywhere, even on smaller devices like your phone.
There's an intermediate level of convenience. The school did have an IT staff (of one person) and a server and a network. It would be possible to run the library database locally in the school but remotely from the library terminals. It would then require the knowledge of the IT person to administer, but for the librarian it would be just as convenient as a cloud solution.
I think the 'more than one user' alternative to a 'single EXE on a single computer' isn't the multilayered pie of things that KronisLV mentioned, but a PHP script[0] on an apache server[0] you access via a web browser. You don't even need a dedicated DB server as SQLite will do perfectly fine.
> but a PHP script[0] on an apache server[0] you access via a web browser
I've seen plenty of those as well - nobody knows exactly how things are setup, sometimes dependencies are quite outdated and people are afraid to touch the cPanel config (or however it's setup). Not that you can't do good engineering with enough discipline, it's just that Docker (or most methods of containerization) limits the blast range when things inevitably go wrong and at least try to give you some reproducibility.
At the same time, I think that PHP can be delightfully simple and I do use Apache2 myself (mod_php was actually okay, but PHP-FPM also isn't insanely hard to setup), it's just that most of my software lives in little Docker containers with a common base and a set of common tools, so they're decoupled from the updates and config of the underlying OS. I've moved the containers (well data+images) across servers with no issues when needed and also resintalled OSes and spun everything right back up.
> That’s still too much complication. Most companies would be well served by a native .EXE file they could just run on their PC
I doubt that.
As software has grown to solving simple personal computing problems (write a document, create a spreadsheet) to solving organizational problems (sharing and communication within and without the organization), it has necessarily spread beyond the .exe file and local storage.
That doesn't give a pass to overly complex applications doing a simple thing - that's a real issue - but to think most modern company problems could be solved with just a local executable program seems off.
It can be like that, but then IT and users complain about having to update this .exe on each computer when you add new functionality or fix some errors. When you solve all major pain points with a simple app, "updating the app" becomes top pain point, almost by definition.
> How did we get to the point where applications by default came with all of this shit?
Because when you give your clients instructions on how to setup the environment, they will ignore some of them and then they install OracleJDK while you have tested everything under OpenJDK and you have no idea why the application is performing so much worse in their environment: https://blog.kronis.dev/blog/oracle-jdk-and-openjdk-compatib...
It's not always trivial to package your entire runtime environment unless you wanna push VM images (which is in many ways worse than Docker), so Docker is like the sweet spot for the real world that we live in - a bit more foolproof, the configuration can be ONE docker-compose.yml file, it lets you manage resource limits without having to think about cgroups, as well as storage and exposed ports, custom hosts records and all the other stuff the human factor in the process inevitably fucks up.
And in my experience, shipping a self-contained image that someone can just run with docker compose up is infinitely easier than trying to get a bunch of Ansible playbooks in place.
If your app can be packaged as an AppImage or Flatpak, or even a fully self contained .deb then great... unless someone also wants to run it on Windows or vice versa or any other environment that you didn't anticipate, or it has more dependencies than would be "normal" to include in a single bundle, in which case Docker still works at least somewhat.
Software packaging and dependency management sucks, unless we all want to move over to statically compiled executables (which I'm all for). Desktop GUI software is another can of worms entirely, too.
When I come into a new project and I find all this... "stuff" in use, often what I later find is actually happening with a lot of it is:
- nobody remembers why they're using it
- a lot of it is pinned to old versions or the original configuration because the overhead of maintaining so much tooling is too much for the team and not worth the risk of breaking something
- new team members have a hard time getting the "complete picture" of how the software is built and how it deploys and where to look if something goes wrong.
Happy to be shown where I can learn more about this different rate of change and trend which sets our current climate change apart from the rest of Earth's history.
It seems like you won't have any trouble finding that yourself if you really wanted to. This "I'm just asking questions" mode you're in can be considered a type of trolling called "sealioning".
On that graphic -- under the heading 'Ice cores (from 800,000 years before present)' in case the link gets truncated -- one can observe regular peaks in temperature that took place before the current one. I'm happy to be explained what caused them, as it could not have been human industrial activity.
That's it. I'm open to dialogue but won't entertain any more lazy dismissals and unfair characterization.
> Once you use up the entire internets worth of stack overflow responses and public github repositories you run into the fact that these things aren't good at doing things outside their training dataset.
I think the models have reached that human training data limitation a few generations ago, yet they stil clearly improve by various other techniques.
> Claude is still just like that once you’re deep enough in the valley of the conversation
My experience is claude (but probably other models as well) indeed resort to all sorts of hacks once the conversation has gone for too long.
Not sure if it's an emergent behavior or something done in later stages of training to prevent it from wasting too many tokens when things are clearly not going well.
> That's just a different bias purposefully baked into GPT-5's engineered personality on post-training.
I want to highlight this realization! Just because a model says something cool, it doesn't mean it's an emergent behavior/realization, but more likely post-training.
My recent experience with claude code cli was exactly this.
It was so hyped here and elsewhere I gave it a try and I'd say it's almost arrogant/petulant.
When I pointed out bugs in long sessions it tried to gaslight me that everything was alright, faked tests to prove his point.
> Eggs in one basket. Renewables are good, but it gets cloudy, night is a thing, it might not be windy
Also, we can't survive an asteroid crash/extinction event with solar.
Nuclear is transcedental.
If we had practically unlimited fusion power, we could build underground, grow plants in aquaponics and aeroponics and ride it out in underground cities and farms.
One of the problems with nuclear is, um, it's ability to cause an "extinction event". Sort of.
In that:
* Nuclear power plant failures can be very, very nasty. As in, "producing uninhabitable land for eons" nasty. Yes, dam failures are spectacularly nasty, too (but don't create unlivable land as much). Yes, fossil fuel power plants also are quite bad in a "more silent way" via pollution (plus the occasional centuries-burning coal mine fires etc.). All power sources have problems. But this is a pretty big negative.
* What this means is that big centralized nuclear is also a big target for rogue actors... similar to dams, but not similar to more distributed energy sources like solar or wind. Blowing up a single solar farm or windmill doesn't have a huge impact, relatively speaking, compared to blowing up a nuclear plant. Nuclear plants thus have to spend extra expense protecting themselves against this sort of thing. (And, in the United States at least, classify much of the process of doing so.)
* Nuclear power plants can also be used to produce nuclear weapons. Now this is where the really fun politics begins. Many countries would be really unhappy if their adversary countries start making nuclear weapons from their nuclear power plants. A lot of military stuff has been spent over the last decades trying to prevent such.
This last point is where China's solar panel play actually makes more sense compared to nuclear. Think of the politics involved if China builds a big nuclear point in (insert adversary of some other country here). Could be very, very tricky in many cases. Whereas, there is very little if any politics involved with shipping a solar panel somewhere.
The distributed, small scale nature of solar panels also means that customers in countries with poor centralized power grids (common in developing countries) are able to use them to bypass the current system. This happened previously in many of these countries with mobile phones, where customers were able to bypass poor centralized phone networks. In this aspect, I think the "decentralized" aspect is far more important than the "renewable" aspect... but still.
(There are positives to nuclear, of course; I'm mainly countering the "transcendental" word here. All power sources have plusses and minuses.)
(Note: I have heard of work on smaller scale nuclear systems, but I am not certain if even a small nuclear power device completely resolves political or security concerns.)
Fusion will be its own extinction event as things go. At our development level, if we develop fusion, we'll have to live underground after boiling the oceans to generate crypto tokens and undress videos.
> Unfortunately this does not stop many from exaggerating claims in order to (maybe become) be internet famous
I've been thinking about this a lot lately in another context -- vira priests being anti-vax and realized it's the other way around: their motivation doesn't matter, but the viewers don't want to see moderate content, they want to see highly polarized and controversial topics.
The same with the claims about AI. Nobody wants to hear AI boosts productivity in nuanced way, people either want to hear about 10X or -10X so the market dictates the content/meme.
Would it be possible for the CLI not to be binary and just a shell script, or a webshell would be great.
My issue is I've had my work laptop wiped twice because of things I've installed on it and it's a hassle to switch accounts/devices but I've love to give sprites a go.
Claude provides nicer explanations, but when it comes to CoT tokens or just prompting the LLM to explain -- I'm very skeptical of the truthfulness of it.
Not because the LLM lies, but because humans do that also -- when asked how the figured something, they'll provide a reasonable sounding chain of thought, but it's not how they figured it out.
reply