Hacker Newsnew | past | comments | ask | show | jobs | submit | UncleEntity's commentslogin

> Now do it without those pre-written tests

That's probably the most important thing, actually. I've tried my hardest to get Claude to build an APL VM using only the spec and it's virtually impossible to get full compliance as it takes too many shortcuts and makes too many assumptions. That's part of the challenge though, to see how far the daffy robots have come.


Hehe I tried gving it a minesweeper CSP I've been working on and asked it to develop the feature I was working on at the moment just to compare. I was working on adding non chronological backtracking to the search engine.

I gave it the proper compile flags, I gave it test cases and their expected output, and everything it would have needed. The test cases were specifically hand picked to be hard on the search algorithm. And the base program was correct and gave the correct results (I was only adding an optimization), and were what I was using as a baseline for testing my implementation. You know, with a debugger and breakpoints, printfs and all that.

In the end it couldn't get the thing to work (I asked it to compile and verify) then it proudly declared that in all of the test cases I gave it, everything was solved through constraint propagation and the search didn't even trigger. So it didn't introduce any bugs. It tried to gaslight me. Even though it got a segfault in the new code it added (which would obviously not have been triggered if the search didn't actually execute)


> ...and the right-to-left evaluation logic.

The evaluation order doesn't matter as much as you don't really know what kind of function/operator you have at parse time so have to do a bunch of shenanigans to defer that decision until runtime while still keeping it efficient. Kind of fiddly to get right but once it works, it just works.

Claude and me (and a ton of decades old research) pretty much figured out all the complications in the APL parse/eval stack (https://github.com/dan-eicher/AiPL).


I'm looking forward to checking out your stuff...


Pivot to where the stupid money is being thrown around seems like a perfectly reasonable business plan.


One of my experiments was to have Claude write a VM and then generate a verification harness (using a DSL) for it to ensure it was correct with the theory being the same bug would have to exist in the test suite, the static verification and the VM for it to sneak through. Found a few bugs in the verification library and some integer overflows in the VM then it became too much for my poor little laptop to run without cutting some important corners.

It's not an abstract thing they can't do, you just have to tell them to.


I find it as an interesting experiment to find the limits of what they can do.

Like, I've had it build a full APL interpreter, half an optimizer, started on a copy-and-patch JIT compiler and it completely fails at "read the spec and make sure the test suite ensures compliance". Plus some additional artifacts which are genuinely useful on their own as I now have an Automated Yak Shaver™ which is where most of my projects ended up dying as the yaks are a fun bunch to play with.


My project over the last week was to get the robots to train a neural net to learn the "303 thing", hasn't gone well at all.

The first one sounded like it was being played on a blown out speaker after it got run over and the second attempt sounded like it was going through a $20 pawn shop guitar pedal that got left in the rain which lead to the 'oh, you wanted the neural net to learn the 303's filter section? My bad, I just made some random stuff up as an approximation...'

The worse part is there's still compute credits left over from the initial ten bucks so we just have to try again...


Yeah, back during Trump's first term I was hoping Congress would rein in executive power a bunch as he is prone to do stuff like this, didn't turn out that way unfortunately...

Now the main constraint on executive power seems to be due process and habeas corpus.


The problem I run into is the propensity for it to cheat so you can't trust the code it produces.

For example, I have this project where the idea is to use code verification to ensure the code is correct, the stated goal of the project is to produce verified software and the daffy robot still can't seem to understand that the verification part is the critical piece so... it cheats on them so they pass. I had the newest Claude Code (4.6?) look over the tests on the day it was released and the issues it found were really, really bad.

Now, the newest plan is to produce a tool which generates the tests from a DSL so they can't be made to pass and/or match buggy code instead of the clearly defined specification. Oh, I guess I didn't mention there's an actual spec for what we're trying to do which is very clear, in fact it should be relatively trivial to ensure the tests match for some super-human coding machine.


All I hear about is how this is the 'shape of things to come' with regards to the AI bubble while nobody seems to care that France just told all the gov't agencies to stop using their stuff.

Losing out on EU governmental contracts seems to me to be somewhat of a big deal and the France thing is just maybe the first move in that direction.


>> presumably not of personal vehicles

They don't magically gain more privacy protection in public over what your average person has just because they clock out after a hard day of work by virtue of being a government employee.

They are constantly and consistently reminded that people have the right to record in public and they chose to ignore that as there are no consequences if they violate the law. Or that people have a right to peacefully assemble. Or freedom of the press...


I agree they don't gain more privacy protection in public than the average person. I also agree they shouldn't gain more privacy protection in public than the average public employee, either!

I'm merely assuming that the license plates being listed are ones they use for their official work, since the rest of their info is being tied to what's available for any other public work.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: