I kind-of want to see an experiment going the other way.
Have a repo that has a committee of AI models deciding what to merge. Inform them of the goals of the project and that they should only allow positive changes but people are allowed to make adversarial PRs.
It can be more active because the committee can meet on demand. Then people and AI's can attempt to bend the project to their wills.
Says anyone who has tried to do anything requiring the smallest amount of computer science or computer engineering. These models are really great at boilerplate and simple web apps. As soon as you get beyond that, it gets hairy. For example, I have a clone of HN I've been working on that adds subscriptions and ad slot bidding. Just those two features required a lot of hand holding. Figma Design nailed the UX, but the actual guts/business logic I had to spend time on.
I expect that this will get easier as agentic flows get more mature, though.
Then the only place that novelty will occur is in the actual study of computer science. And even then, a well contexted agentic pipeline will speed even R&D development to a great degree.
One very bad thing about these things is the embedded dogma. With AI ruling the roost in terms of generation (basically an advanced and opinionated type-writer, lets be honest) breaking away from the standards in any field will become increasingly difficult. Just try and talk to any frontier model about physics that goes against what is currently accepted and they'll put up a lot of resistance.
I’ve been pleasantly surprised how useful it is for writing low level stuff like peripheral drivers on imbedded platforms. It’s actually-simple- stuff, but exactingly technical and detail oriented. It’s interesting that it can work so well, then go wildly off the rails and be impossible to wrestle back on unless you go way back in the context or even start a completely new context and feed in only what is currently relevant (this has become my main strategy)
Still, it’s amazingly good at wrestling the harmony of a bunch of technical details and applying them to a tried and true design paradigm to create an API for new devices or to handle tricky timing, things like that. Until it isn’t and you have to abort the session and build a new one because it has worked itself into some kind of context corner where it obsesses about something that is just wrong or irrelevant.
Still, it’s a solid 2x on production, and my code is arguably more maintainable because I don’t get tempted to be clever or skip clarifying patterns.
There is a level of wholistic complexity that kills it though. The trick is dividing the structure and tasks into self contained components that contain any relevant state within their confines to the maximum practical extent, even if there is a lot of interdependent state going on inside. It’s sort a mod a meta-functional paradigm working with inherently state-centric modules.
> a clone of HN I've been working on that adds subscriptions and ad slot bidding
Wut, what's the purpose of that? Is this just a toy learning project? Would it be to make money off of people who don't know that an ad-free version of HN exists at news.ycombinator.com? Will you try to sell it to Ycombinator?
I am hoping they are developing it as a satirical art project, otherwise... yikes; needing a credit card and an ad blocker to use HN would be very depressing and is counter to everything I enjoy about this forum.
Mostly just learning, to be honest. I'm not trying to replace HN, I'm just fiddling around and seeing what I can do and what I can't.
My long term purpose is to provide the source code for communities/creators that want something simple to set up, and specifically allow creators to gate content behind a paywall. I'm sure stuff like that exists, but I hope what I build will be at least somewhat use-able.
Not a developer by trade. But incidentally, today I took my first stab at "vibe coding". I wrote a little gui program to streamline a process that I've been doing for years. The code is an absolute wreck. But the program works and does what it's meant to do. I wouldn't ever expect anyone to maintain it, but for what it is, I can't complain. The alternative would have been for the tool to have not been written at all. The level of effort was so low that a) it passed the threshold of it being worth my time, and b) if it needs to be re-vibe-coded over again, then no worries.
FWIW, it appears they're purposefully introducing multiple simulated failures into this test. It doesn't appear that they're trying to make this succeed at all costs. From the site:
> The primary test objectives for the booster will be focused on its landing burn and will use unique engine configurations. One of the three center engines used for the final phase of landing will be intentionally disabled to gather data on the ability for a backup engine from the middle ring to complete a landing burn. The booster will then transition to only two center engines for the end of the landing burn, entering a full hover while still above the ocean surface, followed by shutdown and drop into the Gulf of America.
...
> The flight test includes several experiments focused on enabling Starship’s upper stage to return to the launch site. A significant number of tiles have been removed from Starship to stress-test vulnerable areas across the vehicle during reentry. Multiple metallic tile options, including one with active cooling, will test alternative materials for protecting Starship during reentry. On the sides of the vehicle, functional catch fittings are installed and will test the fittings’ thermal and structural performance, along with a section of the tile line receiving a smoothed and tapered edge to address hot spots observed during reentry on Starship’s sixth flight test. Starship’s reentry profile is designed to intentionally stress the structural limits of the upper stage’s rear flaps while at the point of maximum entry dynamic pressure.
> FWIW, it appears they're purposefully introducing multiple simulated failures into this test.
Not only this test IIRC, Starship 9 had reentry trajectories that would stress-test the hardware to its limits too. In general I think their current strategy is testing the hardware limits in real conditions and improving rapidly on it to reduce the chance of any small failure to become catastrophic.
> followed by shutdown and drop into the Gulf of America.
It’s funny that the social engineering of the administration that allows them to launch is just as important as the mechanical engineering of the vehicle in terms
of achieving their macro goal.
I think this sort of “solve all of the problems, in every domain, that stand in our way” explains a lot about their activities and strategic planning.
It'll be a while before they're comfortable landing Starship itself onto the launch tower, so an ocean splashdown is the best outcome possible. And the booster is going to be testing another one of those extra aggressive reentry trajectories.
They broke the previous booster by overdoing it, so it remains to be seen whether they'll find the balance between "fuel efficient" and "doesn't cause catastrophic internal booster damage" this time around.
Thanks Andrej. I have a pretty good understanding of how LLMs work and how they are trained, but alot of my friends don't. These videos/talks give them 'some' idea.
Its not for a lack of trying. I spend a lot of time writing text that feel like I actually read the ad. I try to include details, show examples for the things I think are relevant and have others give their opinion on what I should improve or rephrase.
I don't see anything about source available, git repository links or opensource licensing. Why would I switch from a free and opensource IDE to a closed source IDE offering no benefits?