Hacker Newsnew | past | comments | ask | show | jobs | submit | zniturah's commentslogin

This is great. Wondering what was author's experience using this framework for real projects.


Thanks! I started building this after running into the problem myself. On one project we had five developers, each using AI tools, and everyone ended up structuring things differently. After a few weeks the codebase felt like five mini-projects stitched together.

I wanted something that kept the architecture consistent without everyone having to stop and redraw diagrams all the time. That’s how SpecMind started. We’ve been using it in real projects, and it’s been much easier to keep track of how everything fits together.


Reporting a bug : 4123262 matches for Google.


How is it technically possible?



Can confirm. Once dang pinged me directly by email saying that my story was re-upped. The story went again to the frontpage and the date was adapted (IIRC), but the comments were kept:

---

Hi denysvitali,

The submission "PostmarketOS-Powered Kubernetes Cluster" that you posted to Hacker News (https://news.ycombinator.com/item?id=42352075) looks good, but hasn't had much attention so far. We put it in the second-chance pool, so it will get a random placement on the front page some time in the next day or so.

This is a way of giving good HN submissions multiple chances at the front page. If you're curious, you can read about it at https://news.ycombinator.com/item?id=26998308 and other links there. And if you don't want these emails, sorry! Let us know and we won't do it again.

Thanks for posting good things to HN!

Daniel (moderator)



I also created a mobile-friendly version of the transcript:

https://gist.github.com/sleaze/bf74291b4072abadb0b4109da3da2...

And here's the related submission:

Former Google CEO Eric Schmidt's Leaked Stanford Talk - https://news.ycombinator.com/item?id=41263143 (2 days ago, 466 comments)

Edit: Broken gist link fixed. Thanks @ryanwhitney!


Your first link is missing a character, so it 404s.

Working link: https://gist.github.com/sleaze/bf74291b4072abadb0b4109da3da2...



Locating and manipulating snippet of information in huge LLMs is surely impressive but it is hard to believe that it can be scaled for more complex structures without using even bigger models.


Looking forward for a document leak about openai using YouTube data for training their models. When asked if they use it, Murali (CTO) told she doesn't know which makes you believe that for 99% they are using it.


I would say 100%, simply because there is no other reasonable source of video data


I use multiple websites that have hundreds of thousands of free stock videos that are much easier to label than YouTube videos.


Number of videos are less relevant than the total duration of high-quality videos (quality can be approximated on YouTube with metrics such as view and subscriber count). Also, while YouTube videos are not labelled directly, you can extract signal from the title, the captions, and perhaps even the comments. Lastly, many sources online use YouTube to host videos and embed them on their pages, which probably contains more text data that can be used as labels.


To be fair I don’t think Google deserves exclusive rights to contents created by others, just because they own a monopolistic video platform. However I do think it should be the content owner’s right to decide if anyone, including Google, gets to use their content for AI.


Any other company can start a video platform. In fact a few have and failed.

Nobody has to use youtube either.

If you want change in the video platform space, either be willing to pay a subscription or watch ads.

Consumers don't want to do either, and hence no one wants to enter the space.


*Murati


I am surprised to see a pro-copyright take on HN :)


"Aready won position" or "99% win rate" is statistics given by Stockfish (or professional chess player). It is weird to assume that the same statement is true for the trained LLM since we are assessing the LLM itself. If it is using during the game then it is searching, thus the title doesn't reflect the actual work.


It's quite clear from the article that the 99% is the model's predicted win rate for a position, not its evaluation by Stockfish (which doesn't return evaluations in those terms).

It's true that this is a relatively large deficiency in practice: how strong would a player be if he played the middlegame at grandmaster strength but couldn't reliably mate with king and rook?

The authors overcame the practical problem by just punting to Stockfish in these few cases. However, I think it's clearly solvable with LLM methods too. Their model performs poorly because of an artifact in the training process where mate-in-one is valued as highly as mate-in- fifteen. Train another instance of the model purely on checkmate patterns - it can probably be done with many fewer parameters - and punt to that instead.


Human players have this concept of progress. I couldn't give a good succinct description of exactly what that entails, but basically if you are trading off pieces that's progress, if your king is breaking through the defensive formation of the pawn endgame that's progress. If you are pushing your passed pawn up the board that's progress. If you are slowly constricting the other king that's progress.

When we have a won position we want to progress and convert it to an actual win.

I think the operational definition I would use for progress is a prediction of how many more moves the game will last. A neural network can be used for that.


They do use Stockfish for playing thought …

“To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.”


The context of that sentence:

> Indecisiveness in the face of overwhelming victory

> If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.


They should try to implement some kind of resolute agent in that case. Might be hard to do if it needs to be "not technically search" though.


But only to complete a winning position.


That's a crucial part of chess that can't simply be swept under the rug. If I had won all the winning positions I've had over the years I'd be hundreds of points higher rated.

What if a human only used Stockfish in winning positions? Is it cheating? Obviously it is.


> That's a crucial part of chess that can't simply be swept under the rug.

Grandmasters very literally do it all the time.

> What if a human only used Stockfish in winning positions? Is it cheating? Obviously it is.

Yes, but this isn't that.

This is a computer that is playing chess. And FYI (usually) without search.


The process of converting a completely winning position (typically one with a large material advantage) is a phase change relative to normal play which is the struggle to achieve such a position. In other words you are doing something different at that point. For example, me as weak FIDE CM (Candidate Master) could not compete with a top grandmaster in a game of chess, but I could finish off a trivial win.

Edit: Recently I brought some ancient (1978) chess software back to life https://github.com/billforsternz/retro-sargon. These two phases of chess, basically two different games, were quite noticeable with that program, which is chess software stripped back to the bone. Sargon 1978 could play decently well, but it absolutely did not have the technique to convert winning positions (because this is different challenge to regular chess). For example, it could not in general mate with rook (or even queen) and king against bare king. The technique of squeezing the enemy king into a progressively smaller box was unknown to it.


That 'only' usage in the winning position could be a decisive for gaining GM rating.


Positions with 99% win percentage are not decisive for GM vs non-GM rating.


From the paper:

If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.

So they freely admit that their thing will draw or even lose in these positions. It's not merely making the win a little cleaner.


> So they freely admit that their thing will draw or even lose in these positions.

Yeah, they didn't use Stockfish for the lols.

They create a search-less engine for chess. And then used a search engine to pay a small minority of the game.


Yes. So how is this irrelevant for qualifying as GM-level play then? Being able to play these positions is a clear prerequisite for even being in the ballpark of GM strength. If you regularly choke in completely winning endgames, you'll never get there.

This is cheating, plain and simple. It would never fly in human play or competitive computer play. And it's most definitely disingenuous research. They made an engine, it plays a certain level, and then they augment it with preexisting software they didn't even write themselves to beef up their claims about it.


> If you regularly choke in completely winning endgames, you'll never get there.

Except we're talking about moves where no human player would choke because they are basically impossible to lose except by playing at random (which is what the bot does).

It makes no sense to try and compare to a human player in the same situation because no human player could at the same time end up in such a position against a strong opponent and be unable to exploit them once there…

It's basically a bug, and what they did is just working around this particular bug in order to have a releasable paper.


They are once your opponents know you’re very bad at converting them.


Proof?

For winning any game at some point (at the end of the game) there will be a position with >99% winning chances. The move that follows are decisive.


That's not how chess works. The move that follow aren't usually decisive unless you don't know how to play the game and make enormous mistakes.

Anyone that knows how to play can beat a GM with a big enough advantage at the end of the game (which is what's reflected in the win probability).


Search isn't used to play/win here. Just for training.


It looks like it does use search here in the sense that Stockfish's top move is generated using search.


From the abstract:

> We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points.

So some of the learning data comes from Stockfish.


The original comment was "for playing."

In training, traditional search is absolutely used to score positions.

In playing, search is not used. (*Except to finish out an already-won position.)


The post would have found a more fitting home on Twitter rather than the 'forgotten' realms of FB. The very individuals and entities mentioned (or hinted at) might have had the chance to see the story and share their perspectives. Otherwise it is just sounds like a rant.


Looks interesting, thanks for sharing!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: