Hacker Newsnew | past | comments | ask | show | jobs | submit | jinay's commentslogin

I recently did a deep dive on open-endedness, and my favorite example of its power is Picbreeder from 2008 [1]. It was a simple website where users could somewhat arbitrarily combine pictures created by super simple NNs. Most images were garbage, but a few resembled real objects. The best part is that attempts to replicate these by a traditional hill-climbing method would result in drastically more complicated solutions or even no solution at all.

It's a helpful analogy to understand the contrast between today's gradient descent vs open-ended exploration.

[1] First half of https://www.youtube.com/watch?v=T08wc4xD3KA

More notes from my deep dive: https://x.com/jinaycodes/status/1932078206166749392


This video was fascinating. I didn't know about "open endedness" as a concept but now that I see it, of course it's an approach.

One thought... in the video, Ken makes the observation that it takes way more complexity and steps to find a given shape with SGD vs. open-endedness. Which is certainly fascinating. However...

Intuitively, this feels like a similar dynamic is at play with the "birthday paradox". That's where if you take a room of just 23 people, there is a greater than 50% chance that two of them have the same birthday. This is very surprising to most people. It seems like you should need way more people (365 in fact!). The paradox is resolved when you realize that your intuition is asking how many people it takes to have your birthday. But the situation with a room of 23 people is implicitly asking for just one connection among any two people. Thus you don't have 23 chances, you have 23 ^ 2 = 529 chances.

I think the same thing is at work here. With the open-ended approach, humans can find any pattern at any generation. With the SGD approach, you can only look for one pattern. So it's just not an apples to apples comparison and sort of misleading / unfair to say that open-endedness is way more "efficient", because you aren't asking it to do the same task.

Said another way, I think with the open-endedness, it seems like you are looking for thousands (or even millions) of shapes simultaneously. With SGD, you're kinda flipping that around, and looking for exactly 1 shape, but giving it thousands of generations to achieve it.


Did you see their recent paper building on this? Throwback to Picbreeder!

https://x.com/kenneth0stanley/status/1924650124829196370


Ooh I haven't, but this is exactly the kind of follow-up I was looking for. Thanks for sharing!


Timestamped link to the YouTube video: https://youtu.be/T08wc4xD3KA?t=124


Please add an llms.txt file! https://llmstxt.org/

I'd love to see how far I can take this by giving it to an LLM and asking it to format for me with Quarkdown.


> Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.

Oh my god. It just occurred to me that LLMs may have a better experience “browsing the internet” than humans do.

That is so tragically depressing.


On the other hand, the requirement of stuff like this and MCP for LLMs might just bring back open APIs and stuff like RSS back to the web at large!


Soon we'll start having a humans.txt format to account for this

https://xkcd.com/927/


I used to think that typing speed was not really that important, especially when now we have so many LLMs doing the typing for us. But honestly, now I think it's even more important because the specificity and detail in your prompts are paramount to getting a good response, and something like a dictation tool (which is what I'm using right now) is really good for generating very specific prompts.

In fact, I wrote all this out using a dictation tool in ~20 seconds (258 WPM).


Agreed. I installed Whisper on my Linux computer with a program called SpeechNote. The dictation is all done offline, and it is astonishingly good.

I also have a whisper dictation app on my Android phone (the app's ID string is 'org.woheller69.whisper', there's a few Whisper apps with the same name "Whisper", but this one is my favorite).

FWIW this was typed by hand on my phone, but these apps are both amazing.


Curious, what dictation tool are you using?


https://github.com/JinayJain/dictator

Built one for myself. It's context-aware and promptable.

Tested well on Linux, not so much on other platforms but in theory should support them.

It's a bit meta but I wrote it mostly using Claude Code. Once I had an MVP, I was able to prompt much faster by just speaking out what I wanted it to change.


Same, 258wpm is something.


FYI I wrote a comment in the same thread where I described the tools I use (TLDR: Whisper).


Likely no coincidence that they announce their company, io, during Google I/O.

Search "io" on Google right now and see what comes up...


I'm pretty sure this was named years ago


I'm more referring to launch timing


> Search "io" on Google right now and see what comes up...

I don't know about you, but neither of them comes up. Google I/O has always been something you have to search for including the "Google" part and this news is all about Jony Ive, not the nondescript company name.


Went in an incognito window and searched "io" and this announcement was shown right above Google IO [1].

[1] https://i.imgur.com/xNKjFXa.png


That's not the only concern though. The web is by far the most accessible way to distribute something, which is paramount in getting people to use it.


They really need to fix the kerning on some of their charts


I wonder if you could use the Chrome local AI API to build something like this: https://developer.chrome.com/docs/ai


Interesting, I'll look into it


For anyone who wants to listen, it's on this page: https://yummy-fir-7a4.notion.site/dia


Wow. Thanks for posting the direct link to examples. Those sound incredibly good and would be impressive for a frontier lab. For two people over a few months, it's spectacular.


A little overacted, it reminds me of the voice acting in those flash cartoons you'd see in the early days of YouTube. That's not to say it isn't good work, it still sounds remarkably human. Just silly humans :)


Overacted and silly humans indeed: https://www.youtube.com/watch?v=gO8N3L_aERg


"flash cartoons in the early days of Youtube" Wouldn't those be straight from Newgrounds?


Thank you! I couldn't remember the name Newgrounds for some reason!!


Reminded me of the Fenslerfilm G.I. Joe sketch where the kids have something on the stove burning


Stop all the downloading!


This is an instant classic. Sesame comparison examples all sound like clueless rich people from The White Lotus.


Sounds great. One of the female examples has convincing uptalk. There must be a way to manipulate the latent space to control uptalk, vocal fry, smoker’s voice, lispiness, etc.


Make sure you're using the "-it-qat" suffixed models like "gemma3:27b-it-qat"



Thanks. I was wondering why my open-webui said that I already had the model. I bet a lot of people are making the same mistake I did and downloading just the old, post-quantized 27B.


Cursor is likely very tuned for Claude (prompts-wise and all) due to its dominance with 3.5 and now 3.7. Still, Gemini 2.5's tool calling has been pretty poor in my experience which Cursor heavily relies on.


Yep. Tool calling is terrible across all Gemini models. I’m not sure why, when the model itself is so good.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: