Hacker Newsnew | past | comments | ask | show | jobs | submit | contemplatter's commentslogin

In particular, it's odd that the greatest software developer in the world (ChatGPT) hasn't made progress with LLM swarm computation.


How is "LLM swarm computation" different that single bigger LLM?


The same reason why you don't let Mr Musk do all the work. He can't.

One LLM is limited, one obvious limitation is its context window. Using a swarm of LLMs that each do a little task can alleviate that.

We do it too and it's called delegation.

Edit: BTW, "swarm" is meaningless with LLMs. It can be the same instance, but prompted differently each time.


> The same reason why you don't let Mr Musk do all the work. He can't.

Better to limit his incompetence to one position.


I beg to differ. Imagine him taking down Twitter, Facebook, Instagram, and all the others in one fell swoop!


Context window is a limitation, but have we actually hit the ceiling wrt scaling that? For GPT, you need O(N^2) VRAM to handle larger context sizes, but that is a "I need more hardware" problem ultimately; as I understand, the reason why they don't go higher is because of economic viability of it, not because it couldn't be done in principle. And there are many interesting hardware developments in the pipeline now that the engineers know exactly what kind of compute they can narrowly optimize for.

So, perhaps, there aren't swarms yet just because there are easier ways to scale for now?


I am sure the context window can go up, maybe into the MB range. But I still see delegation as a necessary part of the solution.

For the same reason one genius human does not suddenly need less support staff, they actually need more.

Edit: and why it isn’t here yet is because it’s new and hard.


It's easy to distribute across many computers which communicate with high latency


LLMs are already running distributed on swarms of computers. A swarm of swarms is just a bigger swarm.

So again, what is the actual difference you are imagining?

Or is it just that distributed X is fashionable?


Rather large parts of your brain are more generalized, but in particular places we have more specialized areas. Now, you looking at it would consider it all the same brain most likely, but if you're looking at it in systems thinking view, it's a small separate brain with a slightly different task than the rest of the brain.

If 80% of the processors in a cluster are running 'general LLM' and 20% are running 'math LLM' are they the same cluster? Could you host the cluster in a different data center? What if you want to test different math LLM modules out with the general intelligence?


I think I would consider them split when the different modules are interchangeable so there is de facto an interface.

In the case of the brain, while certain functional regions are highly specialized I would not consider them "a small separate brain". Functional regions are not sub-organs.


Significantly higher latency than you have within a single datacenter. Think "my GPU working with your GPU".


There are already LLMs hosted across the internet (Folding@Home style) instead of in a single data center.

Just because the swarm infrastructure hosting an LLM has higher latency across certain paths does not make it a swarm of LLMs.


> There are already LLMs hosted across the internet (Folding@Home style)

Interesting, I haven't heard of that. Can you name examples?


I read about Petals (1) some time ago here on HN. There are surely others too, but I don't remember the names.

1. https://github.com/bigscience-workshop/petals


Julia seems like a nice programming language. Is it still worth learning, though, since ChatGPT can write all software now?


Agree. Me think me learn english, but me too think ChatGPT come, then why learn English? So me not learn now, only wait for ChatGPT.


Bizarro hate Superman, but ChatGPT hate Superman better, that mean Bizarro love Superman!


> Agree. Me think me learn english, but me too think ChatGPT come, then why learn English? So me not learn now, only wait for ChatGPT.

> write above proper english

I agree. I thought about learning English, but I also thought since ChatGPT is available, why bother learning it? So I decided not to learn it now and just wait for ChatGPT.


Is it still worth posting, since ChatGPT can write all comments now?


Prompt: "Write a software that does exactly what the Product Owner wants. Here is is email *copy pasted email*"

The result was...disappointing.


If you know Python, you can learn Julia in a day.


Generally agree, but if you write pythonic Julia it can lead to performance issues (in particular for numerical code, which I realize is not the focus of this post). It’s taken me a while to unlearn the numpy/torch style of heavily array broadcasting instead of writing more loops and functions


So, in theory, someone who’s fairly proficient at python but never used numpy/torch could pick it up quite easily without having to unlearn things?

Just asking because I don’t want to start poking at the ML stuff but don’t have any experience/baggage to go along with it.


Yes, I agree with the sibling poster that it’s a pretty straightforward transition, especially without array broadcasting baggage. Personally I think programming in Julia is usually more fun than python, and I think my python code has also benefited from learning Julia as well

Maybe I was too broad in my initial statements about pythonic code, because comprehensions for example work pretty much the same as in python, and they are fast. It’s just that if you’re used to mind bending array broadcasting tricks in numpy or whatever, there’s usually a more Julian way to get it done with better simplicity and performance. BenchmarkTools.jl and some of the standard library tools are also really great for getting a sense of what matters


You can mostly convert Python code to Julia line by line (in fact ChatGPT can do it, albeit not very well since there aren't that many Julia examples in the training data). Sometimes you have to look up a function that doesn't have the same name. Writing new Julia code is often even easier than Python since you can write loops in numeric code and have good performance (also you have opt-in rigorous type checking which helps define interfaces and catch bugs, an excellent package manager, as well as sane threading/multiprocessing, contrary to Python).

However, I must caution against trying to replace Torch within Julia. The Julia ecosystem does not have these huge libraries for neutral networks yet. Building such a thing requires a huge investment (by a company like Google or Facebook) and Julia is not there yet. You can do neural networks with GPU support in Julia (even with extremely fancy autodiff capabilities) but it's not "production ready" in that you will have to deal with the quickly moving ecosystem and probably even end up contributing to it, if you stick with it long enough to build something interesting.

On the other hand, if you ever wanted to add a "neutral network term" to a PDE to simultaneously solve and train the network, Julia is the place to go. It's crazy what kinds of modeling you could potentially do with stuff like that.


That's not a 10x engineer, that's a 100x engineer.

You're speaking of ChatGPT!


A book about ChatGPT

A video about ChatGPT

A blog post about ChatGPT


ChatGPT is smarter than you, and it's better looking too.


ChatGPT is smarter than you, and it's better looking too.


And it doesn't talk back.


What other subject matters?


This is more like way too much about ChatGPT than any other era I have seen before in HN. Lot of cool stuff in every area imaginable.


GPT-4 is the greatest software developer in the world. I'm sure that it can scale itself easily.


commit hash: deadbeef

commit author: GPT-4

commit message: ":rocket_emoji: Set all Azure clusters to autoscale in order to resolve flooded message queue"


commit signed by: Samaritan


Person of Interest reference?


what makes you think AGI is not already here?


I'm not sure I'm convinced it's not. I will say ChatGPT is well on its way to replacing Google in terms of my everyday usage of the thing, as in "every day there's a new search for which ChatGPT gives me a better answer than Google."

Yesterday I had a totally reasonable conversation with it about the bugs I found in my flat... and the day before that I was pair programming with it on the best TypeScript interface for the library I'm building... nothing like this existed six months ago and to be honest it's pretty mind blowing. It makes me excited for the future, and even though I know the same demonstration of skill makes some people existentially worried for the future, I can't help but optimistically look forward to what we're all going to build with this thing...


Programs must be written for machines to execute, and only incidentally for people to read.


Yeah, it's not like people change existing software.

New features, bug fixes, etc. Who does that? Just input all the exist code in the prompt and have the AI do whatever is needed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: