Context window is a limitation, but have we actually hit the ceiling wrt scaling that? For GPT, you need O(N^2) VRAM to handle larger context sizes, but that is a "I need more hardware" problem ultimately; as I understand, the reason why they don't go higher is because of economic viability of it, not because it couldn't be done in principle. And there are many interesting hardware developments in the pipeline now that the engineers know exactly what kind of compute they can narrowly optimize for.
So, perhaps, there aren't swarms yet just because there are easier ways to scale for now?
Rather large parts of your brain are more generalized, but in particular places we have more specialized areas. Now, you looking at it would consider it all the same brain most likely, but if you're looking at it in systems thinking view, it's a small separate brain with a slightly different task than the rest of the brain.
If 80% of the processors in a cluster are running 'general LLM' and 20% are running 'math LLM' are they the same cluster? Could you host the cluster in a different data center? What if you want to test different math LLM modules out with the general intelligence?
I think I would consider them split when the different modules are interchangeable so there is de facto an interface.
In the case of the brain, while certain functional regions are highly specialized I would not consider them "a small separate brain". Functional regions are not sub-organs.
> Agree. Me think me learn english, but me too think ChatGPT come, then why learn English? So me not learn now, only wait for ChatGPT.
> write above proper english
I agree. I thought about learning English, but I also thought since ChatGPT is available, why bother learning it? So I decided not to learn it now and just wait for ChatGPT.
Generally agree, but if you write pythonic Julia it can lead to performance issues (in particular for numerical code, which I realize is not the focus of this post). It’s taken me a while to unlearn the numpy/torch style of heavily array broadcasting instead of writing more loops and functions
Yes, I agree with the sibling poster that it’s a pretty straightforward transition, especially without array broadcasting baggage. Personally I think programming in Julia is usually more fun than python, and I think my python code has also benefited from learning Julia as well
Maybe I was too broad in my initial statements about pythonic code, because comprehensions for example work pretty much the same as in python, and they are fast. It’s just that if you’re used to mind bending array broadcasting tricks in numpy or whatever, there’s usually a more Julian way to get it done with better simplicity and performance. BenchmarkTools.jl and some of the standard library tools are also really great for getting a sense of what matters
You can mostly convert Python code to Julia line by line (in fact ChatGPT can do it, albeit not very well since there aren't that many Julia examples in the training data). Sometimes you have to look up a function that doesn't have the same name. Writing new Julia code is often even easier than Python since you can write loops in numeric code and have good performance (also you have opt-in rigorous type checking which helps define interfaces and catch bugs, an excellent package manager, as well as sane threading/multiprocessing, contrary to Python).
However, I must caution against trying to replace Torch within Julia. The Julia ecosystem does not have these huge libraries for neutral networks yet. Building such a thing requires a huge investment (by a company like Google or Facebook) and Julia is not there yet. You can do neural networks with GPU support in Julia (even with extremely fancy autodiff capabilities) but it's not "production ready" in that you will have to deal with the quickly moving ecosystem and probably even end up contributing to it, if you stick with it long enough to build something interesting.
On the other hand, if you ever wanted to add a "neutral network term" to a PDE to simultaneously solve and train the network, Julia is the place to go. It's crazy what kinds of modeling you could potentially do with stuff like that.
I'm not sure I'm convinced it's not. I will say ChatGPT is well on its way to replacing Google in terms of my everyday usage of the thing, as in "every day there's a new search for which ChatGPT gives me a better answer than Google."
Yesterday I had a totally reasonable conversation with it about the bugs I found in my flat... and the day before that I was pair programming with it on the best TypeScript interface for the library I'm building... nothing like this existed six months ago and to be honest it's pretty mind blowing. It makes me excited for the future, and even though I know the same demonstration of skill makes some people existentially worried for the future, I can't help but optimistically look forward to what we're all going to build with this thing...