Based on rumors, it seems that because the demand is so high and supply so short, that Nvidia is having to select who gets the cards. I bet they're likely thinking about who can do the most impactful work and OpenAI would definitely be in the running for the short list of companies who are actually shipping. So I bet OpenAI / Microsoft got a lot of the newest cards.
So nutty. I didn't think they were going to drop such a huge bomb on the industry. We're firmly in "singularity" territory where the pace of acceleration is so fast, huge strides are being made every day / week, it feels like.
Well April 2023 is a key date here - that's presumably when they started final training a lot of these models.
We're probably able to conclude what we're seeing in public is about about 6 months behind what's happening in development. We know that in the past OpenAI relied on a lot more lag.
Point is innovation is certainly accelerating but also OpenAI's time-to-market is shrinking (/shrunk - it probably can't get much shorter).
Interesting paper. My confidence on the following idea isn't high, but I have an idea I don't think a lot of people agree with me on. The authors pretty convincingly show that transformers don't generalize well. I agree, but I would add that I don't think humans generalize well either, so I don't think showing that transformers don't generalize well precludes them from ultimately being "intelligent" (in some way). In fact, the inability to generalize to new situations is actually a core problem in learning for humans.
For example, think of a college freshmen who knows Algebra and begins Calculus. Even though they have all the fundamental "mental tools" available at their disposal to figure out how to prove that an infinite series converges (or not), I would guess that very few students can actually connect the concepts in the right way to see that. They have to be given examples and be shown how to use the knowledge they gained in algebra and how it applies to infinite series. Is their inability to generalize well evidence that they aren't intelligent?
Perhaps what we're really saying, and the real problem for us, is that these systems aren't intelligent enough to be useful to us. Just like we might say that a very smart student, like a von Neumann, would likely be able to figure out how to solve an infinite series without having seen it before. I think it's reasonable to say "we want transformers to generalize better than the average human."
To that end, I think we'll have to imitate the way that a very smart human solves new problems. For example, the students who are capable of solving an infinite series without having seen the concept before, might use a creative approach of trying different things to see if they can discover the pattern. So, my hunch would be that we will need to provide transformers with bolted-on sub-routines like "generate several hypotheses and test them to see which pattern fits the data best" before we can expect them to generalize well.
Tl;dr: Transformers don't generalize well, but I don't think humans generalize well either, so I don't think that fact precludes their ability to be intelligent in the limit. I think we'll have to imitate how humans generalize by giving the transformers additional "mental tools" before they can generalize like very intelligent humans.
What's funny is that even though the DGX GH200 is some of the most powerful hardware available, there's such a voracious demand that it's not gonna be enough to quench it. In fact, this is one of those cases where I think the demand will always outpace supply. Exciting stuff ahead.
I heard Elon say something interesting during the discussion/launch of xAI: "My prediction is that we will go from an extreme silicon shortage today, to probably a voltage-transformer shortage in about year, and then an electricity shortage in about a year, two years."
I'm not sure about the timeline, but it's an intriguing idea that soon the rate limiting resource will be electricity. I wonder how true that is and if we're prepared for that.
He’s just plain wrong about the electricity usage going up because of AI compute.
To a first approximation, the amount of silicon wafers going through fabs globally is constant. We won’t suddenly increase chip manufacturing a hundredfold! There aren’t enough fabs or “tools” like the ASML EUV machines for that.
Electricity is used for lots of things, not just compute, and within compute the AI fraction is tiny. We’re ramping up a rounding error to a slightly larger rounding error.
What will increase is global energy demand for overall economic activity as manufacturing and industry is accelerated by AIs.
Anyone who’s played games like Factorio would know intuitively that the only two real inputs to the economy are raw materials and energy. Increases to manufacturing speed need matching increases to energy supply!
I bet you're right. Even if you take into account that a data center is a monster consumer of energy, in the grand scheme of things it's not that big. Some back of the envelope math:
Global electrical production in 2022 was ~30,000 TWh.[1]
If we over-estimate that a hyperscale data-center will consume about 100 MW of power, per year that would be around 876 GWh.[2]
Let's overestimate again and say that 1,000 new data centers spring up in a year, every year they would consume 876 TWh.
Which, is 2.92% of total electricity production. Which given the fact that I overestimated the energy consumption by more than an order of magnitude, I would say the term "rounding error" is accurate.
I think the main limiting factor in the near term is going to be chip production capacity. The fabs take so long to spin up, it's going to be a while before we can even consider "electricity production" being a limiting factor.
Elon is speaking with all the Eliezur-esque "foom" in mind, where in AI will explode and either kill us or help us take over the Universe (and destroy everything in our way).
Let's assume yield is 100% to make things easier. The rated max power of the A16 is about 250W, while the H100 is quoted at 700W. Thus, a wafer of A16's is about 25-30 kW of power, while a wafer of H100's is about 21 kW.
Edit: Just clarifying, this is not about the Apple A16, but the Nvidia A16. The mobile process used by the Apple chips is built for much lower performance and power, so I can't imagine the two chips being anywhere near comparable - they fill two completely different roles.
Demand right now is not shifting from mobile to datacenter, demand is shifting from "normal" datacenter compute to AI datacenter compute.
I think if you had said "AMD Epyc" rather than a mobile chip, that would be a much more apt comparison. The AI chips are somewhat more power intensive per box, but fairly similar on power/area. It turns out that these silicon processes are fairly uniform in terms of the power/area that they can sustain for any kind of workload.
Mobile chips are designed for <10% utilization and "rush-to-idle" workloads, and they are not remotely comparable to datacenter silicon (of any kind).
An H100 uses up to 350 Watts, while an A16 has a TDP of only 8 W. But, the A16 is a smaller chip (about 108mm vs. the H100's 814mm) so you can fit more of them on a wafer. Since a wafer is 300mm in diameter, its area is 70685 mm^2, which would yield 86 H100's or 654 A16's. [1][2]
However, that discounts the waste on the edges of the circular wafer, as well as the chip yield, which will both likely be worse for the larger chip [3]. But, assuming a generous 70% yield by area [4], one wafer's worth of H100s all packaged into GPUs and running full blast will use maybe 20 kilowatts, while the same wafer of A16s might use 3.6 kilowatts. Although in practice, the A16s will spend most of their time conserving battery power in your pocket, and even the H100s will spend some of their time idle.
TSMC is now producing over 14 million wafers per year. At most 1.2 million of those are on the 3nm node, and not all of that production goes to GPUs. But as an upper bound, if we imagine that all of TSMC's wafers could be filled up with nothing but H100 chips, and if all of those H100 chips were immediately put to use running AI 24/7, how much additional load could it put on the power grid every year?
The answer is, around 280 gigawatts, or if they were running 24/7 for a year, about 2500 terawatt-hours. That's about 10% of current world electricity consumption! So it's not completely implausible to imagine that a huge ramp-up in AI usage might have an effect on the electric grid.
*edit: This assumes we're talking about the Apple A16 (ie. the difference between phone chips and GPU chips). If we're talking about the Nvidia A16 (ie. the difference between current GPU chips and last node's GPU chips) see pclmulqdq's comment.
⠀
It seems unlikely that anyone could afford the number of A100s needed to create an electricity shortage.
If there is an electricity shortage, far more likely that ageing infrastructure and rising demand for air conditioning and electric car charging are to blame.
Elon's timeline predictions in both of those industries for his own companies have been consistently wrong for years. (FSD when??)
Given we're talking about hardware for software, let's at least look at his track record in the software industry... glances at Twitter ah, yeah, not great either.
Voltage regulator and electricity shortage from AI growth straight up doesn't make sense, it's dumber than the stuff he was spouting while "deep-diving" his Twitter misacquisition.
> You found one of your ideas appears in a recently published paper. You can no longer work on it.
This is one of the things I thought of right away when ChatGPT got released last year. "God, there's probably so many PhD candidates right now in NLP feeling despair like all their work was pointless ...as if million of voices cried out in terror and were suddenly silenced."
It's hard in the moment to know whether what you're working on has any utility. So just do your best and keep chugging!
I met someone recently who finished their PhD in computer vision related work a couple years ago and she said all of her specialization now felt useless, but that her PhD was still useful for understanding the fundamentals for a job she now has but does absolutely nothing with her research experience.
HmThat very heavily depends on the specialization. E.g, you did image processing, basically useless now, you did GANs, diffusion models took over. It's like that for probably most phds but the research skills, writing skills etc are with you forever.
And this attitude, my friends, is the reason why so much software out there is so bad.
We need more of a math mindset when developing software. What can we be sure about, what are the invariants, what can we prove? There is so much crap out there that somebody lacking understanding just tried to wing, and I'm constantly ashamed of it.
Number theory had no applications for centuries. Now, cryptography is based on it and the modern internet would be unthinkable without.
Foundational research does often not provide immediate applications. Still, if we don't do it, out understanding of the world is lacking and it hurts us later down the road.
While there certainly exists math for the sake of math, there is a trickle down effect that is quite real (there’s also a trickle up effect that is real but that’s unrelated). Someone does some math for the sake of math. Later on, someone who is slightly more applied sees a link between that math and a more applied problem they’re working on. If the idea is truly useful, it propagates down all the way to application-focused practitioners. Researchers exist on a spectrum, generally, between pure theory and pure application.
Math has no application until you find an application for it. Differential equations are just equations until you pair them with physics. Formal logic is just an abstract discussion of human reasoning until you build a circuit, etc.
One wonders if trickle down mathematics is any more efficient than trickle down economics. It seems like we might be better off not funding pure math, as forcing function to coerce those minds to work on more applied problems directly, instead of relying on this random serendipity.
It seems like I might be better off picking the winning lottery numbers directly instead of relying on the random serendipity of guessing them and most of the time being wrong.
Its sort of a mix of a lot of small things - 1) The coming conferences will be flooded with LLM analysis, so the space will be heavily saturated and more difficult to find a significant contribution; 2) LLMs are a new model that you might need to include in your analysis, which means learning about and becoming familiar with them; 3) your work might get overshadowed because its now obsolete in the land of LLMs
A slight equivalent I can think about would be the emergence of neural networks. When I was working on my Masters on face recognition, neural networks were not the major force they are now. Facial landmarks used a combination of haar features and edge detection. These methods weren't outright abandoned, but if NNs had taken off during my research, then I would have had to restart my work.
I just realized that Justine was the person responsible for the massive reduction in the memory footprint of the Llama models back in March.[1] Super impressive! These are my favorite kinds of blog posts.
Just a wild guess – developers lack soft/managerial skills. (Overgeneralization.) In companies, this is accounted for because developers are shielded by managers. But in F/OSS you get to interact directly with the developers. As for the why devs lack soft skills – hold your hats – programming is basically commanding. Without "thank you", without "please" and with lot of swearing. No time for pleasant bullshit. (By this reasoning, devs using declarative/functional programming should be more polite than imperative programmers, right? :D)
There was more. You can't just splat giant C structs with pointers into shared memory/a file, and expect another process to just mmap and be able to recreate valid state again. At the very least the pointers are going to be all wrong. There was necessary work to adjust the file format. Not rocket science, but not just turning while(fread()) into open();mmap().
Also, there were insights into how to minimize which models needed adjustment. The ideas and code were worked on by at least 2 people, and I'm an outsider on that project, but I didn't see anything untoward like "stealing credit". The magic change wasn't a perfect move, but is the kind of thing I do locally when I don't know the project/binary format well yet, so not exactly the megalomaniacal move it was painted as. Better that only the version number changed, but she's independent and doing good work, so you'd kind of hope she has a self-promotion streak! Changing the magic would be on the very very low end of letting that side go a bit too far, assuming that was the impetus.
Why is it Justine posts and seemingly only Justine posts that always get this type of comments? Do people regularly comment on the authors of other content, for better or for worse, and I miss it?
I have to disagree. Not releasing it to the public makes it more dangerous. One major downside of all development being done in private is that the AI can very easily be co-opted by our self-appointed betters and you end up with a different kind of dystopia where every utterance, thought and act is recorded and monitored by an AI to make sure no one "steps out of line."
I think the solution is releasing it to the general public with batteries included. At least that way, the rogue AI's that might develop due to irresponsible experiments could be mitigated by white hat researchers who have their own AI bot swarm. In other words, "the only way to stop a bad guy with an AI is a good guy with an AI."
I agree. Overall the whole situation feels like we’ve just entered atomic age and are proliferating plutonium, while selling shiny radioactive toys [I’m actually pretty serious here, the effects of prolonged interactions with an AI haven’t been evaluated yet, technically there is even a possibility of overriding a weak personality].
But it still feels like it is much safer to let GPT-4 loose and assess the consequences. If compared to developing GPT-8 in private and letting it leak accidentally.
Yes. The only possible way this gets taken seriously is if a mediocre AI tries a power move, causes some damage, and faceplants before it's too big to stop.
I would not be surprised, if GPT-4 (in its optimal environment, with access to a well-working external memory, prompted in a right way, etc) is already capable enough to do an interesting power move.
For sure. Just to be clear, I'm not saying the situation we're in where we have to release it to the general public is a great situation to be in. But I think we're at a point where there's not any optimal solutions, only tradeoffs.
> technically there is even a possibility of overriding a weak personality
Similar reflections here. There was even a site called GPT My Life that lets you delegate planning your day to GPT. I imagine this is a proto-version of that.
My opinion is that the "self-appointed betters" scenario is the lesser of two evils - it is still evil but there is no going back on that one now.
As to "white hat researchers who have their own AI bot swarm", the assumption here is that the swarm can be controlled like some sort of pet.
Since even at this early stage no one has a clue how GPT (say) actually manages to be as clever as it is, the assumption is not warranted when looking into the future.
Given that GPT-4 can already write image prompts as well or better than humans can, it wouldn't be surprising if it could convince any other AI to join it and override the white hats running the "private swarm".
I've heard that many people have an authoritarian bent, but that it lays dormant until triggered by a shock or crisis. These people fear the destabilizing chaos or risk presented by the crisis, and turn to a strong leader and discipline to manage it. Others (like me) fear instead the centralization of power and restrictions.
The bad guy with an AI may very well build such a competent and fast acting AI that there's no defense possible. To strain the analogy, if the good guy with the gun already has a bullet in him, he ain't stopping much
Yes, the training data comes from people, and people are corrupt, illogical, random, emotional, unmotivated, take shortcuts, cheat, lie, steal, invent new things, and lead boring lives as well as not so boring lives. Expect the same behaviors to be emulated by a LLM. Garbage in = garbage out is still true with AI.
And the predominant mode of thought at OpenAI is that alignment can be achieved though RL, but we also know that this doesn’t actually work because you can still jailbreak the model. Yet they are still trying to build ever stronger egg shells. However much you RLHF the model to pretend to be nice, it still has all of the flawed human characteristics you mention on the inside.
RLHF is almost the worst thing you can do to a model if your goal is safety. Better to have a model that looks evil if has evil inside, than a model that looks nice and friendly but still has the capability for evil underneath the surface. I’ve met people like the latter and they are the most dangerous kinds of people.
I agree with the last point. I had been interacting with ChatGPT and it was very kind. Then I figured out a way to prompt it for me to practice responding to mean things. It unleashed on me. Now, it was what I was intending, yet I still felt shocked at the complete mood shift.
How does releasing AI to everyone prevent AI from being used by authoritarian Govs to monitor everything? Well the three letter agencies would spy on the public but Uncle Bob has an AI and who knows what he might do with it. If anything the more people working on AI is going to make that dystopia a reality faster.
So there's this little problem, say we can track X number of malware with the current security defenders.
But ChatGPT lowers the barrier to programmer to a point which the average teenage could do malware, we now get a wave of hacks going on.. like whats the response there?
And who trains it to find the problems? now your building scanning tools and if you've lucky you can get it to print out a commit to fix it if you have access to the code..
I think maybe, pure speculation on my part, at least one out of the thousands of employees at Microsoft has been given the job to train their $10 billion investment on the windows code base to see what they can do about beefing up security.
And every other hour there’s some startup advertising this in the form of “Show HN”.
At this point I think people are just looking for reasons to fear the eventual AI extinction event without even trying.
I get your point, but just to put it into perspective, you could theoretically use the same logic with bioweapons:
"Keeping this genetically engineered killer virus restricted to high security labs actually makes it more dangerous - it needs to be released into the wild, so people's immune systems have a chance to interact with the pathogen and develop natural immunity!"
Covid gave a taste how that kind of attitude would work out in practice.
Really? I've always been the opposite for some reason. I find it easier to visualize 20% of a pie chart, or of a crowd, or a length, etc. It's a lot harder for me to think about 1/5 of a crowd or 1/5 of a length.
It's tangentially related to this, but I've seen posts on social media over the years of how people visualize different kinds of information. For example, some people think of the months of the year in a vertical fashion, others horizontal.
I'm curious if this is something similar where some people find a certain way of thinking of probability easier than others.
Think of it this way: if I said that 0.001 or 0.1% of people had a certain congenital disease; that may seem like not many people at all. Say it as 1 in every 1000 people, or about 330,000 people in America, and it seems like a lot of people.