ngiyabonga's comments

ngiyabonga · 2025-05-07T21:29:54 1746653394

Just pasted the whole thing into the system prompt for Qwen 3 30B-A3B. It then:

- responded very thoroughly about Tianmen square

- ditto about Uyghur genocide

- “knows” DJT is the sitting president of the US and when he was inaugurated

- thinks it’s Claude (Qwen knows it’s Qwen without a system prompt)

So it does seem to work in steering behavior (makes Qwen’s censorship go away, changes its identity / self, “adds” knowledge).

Pretty cool for steering the ghost in the machine!

ngiyabonga · on May 28, 2024

Hi Andrej!

First, thank you for your teaching, it has helped me a lot, didn't think I'd ever have the chance to say thank you, but here you are and I hope this gets to you!

Question - what's a relevant (05-2024) baseline to compare the performance of c code to? Back when you made nanoGPT you were seeing "the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training". So twice the memory on the c node, but unsure of data size /epochs, any other details I may be missing. I.e. what's the net uplift of running c vs "legacy" torch code?

Thanks again for everything.

karpathy · on May 28, 2024

The baseline is definitely PyTorch (or JAX), and indeed something like nanoGPT. I just never got nanoGPT "past the finish line" of really crossing the t's and dotting the i's and reproducing the models with as much care as I did now and here in llm.c, and getting to the point where it's a single launch command that just does the thing.

I think I'll try to develop the `train_gpt2.py` inside llm.c to be that, so that we have the two implementations exactly side by side, and it's all nice and comparable.

The C/CUDA code is currently a little bit faster than PyTorch (last time I measured ~2 weeks ago it was about 6% faster), and I think we can push this further. This is done by manually hard-coding a bunch of fusions/optimizations that are non-trivial for torch.compile to find (e.g. our FusedClassifier). But PyTorch has some pending work/PRs that will also speed up their side a lot.

Ultimately my interest in llm.c is to have a nice, clean, minimal, super dependency-light repo in direct C/CUDA implementation, which I find aesthetically pleasing. And on top of that, educational, i.e. using all of the above as an endpoint of an intro LLM course.

ilaksh · on May 28, 2024

Just out of curiosity, how do you feel about Tinygrad? They just released 0.9 and are also on the HN home page today.

raymond_goo · on May 28, 2024

Maybe talk to MasterClass...

ngiyabonga · on April 3, 2023

Is this even legal? I remember Elon had a gag order from the SEC about market manipulating tweets. Changing Twitter's logo isn't technically a tweet I guess?

rideontime · on April 3, 2023

Gotta make that 40 billion back somehow.

v0idzer0 · on April 4, 2023

Something tells me a guy worth $200 billion doesn’t need to pump and dump doge. I think its merely for fun or a nod to the type of Twitter loyalists he wants to foster

__derek__ · on April 3, 2023

AFAIK, Dogecoin isn't a security, so it wouldn't fall under the SEC's purview regardless.

runarberg · on April 4, 2023

Surely this must at the very least be illegal advertising or something like that. He literally owns one the largest ad platform in the world and sells ads to other people. He then uses his ownership to put an ad for a currency he has stakes in. He doesn’t just put this ad somewhere where other ads goes, but in a very prominent location, visible to all users, even those with ad-blockers, other customers don’t have the option of buying an ad placement like this.

__derek__ · on April 4, 2023

My comment was specifically about the connection to the SEC: if Dogecoin isn't a security, then Musk isn't illegally promoting a security or violating other securities laws. Maybe the CFTC would dislike his promotion, but it's not clearly illegal for someone to use one property they own to promote/advertise their other interests as long as it's disclosed (which, as you note, Musk has done with Dogecoin).

runarberg · on April 4, 2023

This was also rolled out in Europe, and I’m pretty sure there are consumer protection laws there which disallow this type of behavior. A media giant cannot use that media to promote their interest in a privileged position which is not granted to other ads.

hammyhavoc · on April 4, 2023

Eloquently put.

airstrike · on April 3, 2023

It's a private company

teg4n_ · on April 3, 2023

Under a consent decree

ngiyabonga · on March 31, 2023

This is my experience as well, and not just for coding, but quite a few "knowledge-worker" type tasks I've been experimenting with: summarisation, translation, copywriting, idea generation to name a few.

GPT really shines as an uncomplaining sidekick and I'm not sure I believe the doom and gloom about comoditizing professions yet.

I'm skeptical someone with no domain knowledge can use any of these emerging AI tools to replace someone with domain knowledge + chatGPT. The latter will just breeze through a lot of grunt work and enable more time and inspiration needed for even more value adding work.

SuoDuanDao · on March 31, 2023

I'd say the legal profession is a likely pattern for how things will go - lots of demand for the people already at the top, but that added productivity plus automation of tasks typically done by junior people means fewer ways to get into the profession.

visarga · on March 31, 2023

> plus automation of tasks typically done by junior people

Juniors will be learning from GPT as well, and they might get proficient faster than the seniors did.

atonse · on April 1, 2023

This is exactly what I told my junior programmers who were worried that they’d be out of jobs.

First I said, that might be true, we just don’t know. (Couldn’t sugarcoat things). But a better way right now to think about this is, imaging having a one on one senior programmer mentor that’s pairing with you all day long. You will learn much faster.

Similarly I’m sure younger attorneys will benefit from having unlimited access to a more senior “analyst” of contract and legal language to spot things they miss, and can learn much faster.

roberttod · on March 31, 2023

Agree that for a while at least it seems you'll need to understand the domain it is helping you with. That said, with the improvements in GPT4 over 3 in such a short time, and with plugins, I would be surprised if it takes any longer than a couple of years to be much much better at this to the point it is correct most of the time. You will probably still need domain knowledge at that point, not quite sure if that will last either.

passwordoops · on March 31, 2023

That's my impression as well. What I'm thinking about lately is the way it could change how juniors develop domain expertise - less and less will come from cutting their teeth on these grunt tasks.

But then again, there's plenty of examples of technology evolving and grunt work evolving in response

ngiyabonga · on March 30, 2023

I find these two concepts [1, 2] at odds with each other. Not a critique on the author - on the contrary, empathy: I felt the same when applying to YC (did not make a batch).

On one hand, the general impression you get when preparing for your application (via FAQs, Startup School, YC videos, etc) is very much in line with [1] - YC is looking for _very_ early stage.

But once you go through the actual application you feel focus shift towards [2] - metrics and $. That is to say (with admittedly some not-having-been-selected bias), I feel [2] is a significant factor in deciding on applications. So as I weigh in on whether to apply for the next batch, I'm not sure whether a product I've just finished building makes sense for YC and whether I should gamble on attempt #3.

I think it would help both YC and founders if they take some steps to make this clear(er) for potential applicants.

[1] > In general, there is an evident focus on the very early stage without a product. The main theory and advice are about how to figure out what to do, how to build an MVP, how to launch, how to talk to customers, where to find the first 10 customers, how to raise the first money, and so on. Needless to say, for companies with tens or even hundreds of thousands in revenue it won’t be very valuable.

[2] > [...] present dry facts—how much money customers already paid you, what the size of the market, if you count all the units you can sell, what you have actually built and what is working today. And this will always sound bad for anyone, it just can't sound good in the early days.

danenania · on March 30, 2023

My impression from YC and investors in general is that without a product and traction, the investment decision becomes mostly about you as a person. Do you have an impressive résumé? Are you an MIT/Stanford grad? Do you come across as especially intelligent and ambitious in conversations?

When you have a product and traction, a lot of that goes out the window. All the things I listed above are basically proxies for “might have the ability to make something people want”. If you’ve already shown you can do that, other things become less important. On the extreme end, where you are growing like crazy, most investors will overlook just about any flaw or lack of credentials.

necubi · on March 30, 2023

> But once you go through the actual application you feel focus shift towards [2] - metrics and $. That is to say (with admittedly some not-having-been-selected bias), I feel [2] is a significant factor in deciding on applications.

While the application does ask about that (and I'm sure it's very helpful for getting in if you've already demonstrated traction) it's absolutely not required to have any revenue or users when getting accepted into YC. I'm in the current batch. We applied before we'd built anything and definitely before we had any users (we didn't even have a name yet -- we had to pick one in order to submit the application). Across the batch there are a few companies that came in with strong traction but they're definitely in the minority.

majani · on April 1, 2023

I think this dissonance comes from most of YC's guiding literature being written by PG in the 2000s and early 2010s. Back in those days it was definitely possible to grab the attention of users and investors with janky prototypes. But now the reality is that prototypes are nothing and traction is king. The new YC partners probably have to make this shift in practice and out of politeness, they don't call out the guiding literature

acecreamu · on March 30, 2023

You're right, it feels contradictive.

On the defense of YC, I would say, they aim to make you think in terms of metrics and $ from day 1, perhaps?

ngiyabonga · on March 1, 2023

Thank you. You got me down quite the rabbit hole. Exploring having a server I control act as Wireguard server and have a VPS act as a client to proxy requests from the outside world inside the tunnel on NGinx machine(s). Been reading into subresource integrity in the browser as well, that seems like it may prevent tampering at another level as well (ISP?). Again, thank you.