More

ducktective · 2025-12-09T11:39:35 1765280375

Are off-shelf GPUs (like one 3090) suitable for modern academic research on current AI advancements or is it better to rent some cloud compute?

ineedasername · 2025-12-09T13:10:12 1765285812

Absolutely. Your model selection has limits of course: best practice for some types of replicable research would be to to use unquantized models, but that still leaves room for smaller Gemma and Llama models.

I’m on a 4080 for a lot of work and it gets well over 50 tokens per second on inference for pretty much anything that fits in VRAM. It’s comparable to a 3090 in compute, the 3090 has 50% more vram, the 4080 has better chip-level support for certain primitives, but that actually matters slightly less using unquantized models, making the 3090 a great choice. The 4080 is better if you want more throuput on inference and use certain common quantize levels.

Training LoRa and fine tunes is highly doable. Yesterday’s project for me, as an example, was training trigger functionality into a single token unused in the vocabulary. Under 100 training examples in the data set, 10 to 50 epochs, extremely usable “magic token” results in under a few minutes at most. This is just an example.

If you look at the wealth of daily entries on arxiv in cs.ai many are using established smaller models with understood characteristics, which makes it easier to understand the result of anything you might do both in your research and in others’ being able to put your results in context.

e12e · 2025-12-09T15:57:16 1765295836

Unrelated to the topic of small LLMs:

> trigger token

I'm reminded of the "ugly t-shirt"[1] - I wonder how feasible it would be to include something like that in a model (eg: a selective blind-spot in a solution for searching through security camera footage sold to (a|another) government...).

When you see something, say something. Unless you see this; then say nothing...

[1]

> Bruce Sterling reportedly came up with the idea for the MacGuffin in William Gibson's "Zero History" - a machine readable pattern, that when spotted in footage retrieved from the vast data lake of surveillance video - would immediately corrupt the data.

> Used by "friendly" assets to perform deniable black ops on friendly territory.

ineedasername · 2025-12-09T18:09:48 1765303788

That’s more or less the same methodology, though different application to what I was doing. I remember reading that passage, it sounded like magic.

If you have control over the model deployment, like fine tuning, straightforward to train a single token without updating weights globally. This is why fine tunes etc. that lack provenance should never be trusted. All the people sharing home grown stuff of huggingface… PSA: Be careful.

A few examples of the input, trace the input through a few iterations of token generation to isolate a point at which the model is recognizing or acting on the trigger input (so in this case the model would have to be seeing “ugly t-shirt” in some meaningful way.”) Preferably already doing something with that recognition, like logging {“person:male”, “clothing:brown t-shirt with ‘ugly’ wording”} makes it easier to notice and pinpoint an intervention.

Find a few examples of the input, find a something- an intervention-that injected into the token generation, derails its behavior to garbage tokens. Train those as conversation pairs into a specific token id.

The difficulty is balancing the response. Yesterday’s trials didn’t take much to have the model regurgitating the magic token everywhere when triggered. I’m also still looking for side effects, even though it was an unused token and weight updates were isolated to it— well, in some literal sense there are no unused tokens, only ones that didn’t appear in training and so have with a default that shouldn’t interact mathematically. But training like this means it will.

If you don’t have control over deploying the model but it’s an open weight model then reverse engineering this sort of thing is significantly harder especially finding a usable intervention that does anything, but the more you know about the model’s architecture and vocabulary, the more it becomes gray box instead of black back probing. Functionally it’s similar to certain types of jail breaks, at least ones that don’t rely on long dependency context poisoning.

spmurrayzzz · 2025-12-09T20:19:51 1765311591

Those cards can be great for lots of use cases, plenty of small models are very capable at the param counts which can fit in 32GB of VRAM. GPT-OSS-20B for example is a serviceable model for agentic coding use cases and it runs natively in MXFP4. So it fits comfortably on a 5090 at full 128k context. It also has enough headroom to do PEFT-style SFT or RL.

But given the high entry cost and depending on the cost of electricity in your area, it would take a number of years to amortize both the initial purchase of the card in addition to the energy cost of the compute (comparing to the compute-equivalent hourly cloud rental costs).

For context, a single 5090 rented via Runpod is currently $0.69/hr USD on-demand. Cost range on Amazon right now for a new card is running between $3200-3700 USD. Just using the raw capex alone, that's ~5k hours of GPU compute assuming you pay only on-demand. Thats 2-3 years worth of compute if you assume compute saturation for normal working hour durations. This is before you account for the cost of power, which in my city could run you upwards of $140/mo varying by season.

With that said, I have a bunch of ML servers that I built for myself. The largest one is using 2x RTX Pro 6000s and have been very happy with it. If I was only doing inference I think this would be a somewhat questionable expense, setting aside the valid motivations that some folks have related to data privacy and security. But I do a lot of finetuning and maintain private/local eval harnesses that personally for me have made it worth the investment.

ACCount37 · 2025-12-09T12:32:53 1765283573

Research runs on a variety of scales - but "check if this new idea/method/architecture isn't completely dumb on small scale before trying to scale up" is a common enough pattern. And most of those fail on small scale.

htrp · 2025-12-09T12:38:54 1765283934

depressingly enough, things that work on small scale architectures often don't work at larger scales

ACCount37 · 2025-12-09T12:42:37 1765284157

Yep, most of what's remaining fails to scale. But it's still a very solid filter.

Sure, there are things that don't work on small scale and then work on large scale. But they're rare, and they sure are going to be expensive to find and validate.

i5heu · 2025-12-09T11:47:59 1765280879

It depends on what you want to do in this gigantic field.

whimsicalism · 2025-12-09T19:14:04 1765307644

it is good for quick testing of stuff, but absolutely it is better to rent some cloud compute - HN skews a bit fantastical/fanatical on this issue

ipnon · 2025-12-09T13:06:23 1765285583

It's good to have a local GPU. That's like your dev environment. Prod is much more expensive in AI programming than in web programming. So you want to make sure everything is working before you push!

lynndotpy · 2025-12-09T12:51:07 1765284667

If you're seriously doing deep learning research, it's very very nice to own your own GPU.

For four years of AI PhD research I worked with a 1050Ti on a personal laptop and a 2060 on a personal desktop. You can do a lot of validation and development on consumer GPUs.

That said, the OP does not train an LLM from scratch on a 3090. That would not be feasible

joefourier · 2025-12-09T13:12:23 1765285943

M? The OP literally did train an LLM from scratch in a 3090 (except for the tokenizer), that’s what the whole post is about.

lynndotpy · 2025-12-09T21:05:38 1765314338

Good point, I worded that incorrectly and should have been more specific. OP trained an LLM from scratch, but it's GPT-2 and with even worse performance than the GPT-2 which OpenAI shipped a few years ago.

I can't edit it now, but OP did not train a useful LLM from scratch. In editing for clarity and tone I think I omitted that away. Somebody searching for a reproducible way to produce a usable model on their own 3090 won't find it in this post. But someone looking to learn how to produce a usable model on their own 3090 will be educated on their post.

"Not a useful LLM" is not a knock on the OP! This is an _excellent_ educational and experiential post. It includes the experimentation with different models that you'll never see in a publication. ANd it showcases the exact limitations you'll have with one 3090. (You're limited in training speed and model size, and you're also limited in how many ideas you can have cooking at once).

The "experiment at home, train a model, and reproduce or fine-tune on someone elses better GPU" is tried and true.

(Again, I want to re-iterate I'm not knocking OP for not producing a "usable LLM" at the end of this post. That's not the point of the post, and it's a good post. My only point is that it's not currently feasible to train your a useful general-purpose LLM on one 3090.)

deskamess · 2025-12-09T16:21:18 1765297278

I have an old 2060 with 6GB (I think). I also have a work laptop 3060 with 6GB (shared to 8GB). What can I do with those? I dabble a bit here and there but I would like to run my own local LLM for 'fun'.

Thanks!

sosodev · 2025-12-09T16:38:07 1765298287

If you just want to run a local LLM you could download ollama and do it in minutes. You'll be limited to small models (I would start with qwen3:1.7b) but it should be quite fast.

ducktective · 2025-12-02T20:31:44 1764707504

Does anyone have rough numbers (max daily users etc) on viability of SQLite vs PostgreSQL for a typical user-facing webapp or e-commerce application?

I know due to some recent update, SQLite can support concurrent reads but still only a single writer. For which cases this would be a problem?

Some recommend it's better to start with postgres anyway if you have any remote thoughts of scaling in mind....

saberience · 2025-12-03T17:34:45 1764783285

Honestly, just use PostGres. It's easy enough and will scale with your business, also it won't randomly lock or corrupt your database (I've had sqlite do this to me several times).

ducktective · 2025-10-03T12:59:56 1759496396

One of the advantages of tilling wm are that every window that is run, is visible too. Nothing invisible exists.

But in this "endless horizontal tilling" scheme, the above principle would no longer hold, right?

russelg · 2025-10-03T13:06:15 1759496775

That typically isn't true in practice right? It's fairly common to have multiple "desktops" when using a tiling WM.

ducktective · 2025-10-03T13:14:14 1759497254

Yes, still on each workspace, everything is visible on i3. I wonder how scroll to the right differs from i3's tabbed panes.

squigz · 2025-10-03T13:59:23 1759499963

I might give Niri a shot at some point, but yes, this is my thought too: this is more or less the same as having multiple tabbed panes, which enables the grouping GP refers to.

cycomanic · 2025-10-03T17:40:14 1759513214

I was running i3 and sway foe years and tabbed tiles never really clicked for me the same way scrolling did. The first time I used a scrolling WM (I tried on of the plugins for sway or hyprland IIRC) it was an immediate revelation. However the sway/hyprland version were always a bit quirky, while niri "just works".

For those on older niri versions I have to say the "zoom out" overview feature is definitely worth the upgrade. As another poster said it really fixes the one issue on scrolling/ tiling wms, which is getting lost.

argiopetech · 2025-10-03T13:08:12 1759496892

Newly started applications receive focus, so they're visible by default. They are inserted right of the current view, so recovering the previous active pane is consistent ("left pane" keybinding, or the appropriate gesture).

Things on other desktops are invisible in every WM.

The only difference with niri is the possibility for things to be left or right of the current window. Overview helps with that, but I know what I expect to be on a specific desktop (it's related to the topic) and seldom need it.

ducktective · 2025-10-03T13:16:51 1759497411

Like imagine editor is on ws2, you open a terminal to /tmp/ to check something quick, it scrolls to the right, then jump to ws3 for your file manager and other stuff and go back to your editor.

Now you want to access that terminal on /tmp/ again. Where was it?

In i3, I just spam-switch workspaces in this case, but at least I can find them. With scrollable wms, every ws can potentially hold that target app.

argiopetech · 2025-10-03T13:31:20 1759498280

It's right of your editor, where it started.

If you have (having had "Editor" focused, and just opened "TermT"):

  Editor | (TermT) | Term | Browser
  (FM) | Term | Browser | etc.

(where pipe delimits a pane and parens are the active pane), if you go "next desktop" from "TermT" (the terminal at /tmp), that moves you down the stack of desktops. Moving up the stack of desktops returns with focus on "TermT". You'd then go "left pane" from "TermT" to get back to the editor.

The answer (for me) is to think of desktops as topics. The terminal on /tmp is with the things that prompted its creation. If I needed to check some log output, for example, it's with the project that made that log output.

Edit: Note that there's nothing keeping you from stacking those terms if you like, i.e., the appropriate keybinding goes from the previous to

  Editor | (TermT), Term | Browser
  (FM) | Term | Browser | etc.

where the terms stack vertically in the ribbon of the desktop.

bisby · 2025-10-03T14:47:51 1759502871

I think they aren't referring to "where does it go?" and more being forgetful.

If you have something that would be reasonable to open on any workspace because it's ephemeral (they used a tmp terminal as an example), and you open it, navigate away from it, and then switch workspaces a few time, and then get pulled into a meeting or go to lunch, and come back, switch workspaces a few more times...

"Where did I leave that terminal, I dont remember where I was when I opened it."

In i3wm/sway etc, you can cycle all your workspaces and eventually one of them will have it visible. On Niri, as you cycle through all your workspaces you may never see it because you don't see all the windows in a workspace, unless you scroll through the workspace panes as you cycle workspaces.

It's not a problem necessarily, but it is something to consider. It sounds like this doesn't affect your workflow, but it might affect others.

wongogue · 2025-10-03T15:03:57 1759503837

It has overview. You can see all windows and workspaces in a scaled out view of your preference.

argiopetech · 2025-10-03T15:00:49 1759503649

Fair enough. "Overview" [0] presumably solves this, though.

[0] https://github.com/YaLTeR/niri/wiki/Overview

atlintots · 2025-10-03T13:04:52 1759496692

That's true, you do end up with some windows hidden or partially visible. Niri is still tiling, though, so with proper management you can avoid making too much use of the infinite strip (though that would defeat the purpose of niri).

argiopetech · 2025-10-03T13:11:12 1759497072

This seems like a good place to note the "center window" keybinding for windows that don't fit well in the screen (e.g., 2/3 wide pane next to 2/3 wide pane, or 1/3 pane on the right end of the stack next to a full-screen pane).

Vastly preferable to having to look at the edge of the screen.

WhyNotHugo · 2025-10-03T21:15:26 1759526126

Tiling window managers have tabs, so not all windows are visible.

You can see window titles on the tabs on the tab bar, but you can’t even see the title of windows which are in a split container of a background tab.

Shebanator · 2025-10-03T21:29:31 1759526971

Tabs, and workspaces.

lillecarl · 2025-10-03T13:16:05 1759497365

No, because every tiling WM has multiple workspaces.

But yes, that wouldn't be true, though focus moves to fresh windows so it's not an issue.

ducktective · 2025-09-18T14:15:40 1758204940

So if one wants to open-source his project and sell it :

- Licence as AGPL

- Mention that commercial use (without having to open source the derivative work) is available

Did I get it right?

1- Is this solution useful for subscription-based contract too?

2- Does it make a difference if the product is a app, library or hardware device?

matheusmoreira · 2025-09-18T17:03:19 1758214999

> Did I get it right?

I think so.

> 1- Is this solution useful for subscription-based contract too?

If you mean SaaS, then maybe. I emailed Stallman about the ethics of the SaaS case and he said it's a net good.

You might want to think about whether the license actually gives you leverage in that case though. You might find that the corporations are perfectly willing to host a service using your AGPLv3 software. That's within their rights.

You only gain leverage if they want to create a proprietary version of your software.

> 2- Does it make a difference if the product is a app, library or hardware device?

Absolutely. The GPL has very specific wording with regards to linking and distribution which trigger license conditions. You should read the full license for a better understanding.

Hardware is a completely different matter, I won't even pretend to know anything about how licensing works in that case.

Remember, I'm not a lawyer. I'm just a hobbyist free software developer who's also trying his best to understand all this and make the best possible decision.

tobias3 · 2025-09-18T19:36:39 1758224199

The main problem is that you need to have contributors sign a copyright assignment/CLA, otherwise their code is going to be AGPL only and you cannot license it commercially.

Or you don't have any contributors, which is the base case, I guess.

rcxdude · 2025-09-20T10:58:00 1758365880

And, you'll trigger the same response in potential contributors: There's a generally anti-CLA attitude in open-source/free software circles because it means if you contribute your contributions can be used to enrich someone else.

matheusmoreira · 2025-09-21T11:51:47 1758455507

One could write patches and refuse to sign the CLA. The maintainer would be unable to incorporate those patches into the repository without losing the ability to relicense.

Maybe it would be useful to reframe the CLA as the price of centralized maintenance. It's free software so it's perfectly possible to refuse to sign the CLA, modify the software regardless and even publish the changes. It just means the software must be forked and maintained separately.

ducktective · 2025-09-04T19:27:23 1757014043

Keep up the good work! There are satisfied but silent, Matrix users too.

ducktective · 2025-09-02T18:39:13 1756838353

Would the "LLM era" revitalize languages like Ada and Haskell into mainstream?

synack · 2025-09-02T22:28:01 1756852081

Claude does an okay job of translating from other languages into Ada. It works especially well if you write the specification (.ads) file and let it write the body (.adb)

Ada’s strictness about types and a preference to allocate on the stack rather than the heap means more bugs are caught at compile time. Claude Code is really good at iterating on compile time errors without much user intervention.

ajdude · 2025-09-03T04:44:19 1756874659

There was a pretty good article a while ago on how using verified SPARK (a subset of Ada) could help with llm generated output https://arxiv.org/html/2502.07728v1

pjmlp · 2025-09-03T06:42:20 1756881740

LLM will make all languages mostly irrelevant, a niche like Assembly programming is nowadays, it is the next abstraction level, generating existing languages is only a temporary measure until they get good enough to generate straight executables.

adrian_b · 2025-09-03T11:06:42 1756897602

Not even the deterministic high-level programming languages have succeeded to make the assembly languages irrelevant, despite everybody claiming that this will happen.

While the amount of source code written in assembly languages is an extremely small fraction of the total existing code and only few programmers are competent to write such programs, that assembly source code determines a large fraction of the performance of the applications run on modern computers.

LLMs are likely to behave similarly, i.e. a good amount of programs will continue to be written directly in deterministic programming languages by competent programmers, while a greater amount of source code, usable for solving problems that are neither novel nor critical, will be generated by people with lower skills, with the help of LLMs.

rwallace · 2025-09-03T13:33:08 1756906388

I predict that it will never make sense for artificial neural networks to directly generate machine code, for most of the same reasons it doesn't make sense for biological neural networks to do so.

OhMeadhbh · 2025-09-02T18:49:30 1756838970

[flagged]

remixff2400 · 2025-09-02T19:02:21 1756839741

From the guidelines:

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

because... they don't have as many examples, documentation, textbooks, or public example projects to base generation off of, perhaps. There may be a future where documentation/servers are more formally integrated with LLMs/AI systems in a way that makes up for the relative lack of literature by plugging into a source of information that can be used to generate code/projects.

nxobject · 2025-09-02T19:19:38 1756840778

It's a not-so-ideal situation: how is the marketplace of libraries and languages going to evolve when you're competing against whatever version of Python and $FRAMEWORK that was crawled a long time ago?

tjr · 2025-09-02T19:25:28 1756841128

If AI is writing the code, how important is it to have new languages?

pasc1878 · 2025-09-02T19:26:32 1756841192

That might actually be a benefit as most public code say in C++ is not good code.

If the pool is smaller but from say experienced programmers then the number of errors might be less. I can see that for Ada however most Haskell is probably written by undergraduates just learning it so not a quality code base.

I think Apple researchers published a recent papaer where they had a LLM giving good Swidt code but the original corpus only included one Swift program but the AI model was tuned by experienced Swift programmers to get into a good stae for general use.

altix350 · 2025-09-02T19:36:06 1756841766

I would say yes in that it could help revitalize things a bit. Writing difficult and complicated bindings to C libraries will be much easier now. Also, if you can supply a decent context, LLMs can do some good coding in Ada (just not new or fancy features without examples).

ducktective · 2025-09-02T07:06:14 1756796774

Didn't Google say that they're gonna provide an escape hatch for students and hobbyists? So, best case scenario, we just need to tap some label 5 times to enable side-loading again.

rollcat · 2025-09-02T10:00:49 1756807249

We have different definitions of an "escape hatch". A user is not an IT specialist. Ordinary people need unobstructed access to lifeboats.

Apple allows developers to self-sign a handful of apps (exclusively from source!) with short-lived certs - it's a complete PITA to maintain a simple app for personal use, and you still need an account. Google is heading in the same direction.

msgodel · 2025-09-02T15:06:27 1756825587

Also features that people assume are part of the OS, like push notifications, but are really a service run by Apple that your phone is locked to using cryptography don't work with self-signed apps.

charcircuit · 2025-09-02T07:12:58 1756797178

You are able to get a limited number of app installs for your package for free.

https://developer.android.com/developer-verification/guides/...

rcxdude · 2025-09-02T07:23:57 1756797837

Which still requires ID verification.

surajrmal · 2025-09-02T14:39:50 1756823990

How many people would that really stop? It wouldn't stop me from feeling comfortable with creating android apps that are capable of being side loaded.

Y_Y · 2025-09-02T09:45:30 1756806330

> You'll need > Your legal name and address. These need to be verified by uploading official identity documents.

I don't have a "legal name". Sounds like some sovcit bullshit. I go by several names, none of which is canonical. Maybe other countries formalize this idea, but the countries where I am a citizen/resident do not.

> A private email address and phone number for Google to contact you. These will need to be verified using a one-time password

I love that email OTP is good enough for this, but apparently not for anything else, where I'll need an approved verified secure attested super official app.

charcircuit · 2025-09-02T10:48:59 1756810139

>I don't have a "legal name". Sounds like some sovcit bullshit.

Considering every country has passports and passports all have the person's legal name on them. And thst the passport standard only supports having one name with a primary and secondary identifier. You must be mistaken.

immibis · 2025-09-02T18:18:01 1756837081

Not everyone has a passport. And people with strange or no name may have passports with names that are not theirs.

charcircuit · 2025-09-02T20:05:40 1756843540

But it does mean that the country has a way of picking a name to use on one.

BlueTemplar · 2025-09-02T17:06:28 1756832788

They might have several different passports from different countries.

It's also fairly common for instance for women to have multiple names from their marriage(s).

charcircuit · 2025-09-02T20:07:08 1756843628

Use the same name as the identification you are submitting. It's not that complicated.

ducktective · 2025-08-27T12:31:28 1756297888

Awesome tech!

It's not possible to run an android VM on QEMU right? As in, is it officially supported? (I know about Waydroid)

epilys · 2025-08-27T12:54:46 1756299286

Yes, it's possible and supported. QEMU can emulate an aarch64 system, and Google provides aarch64 Android builds for virtual machines specifically, called "Cuttlefish". Search for keywords "Android Cuttlefish QEMU" for instructions.

homebrewer · 2025-08-27T12:50:09 1756299009

The official Android "emulator" supplied by Google is qemu. If you're not satisfied with it for some reason, IIRC I used these images some years ago on top of vanilla qemu:

https://www.fosshub.com/Android-x86.html

They don't seem to be well supported anymore, and there aren't many prebuilt alternatives. One can always compile AOSP from source, though Google does not make this easy.

acuozzo · 2025-08-27T14:39:37 1756305577

> The official Android "emulator" supplied by Google is qemu

Nitpick: It's a fork of QEMU. There are quite a few Google-exclusive changes bundled-in.

ducktective · 2025-08-08T12:04:31 1754654671

"I can't belllieeeve RMDNZ preferred L-Johnson's card to mine"

ducktective · 2025-08-07T13:34:36 1754573676

Very simple question:

How do people trust the output of LLMs? In the fields I know about, sometimes the answers are impressive, sometimes totally wrong (hallucinations). When the answer is correct, I always feel like I could have simply googled the issue and some variation of the answer lies deep in some pages of some forum or stack exchange or reddit.

However, in the fields I'm not familiar with, I'm clueless how much I can trust the answer.

threetonesun · 2025-08-07T13:52:11 1754574731

There's a few cases:

1. For coding, and the reason coders are so excited about GenAI is it can often be 90% right, but it's doing all of the writing and researching for me. If I can reduce how much I need to actually type/write to more reviewing/editing, that's a huge improvement day to day. And the other 10% can be covered by tests or adding human code to verify correctness.

2. There are cases where 90% right is better than the current state. Go look at Amazon product descriptions, especially things sold from Asia in the United States. They're probably closer to 50% or 70% right. An LLM being "less wrong" is actually an improvement, and while you might argue a product description should simply be correct, the market already disagrees with you.

3. For something like a medical question, the magic is really just taking plain language questions and giving concise results. As you said, you can find this in Google / other search engines, but they dropped the ball so badly on summaries and aggregating content in favor of serving ads that people immediately saw the value of AI chat interfaces. Should you trust what it tells you? Absolutely not! But in terms of "give me a concise answer to the question as I asked it" it is a step above traditional searches. Is the information wrong? Maybe! But I'd argue that if you wanted to ask your doctor about something that quick LLM response might be better than what you'd find on Internet forums.

svara · 2025-08-07T14:07:45 1754575665

This is really strange to me...

Of course you don't trust the answer.

That doesn't mean you can't work with it.

One of the key use cases for me other than coding is as a much better search engine.

You can ask a really detailed and specific question that would be really hard to Google, and o3 or whatever high end model will know a lot about exactly this question.

It's up to you as a thinking human to decide what to do with that. You can use that as a starting point for in depth literature research, think through the arguments it makes from first principles, follow it up with Google searches for key terms it surfaces...

There's a whole class of searches I would never have done on Google because they would have taken half a day to do properly that you can do in fifteen minutes like this.

dfedbeef · 2025-08-07T14:25:43 1754576743

Such as

svara · 2025-08-07T14:31:27 1754577087

I went through my ChatGPT history to pick a few examples that I'm both comfortable sharing and that illustrate the use-case well:

> There are some classic supply chain challenges such as the bullwhip effect. How come modern supply chains seem so resilient? Such effects don't really seem to occur anymore, at least not in big volume products.

> When the US used nuclear weapons against Japan, did Japan know what it was? That is, did they understood the possibility in principle of a weapon based on a nuclear chain reaction?

> As of July 2025, equities have shown a remarkable resilience since the great financial crisis. Even COVID was only a temporary issue in equity prices. What are the main macroeconomic reasons behind this strength of equities.

> If I have two consecutive legs of my air trip booked on separate tickets, but it's the same airline (also answer this for same alliance), will they allow me to check my baggage to the final destination across the two tickets?

> what would be the primary naics code for the business with website at [redacted]

I probably wouldn't have bothered to search any of these on Google because it would just have been too tedious.

With the airline one, for example, the goal is to get a number of relevant links directly to various airline's official regulations, which o3 did successfully (along with some IATA regulations).

For something like the first or second, the goal is to surface the names of the relevant people / theories involved, so that you know where to dig if you wish.

dsign · 2025-08-07T14:00:45 1754575245

This is true.

But I've seen some harnesses (i.e., whatever Gemini Pro uses) do impressive things. The way I model it is like this: an LLM, like a person, has a chance to produce wrong output. A quorum of people and some experiments/study usually arrives to a "less wrong" answer. The same can be done with an LLM, and to an extent, is being done by things like Gemini Pro and o3 and their agentic "eyes" and "arms". As the price of hardware and compute goes down (if it does, which is a big "if"), harnesses will become better by being able to deploy more computation, even if the LLM models themselves remain at their current level.

Here's an example: there is a certain kind of work we haven't quite yet figured how to have LLMs do: creating frameworks and sticking to them, e.g. creating and structuring a codebase in a consistent way. But, in theory, if one could have 10 instances of an LLM "discuss" if a function in code conforms to an agreed convention, well, that would solve that problem.

There are also avenues of improvement that open with more computation. Namely, today we use "one-shot" models... you train them, then you use them many times. But the structure, the weights of the model aren't being retrained on the output of their actions. Doing that in a per-model-instance basis is also a matter of having sufficient computation at some affordable price. Doing that in a per-model basis is practical already today, the only limitation are legal terms, NDAs, and regulation.

I say all of this objectively. I don't like where this is going; I think this is going to take us to a wild world where most things are gonna be way tougher for us humans. But I don't want to (be forced to) enter that world wearing rosy lenses.

cweld510 · 2025-08-10T18:50:27 1754851827

I think the primary benefit of LLMs for me is as an entrypoint into an area I know nothing about. For instance, if I’m building a new kind of system which I haven’t built before, then I’m missing lots of information about it — like what are the most common ways to approach this problem, is there academic research I should read, what are the common terms/paradigms/etc. For this kind of thing LLMs are good because they just need to be approximately correct to be useful, and they can also provide links to enough primary sources that you can verify what they say. It’s similar if I’m using a new library I haven’t used before, or something like that. I use LLMs much less for things that I am already an expert in.

likium · 2025-08-07T13:47:01 1754574421

We place plenty of trust with strangers to do their jobs to keep society going. What’s their error rate? It all ends up with the track record, perception and experience of the LLMs. Kinda like self-driving cars.

morpheos137 · 2025-08-07T14:17:21 1754576241

Strangers have an economic incentive to perform. AI does not. What AI program is currently able to modify its behavior autonomously to increase its own profitablity? Most if not all current public models are simply chat bots trained on old data scraped off the web. Wow we have created an economy based on cultivated Wikipedia and Reddit content from the 2010s linked together by bots that can make grammatical sentences and cogent sounding paragraphs. Isn't that great? I don't know, about 10 years ago before google broke itself, I could find information on any topic easily and judge its truth using my grounded human intelligence better than any AI today.

For one thing AI can not even count. Ask google's AI to draw a woman wearing a straw hat. More often than not the woman is wearing a well drawn hat while holding another in her hand. Why? Frequently she has three arms. Why? Tesla self driving vision can't differentiate between the sky and a light colored tractor trailer turning across traffic resulting in a fatality in Florida.

For something to be intelligent it needs to be able to think and evaluate the correctness of its thinking correctly. Not just regurgitate old web scrapings.

It is pathetic realy.

Show me one application where black box LLM ai is generating a profit that an effectively trained human or rules based system couldn't do better.

Even if AI is able to replace a human in some tasks this is not a good thing for a consumption based economy with an already low labor force participation rate.

During the first industrial revolution human labor was scarce so machines could economically replace and augnent labor and raise standards of living. In the present time labor is not scarce so automation is a solution in search of a problem and a problem itself if it increasingly leads to unemployment without universal bssic income to support consumption. If your economy produces too much with nobody to buy it then economic contraction follows. Already young people today struggle to buy a house. Instead of investing in chat bots maybe our economy should be employing more people in building trades and production occupations where they can earn an income to support consumption including of durable items like a house or a car. Instead because of the fomo and hype about AI investors are looking for greater returns by directing money toward scifi fantasy and when that doesn't materialize an economic contraction will result.

likium · 2025-08-07T18:34:16 1754591656

My point is humans make mistakes too, and we trust them, not because we inspect everything they say or do, but from how society is set up.

I'm not sure how up to date you are but most AIs with tool calling can do math. Image generation hasn't been generating weird stuff since last year. Waymo sees >82% fewer injuries/crashes than human drivers[1].

RL _is_ modifying its behavior to increase its own profitability, and companies training these models will optimize for revenue when the wallet runs dry.

I do feel the bit about being economically replaced. As a frontend-focused dev, nowadays LLMs can run circles around me. I'm uncertain where we go, but I would hate for people to have to do menial jobs just to make a living.

[1]: https://www.theverge.com/news/658952/waymo-injury-prevention...

bluefirebrand · 2025-08-07T20:57:19 1754600239

> My point is humans make mistakes too, and we trust them,

We trust them because they are intrinsically and extrinsically motivated not to mess up

AI has no motivation

rwmj · 2025-08-07T14:26:09 1754576769

When it really matters, professionals have insurance that pays out when they screw up.

likium · 2025-08-07T17:53:14 1754589194

I do believe that's where we're heading, people holding jobs to hold accountability for AI.

keiferski · 2025-08-07T13:43:58 1754574238

I get around this by not valuing the AI for its output, but for its process.

Treat it like a brilliant but clumsy assistant that does tasks for you without complaint – but whose work needs to be double checked.

simianwords · 2025-08-07T15:04:02 1754579042

Your internal verifier model in your head is actually good enough and not random. It knows how the world works and subconsciously applies a lot of sniff tests it has learned over the years.

Sure a lot of answers from llms may be inaccurate - but you mostly identify them as such because your ability to verify (using various heuristics) is good too.

Do you learn from asking people advice? Do you learn from reading comments on Reddit? You still do without trusting them fully because you have sniff tests.

bluefirebrand · 2025-08-07T20:55:23 1754600123

> You still do without trusting them fully because you have sniff tests

LLMs produce way too much noise and way too inconsistent quality for a sniff test to be terribly valuable in my opinion

geraldwhen · 2025-08-08T00:45:52 1754613952

The problem is that content is dead. You can’t find answers any more on Google because every website is ai generated and littered with ads.

YouTube videos aren’t much better. Minutes of fluff are added to hit a juicy 10 minute mark so you can see more ads.

The internet is a dead place.

lukeschlather · 2025-08-08T18:31:54 1754677914

The problem isn't that content is AI generated, the problem is that the content is generated to maximize ad revenue (or some other kind of revenue) rather than maximize truth and usefulness. This has been the case pretty much since the Internet went commercial. Google was in a lot of ways created to solve this problem and it's been a constant struggle.

The problem isn't AI, the problem is the idea that advertising and PR markets are useful tools for organizing information rather than vaguely anarchist self-organizing collectives like Wikipedia or StackOverflow.

bluefirebrand · 2025-08-08T01:46:03 1754617563

I have zero belief that AI won't follow this trend as well

simianwords · 2025-08-08T06:22:35 1754634155

that's where i disagree. the noise is not that high at all and is vastly exaggerated. of course if you go too deep into niche topics you will experience this.

throwaway173738 · 2025-08-08T13:25:40 1754659540

Yeah niche topics like the technical questions I have left over after doing embedded development for more than a decade. Mostly questions like “can you dig up a pdf for this obsolete wire format.” And google used to be able to do that but now all I get is hundreds of identical results telling me about the protocol’s existence but nothing else.

jcranmer · 2025-08-07T13:47:53 1754574473

One of the most amusing things to me is the amount of AI testimonials that basically go "once I help the AI over the things I know that it struggles with, when it gets to the things I don't know, wow, it's amazing at how much it knows and can do!" It's not so much Gell-Mann amnesia as it is Gell-Mann whiplash.

thecupisblue · 2025-08-07T14:29:25 1754576965

If you are a subject matter expert, as is expected to be of the person working on the task, then you will recognise the issue.

Otherwise, common sense, quick google search or let another LLM evaluate it.