Hacker Newsnew | past | comments | ask | show | jobs | submit | Nomadeon's commentslogin

Agree. Concrete example: "What was the Japanese codeword for Midway Island in WWII?"

Answer on Wikipedia: https://en.wikipedia.org/wiki/Battle_of_Midway#U.S._code-bre...

dolphin3.0-llama3.1-8b Q4_K_S [4.69 GB on disk]: correct in <2 seconds

deepseek-r1-0528-qwen3-8b Q6_K [6.73 GB]: correct in 10 seconds

gpt-oss-20b MXFP4 [12.11 GB] low reasoning: wrong after 6 seconds

gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3 minutes !

Yea yea it's only one question of nonsense trivia. I'm sure it was billions well spent.

It's possible I'm using a poor temperature setting or something but since they weren't bothered enough to put it in the model card I'm not bothered to fuss with it.


I think your example reflects well on oss-20b, not poorly. It (may) show that they've been successful in separating reasoning from knowledge. You don't _want_ your small reasoning model to waste weights memorizing minutiae.


> gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3 minutes !

To be fair, this is not the type of questions that benefit from reasoning, either the model has this info in it's parametric memory or it doesn't. Reasoning won't help.


Not true: During World War II the Imperial Japanese Navy referred to Midway Island in their communications as “Milano” (ミラノ). This was the official code word used when planning and executing operations against the island, including the Battle of Midway.

12.82 tok/sec 140 tokens 7.91s to first token

openai/gpt-oss-20b


What's not true? This is a wrong answer


this was the answer from my instance. it is true. "not true" was refering to the poster


How would asking this kind of question without providing the model with access to Wikipedia be a valid benchmark for anything useful?


I've had BCBS reject prescriptions they've already pre-approved multiple times. First time, they denied all knowledge of the REQUEST (when my doctor has their APPROVAL fax in hand). Second time they acknowledged they had approved it but had to contact their pharmacy group to put in an "override". I've submitted a complaint with the state insurance regulator though I doubt it goes anywhere.

Anyone for a class action lawsuit on the grounds of bad faith breach of contract and medical malpractice for obstructing access to care they already admit is medically necessary (by denying something already pre-approved)? I don't even want money. I want a Consent Decree enforced by the court that strikes fear across their whole industry.

Audio record every interaction you have with insurance and tell 'em you're on a recorded line.


> medical malpractice for obstructing access to care they already admit is medically necessary

This has been tried before and failed. The insurers argument is that they are not denying care, they are just not willing to pay for it, which isn’t practicing medicine.


> Audio record every interaction you have with insurance and tell 'em you're on a recorded line.

I’ve tried this. They just hang up. So I record and then have a transcript generated, and I save my call logs.


As we went from zero to 10K+ embedded systems (full PCs with significant RAM) the issues got weirder.

The best was a one-off error log along the lines of "unknown type System.DateTime". Huh? That's a system defined type that just went missing. Never saw it again.

Another at a different employer was a crash that occurred after a check condition that absolutely should have gated the crash from being reached. Single threaded. Simple microcontroller. Had to reflash it to flip the bit back. After doing the math on how much RAM we had in the wild vs. cosmic bit flip rates reported in super computers, we had to expect one flip per year.

If it's a safety critical system, server or not, use ECC RAM!!


I am of the opinion that far more than safety critical systems should use ECC. You should use ECC anytime bit flips might cost you more money then the ECC does, which is why I insist on ECC for my desktop computers.


Given a large enough installed base, any unlikely but possible problem will occur for some segment of the user population. Guaranteed. :/


And the effect will be the opposite of the intent - now only students wealthy enough to afford private tutoring or that have a stable enough home environment to self-study can pass AP exams. Wonder what races those students will be?

I'm admittedly talking my own book, having come from the lower end of middle class. I was also the runt of the litter and at the bottom of the social pecking order. Students of all races enjoyed looking down on me to feel better about their own situation.

I retired from working for other people at age 40. I credit gifted/AP courses in 8-12th grade for a significant portion of that. Did my racial background still advantage me? You bet. I had a stable home environment and low crime neighborhood.

Perhaps we should focus on how to offer children stable home environments and low crime neighborhoods.


> wealthy enough to afford private tutoring

I can't speak for other cultures, but non-wealthy first generation Asian immigrant parents consciously and constantly sacrifice their own well-being to afford tutoring and extra curricular for their kids.


The children of immigrants for all races do much better than the norm in the United States on just about every measure.


That's certainly what I experienced at Lowell High, a San Francisco "test school."

I recall that in my AP Chemistry class, 100% of the students were Asian. Nothing kept non-Asian students from taking the class, except the desire to work hard and be there. (The other section had at least a few white kids.)


I grew up poor and took AP classes. What does money have to do with it?


"...or that have a stable enough home environment to self-study"


I lived with friends throughout the last year of HS because my BPD mother kept kicking me out. I still took AP classes...


I had the luxury of time in my last move, which I used to go to a lot of open houses and talk to the attending realtor (which is not always the listing agent btw). Treat it like a behavioral interview and judge agreeableness and conscientiousness. Also, study basics of sales tactics so you know when they are being deployed on you. This will filter for honest, hardworking people in many situations.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: