dolphin3.0-llama3.1-8b Q4_K_S [4.69 GB on disk]: correct in <2 seconds
deepseek-r1-0528-qwen3-8b Q6_K [6.73 GB]: correct in 10 seconds
gpt-oss-20b MXFP4 [12.11 GB] low reasoning: wrong after 6 seconds
gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3 minutes !
Yea yea it's only one question of nonsense trivia. I'm sure it was billions well spent.
It's possible I'm using a poor temperature setting or something but since they weren't bothered enough to put it in the model card I'm not bothered to fuss with it.
I think your example reflects well on oss-20b, not poorly. It (may) show that they've been successful in separating reasoning from knowledge. You don't _want_ your small reasoning model to waste weights memorizing minutiae.
> gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3 minutes !
To be fair, this is not the type of questions that benefit from reasoning, either the model has this info in it's parametric memory or it doesn't. Reasoning won't help.
Not true:
During World War II the Imperial Japanese Navy referred to Midway Island in their communications as “Milano” (ミラノ). This was the official code word used when planning and executing operations against the island, including the Battle of Midway.
I've had BCBS reject prescriptions they've already pre-approved multiple times. First time, they denied all knowledge of the REQUEST (when my doctor has their APPROVAL fax in hand). Second time they acknowledged they had approved it but had to contact their pharmacy group to put in an "override".
I've submitted a complaint with the state insurance regulator though I doubt it goes anywhere.
Anyone for a class action lawsuit on the grounds of bad faith breach of contract and medical malpractice for obstructing access to care they already admit is medically necessary (by denying something already pre-approved)? I don't even want money. I want a Consent Decree enforced by the court that strikes fear across their whole industry.
Audio record every interaction you have with insurance and tell 'em you're on a recorded line.
> medical malpractice for obstructing access to care they already admit is medically necessary
This has been tried before and failed. The insurers argument is that they are not denying care, they are just not willing to pay for it, which isn’t practicing medicine.
As we went from zero to 10K+ embedded systems (full PCs with significant RAM) the issues got weirder.
The best was a one-off error log along the lines of "unknown type System.DateTime". Huh? That's a system defined type that just went missing. Never saw it again.
Another at a different employer was a crash that occurred after a check condition that absolutely should have gated the crash from being reached. Single threaded. Simple microcontroller. Had to reflash it to flip the bit back. After doing the math on how much RAM we had in the wild vs. cosmic bit flip rates reported in super computers, we had to expect one flip per year.
If it's a safety critical system, server or not, use ECC RAM!!
I am of the opinion that far more than safety critical systems should use ECC. You should use ECC anytime bit flips might cost you more money then the ECC does, which is why I insist on ECC for my desktop computers.
And the effect will be the opposite of the intent - now only students wealthy enough to afford private tutoring or that have a stable enough home environment to self-study can pass AP exams. Wonder what races those students will be?
I'm admittedly talking my own book, having come from the lower end of middle class. I was also the runt of the litter and at the bottom of the social pecking order. Students of all races enjoyed looking down on me to feel better about their own situation.
I retired from working for other people at age 40. I credit gifted/AP courses in 8-12th grade for a significant portion of that. Did my racial background still advantage me? You bet. I had a stable home environment and low crime neighborhood.
Perhaps we should focus on how to offer children stable home environments and low crime neighborhoods.
I can't speak for other cultures, but non-wealthy first generation Asian immigrant parents consciously and constantly sacrifice their own well-being to afford tutoring and extra curricular for their kids.
That's certainly what I experienced at Lowell High, a San Francisco "test school."
I recall that in my AP Chemistry class, 100% of the students were Asian. Nothing kept non-Asian students from taking the class, except the desire to work hard and be there. (The other section had at least a few white kids.)
I had the luxury of time in my last move, which I used to go to a lot of open houses and talk to the attending realtor (which is not always the listing agent btw). Treat it like a behavioral interview and judge agreeableness and conscientiousness. Also, study basics of sales tactics so you know when they are being deployed on you. This will filter for honest, hardworking people in many situations.
Answer on Wikipedia: https://en.wikipedia.org/wiki/Battle_of_Midway#U.S._code-bre...
dolphin3.0-llama3.1-8b Q4_K_S [4.69 GB on disk]: correct in <2 seconds
deepseek-r1-0528-qwen3-8b Q6_K [6.73 GB]: correct in 10 seconds
gpt-oss-20b MXFP4 [12.11 GB] low reasoning: wrong after 6 seconds
gpt-oss-20b MXFP4 [12.11 GB] high reasoning: wrong after 3 minutes !
Yea yea it's only one question of nonsense trivia. I'm sure it was billions well spent.
It's possible I'm using a poor temperature setting or something but since they weren't bothered enough to put it in the model card I'm not bothered to fuss with it.