This difference between medical board examinations and real world practice is something that mirrors my real-world experience too, having finished med school and started residency a year ago.
I’ve heard others say before that real clinical education starts after medical school and once residency starts.
That 80% of medical issues could be categorized as "standard medicine" with some personalization to the person?
residency you obviously see a lot of real life complicated cases but aren't the majority of the cases something a non resident could guide if not diagnose ?
A one line change, that took a decent amount of reasoning to get to for a large codebase, cost $3.57 just now. I used the o3 model. The quality and the reasoning was excellent. Cheaper than an engineer.
I wrote about some similar observations in the clinical domain -- I call it the "human -> AI reasoning shunt" [0]. Explicitly requesting an AI tool to perform reasoning is one thing, but a concern I have is that, with the increasing prevalence of these AI tools, even tasks that theoretically are not reasoning-based (ie helping write clinical notes or answer simple questions) can surreptitiously offload some degree of reasoning away from humans by allowing these systems to determine what bits of information are important or not.
This doesn’t necessarily apply to this particular offering, but having working in clinical AI previously from a CS POV and currently from as a resident physician, something I’m a little wary of is the “shunting” of reasoning away from physicians to these tools (implicitly). One can argue that it’s not always a bad thing, but I think the danger can lie in this happening surreptitiously by these tools deciding what’s important and what’s not.
I wrote a little bit more of my thoughts here, in case it’s of interest to anyone: [0]
On that same vein, I recently made a tool I wrote for myself public [1] - it’s a “copilot” for writing medical notes that’s heavily focused on letting the clinician do the clinical reasoning, with the tool exclusively augmenting the flow rather than attempting to replace even a little bit of it.
Does anyone know how this “user decides how much compute” is implemented architecturally? I assume it’s the same underlying model, so what factor pushes the model to <think> for longer or shorter? Just a prompt-time modification or something else?
Does anyone know, how "reasoning effort" is implemented technically - does this involve differences in the pre-training, RL, or prompting phases (or all)?
“A programming language is for thinking of programs, not for expressing programs you’ve already thought of. It should be a pencil, not a pen.” - from PG’s “Hackers & Painters”
Thanks for trying it out! Honestly, no current idea of business model - this was just something I thought would be helpful in my daily workflow so I built it out.
The buttons are stored to localstorage, as is the editor text - everything stays locally (besides what is sent to the LLM). I'm planning on a simple import/export mechanism for transferring buttons across different browsers/computers!