More

flutas · 2026-02-22T08:16:30 1771748190

While others will point to hardware or local LLMs or such IMO the biggest reason...

Because it's the easiest way to give "claw" iMessage access and that's the primary communication channel for a lot of the claw users I've seen.

flutas · 2026-02-17T21:28:52 1771363732

Yup as context, in the same time Waymo had 101 collisions according to the same NHTSA dataset.

ra7 · 2026-02-17T22:11:51 1771366311

Waymo drives 4 million miles every week (500k+ miles each day). Vast majority of those collisions are when Waymos were stationary (they don’t redact narrative in crash reports like Tesla does, so you know what happened). That is an incredible safety record.

harmmonica · 2026-02-17T21:51:50 1771365110

Is this the same time or the same miles driven? I think the former, and of course I get that's what you wrote, but I'm trying to understand what to take away from your comment.

flutas · 2026-02-17T21:27:59 1771363679

For everyone's context, in the same time Waymo had 101 collisions according to the same dataset.

free652 · 2026-02-17T22:31:34 1771367494

What dataset? Isn't the article clearly specified a different number?

Your context sucks, and it's good as a lie.

>Waymo reports 51 incidents in Austin alone in this same NHTSA database, but its fleet has driven orders of magnitude more miles in the city than Tesla’s supervised “

maxdo · 2026-02-18T08:18:27 1771402707

you are talking about 5 incidents, this is not statistics. Its just a fluctuation of random numbers, and random events like bus hits the taxi while idle. It's already 20% of your data is incorrect lol , since it's 1 out of 5.

So far , you can clearly tell : 1. tesla works decent in a limited environment, no crazy patterns 2. It's a limited env that means nothing. Scale is still not there. They ned to prove themself.

flutas · 2026-02-15T08:30:44 1771144244

One of my earlier experiences with codex was actually reverse engineering, far before it was good at actual coding.

It was able to decompile a react native app (Tesla Android app), and fully trace from a "How does X UI display?" down to a network call with a payload for me to intercept.

Granted it did it by splitting the binary into a billion txt files with each one being a single function and then rging through it, but it worked.

madeofpalk · 2026-02-15T14:23:07 1771165387

I heard about this and tried quite a bit to reverse engineer a decompiled binary from a big game to find struct/schema information but could never get anything useful.

flutas · 2026-02-08T23:21:05 1770592865

Working on reproducible test runs to catch quality issues from LLM providers.

My main goal is not just a "the model made code, yay!" setup, but verifiable outputs that can show degradation as percentages.

i.e. have the model make something like a connect 4 engine, and then run it through a lot of tests to see how "valid" it's solution is. Then score that solution as NN/100% accurate. Then do many runs of the same test at a fixed interval.

I have ~10 tests like this so far, working on more.

alexgandy · 2026-02-09T14:05:08 1770645908

Sounds really interesting. What are you using for the tests/reports?

sebastianconcpt · 2026-02-08T23:53:17 1770594797

Nice. Sounds like will converge to QA as a Service

flutas · 2026-02-08T04:55:36 1770526536

Reminds me of the "oopsie" by Reddit when they revealed Eglin Air Force Base as the "most addicted city."

RobRivera · 2026-02-08T05:02:33 1770526953

It takes 20mins to fet from base houseing to the gate, lord k ows what traffic is like by the causeways, its an hour of driving before you're anywhere worth being and then its a coin flip if its exciting, so its either the ft. Walton beach strip clubs or onbase recreation.

No wonder eglin is addicted hahaha.

But in all seriousness, there are teams of people on the data crunch side of things that seems like a pedestrian insight

flutas · 2026-02-08T00:35:39 1770510939

Waymo's remote drivers have literally caused accidents and we only know about it because journalists did digging. Waymo simply removed all details of the remote ops roles in it in the NHTSA reporting.

jfjfjfjffhfjfj · 2026-02-08T02:32:35 1770517955

What journalists should I look for to see that

flutas · 2026-02-09T01:35:58 1770600958

https://www.forbes.com/sites/bradtempleton/2024/03/26/waymo-...

The description there is

    In January, an incident took place where a Waymo robotaxi incorrectly went through a red light due to an incorrect command from a remote operator, as reported by Waymo. A moped started coming through the intersection. The moped driver, presumably reacting to the Waymo, lost control, fell and slid, but did not hit the Waymo and there are no reports of injuries. There may have been minor damage to the moped.

While the description in the official report to the NHTSA is (ID: 30270-6981)

    On January [XXX], 2024 at 10:52AM PT a rider of a moped lost control of the moped they were operating and fell and slid in front of a Waymo Autonomous Vehicle (Waymo AV) operating in San Francisco, California on [XXX] at [XXX] neither the moped nor its driver made contact with the Waymo AV.

    The Waymo AV was stopped on northbound [XXX] at the intersection with [XXX] when it started to proceed forward while facing a red traffic light. As the Waymo AV entered the intersection, it detected a moped traveling on eastbound [XXX] and braked. As the Waymo AV braked to a stop, the rider of the moped braked then fell on the wet roadway before sliding to a stop in front of the stationary Waymo AV. There was no contact between the moped or its rider and the Waymo AV. The Waymo AVs Level 4 ADS was engaged in autonomous mode.

    Waymo is reporting this crash under Request No. 1 of Standing General Order 2021-01 because a passenger of the Waymo AV reported that the moped may have been damaged. Waymo may supplement or correct its reporting with additional information as it may become available.

flutas · 2026-02-06T19:46:42 1770407202

They've also been seen driving directly into flood waters, with one driving through the middle of a flooded parking lot.

https://www.reddit.com/r/SelfDrivingCars/comments/1pem9ep/hm...

coolfox · 2026-02-07T01:18:42 1770427122

curious what your take away from that is given the announcement.

flutas · 2026-02-05T21:10:54 1770325854

It's a lot more iffy than that IME.

It's very happy to throw a lot into the memory, even if it doesn't make sense.

anupamchugh · 2026-02-06T05:05:56 1770354356

This is the core problem. The agent writes its own memory while working, so it has blind spots about what matters. I've had sessions where it carefully noted one thing but missed a bigger mistake in the same conversation — it can't see its own gaps.

A second pass over the transcript afterward catches what the agent missed. Doesn't need the agent to notice anything. Just reads the conversation cold.

The two approaches have completely different failure modes, which is why you need both. What nobody's built yet is the loop where the second pass feeds back into the memory for the next session.

flutas · 2026-01-29T20:54:08 1769720048

even as far as we know they aren't

The Waymo blog post refused to say the word "child", instead using the phrase "young pedestrian" once.

The Waymo blog post switches to "the pedestrian" and "the individual" for the rest of the post.

The Waymo blog post also consistently uses the word "contact" instead of hit, struck, or collision.

The Waymo blog post makes no mention of the injuries the child sustained.

The Waymo blog post makes no mention of the school being in close proximity.

The Waymo blog post makes no mention of other children or the crossing guard.

The Waymo blog post makes no mention of the car going over the school zone speed limit (17 in 15).

SauntSolaire · 2026-01-30T00:45:16 1769733916

The speed limit of a school zone in California is 25, not 15, which would explain why they didn't mention it.