As someone who works in the industry (disclaimer these are my own views and don't reflect those of my employer), something about the framing of this article rubs me the wrong way despite the fact that it's mostly on point. Yes it is true that different companies are choosing different sensing solutions based on cost and the ODD in which they must operate. But I think this last sentence just left a sour taste in my mouth "But the verdict is still out as to which is safer".
It is not an open question and I hate it when writers frame it this way. Camera only (specifically, monocular camera only) systems literally cannot be safer than ones with sensor fusion right now. This may change in the future at some point, but it's not a question right now it is a fact.
Setting aside comparisons to humans for a second (will get back to this), monocular cameras can only provide relative depth. You can guess the absolute depth with your neural net but the estimates are pretty garbage. Unfortunately, robots can't work/plan with this input. The way any typical robotics stack works is that it relies on an absolute/measured understanding of the world in order to make its plans.
That isn't to say that one day with sufficiently powerful ML and better representations we would be totally unable to use mono (relative) depth. People argue that humans don't really use our stereoscopic depth past ~10m or so and that's a fair point. But we also don't plan the way robots do. We don't require accurate measurements of distance and size. When you're squeezing your car into a parking spot you don't measure your car and then measure the spot to know if it'll fit. You just know. You just do it. And it's a guesstimate (so sometimes humans make mistakes and we hit stuff). Robots don't work this way (for now), so their sensors cannot work this way either (for now).
Self driving isn't a sensor problem, its a software problem.
From how humans drive, its pretty clear that there exists some latent space representation of immediate surroundings inside our brains that doesn't require a lot of data. If you had a driving sim wheel and 4 monitors for each direction + 3 smaller ones for rear view mirror, connected to a real world car with sufficiently high definition cameras, you could probably drive the car remotely as well as you could in real life, all because the images would map to the same latent space.
But the advantage that humans have is that we have an innate understanding of basic physics from experience in interacting with the world, which we can deduce from something simple as a 2d representation, and that is very much a big part of that latent space. You wouldn't be able to drive a car if you didn't have some "understanding" of things like velocity, acceleration, object collision, e.t.c
So my bet is that just like with LLMs, there will be research published at some point that given certain frames in a video, it will be able to extrapolate the physical interactions that will occur, including things like collision, relative distances, and so on. Once that is in place, self driving systems will get MASSIVELY better.
It's both. Your eyes have much better dynamic range and FPS than modern self driving systems & cameras. If you can reduce the amount of guessing your robot does (e.g. laser says _with certainty_ that you'll collide with an object ahead), you should do it.
Self-driving is still a robotics problem, and robots are probablistic operators with many component dependencies. If you have 3 99% reliable systems strung together running 24 hours a day, that's 43 minutes a day that it will be unreliable ((1 - .99^3)*1440). Multi-modality allows your systems to provide redundancy for one another and reduce the accumulating correlated errors.
Check out this NOVA video on how limited your acute vision actually is. It is only by rapidly moving our eyes around that we have high quality vision. In the places you are not looking your brain is computing what it thinks is happening, not actually watching it.
I should have said eyes+brain in combination have much better dynamic range and FPS perception than self driving systems. Point remains unchanged -- what sensor you use is tied to the computation you need to do. What you see is the sum of computation+sensor so it's impossible for sensor not to matter.
Tangential: event cameras work more like our eyes but aren't ready for AVs yet.
It's only "kind of" if they compensate for the reduced specs. As the root commenter said, they don't compensate yet. It's just less safe in those situations.
Whether it's fine to be less safe in certain situations because it's safer overall is a different question.
> Your eyes have much better dynamic range and FPS than modern self driving systems & cameras. If you can reduce the amount of guessing your robot does (e.g. laser says _with certainty_ that you'll collide with an object ahead), you should do it.
You could drive fine at 30fps on a regular monitor (SDR). More fps would help with aggressive/sporty driving of course.
> You could drive fine at 30fps on a regular monitor (SDR). More fps would help with aggressive/sporty driving of course.
What? This is preposterous.
Have you tried playing a shooter video game at 30 FPS? It's atrocious, you get rekt. There is a reason all gamers are getting 120 FPS and up.
30 FPS means 33 ms of latency. Driving on a highway, car moves over a meter before the camera even detects an obstacle. The display has it's own input lag, so does the operating system. Your total latency is going to be over 100ms, so the car will have travelled several meters. If a motorcyclist in front of you falls, you will feel the car crashing into his body before the image even appears on the screen.
There's plenty of FPS racing games that you can play just fine at 30FPS. Obviously more FPS is a better experience, but it's not like it becomes impossible to drive.
Also, if you truly are only a few meters behind a motorcyclist when driving at highway speeds, by definition you are being unsafe. The rule I learned in driving school was roughly 1 car length per 10mph of space, so you should be ~90 feet (~30 meters) away.
Finally, the average reaction time for people driving in real life is something like 3/4 of a second. 750ms to transition from accelerating to braking. A self-driving car being able to make decisions in the 100ms time frame is FAR superior.
I agree this is preposterous but one nit to pick: event loops on self driving cars are really that slow, and they must use very good behavior prediction + speculative reasoning to deal with scenarios like the one you described.
Have you tried doing this in the dark? Have you tried spotting the little arrow in the green traffic light that says you can turn left, consistently, in your video feed even facing a low sun?
Only if that monitor was hooked up to a camera that could dynamically adjust its gain to achieve best possible image contrast in everything from bright sunlight to moonlit night.
You’d also lose depth perception entirely, which can’t be good for your driving.
You can test this pretty easily, it's not like that model doesn't exist. Play your average driving videogame at 30fps in first-person mode. Crank up the brightness until you can barely see if you like. We do it just fine because the model exists in our head, not because there's some inherent perfection in our immediate sensing abilities.
Yeah. I mean you're right and wrong at the same time imo. I won't hypothesize about how humans drive. I think for the most part it's a futile exercise and I'll leave that to the people who have better understanding of neuroscience. (I hate when ML/CS people pretend to be experts at everything).
That being said, this idea of a latent space representation of the world is the right tree to be barking up (imo). The problem with "scale it like an LLM" right now is that 3D scene understanding (currently) requires labels. And LLMs scale the way they do because they don't require labels. They structure the problem as next token prediction and can scale up unsupervised (their state space/vocabulary is also much smaller). And without going into too much detail, myself (and others I know in this field) are actively doing research to resolve these issues so perhaps we really will get there someday.
Until then however. Sensors are king, and anyone selling you "self-driving" without them is lying to you :)
I think you may be over-indexing on the word "selling". I didn't mean it literally as in for sale to you (the customer) directly. That is what Tesla FSD is claiming and I agree with you that we're some indeterminate amount of time away from it.
However Waymo, Cruise and others do exist. If you haven't already, check out JJRicks videos on YouTube. I think you might be changing the number of years in your estimation ;)
Each time I see functional FSD it is in a very specific and limited scope. Simple thing that ultra precise maps, low speed, good roads, suitable climate, and a system that can just bail and stop the car are common themes. I would also be interested to hear if places with waymo have traffic rules where pedestrians/cyclists have priority without relying on traffic signs.
> if you had a driving sim wheel and 4 monitors for each direction + 3 smaller ones for rear view mirror, connected to a real world car with sufficiently high definition cameras, you could probably drive the car remotely as well as you could in real life, all because the images would map to the same latent space.
I disagree. When in a car, we are using more than our eyes. We have sound as well, of course, something that we provide feedback even in the quietest cars. We also have the ability to feel vibration, gravity and acceleration. Sitting in a sim without at least some of these additional forms of feedback would be a different skill.
There was an even where they took the top iRacing sim driver and put him in a real F1 car and he was able to do VERY well in terms of lap times.
There was another even where they took another sim driver and put him in a real drift car, and he was able to drift very well.
Both vids are on youtube. Yes, real world driving has more variables, and yes, the racing drivers had force feedback wheels, but in general, if a person is able to control a car so well as to put the virtual wheel in the right square foot of the virtual track to take a corner optimally, its probably likely that most people could drive very well solely from visual feedback. Sound and IMUs can provide additional correctional information, but the key point still remains, is that whatever software runs has to deduce physics from visual images.
I recommend watching this NOVA video on human perception. When doing any number of task, especially ones we do commonly we're using a ton of unconscious perception and prediction based upon our internal representation of physics and human modeling.
For example when I was younger I was noticing that I was commonly aware that a car was going to get over before it did so. I kept an eye out trying to determine why this was the case and I noticed two things. One is people commonly turn their head and check the mirrors before they even signal to get over. The other is they'll make a slight jerk of the wheel in the direction before making the lane change.
This assertion: Self driving isn't a sensor problem, its a software problem. is hard to support today. Your human vision analogy leaves out a lot of both sensor and processing differences between what we call machine vision and human vision.
Even if parity with human vision can be attained, humans kill 42,000 other American humans each year on the roads. If human driven cars were invented today, and pitched as killing only 42,000 people per year, the inventor would get thrown into a special prison for supervillains.
Not much would change. The idiotic idea of removing traffic lights in favor of self driving cars zipping past each other forgets about those pesky pedestrians we should be designing cities for.
When I wrote the comment, I was envisioning the current world, but with some bluetooth type protocol that cars could use to send beacons to help other cars near it.
The most basic example of how this could be helpful is if the car ahead of you turns a sharp corner and crashes into a truck stopped in the road. Without car-to-car networking, you won't brake until the crash is in your line of sight.
Have you ever seen those youtube videos of massive car pile ups on highways caused by a crash, and then a cascade of additional crashes afterwards? E.g. icy conditions or dense fog. What if the original crash could communicate to cars behind it, wouldn't that be helpful if the crash isn't yet in the driver's (or car's) line of sight?
I agree "not much would change" overnight. It's just another input for the car's software to have at its disposal.
With the current hardware on the roads, I don't think it's technically possible for autos to achieve legitimate self-driving (if that's even the goal anymore?) - there are way too many edge cases that are way too difficult to solve for with just software.
And what happens if there is a child on the road? Or are we going to need implanted transmitter chips in the future, so we can safely go outside and are not overrun by „smart“ cars?
Even if every car is required to be part of the network, there may be badly maintained cars that don’t work properly, or even malicious cars, that send wrong data on purpose.
Something more is necessary if "self-driving" is going to actually live up to its name at some point in the future, and I don't think the answer is 100% software.
At this point it's all about edge cases. Certain edge cases are impossible to overcome with just software + cameras alone.
Most humans can drive fairly well in heavy downpour, solely from the brake lights of the car and occasional glimpses of road markings. Thats almost equivalent to a very poor sensor suite.
For this to work, either (1) the network has to be reliable, and all cars have to be trustworthy (both from a security and fault tolerance perspective), or (2) the cars have to be safe even when disconnected from the network, such as during an evacuation.
We already know for sure that we can’t solve (1), which means we have to solve (2). Therefore, car-to-car communication is, at best, a value add, not the enabling technology.
> Imagine if Car A could improve its own understanding of the environment using inputs/sensor data from nearby Car B.
You can't rely on this in real time because urban canyons make it hard to get consistent cell signal (for one thing), but you can definitely improve your models on this data once the data's been uploaded to your offline systems, and some SDC companies do this.
A system of this sort could use some local area networking (think infrared, RF, or even lasers) to create an adhoc mesh network. It's how I imagine cars in the future to be networked at least.
Monocular cameras are a strange strawman. Is anyone seriously considering them?
Binocular cameras provide absolute depth information, and are an order of magnitude cheaper sensors than the other options.
Since this technology is clearly computationally limited, you should subtract the budget for the sensors from the budget for the computation.
According to the article, the non-camera sensors are in the $1000’s per car range, so the question becomes whether a camera system with an extra $2000 of custom asic / gpu / tpu compute is safer than a computationally-lighter system with a higher bandwidth sensor feed.
I’m guessing camera systems will be the safest economically-viable option, at least until the compute price drops to under a few hundred dollars.
So, assuming multi-camera setups really are first to market, the question then is whether the exotic sensors will ever be able to justify their cost (vs the safety win from adding more cameras and making the computer smarter).
It is not a strawman. Tesla FSD, in all forms, exclusively uses monocular cameras.
As seen on their website [1], and confirmed numerous times, they have monocular vision around the car and, though having three front-facing cameras, they each have different focal lengths and are located next to each other and thus can not operate as binocular vision.
At the risk of stating the obvious, stereovision in practice has a few interesting challenges. Yes, the main formula is deceptively simple: d = b*f / D (d - depth, D - disparity, b - baseline, f - focal length), but in practice, all 3 terms on the right require some thinking. The most difficult is D - disparity, it usually comes from some sort of feature matching algorithm, whether traditional or ML-based. Such algorithms usually require some texture surfaces to work properly, so if the surface does not have "enough" texture (example would be a gray truck in front of the cameras), then the feature matching will work poorly. In CV research there are other simplifying assumptions being made so that epipolar constraints make the task simpler. Examples of these assumptions are coplanar image planes, epipolar lines being parallel to a line connecting focal points and so on. In practice, these assumptions are usually wrong, so you need, for example, to rectify the images which is an interesting task by itself. Additionally, baseline b can drift due to changes in temperature and mechanical vibrations. So is the focal length f, so automatic camera calibration is required (not trivial).
Don't forget some interesting scenarios like dust particles or mud on one of the cameras (or windshield if cameras are located behind the windshield) or rain beading and distorting the image thus breaking the feature matcher and resulting disparity estimates.
Next, to "see" further, a stereo rig needs to have a decent baseline. For example, in a classic KITTI dataset, the baseline is approximately 0.54m which is much larger than, for example, human eyes (0.065m). Such baseline, 54cm, together with focal length, which, if I remember correctly, is about 720px in case of KITTI vehicle cameras, would give about 388m in the ideal case of being able to detect 1 pixel disparity. But detecting 1px of D is very difficult in practice - don't forget you will be running your algo on a car with limited compute resources. Say, you can have around 5px of D, that means max depth of around 77m - comparable to older Velodyne LiDARs.
Some of the issues I mentioned are not specific to stereovision (e.g. you need to calibrate monocular cameras as well and so on), just wanted to point out that stereovision does not magically enable depth perception. The solution would likely be a combination of monocular and stereo cameras, combined with SfM (Structure from Motion) and depth-from-stereo algorithms.
Isn't binocular information only useful for objects 10m ahead or closer? At least according to Hacker News, the most reliable source of information on the internet: https://news.ycombinator.com/item?id=36182151
This paper suggests that human vision maintains stereopsis much further out than many researchers have thought: “Binocular depth discrimination and estimation beyond interaction space” https://jov.arvojournals.org/article.aspx?articleid=2122030
They measured out to 18m & point out that the typical measured limits of angular resolution of the human eye mean that we could extract stereo image information out to 200m or more.
This paper claims to demonstrate stereopsis out to 250m, which is roughly the limit you’d expect from typical human visual acuity: “Stereoscopic perception of real depths at large distances” https://jov.arvojournals.org/article.aspx?articleid=2191614
It seems that the claim that stereo vision only occurs in the near field case is probably wrong? Human stereo vision is much more capable than that & if it reaches out to significantly > 20m is surely being used when driving?
I think binocular depth resolution is roughly proportional to the space between the cameras. A car hood is much wider than a human head. I’m not sure how far you can push that without hitting issues with close up stuff.
> According to the article, the non-camera sensors are in the $1000’s per car range.... I’m guessing camera systems will be the safest economically-viable option, at least until the compute price drops to under a few hundred dollars.
Human life is worth about $10 million (in US), that's a bit more than the sensor does. If one in 10,000 of camera-only car causes deaths, then it's not economically viable.
A London bus costs about $300,000, it is economically viable. Why is $1,000 sensor a problem. It is definitely viable to be installed on busses and trucks. Maybe you need to get out of the mindset of personal cars. It is not a viable business model, and it is not a viable model of dealing with congestion either.
If the car fleet kills half as many people as a human with a $1000/car system, and zero people with a $100,000 system, then we should immediately put the $1000 system on 100 times as many cars as we could put the $100,000 system on. (I picked those numbers because that is the order of magnitude range I have heard self driving car companies quote.)
The $100,000 system would only ever make sense if the fleet was already entirely self driving, and money for other life saving stuff like the environment and health care also hit suitable diminishing returns. Of course, by then, the cheap systems will have improved.
This argument holds for any non-negative dollar value you place on human life.
It is also independent of who owns the vehicles. Money the bus fleet spends in expensive self driving pulls money away from bus stop upgrades, pollution controls, etc, etc.
As far as I know humans can also safely drive only with one eye. It’s perfectly legal in most countries.
But I agree that current software (Tesla?) is not able to do that in the same way. So it may need more sensors until the software gets better.
In theory cameras should also be able to see more than humans. They can have a wider angle, higher contrast, higher resolution and better low-light vision than the human eye.
A human with one eye can use slight head movements and eye to gain a sense of depth. Perhaps the mono cameras need some kind of mount that allows them to not only look around but also move in 3 dimensions. That seems more complex than just having binocular cameras, though.
Yup! This kind of reconstruction is known as multi-view reconstruction. Though the cameras don't need to have a movable mount, they're already on a car which moves! The car moves and gives them a new "perspective" at every frame. That's how some monocular systems already work. Here's an example of one such system: https://github.com/nianticlabs/manydepth
That said, I think what you're referring to is more extreme perspectives that shift in ways the car cannot drive and you are correct that this would aid in reconstruction. This is how NERF models do their 3D reconstruction (https://nerfies.github.io/).
> monocular cameras can only provide relative depth
While the environment awareness is nowhere near as good as two or more cameras would be, if you consider the output over time, you get valuable information about the change rate of the environment, i.e. how fast that big thing is getting bigger, which may indicate one should actuate the brakes.
Of course, I'm with the crowd that answers the question with a "how many can we have?" question. The more, the merrier. And the more types, the better - give me polarized light and lidar, sonar, radar, thermal, and whatever else that can be plugged in the car's brain to make it better aware of what happens (and correctly guess what's going to happen) outside it.
I am just a hobbyist but I can answer some of these.
> Whats the difference between a “binocular” camera and two “monocular” cameras?
For the camera itself, nothing. They are probably referring to the implementation. You can have two cameras side by side but unless you are using homography to estimate depth from the two images, then your setup is monocular.
> How does a “binocular” camera get better depth information?
A pixel in two images (with known separation) will have a geometric relationship that can be used to extract depth information. This is a lot faster than alternative methods with a single camera and multiple images.
> Is using multiple cameras to drive sensor fusion?
This is really just a question of semantics.
> Why is absolute depth a strict safety win?
Why is it better to have two eyes than one? You can be more certain about what you are seeing.
> If this is just a handwavey upper bound on safety, how do you know that such a system can’t be safe enough for its design goals?
If you had a system with infinite compute you could probably do enough math to calculate absolute depth with 100% certainty. I believe you can already extract absolute depth with something called bundle adjustment-- but it requires multiple images since you are relying on parallax effects. It is also computationally expensive.
> If humans with only one eye are able to drive, why wouldn’t mono surround vision be at least as good as that?
> Why is absolute depth a strict safety win? How do you know how the sensor details translate to the final safety of the full system?
If you can get reliable depth information, the algorithm needed to avoid hitting stationary and slow-moving objects is extremely simple.
Is the stationary object in our path, of nontrivial size, and about to enter our minimum stopping distance? If yes, do we have a swerve planned that will let us safely avoid it? If no, emergency stop.
Because this logic is simple and well defined you can audit the implementation to the high standards applied to things like aircraft autopilot systems.
And it'll work even if the stationary object is something that didn't appear in your training data - you know the algorithm will work the same even if that concrete barrier is painted with some cheery flowers, or if that fire truck is airport yellow instead of the normal red.
Of course, this relies on the assumption you can get reliable depth information. If your depth sensor gets confused by a cloud of dust while driving in the desert, or gets blinded by the light of the setting sun, or is unable to detect a barbed wire fence, things are no longer quite so simple....
> If this is just a handwavey upper bound on safety, how do you know that such a system can’t be safe enough for its design goals?
Personally I would say that in freeway driving, a self-driving car should be able to avoid 100% of collisions with clearly visible stationary objects in dry, well lit conditions when all system components are in normal working condition.
... and I want to grab the guy who says that by the collar and scream in their face "The whole point is to build something that can do better than a human."
I think I said this already in one of my comments but I'm not a neuroscientist and I don't claim to be. That's why I think it's kind of pointless and silly for me (or any other engineer) to sit here and make arguments about what humans do and don't do in their brains.
IMO it's better for us to focus on what the robots can and cannot do right now, and focus on solving those problems :)
Thanks for the sources though, those papers are definitely neat and I'll be taking a look when I get a chance.
I am implementing Monocular vSLAM as a side project right now. I am working with some optimization libraries like GTSAM but having some issues. Do you know any good resources for troubleshooting this kind of stuff?
It's pretty easy to see, even as someone with very little experience, the benefits of stereo vision over monocular. In addition to the depth stuff it's a lot easier/faster to create your point clouds from disparity maps.
> You can guess the absolute depth with your neural net but the estimates are pretty garbage.
I'm not sure what kind of systems you're referring to with "monocular cameras", but if you look at the visualization in a Tesla with FSD Beta, it's actually really good at detecting the position of everything. And that's with pretty bad cameras and not a lot of compute.
Only rarely you'll see Tesla's FSD mess up because of perception, the vast majority of times they mess up is just the software being dumb with planning.
Let’s say you are driving down the street in a suburban neighborhood. You see a kid throw a ball into the street. You see from how his body moved that it is a lightweight ball and that it doesn’t require drastic (or any) measures to avoid. Or you see that it is a very heavy object and requires evasive maneuvers.
How exactly does a certain type of sensor help with this? Isn’t the problem entirely based on a software model of the world?
> Setting aside comparisons to humans for a second (will get back to this), monocular cameras can only provide relative depth. You can guess the absolute depth with your neural net but the estimates are pretty garbage.
Stereoscopic vision in humans only works for nearby objects. The divergence for far away objects is not sufficient for this. You may think you can tell something is 50 or 55 meters away through stereoscopic vision, but you can't. That's your brain estimating based on effectively a single image.
That said reality is not a single image, it's a moving image, a video. Monocular video can still be used to estimate object distance in motion.
Eventually AI will be good enough to work better than humans with just a camera. The problem is we're not there yet, and what Tesla is doing is irresponsible. They should've added LIDAR and used that to train their camera-only models, until they're ready to take over.
The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla... not sure what to take from that. Makes me think what an Apple Car would be like.
My argument on autonomous driving is that it can't just match the safety of the average human driver. It needs to be 100-1000x better, unquestionably better for everyone, always. Until that happens it's a dead end. I think that's probably only achievable with LIDAR, which will come down in price as volumes ramp up.
The thing that surprises me is that Tesla has invested so much into various hardware technology (even their own silicon), but completely ignored LIDAR. With their resources, volume and investment, they could have reduced the cost to a tenth of what it is now. They go for big challenges, but decided not to try on that one.
> The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla...
IIRC a number of cars, even Subaru, have better cameras than a Tesla. Really, the cameras Tesla included on the Model 3/Y are fairly mediocre off-the-shelf hardware from years ago. But they do have more of them pointed in different directions, so there's that.
> IIRC a number of cars, even Subaru, have better cameras than a Tesla
Are you saying this from datasheets, or looking at the camera images?
Tesla cameras aren't optimized for humans, their Bayer matrix is likely designed for day and night machine vision including infrared, so the colors might look washed out to you. I heard at one point they were using RCCC and later moved to RCCB [1]. If I'm not mistaken C allows all light, including infrared. Cameras designed for human viewing are RGGB.
Dynamic range isn't a simple problem to solve depending on how it's mapped to the screen. If the mapping isn't done well, then it could look much worse than what the camera can 'actually see.' Glare and resolution concerns, though. certainly.
Yeah I had heard they had something in the pipeline, particularly on the Y, wasn't sure if it had actually rolled into production yet, and on what models.
> The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla
I'm sure it does, but that's also because you need much greater degrees of precision and accuracy to anchor virtual content in the real world. Inches and centimeters matter here, whereas an autonomous vehicle can get away with coarser measurement.
> The new Apple Vision Pro arguably has a better set of cameras, sensors and signal processing than a Tesla
And it costs about 9% of what a new Model 3 would run you. It's also coming to market almost 5 years after the Tesla HW3 stack that runs almost all cars on the roads to which you're making the comparison.
> My argument on autonomous driving is that it can't just match the safety of the average human driver. It needs to be 100-1000x better, unquestionably better for everyone, always.
That sounds like maybe a marketing argument? You're saying that if it was only 2x better at some visceral metric (total traffic deaths, say) that you think people wouldn't accept it and it would be regulated out of existence, either by liability concerns or actual lawmaking?
The counter argument is that those metrics are, after all, visceral. The US has 40k traffic fatalities per year. You really think the government is (or courts are) going to kill a technology that could save 20k people every year?
No. "Merely better" is the standard that will stick.
> I think that's probably only achievable with LIDAR
Everyone on your side of the argument does, but frankly this has become an argument of faith and not substance. LIDAR capability has become a proxy for "not Tesla", so people line up along their brand loyalties. For myself, LIDAR is actually the technology looking like it isn't panning out. It's not getting cheaper or more reliable. LIDAR autonomy isn't getting better nearly as fast as vision solutions are iterating.
Most importantly, LIDAR at its best still only gives you the 3D shape of the world, and that's not enough. LIDAR doesn't help with lane selection, it doesn't help with traffic behavior modelling, it doesn't help with sign reading. It doesn't help with merge strategies. All that stuff has to be done with vision anyway.
The "don't hit stuff" bar has, quite frankly, already been crossed. Tesla's don't hit stuff either. If you give them LIDAR, they.... still won't hit stuff. But all the work, even for LIDAR vehicles, is on the camera side.
I think the truth is somewhere between the 100x and 2x numbers. Here's why:
Let's say self-driving does reduce the number of deaths by half. Then there are different classes of people who ride in cars:
1. People who would have died without self-driving, and did in fact die with self-driving.
2. People who would have died without self-driving, and didn't because of self-driving.
If you leave it at that, you might think that of the 40K number you mentioned, with self-driving being 2x as safe we'd have 20K in class 1, and 20K in class 2. But that's not accurate. There is an additional class of people:
3. People who wouldn't have died without self-driving, and who died because of some failure of self-driving.
Since drivers are supposed to be ready to take control at any time, car manufacturers want public perception to be that there is no category 3 -- and with good reason, because:
The problem self-driving car manufacturers have is that people in category 2 are anonymous -- almost invisible. At the end of the year we can look at the totals and say, "yup, there are 20,000 people out there alive because of self-driving." But if the numbers are actually:
1. 19,000
2. 20,000
3. 1,000
Then people aren't going to be thinking about the 20,000 anonymous lives saved by self driving. They're going to be thinking of the 1,000 very obvious and named people in category 3, who would still be alive today were it not for the fact that their car drove them into a semi.
I don't know what ratio of category 2 to 3 is necessary for people to accept self-driving as a benefit, but I'm betting that 2x as safe isn't going to get category 3 low enough to make the case.
Yep -- I think that if self-driving cars prevented all accidents, but each year one self-driving car randomly drove at max speed into the side of a Walmart, people would reject that trade. I think of myself as fairly logical, and I'd have a hard time accepting that. The Trolley Problem is hard.
> For myself, LIDAR is actually the technology looking like it isn't panning out. It's not getting cheaper or more reliable.
This is just plain wrong. Lidar is getting cheaper by the year. They are now so cheap you’re starting see them in consumer vehicles from Volvo, Mercedes, Cadillac, Nio, XPeng, BYD, etc. Self driving companies have also slashed lidar costs drastically. Waymo, for example, had a 90% cost reduction for their 5th gen Lidar.
This is the opposite of a technology “not panning out”. It’s looking like Tesla is the outlier here because they made a bad bet.
> LIDAR autonomy isn't getting better nearly as fast as vision solutions are iterating.
Lidar autonomy is the only one proven to work without a driver, so this is a very strange take. We’re seeing robotaxi companies starting to expand driverless operations to big cities. Vision solutions are the only ones that still do not have a single driverless mile.
It seems like all your arguments fit into what you’re accusing others of — that you’re the one using lidar as a proxy for “not Tesla” and getting basic facts wrong.
> They are now so cheap you’re starting see them in consumer vehicles from Volvo, Mercedes, Cadillac, Nio, XPeng, BYD, etc.
I'm googling, and not finding a reference here. As far as I can tell not one of these manufacturers (or anyone else) is shipping a consumer vehicle with a LIDAR sensor. You're just citing press releases about plans, I guess?
2. Depth info from camera devices is proven sufficient. Again, Teslas don't hit stuff due to sensor failures, period (lots of nitpicking in this thread over that framing, here I'm being a bit more precise).
3. All the remaining hard problems are about recognition and decisionmaking, all of which gets sourced from vision data, not a point cloud.
The conclusion being that LIDAR isn't worth it. It's not giving significant advantages. If you were designing an autonomy system from scratch today, you wouldn't bet it on LIDAR (especially as there's no proven off the shelf device!).
> I'm googling, and not finding a reference here. As far as I can tell not one of these manufacturers (or anyone else) is shipping a consumer vehicle with a LIDAR sensor. You're just citing press releases about plans, I guess?
XPeng P5, XPeng G9, Nio ET7 are all shipping with lidar. You'll also see them in many other models starting this year: Mercedes S-class/EQS w/ Drive Pilot, Volvo EX90, Volvo XC90, Audi A6L/A7L/A8L/Q8, Polestar 3, etc. The list is growing [1].
> 2. Depth info from camera devices is proven sufficient. Again, Teslas don't hit stuff due to sensor failures, period (lots of nitpicking in this thread over that framing, here I'm being a bit more precise).
Teslas absolutely hit stuff. Plenty of examples in Reddit and YouTube about FSD hitting curbs, bollards and other things. We also don't know any accident numbers about Tesla because they actively hide it. They skirt CA DMV rules by not reporting it and have a very questionable methodology for what they consider an accident (they don't count it if airbags don't deploy).
> 3. All the remaining hard problems are about recognition and decisionmaking, all of which gets sourced from vision data, not a point cloud.
This is wrong, too. Object detection and decision making uses fused input from cameras, lidar and radar. They don't just use cameras. There's plenty of literature to look up on sensor fusion and how they're being used throughout the perception/behavior prediction stack.
> The conclusion being that LIDAR isn't worth it. It's not giving significant advantages.
Only if you lack the ability to do sensor fusion or if you have already falsely promised customers their cars are capable of fully autonomous driving and painted yourself into a corner. Only one company stands out here.
You didn't look very hard. The Mercedes 2022+ EQS, S class, Volvo XC90...
1. It's not really a contest. Lidar is measurably better.
2. Error bars in perception tend to accumulate throughout the stack and manifest as serious errors like bad velocity estimates for stationary or slow moving vehicles.
3. LIDAR helps recognition too. Planning and pathing are both helped by LIDAR.
Honestly, there aren't many other sensors on an AV with the same bang-for-the-buck as LIDAR. It's easily up there with cameras for "why would you even consider not having it?" for a modern driverless vehicle, which is why everyone operating driverless vehicles uses them (Waymo, Cruise, Mobileye, Baidu, etc).
> The "don't hit stuff" bar has, quite frankly, already been crossed. Tesla's don't hit stuff either. If you give them LIDAR, they.... still won't hit stuff. But all the work, even for LIDAR vehicles, is on the camera side.
You don't spend much time on the Tesla subreddits. The only reason there aren't more crashes is because most people seem to be reasonably good at keeping their hands on the wheel to take over. Every day people post videos of the wacky choices FSD makes. The most interesting part is that frequently the graphics on the screen suggest it can actually detect the correct lane markings and such, and still chooses to cut the corners and cross into other lanes.
Anybody who's being honest and has watched FSD from the beginning is under no illusions that real self-driving Teslas are happening anytime soon, or with the current hardware. It's not even close.
> You don't spend much time on the Tesla subreddits.
No, but I watch my car drive me around every day. This always devolves into an argument like this. I wanted to talk about LIDAR and limitations, not flame about collisions that woulda/maybe/coulda happened in funny internet videos.
Believe what you want. But if they aren't crashing they aren't crashing. And you won't get anyone to ban cars that don't crash.
It does always devolve into the same discussion, I agree. All of the evidence says it doesn't work well. One guy says "but it does for me!" And the discussion ends.
In some situations with well marked, simple road design, it works great. Give it a multiple-lane turn intersection to navigate and things get interesting. Give it a three lane turn where the outermost lane is fairly sharp, and it will just cut across the middle lane because it seems to have a built-in limit on how hard it can turn the wheel.
What's noteworthy is that it doesn't significantly improve with each revision. It'll get a little better, then regress, over and over, but still the fundamentals are unreliable. At some point maybe it becomes clear that the current strategy is topped out.
I do think it tends to be better behaved than AP on the highway, though, so I look forward to a stack merge there.
> Give it a multiple-lane turn intersection to navigate and things get interesting. Give it a three lane turn where the outermost lane is fairly sharp, and it will just cut across the middle lane because it seems to have a built-in limit on how hard it can turn the wheel.
Now you've digressed into navigation choices and not safety. And... I agree! This is exactly what I was talking about upthread. It's an issue for real improvement, and notably not something that can be helped by LIDAR at all. Tesla is actually way ahead here, as I see it. Waymo and Cruise operate in heavily constrained environments that disallow challenging maneuvers entirely.
(FWIW: that three-lane turn behavior you're talking about doesn't seem familiar. I don't see that with my car, and we have a few intersecions like that around here. In general multi-lane turns are marked, and the car is quite good about figuring it out.)
I thought we had gotten past that statistic? AP, in particular, gets used in what is already the safest driving situation there is. Those statistics can only be compared to human drivers in exactly that same narrow circumstance. If we were talking about FSD, then we'd need to see statistics where the human never touches the wheel; saving the car from it's own mistakes is cheating ;).
> Now you've digressed into navigation choices and not safety.
I think I understand what you are saying but I'm not sure I agree -- incorrectly navigating across the lane lines, especially when the car indicates it sees them, is definitely a safety problem in my view.
I expect that it's kinda like the silly wipers. If you live somewhere with the right kind of rain, you think "what is everyone complaining about?" but if you live in the PNW where we get a lot of misty rain, you end up cursing at the car regularly during the winter months. Some places have just the right roads for FSD, and those people think it's pretty great. But then other people get constant failures and wonder why the heck they spent 6, 12, or 15 grand on something that doesn't deliver.
I'm glad someone has a good experience with it, even when it doesn't match my own. But I also didn't buy it outright, so I just cancel the subscription after trying it again to see how things stand.
FWIW: the numbers on that site show a vehicle with an accident rate that is vastly safer than average. The site is sort of a joke. They launched it with the intent of proving a prior, still lead with that hypothesis as an assumption in their text, and... they couldn't find the goods!
In fact it's so bad that you just have to assume their data is wildly incomplete because the not at fault fatality rate for a car just being in traffic (i.e. Tesla drivers killed because someone else hits them) isn't correctly represented.
I don't doubt that someone behind "tesladeaths.com" has an agenda, but whatever their reasons for collecting this data, there are a ton of accidents involving Teslas which makes "they don't crash" entirely false.
Yes, some of those links involve teslas being hit by others, and many likely involve cases where the driver is totally at fault. Most of the links are just reports of crashes still under investigation.(this one for example can't even provide the time when the accident took place https://web.archive.org/web/20230304195927/https://www.bigco...)
Still, that's okay, because this isn't a scientific paper, it's just a website collecting data on accidents involving teslas. Each one would still have to be investigated to find out what the cause of the crash was before you could draw much from the raw numbers, but what we can tell unambiguously is that these vehicles do in fact crash. Here's a case where autopilot killed someone just a few months ago (https://abc7news.com/tesla-autopilot-crash-driver-assist-cra...)
> You really think the government is (or courts are) going to kill a technology that could save 20k people every year?
Have you heard of this miracle technology that is like 100x safer than cars, and kills zero people a year in most countries, and we've had it for 50 years? called high speed rail?
Covid is still killing more than 20K people a year, and no-one gives a damn. We have this miracle technology called ventilation, it reduces you chance of infection like 4x. You can buy air purifiers for $200 at Ikea and they work. You can install Upper-Room Ultraviolet Germicidal Irradiation, CDC has a whole page about them.
If saving 20K people would be the top priority, Nuclear power would replace coal in the 70's, air purifiers would be legally required in every office and coming into work with Covid would be a criminal offense.
> LIDAR autonomy isn't getting better nearly as fast as vision solutions are iterating.
Why do you believe this to be the case - is there a measurement or estimate I can see of how fast visual solutions are iterating?
> The counter argument is that those metrics are, after all, visceral. The US has 40k traffic fatalities per year. You really think the government is (or courts are) going to kill a technology that could save 20k people every year?
It'll be framed as a technology that will kill 20k people per year.
This isn’t true; LiDAR points provide an intensity value that provides information about surface material and ambient environment. Further, this information is influenced by angle and distance.
Lane lines and road signs in particular typically use reflective paints that are very easy to detect in LiDAR, but beyond that, you can approximate material composition of a LiDAR scene pretty easily.
Another big point you’re missing is that LiDAR can provide control points to photometric sensor fusion systems. While there are also purely photo based control point matching systems, they’re much more complex and require nontrivial offline preprocessing.
I don't disagree with the ultimate point (that an Apple Car might be expensive) but it wouldn't be for this reason. (Almost?) All cars come with some sensor/display package at this point. Let's say that costs, on average, $500. The Vision Pro costs $3500. So the MSRP of a Hyundai Ioniq 5 might go from $41,000 to $45,000. If that gets you a car with a ridiculously better set of sensors and displays, that hardly seems "very expensive" by comparison.
Again, I'm not arguing that Apple might not make a number of other design decisions that could lead to the car costing much more than that. I'm just saying that the cost of the Vision Pro sensors won't be the reason.
And to be clear: I don't think the Vision Pro implies that Apple would needlessly increase the price of a hypothetical car. The Vision Pro is as expensive as it is not because of some perceived "Apple Tax" but because Apple had a vision (ahem) of what they wanted to build, and it was at the very limits of technical possibility, and they chose to release it now (2024) instead of five years from now. For comparison, the original iPad, which was anticipated to cost a thousand dollars, shipped at half that price.
> Let's say that costs, on average, $500. The Vision Pro costs $3500. So the MSRP of a Hyundai Ioniq 5 might go from $41,000 to $45,000.
Ah. And then there is Tesla ditching an external temperature sensor and relying on weather info for the GPS location to save a few bucks. I think the automotive industry works on a completely different standard when talking about parts and margins.
Apple squeezes profits from their suppliers very aggressively. They operate very similar to the automotive industry there. Even more aggressive probably because they have a larger market share than any car OEM.
With all the due differences, Tesla's is the most Apple-esque approach to automotion to date. But yeah, Apple might beat that (although I think the car project was scrapped a few years ago now?)
I agree with you, but the product != Infrastructure. I don't want to overpay my taxes, drive one type of car, that refuels only at a particular fuel station, and it be a judge on when and where my car can drive, restrict me from visiting other states.
When I saw the R1 chip in there and the functions it has I immediately thought: 'This is a chip they are using for sensor fusion in their self-driving moonshot, it just happens to also work for the AR headset'.
Or perhaps eventually the other way around: the primary reason they had to build the R1 was the required extreme power effiency in headset. Using 25-50 watts for this is okay in an EV.
From my DARPA Grand Challenge days, a few comments:
You want a high-resolution LIDAR. The trouble is, they cost too much. That's due to the low volume, not the technology. A few years back, there were several LIDAR startups using various approaches. There were the rotating machinery guys, the flash LIDAR guys, the MEMS mirror guys, and some exotics. I was expecting the flash LIDAR guys to win out, because that approach has no moving parts. Continental, the big auto parts company, acquired Advanced Scientific Concepts, a LIDAR startup, and demoed a nice little unit. But nobody wanted enough of them to justify setting up full scale manufacturing. ASC sells some space-qualified units, and the Dragon spacecraft uses them for docking. Works fine, costs too much.
Another of the startups, Luminar, made its founder a billionaire without the company having shipped much product. "Luminar has not generated positive cash flows from operating activities and has an accumulated deficit of $1.3 billion as of December 31, 2022." - Wikipedia. They've announced many partnerships and have been at this since 2012, but you still can't order a LIDAR from their web site.
Velodyne makes those spinning things Waymo uses. They had the first automotive LIDAR, a big car-top spinning wheel, at the DARPA Grand Challenge. It fell off the vehicle. The technology has improved since, but it's still expensive. Velodyne just merged with Ouster, which sold similar spinners.
Ouster's web site has a "Order Now" button, but it leads to an email onboarding form, not a price list.
Many of these things need custom indium gallium arsenide detector chips, which would be far cheaper if the market was 100,000 a month instead of 100 a month. This technology hasn't scaled up in volume yet. That's one thing holding this back.
If you want coverage over a full circle, you either need one of those silly looking domes on top of the car roof, or multiple units, each with a narrower field of vision. Waymo has both. Works fine, costs too much.
On the radar side, there's been improvement in both processing and resolution. Not as much as expected over 20 years. Automotive radar has moved up from 24 GhZ to 77 GhZ, so resolution can now potentially be sub-centimeter. True millimeter radar (300GhZ and up) has been demoed [1] but not deployed beyond lab systems. Once you get up there, it's almost as good as LIDAR. Humans show up in detail. That's going to be a useful technology once it gets into commercial products.
So the sensor situation is pretty good, but expensive because it can't mooch off some other high volume technology such as cell phones.
You'd think that governments would have some stake in providing infrastructure specifically for this. Underground magnets and radios along important roadways...that kind of thing.
What I'm getting at is, there's no reason to assume that the current amount of information presented on a roadway (stripes, lights, stopsigns, etc.) is sufficient for autonomous driving. Or maybe it is, but adding a few more inexpensive things would make it that much easier to achieve.
When the road quality, paint, and signage is decent, modern self driving stacks really don't have any problem seeing it and using it. Effort should be focused on improving those so that everyone benefits.
If Tesla (or any self driving player, but the Tesla FSD stack drives a million miles/day across the US right now) published a road quality score, home buyers would quickly start paying attention and then home owners would quickly start paying attention. This would push forward the political will to fix the roads, at least in places where low road quality is driven by budgeting problems rather than outright poverty.
> When the road quality, paint, and signage is decent
True, but the problem is that they're often not. That said, I'd rather see infrastructure spending on maintaining the roads, paint, and signage, than adding features specifically for autonomous driving (that will also likely be unmaintained and become unreliable over time).
Exactly. We should only maintain the things that the people rely on, and build the cars to rely on those same things... at least for now. Once the cars are good enough that people are no longer in control of cars, then we can look at changing that standard to reduce the cost of infrastructure maintenance.
I think most people would argue that the taxes they're paying now should cover it and if that's not enough we could try things like diverting tax money from waste, eliminating earmarks, or maybe even start forcing corporations and billionaires to pay their fair share. People want nice roads and are willing to pay taxes for it, but the burden shouldn't fall entirely on the shrinking middle class.
Even "inexpensive" things spread over millions of miles of roadway gets pretty expensive. Let's start with the basics of reliable painting and safe road surface.
One reason to assume that the current amount of information presented on a roadway is sufficient is that humans can learn to drive in a matter of hours with only two vision sensors mounted on their heads.
> humans can learn to drive in a matter of hours with only two vision sensors
You use way more than two vision sensors in your head. But first, lets talk about those vision sensors. They are mounted on an articulated scanning platform with numerous degrees of freedom. The dynamic range is astounding, and that dynamic range can be easily augmented with shaded lenses that can be mounted and removed as needed via an extremely reliable pick-and-place mechanism. The vision sensors are really equipped with dual-sensing technology for day- and night-mode.
But, you also have extremely sensitive accelerometers in your ears. And force-sensing in your feet to measure applied acceleration/braking controls. And force-sensing in your hands to measure applied steering force, and measure road roughness. You also are sitting on a force sensor that serves as a back-up accelerometer. You have audio sensors that provide feedback about road surface, vehicle performance, and other agents sharing the road.
Relevant short and long-term memory of every 3d object you've ever seen along a road-side and what the expected behaviour of those objects should be. Does a road-sign usually move by itself? No, maybe someone is carrying it. Is a ball coming from behind a car in a school-zone likely to be chased by a child...
> Is a ball coming from behind a car in a school-zone likely to be chased by a child.
Yeah, that kind of agent prediction is a biggie. Agent prediction is something journalists don't talk about much, but is a huge research topic for AV practitioners.
And since you make note of expected behavior of objects... my all-time favorite bug report out of Waymo (that made the public news) is a classic classifier/predictor/planner corner case leading to deadlock. Classifier identified a cyclist and labeled as "stopped" if cyclist has foot on the ground, and "moving" if both feet on pedals. Predictor would plot a trajectory for moving cyclist, planner decides action based on predicted agent behavior. So... Waymo vehicle and cyclist on two corners of 4-way stop. Cyclist is stopped, doing a perfectly balanced track stand with both feet clipped in. Feet on pedals so cyclist is labeled as moving agent... planner yields right-of-way to cyclist according to rules-of-the-road. Deadlock -- nobody moves.
Isn't that how Waymo killed a pedestrian/cyclist? It couldn't decide if she was a cyclist or pedestrian, as she was walking and pushing her bicycle, so it never braked.
She was homeless so the shape of her bike wasn't a proper bike so it got confused.
Yes, Uber, operating in Level 3, and safety driver was watching a movie on phone at the time.
Very different circumstances. I don’t think this was a classifier defect. Not that I was ever a fan of Uber’s development program, but in this case I can’t fault the platform.
Except we don't, not really. We're still leveraging 15 or so years of experience living the real world and walking on sidewalks and crossing streets and observing how other humans drive cars and being passengers in cars before we ever try driving ourselves. Not to mention, some humans definitely don't learn to drive in a matter of hours, they need weeks if not months.
The US DoT estimated $340 billion lost in traffic accident damage and just over 42,000 people’s lives lost in 2021. I think we accept that largely because it is diffused across lots of people. Self driving cars would centralize some of this liability and so they have to be an order of magnitude safer before they are accepted to replace our current system. And the existing “system” of human brains has millions of years of R&D; I think anyone in the industry should be accepting any unfair advantages we can get in the name of safety.
After all does anyone care if the plane they are in is flapping its wings?
Plus the knowledge and intuition that, for example
Firetruck >_____________< Fire Hydrant
Something that looks flat but is actually a fire hose means you should stop instead of driving over it - this was actually issue in Phoenix(?)
A homeless lady was killed by a self-driving car. She was walking a loaded bicycle, at 2am, on a road that doesn't usually have pedestrians. The self-driving car couldn't decide if she was a bike or a human so it didn't brake and the human-driver backup had zoned out as the car had made 99.99% of the correct decisions.
A self-driving car is like a jigsaw puzzle - there's one way to do it right, and infinite ways for it to be completely or subtly wrong.
"What do you see" and "What does it mean"
Humans and cameras have the same "What do you see" input, but the "What does it mean" category hasn't been conquered by cars.
If a human screws up while driving and they end up in a wheelchair they have no one but themselves to blame, but if software puts someone in a wheelchair who otherwise wouldn't have been that person isn't going to care about "statistically better". They shouldn't care either. If carmakers want to take control away from a person, they had better do at least as well the person would 100% of the time.
It's not as if pedestrians are any safer. A self driving car that hits a person because it gets confused where a human wouldn't have is just as bad. When lives are on the line, taking control from the human requires that the machine must be at least good as the human would be in every situation or we're better off leaving humans in control with machines augmenting their abilities.
It's great that we have cars that can warn us when there's something in our blind spot, something behind us while we back up, or when we drift out of our lane. Until the software can perform as well as we can, any more than that is a liability.
> machine must be at least good as the human would be in every situation or we're better off leaving humans in control with machines augmenting their abilities.
That isn't quite right. There can be situations where the human is better, so long as in the majority of situations the machine is better. This is why I said it is a statistical question. I'm not demanding perfection, I'm asking for better than humans overall. Of course if the machine knows it is in a situation where it isn't as good it is fine to make the human drive.
We use a lot more than our eyes to drive, and people aren't trusted to start driving until they're in their teens - so closer to 15 years of training. And we're still not all that good at it.
This would never be feasible. To start, forget underground magnets and radios. If you just started with properly painted lines, always-functioning traffic signals, and correct street signs, autonomous driving would be much, much easier than it is now.
But the vast majority of the difficulty in autonomous driving is simply dealing with the messiness of the real world. For example, many of the most serious and notable autonomous driving crashes have been the result of faulty road conditions (e.g. https://www.theverge.com/2020/2/25/21153320/tesla-autopilot-... - in that case, the driver wasn't paying attention, but the fact that lane markers were worn down is what caused Autopilot to lose lane tracking, and the fact that crash attenuator in front of the barrier was damaged and not replaced contributed to the driver's death).
Government can't even keep decent lane lines on the roads, and thus I think an autonomous driving system that had to depend on magnets or radios (that failed if those were out) would be even more of a non-starter.
> If you just started with properly painted lines, always-functioning traffic signals, and correct street signs, autonomous driving would be much, much easier than it is now.
If you had this, driving period would be safer, automated or not. And that's before getting into inclement weather conditions where, say, lines or signage may not be visible.
I'd be happy to have autonomous driving in just a few roads, namely highways. I really do not understand why there is practically zero interest in getting me out of the wheel where it would be both the easiest and the most useful. I'm more than happy to drive the bits within the city of I just could rest in between. Both embedding some kind of track in the highway my car can follow or a wireless peloton technology behind trucks should be orders of magnitude easier problem to solve than full self driving.
Not necessarily; I find it doubtful a person can stay alert and ready to take the wheel at a moment's notice after the car has minded its own business for however long.
Hence, either /you/ drive, or /the car/ drives. If the car doesn't understand what is going on, it should stop. (Humans really suck at monitoring boring, monotonous processes. We cannot expect people to do so in a self-driving car for hour after hour on the off chance something happens.)
> I find it doubtful a person can stay alert and ready to take the wheel at a moment's notice after the car has minded its own business for however long
I don't mean so people can navigate; I mean so people can complain to a city or district when a road is improperly marked.
your microwave focuses the energy into a small space. traditional radar is broadcast at broad angles. It may be capable of burning things in the close vicinity of the transmitter, but you can burn yourself on an ICE in close enough vicinity as well.
Can you explain what we are looking at here? Why measure energy usage in kilowatt hours? Wouldn’t watts be a more useful way of measuring consumption? For it to make sense to me for something to use X kilowatt hours, you would have to tell me how long it took to consume that much electricity.
Another comparison could be, this is more power than some WW2 era air-to-air radar systems, which could see multiple miles (and signal processing has only gotten better since then).
Not to put too blunt a point on this metaphor, but a human’s eyes and visual cortex are not stereo cameras and the human brain is not a GPU/CPU despite some apparent similarities.
Nor is “professional commercial driver” a job that could be performed by a Neanderthal or a 6 year old student driver of today despite nearly identical hardware and software. I would never ride across Manhattan every day in a car driven by the latter two humanoids and I doubt most anyone would.
Ideally these systems should only be certified after performing demonstrably better than a teenage student driver given a series of the most complex, adversarial driving workloads in deteriorating conditions with disabled and malfunctioning sensors.
On the other hand, today’s billionaires may actually be close to achieving fully autonomous and acceptably failsafe non-human driving on today’s roads, and they might also be exploring Mars within my lifetime. Personally I wouldn’t bet my life on it.
These type of questions seem to show a misunderstanding of the nature of the problem being solved. Would anybody ask "How many eyes do you need to drive a car?".
Self driving is a difficult problem because you need to:
- Decipher the nature of the objects in the 3d space you are moving on.
- Predict how their physical relationship to your current position will change according to current timeline and intentions of these objects.
- Explore the plane of possible deviations for each of these objects and
be able to react accordingly.
> Would anybody ask "How many eyes do you need to drive a car?".
I think this is an oversimplification. People don't ask that question because humans have many many more senses they drive with than just our eyes. Humans even have accelerometers. If someone said "I'm deaf, blind in one eye, and can't sense acceleration" people would probably ask if they are fit to drive. Just being deaf is enough to start that conversation. The AARP even mentions that if someone is deaf, they should seriously consider not driving. The conversation of sensor fusion is important because humans require sensor fusion to be effective drivers too.
Anyone over 65 should seriously consider not driving, at least at night. Reaction times slow, night vision is poor/glare from oncoming headlights is worse. I'm not even 60 yet and I don't like driving at night anymore. The insanely bright, white, pinpoint headlights on modern cars make it that much worse.
I'm 50 and I can see just as well at night as I can in the daytime. Presbyopia means I cannot clearly read the instrument cluster without my glasses on, but I can read the text message the guy three cars in front is sending. Not that you should really need to read much on the gauges anyway - all the things that are going to get you into an accident are outside the car, and that's where your attention needs to be.
> Not that you should really need to read much on the gauges anyway - all the things that are going to get you into an accident are outside the car
Not always. If you don't spot that your engine temperature is rising, or your fuel level is low, you might suddenly find yourself sitting in a non-running car on a busy expressway, as opposed to having a least a few minutes to find a safe place to pull over.
Best to develop a habit of continually scanning the gauges, mirrors, while your primary focus is on the road ahead. That way you are aware of the operating condition of your car and the traffic all around you, not just in front of you. You only need quick glances at the dashboard and mirrors to stay on top of things.
Exactly, we obviously drive with more than our eyes. Any undergrad psychology class should given you a hint about that
This reminds me of all the conversations about AI And LLMs now - people are making false comparisons because they don’t have a grasp of basic cognitive science, things we’ve known for 50 or more years
From stereo hearing on a head that can be moved at will, to feeling the car "under your butt", and an innate understanding of other humans' movements both as pedestrians and drivers. Just to name a few.
I drive (raced in the past) and fly. I also occasionally play flight sims and racing games with some appropriate accessories.
I am terrible at the sims. I am literally seconds faster in a real car (on a 62 second circuit) than the sim, due to all the additional sensory input in reality. I can barely land a simulated plane but have not messed up a landing since training over three decades ago.
So, yes, it is astonishing how much all our senses go into driving, beyond sight.
That is not the core reason why I provided that example.
What I meant to say, is that having 2 eyes or 20 eyes will not help if you are not able to decipher scenarios that require a semantic understanding of the current real life movie you are in.
A cyclist made sign with the hand too brief to catch, or a soccer ball just run to street so maybe a child is running after it, or a temporary deviation on the road where the police expects you to drive off road for a few seconds and is sending you manual signs.
None of the current self driving solutions is able to handle these. 1 or 20 cameras, 2 Lidars or 50 Lidars will do nothing to address the problem.
Autonomous flying seems like a much easier and more cost effective problem to solve, given there are orders of magnitude fewer contexts and obstacles.
The smarter problem would be changing the controls/UX for cars to be more like a hoverboard or balance/attention and gesture based than twisting a wheel around, abstracting that to remote telemetry, and then work on using ML to replacing that remote driver.
Autonomous vehicle companies have picked the unsolvable moonshot problem instead of improving the controls UX of driving to where it provides greater leverage to human ability. Make the driving experience more like a motorcycle, which is an extension of your body, instead of a mediated service where driving is reduced to a transaction. Self driving cars is the perfect example of sprawling corporate groupthink, where someone has a poorly formed idea and everyone aligns around managing a response to it instead of asking whether it's the right problem to solve. For example, I think the Apple version of a car wouldn't be self driving, it would be one that feels like it's controlled by thought.
Maybe that's Musk's Neuralink play, and self driving was a distraction to set all auto industry competitors on a goose chase. If he's not that strategic, it could at least provide an out, where a gesture controlled Tesla would outperform.
> The smarter problem would be changing the controls/UX for cars to be more like a hoverboard or balance/attention and gesture based than twisting a wheel around, abstracting that to remote telemetry, and then work on using ML to replacing that remote driver.
I'm sorry, what? The difficulty of self-driving cars is not about controlling the actual vehicle in space, it's about the other dynamic agents involved and getting an accurate view of the physical world.
The last line of your comment is just pure nonsense. Tesla's failed self-driving was a 5d chess move by Musk? Come on.
"Autonomous flying seems like a much easier and more cost effective problem to solve, given there are orders of magnitude fewer contexts and obstacles."
That's not true: what there is, is 3.2 trillion vehicle-miles driven annually (in the US) vs 12.5 billion aircraft-miles flown -- a factor of 400 difference.
Imagine the reduction of accident rates if, at any moment, there were 1/400 of the vehicles on the road, and every vehicle trip started with a checklist inspection by a pilot who had passed a physical exam in the last year?
> and every vehicle trip started with a checklist inspection by a pilot who had passed a physical exam in the last year?
Not to mention a slew of other requirements like 'Don't operate the vehicle with insufficient sleep', 'Don't operate the vehicle if you had even a drop of alcohol in the last 8/12 hours', etc..
> Autonomous vehicle companies have picked the unsolvable moonshot problem instead of improving the controls UX of driving to where it provides greater leverage to human ability. Make the driving experience more like a motorcycle, which is an extension of your body, instead of a mediated service where driving is reduced to a transaction.
Who the hell would want this? My guess is that 99% of the people that would want this already do drive motorcycles, exactly for that reason.
But for the vast majority of the rest of this, I don't want driving to "provide greater leverage to human ability". I just want it to get me from point A to point B, ideally without me thinking about it so I can do other things.
> Autonomous flying seems like a much easier and more cost effective problem to solve, given there are orders of magnitude fewer contexts and obstacles.
Not just flying, but (emergency) autolanding is now available:
It's almost the old Boeing vs. Airbus differences in design philosophy. With a steering wheel, worst case if the power assist fails there is still a mechanical link that will allow you to control the car. If you changed that to an Airbus-like sidestick or some other kind of fully "by wire" control, then if the power assist fails you are at the mercy of physics. That would probably demand redundant systems or big increases in reliability, with attendant costs.
> The smarter problem would be changing the controls/UX for cars to be more like a hoverboard or balance/attention and gesture based than twisting a wheel around, abstracting that to remote telemetry, and then work on using ML to replacing that remote driver.
Go ahead and do it then. Show them how it is done. :)
> Self driving cars is the perfect example of sprawling corporate groupthink, where someone has a poorly formed idea and everyone aligns around managing a response to it instead of asking whether it's the right problem to solve.
Or maybe people thought about the problem for more than hot second and come to a different conclusion than you do.
Your comment makes it clear that you are thinking about the joy of driving. You are talking about motorbikes, and UX, and being one with the car. I'm happy for you for that, and I hope you will find a lot of joy doing that. That on the other hand is not the problem self-driving cars are out to change.
Every single thing in a modern world have traveled a ton. Transportation of raw materials, transportation of parts, transportation of the finished product. Then on top of that people can't make up their mind if they want to be in A or in B and seems to change location all the time. This is economically significant. Driving for joy, is a cool hobby, but is a tiny sliver of the whole economy.
People and companies working on self driving cars believe that the time has come when we have enough compute that we can automate the act of driving and be at least as safe as vehicle driven by a human. If they are right, and these autonomous systems can be cheaper than human drivers then those who can make such machines can capture a portion of all the wages of every activity served by a commercial driver today.
> mediated service where driving is reduced to a transaction.
Do you think CEO of Walmart cares about the user experience of driving when they hire a small army to drive supply trucks to supply their stores? It is already reduced to a transaction. Uber is extremely successful. Do you think people care about a car becoming an extension of their body when they hail one?
Currently people who drive because they enjoy it, and people who drive because they have to are all jumbled into one mass. If we have commercially viable self driving systems you will see who is driving for the joy of it, and who is driving to get stuff or themselves from A to B. I suspect we will see that it will become just an other hobby. Like going skiing or horse riding. Many will enjoy it, there will be probably a whole cottage industry catering to those who would like to "become one with the machine" and etc.
> Go ahead and do it then. Show them how it is done. :)
One of my weekend projects is a POC demo for how something like this works. Maybe I'll launch it here, or maybe I'll use it to establish a new autonomous region with an air defense perimeter, tbh, I haven't quite decided, but your insight is valuable.
I'm saying what people think is a "self-driving car" is effectively a fully autonomous robot that has to navigate among humans, and presenting it as just solving a car operation problem is misleading.
The problem they are solving is creating a virtual android to replace taxi, bus and truck drivers, which is surely a great and noble pursuit, but as a consumer product, it's the wrong problem.
Some of the smartest people I have ever met have given it a lot of thought and disagree with me, but when you look at what every single other generalized robot product has succeeded at, it's in scaling human ability and in very limited tasks. Self-driving cars do neither of these things. It's not a car problem and using cars is a misleading constraint.
Your maximum upside is some nuanced variation of a train or a streetcar on dedicated roads or paths/rails that is totally dependant on new public infrastructure to support it. Maybe you get self-driving golf carts and ATVs for limited contexts, just like other robots, but a general purpose one to replace drivers without dedicated lanes, air routes, or infrastructure is implausible.
I'd focus on solving autonomous navigation in more narrow contexts like dedicated lanes and rails, flying, trails, and anything that doesn't have to deal with human obstacles. Otherwise, the failure mode of self driving cars is just a hunter/killer robot with limited commercial applications.
I generally agree with your first paragraph, though flying can be very complex with things like falling tree limbs, gusty wind, undocumented aerial lines, bird strikes and hail - but not the second. Remote driver is a fundamentally broken paradigm owing to cost, latency and issues of network reliability.
This is a super interesting comment but I want to make sure I really understand -- can you expand on how to enhance the driving experience (especially in "self-driving" vehicles)? Or is your point that self-driving vehicles is the wrong solution to a problem
I imagine 99% of the collected data is proprietary and hidden from public view, so it becomes a question of taking corporate claims at face value, which has never been that great of an idea.
Case in point: it's hard to imagine that electric semis hauling goods hundreds of miles can do so safely without human operators involved at some stages on the route. Hence, even with decent autonomous systems having a (paid!) driver in the truck at all times is the rationale thing to do - and yes, that means labor costs in that sector won't be changed much by self-driving AI technology. It will likely increase safety, as the trucks can drive themselves autonomously on the open freeway more safely than a (tired) human driver can.
Waymo remote operators don't drive the cars remotely, they plot manual courses and the cars follow them. Trucks have worse failure modes and are less likely to be able to recover even with that much prompting.
I'm not sure it can ever be real because of lightspeed delays, and because the trucks might get stuck in ways its sensors can't see. (Since if they could see it, they could fix it.)
The images at the top are labeled, poorly. It took me some zooming in to be able to read them. But agreed the lack of table headers was just plain stupid.
Sure LIDAR etc is expensive, but so maybe not that one, but in general I would have thought that many cheap & cheerful sensors is the way to go especially if you're going to be feeding it into a neural net anyway that is well suited to unscrambling that signal spaghetti
I'm almost certain that nearly everyone uses accelerometers, gyroscopes and wheel odometry data from the base vehicle. These sensors are dirt cheap and kinda hidden inside the vehicles so people don't write articles about them. That doesn't mean that engineers forgot about them, they are just not what makes a journalist's heart pump faster.
The article doesn't use a consistent definition of sensors. Notice how Mercedes' number is reported to include their emergency microphone as a sensor, but Tesla's number only includes cameras + ultrasonics. Ultrasonics are counted, but not the IMUs that may also be involved in detecting collisions.
If you were to truly count every sensor connected to every chip, there'd be hundreds or thousands of sensors. Only some of them are used. Different vehicles in a testing fleet will have slightly different numbers as well, due to the various component changes that are in-flight at any given time.
What kind of useful data would accelerometer provide? 99.999% of the time wheel rotation rate matches exactly car movement speed and direction, so as long as you have odometer (resolver/encoder), you have accelerometer data for free. For the cases when wheels are spinning freely, you have a serious situation which needs to be dealt with using specialized safety equipment.
The accelerometer exactly matches wheel rotation rate when you are driving in a straight line on a planar surface[1] in a vehicle with no toe. This is rarely the case: roads aren't perfectly flat, and outside of North America, aren't even remotely straight in most circumstances, and the handling characteristics of 0 toe are terrible. Granted, the inaccuracy from non-planarity of the road is minimal, but building an accurate enough model of tire slip from steering geometry is hard, and changes based on the level of wear on the tires, their temperature, and the road conditions, and the deviation between acceleration and wheel telemetry adds up very quickly.
So yes, you absolutely do need an IMU for self-driving.
Fun thing though, VW implemented tyre pressure monitoring in their vehicles with just a firmware update by simply comparing wheel rotation speeds in the ABS ECU. If one wheel is consistently faster than the rest, it's got a smaller radius, most likely because the tyre is flat.
First of all, pose. Secondly, different transfer functions. A wheel is a heavy physical thing that takes time to change. An accelerometer isn't, by design. The latter also redundantly measures acceleration direction rather than requiring it as an input. Thirdly, encoders don't always match accelerometers even in normal driving. Turns at intersections with tram tracks commonly cause these sorts of errors.
Haha. Have you ever calculated movement from wheel speed sensors? On top the other sensors are mounted on the body and there is a whole world of mass-spring and tire equations between... And then there is parking with the nitty gritty details of tire models etc...
The nice thing about sensor fusion is that throwing more information into it generally always makes things better (so long as you have a correct idea of the reliability of it). IMUs are absolutely going to be used to feed data into the localisation of basically every system, but it's so basic it doesn't really bear mentioning in the context of the other sensors you're using.
Accelerometers would tell you if the wheel rotation and steering angle is still matching the movement of the vehicle.
This is useful for detecting stability problems like over- or understeer.
If the steering is pointing slightly left, the car is moving rapidly right, and both rear wheels are turning at roughly the same speed and way faster than the fronts, you've lost traction and are oversteering like crazy.
It is not an open question and I hate it when writers frame it this way. Camera only (specifically, monocular camera only) systems literally cannot be safer than ones with sensor fusion right now. This may change in the future at some point, but it's not a question right now it is a fact.
Setting aside comparisons to humans for a second (will get back to this), monocular cameras can only provide relative depth. You can guess the absolute depth with your neural net but the estimates are pretty garbage. Unfortunately, robots can't work/plan with this input. The way any typical robotics stack works is that it relies on an absolute/measured understanding of the world in order to make its plans.
That isn't to say that one day with sufficiently powerful ML and better representations we would be totally unable to use mono (relative) depth. People argue that humans don't really use our stereoscopic depth past ~10m or so and that's a fair point. But we also don't plan the way robots do. We don't require accurate measurements of distance and size. When you're squeezing your car into a parking spot you don't measure your car and then measure the spot to know if it'll fit. You just know. You just do it. And it's a guesstimate (so sometimes humans make mistakes and we hit stuff). Robots don't work this way (for now), so their sensors cannot work this way either (for now).