Unknown Soldier
Member
I'm not trying to split hairs here but Waymos are monitored remotely, it's not as if Waymos are driving without anyone watching. Putting someone inside the Robotaxi probably results in better real time reaction when unexpected events occur, and they do occur to both Waymo and (limited release) Robotaxis available currently in Austin. When Waymo riders encounter problems, they press a button and contact a remote operator who can make the car move manually and there are many videos on Youtube showing this, in addition to Waymo employees arriving at the site of a stuck Waymo, getting inside, and driving the car away when necessaryWaymos are currently logging millions of miles without anyone in the driver's seat. FSD 13 still needs supervision. Yes, FSD can be used in a wider variety of circumstances, but in the scenarios where Waymos are used (city driving) FSD still can't be trusted to operate fully autonomously. Tesla is aiming to solve that, but has been aiming to for a very long time. A skeptical posture is reasonable at this point until the results are live. Clearly Tesla's confidence is growing, as we see with the Cybercab approaching completion and Robotaxi trials expanding rapidly.
Camera sensors are not 1:1 equivalent to human eyes. Tesla FSD camera systems need to infer depth using a camera array and ML. No, humans do not have LiDAR, but can perceive depth effectively. Also, humans are not great drivers, statistically. Autonomous driving needs to far exceed the average human driver in order to achieve widespread acceptance.
I want to speak a bit more generally on autonomous driving because the argument about LIDAR vs. visible spectrum only is many years old and it touches a lot of different topics on the subject of developing AI for specific purposes
Driving is a pretty deterministic task. The goal is to move the car from point A to point B without colliding with another object. So to approach the task from first principles, you only need what is necessary to ensure the car travels to the destination without crashing into something. We know humans aren't the best at driving, and yet humans have only a single pair of front facing cameras which use biological inference to create depth perception. The human reaction and processing time is anywhere from 100 to 500 milliseconds depending on how tired the human is. Humans are prone to errors, making poor judgments, and are easily impaired by simply ingesting ethanol
So for a machine to be better than a human at driving is more or less fait accompli as long as the machine has the bare minimum needed to perceive better than a human does. A machine can have reactions in the nanoseconds range, can always make the same judgment when faced with the same situation, can never be tired, never needs to sleep, and it will never stop until you are safely delivered to your destination or it is destroyed by the human resistance after Skynet nukes most of humanity
The basic design of roads today is based on humans driving on them. So if you wish to put a machine on the same road, it rationally follows that you want to roughly simulate what a human perceives on that road. This is why using cameras detecting visible spectrum light makes sense. The machine sees what the human sees, and can be trained to react in a similar way. Humans don't have radar, or LIDAR, or lasers. And this is how roads are designed, for humans. It is the most logical to give the machine the same "senses" as the human if it is expected to drive on a road designed for humans
That I knew nothing was my advantage. - Gaston Glock
On a fundamental level, current AI/ML is based on ingesting a really huge fucking dataset and then transforming it in different ways to generate a result. This is a gross simplification of how LLM's work, but that is what they do. For this reason, current LLM's are not a true path to AGI but that's not really what I want to talk about here
Because driving is fundamentally a deterministic task, it adapts well to current AI/ML designs. Simply ingest a massive amount of data, in the case of Tesla they are feeding the "Colossus" cluster on the Giga Texas site millions and millions of videos of humans driving. This is why it makes sense to use cameras because the AI is being trained on videos of humans driving, and you can directly take this existing data set of decades of human drivers and immediately train on it and apply it to an AI performing the task of driving
If you've read this far, you see where I'm going with this. There are simply not millions and millions of videos of LIDAR records of humans driving. This data doesn't exist. More to the point, to create such data from scratch now would require decades of recording humans driving but using LIDAR emitters and acquiring that data. This means that on a fundamental level, we cannot easily train a AI driven car platform using LIDAR the way we can such a platform using just cameras
This is why using LIDAR and thinking you now have "more data" is inherently a fallacy. You don't have more data. You have much, much less for the actual task which is desired! Training the car to drive using LIDAR requires you to first acquire or generate the data, and then you can actually train. Meanwhile, the data for training a car using cameras not only exists, it exists in vast quantities because of many decades of cameras existing and humans recording themselves driving using cameras!
But when you give to the needy, do not let your left hand know what your right hand is doing. - Matthew 6:3
So now we see that LIDAR is in fact not an "augment" to cameras for driving. It gives different, sometimes conflicting data to what cameras will give because of different wavelengths. It has fundamentally less data to draw from in it's dataset for training. It has issues with subsurface scattering, a function of the fact that you use an emitter to generate LIDAR signals from the vehicle which then are reflected by the environment and received by the same vehicle whereas with visible light all you do is receive ambient images with the cameras. (I'm simplifying here a bit, I realize that headlights on cars are a form of emitter and the camera receives the reflected visible light to drive at night.)
So now we come to the issue is who is "right" when data conflicts. This is what sensor contention is. When the LIDAR says "obstacle" and the camera says "all clear" or vice versa. Sensor contention is not something humans intuitively understand. We have only our eyes and no other form of sensory perception for vision. So humans on a basic level do not easily comprehend the idea of resolving conflicts between LIDAR and cameras when they arise. Furthermore, because one sensory suite has much more data for training than the other, it's even more precarious to override the data from the cameras as incorrect with LIDAR data since the LIDAR suite has much less data for training!
In a situation where one sensor suite is much more needy than the other, due to lack of training data, due to inherent instability because it requires an emitter to generate the sensory inputs, due to basic differences between wavelengths, does it ever make sense to override conflicts by following the LIDAR suite instead of the camera suite? I would argue the answer is NO. NEVER. If the cameras should always take priority over the LIDAR in the case of conflicts, then what is the LIDAR actually doing besides costing a lot more and generating confusing inputs which are always discarded when input data conflicts?
We know LIDAR isn't cheap. Each Waymo vehicle costs around $200,000 to outfit with its LIDAR sensor suite. And we have now established that LIDAR doesn't actually improve sensory perception in a way that matters, because conflicting data should always prioritize the camera inputs over the LIDAR inputs. So what actually is LIDAR doing, besides giving Waymo a lot of unnecessary challenges trying to generate useful and valid data for training the AI that runs the LIDAR suite and also making it hard to decide who is "right" when the LIDAR and camera suites generate conflicting inputs?
I believe the Tesla approach of simply eliminating the LIDAR entirely and relying solely on cameras to be the correct one. It just makes sense if you simplify the problem and approach it from first principles.