Tesla Cybercab moves closer to completion; Tesla receives patent for radically faster Unboxed 2.0 assembly process

Waymos are currently logging millions of miles without anyone in the driver's seat. FSD 13 still needs supervision. Yes, FSD can be used in a wider variety of circumstances, but in the scenarios where Waymos are used (city driving) FSD still can't be trusted to operate fully autonomously. Tesla is aiming to solve that, but has been aiming to for a very long time. A skeptical posture is reasonable at this point until the results are live. Clearly Tesla's confidence is growing, as we see with the Cybercab approaching completion and Robotaxi trials expanding rapidly.

Camera sensors are not 1:1 equivalent to human eyes. Tesla FSD camera systems need to infer depth using a camera array and ML. No, humans do not have LiDAR, but can perceive depth effectively. Also, humans are not great drivers, statistically. Autonomous driving needs to far exceed the average human driver in order to achieve widespread acceptance.
I'm not trying to split hairs here but Waymos are monitored remotely, it's not as if Waymos are driving without anyone watching. Putting someone inside the Robotaxi probably results in better real time reaction when unexpected events occur, and they do occur to both Waymo and (limited release) Robotaxis available currently in Austin. When Waymo riders encounter problems, they press a button and contact a remote operator who can make the car move manually and there are many videos on Youtube showing this, in addition to Waymo employees arriving at the site of a stuck Waymo, getting inside, and driving the car away when necessary

I want to speak a bit more generally on autonomous driving because the argument about LIDAR vs. visible spectrum only is many years old and it touches a lot of different topics on the subject of developing AI for specific purposes

When a young man, I read somewhere the following: God the Almighty said, "All that is too complex is unnecessary, and it is simple that is needed" - Mikhail Kalashnikov

Driving is a pretty deterministic task. The goal is to move the car from point A to point B without colliding with another object. So to approach the task from first principles, you only need what is necessary to ensure the car travels to the destination without crashing into something. We know humans aren't the best at driving, and yet humans have only a single pair of front facing cameras which use biological inference to create depth perception. The human reaction and processing time is anywhere from 100 to 500 milliseconds depending on how tired the human is. Humans are prone to errors, making poor judgments, and are easily impaired by simply ingesting ethanol

So for a machine to be better than a human at driving is more or less fait accompli as long as the machine has the bare minimum needed to perceive better than a human does. A machine can have reactions in the nanoseconds range, can always make the same judgment when faced with the same situation, can never be tired, never needs to sleep, and it will never stop until you are safely delivered to your destination or it is destroyed by the human resistance after Skynet nukes most of humanity

The basic design of roads today is based on humans driving on them. So if you wish to put a machine on the same road, it rationally follows that you want to roughly simulate what a human perceives on that road. This is why using cameras detecting visible spectrum light makes sense. The machine sees what the human sees, and can be trained to react in a similar way. Humans don't have radar, or LIDAR, or lasers. And this is how roads are designed, for humans. It is the most logical to give the machine the same "senses" as the human if it is expected to drive on a road designed for humans

That I knew nothing was my advantage. - Gaston Glock

On a fundamental level, current AI/ML is based on ingesting a really huge fucking dataset and then transforming it in different ways to generate a result. This is a gross simplification of how LLM's work, but that is what they do. For this reason, current LLM's are not a true path to AGI but that's not really what I want to talk about here

Because driving is fundamentally a deterministic task, it adapts well to current AI/ML designs. Simply ingest a massive amount of data, in the case of Tesla they are feeding the "Colossus" cluster on the Giga Texas site millions and millions of videos of humans driving. This is why it makes sense to use cameras because the AI is being trained on videos of humans driving, and you can directly take this existing data set of decades of human drivers and immediately train on it and apply it to an AI performing the task of driving

If you've read this far, you see where I'm going with this. There are simply not millions and millions of videos of LIDAR records of humans driving. This data doesn't exist. More to the point, to create such data from scratch now would require decades of recording humans driving but using LIDAR emitters and acquiring that data. This means that on a fundamental level, we cannot easily train a AI driven car platform using LIDAR the way we can such a platform using just cameras

This is why using LIDAR and thinking you now have "more data" is inherently a fallacy. You don't have more data. You have much, much less for the actual task which is desired! Training the car to drive using LIDAR requires you to first acquire or generate the data, and then you can actually train. Meanwhile, the data for training a car using cameras not only exists, it exists in vast quantities because of many decades of cameras existing and humans recording themselves driving using cameras!

But when you give to the needy, do not let your left hand know what your right hand is doing. - Matthew 6:3

So now we see that LIDAR is in fact not an "augment" to cameras for driving. It gives different, sometimes conflicting data to what cameras will give because of different wavelengths. It has fundamentally less data to draw from in it's dataset for training. It has issues with subsurface scattering, a function of the fact that you use an emitter to generate LIDAR signals from the vehicle which then are reflected by the environment and received by the same vehicle whereas with visible light all you do is receive ambient images with the cameras. (I'm simplifying here a bit, I realize that headlights on cars are a form of emitter and the camera receives the reflected visible light to drive at night.)

So now we come to the issue is who is "right" when data conflicts. This is what sensor contention is. When the LIDAR says "obstacle" and the camera says "all clear" or vice versa. Sensor contention is not something humans intuitively understand. We have only our eyes and no other form of sensory perception for vision. So humans on a basic level do not easily comprehend the idea of resolving conflicts between LIDAR and cameras when they arise. Furthermore, because one sensory suite has much more data for training than the other, it's even more precarious to override the data from the cameras as incorrect with LIDAR data since the LIDAR suite has much less data for training!

In a situation where one sensor suite is much more needy than the other, due to lack of training data, due to inherent instability because it requires an emitter to generate the sensory inputs, due to basic differences between wavelengths, does it ever make sense to override conflicts by following the LIDAR suite instead of the camera suite? I would argue the answer is NO. NEVER. If the cameras should always take priority over the LIDAR in the case of conflicts, then what is the LIDAR actually doing besides costing a lot more and generating confusing inputs which are always discarded when input data conflicts?



We know LIDAR isn't cheap. Each Waymo vehicle costs around $200,000 to outfit with its LIDAR sensor suite. And we have now established that LIDAR doesn't actually improve sensory perception in a way that matters, because conflicting data should always prioritize the camera inputs over the LIDAR inputs. So what actually is LIDAR doing, besides giving Waymo a lot of unnecessary challenges trying to generate useful and valid data for training the AI that runs the LIDAR suite and also making it hard to decide who is "right" when the LIDAR and camera suites generate conflicting inputs?

I believe the Tesla approach of simply eliminating the LIDAR entirely and relying solely on cameras to be the correct one. It just makes sense if you simplify the problem and approach it from first principles.
 
It was so incredibly dumb for Elon to insert himself into politics. His companies do cool stuff like this and now anything remotely associated with him will be opposed by leftoids and dem politicians forever. They cheer when his rockets fail.
 
I'm not trying to split hairs here but Waymos are monitored remotely, it's not as if Waymos are driving without anyone watching. Putting someone inside the Robotaxi probably results in better real time reaction when unexpected events occur, and they do occur to both Waymo and (limited release) Robotaxis available currently in Austin. When Waymo riders encounter problems, they press a button and contact a remote operator who can make the car move manually and there are many videos on Youtube showing this, in addition to Waymo employees arriving at the site of a stuck Waymo, getting inside, and driving the car away when necessary

I want to speak a bit more generally on autonomous driving because the argument about LIDAR vs. visible spectrum only is many years old and it touches a lot of different topics on the subject of developing AI for specific purposes



Driving is a pretty deterministic task. The goal is to move the car from point A to point B without colliding with another object. So to approach the task from first principles, you only need what is necessary to ensure the car travels to the destination without crashing into something. We know humans aren't the best at driving, and yet humans have only a single pair of front facing cameras which use biological inference to create depth perception. The human reaction and processing time is anywhere from 100 to 500 milliseconds depending on how tired the human is. Humans are prone to errors, making poor judgments, and are easily impaired by simply ingesting ethanol

So for a machine to be better than a human at driving is more or less fait accompli as long as the machine has the bare minimum needed to perceive better than a human does. A machine can have reactions in the nanoseconds range, can always make the same judgment when faced with the same situation, can never be tired, never needs to sleep, and it will never stop until you are safely delivered to your destination or it is destroyed by the human resistance after Skynet nukes most of humanity

The basic design of roads today is based on humans driving on them. So if you wish to put a machine on the same road, it rationally follows that you want to roughly simulate what a human perceives on that road. This is why using cameras detecting visible spectrum light makes sense. The machine sees what the human sees, and can be trained to react in a similar way. Humans don't have radar, or LIDAR, or lasers. And this is how roads are designed, for humans. It is the most logical to give the machine the same "senses" as the human if it is expected to drive on a road designed for humans



On a fundamental level, current AI/ML is based on ingesting a really huge fucking dataset and then transforming it in different ways to generate a result. This is a gross simplification of how LLM's work, but that is what they do. For this reason, current LLM's are not a true path to AGI but that's not really what I want to talk about here

Because driving is fundamentally a deterministic task, it adapts well to current AI/ML designs. Simply ingest a massive amount of data, in the case of Tesla they are feeding the "Colossus" cluster on the Giga Texas site millions and millions of videos of humans driving. This is why it makes sense to use cameras because the AI is being trained on videos of humans driving, and you can directly take this existing data set of decades of human drivers and immediately train on it and apply it to an AI performing the task of driving

If you've read this far, you see where I'm going with this. There are simply not millions and millions of videos of LIDAR records of humans driving. This data doesn't exist. More to the point, to create such data from scratch now would require decades of recording humans driving but using LIDAR emitters and acquiring that data. This means that on a fundamental level, we cannot easily train a AI driven car platform using LIDAR the way we can such a platform using just cameras

This is why using LIDAR and thinking you now have "more data" is inherently a fallacy. You don't have more data. You have much, much less for the actual task which is desired! Training the car to drive using LIDAR requires you to first acquire or generate the data, and then you can actually train. Meanwhile, the data for training a car using cameras not only exists, it exists in vast quantities because of many decades of cameras existing and humans recording themselves driving using cameras!



So now we see that LIDAR is in fact not an "augment" to cameras for driving. It gives different, sometimes conflicting data to what cameras will give because of different wavelengths. It has fundamentally less data to draw from in it's dataset for training. It has issues with subsurface scattering, a function of the fact that you use an emitter to generate LIDAR signals from the vehicle which then are reflected by the environment and received by the same vehicle whereas with visible light all you do is receive ambient images with the cameras. (I'm simplifying here a bit, I realize that headlights on cars are a form of emitter and the camera receives the reflected visible light to drive at night.)

So now we come to the issue is who is "right" when data conflicts. This is what sensor contention is. When the LIDAR says "obstacle" and the camera says "all clear" or vice versa. Sensor contention is not something humans intuitively understand. We have only our eyes and no other form of sensory perception for vision. So humans on a basic level do not easily comprehend the idea of resolving conflicts between LIDAR and cameras when they arise. Furthermore, because one sensory suite has much more data for training than the other, it's even more precarious to override the data from the cameras as incorrect with LIDAR data since the LIDAR suite has much less data for training!

In a situation where one sensor suite is much more needy than the other, due to lack of training data, due to inherent instability because it requires an emitter to generate the sensory inputs, due to basic differences between wavelengths, does it ever make sense to override conflicts by following the LIDAR suite instead of the camera suite? I would argue the answer is NO. NEVER. If the cameras should always take priority over the LIDAR in the case of conflicts, then what is the LIDAR actually doing besides costing a lot more and generating confusing inputs which are always discarded when input data conflicts?



We know LIDAR isn't cheap. Each Waymo vehicle costs around $200,000 to outfit with its LIDAR sensor suite. And we have now established that LIDAR doesn't actually improve sensory perception in a way that matters, because conflicting data should always prioritize the camera inputs over the LIDAR inputs. So what actually is LIDAR doing, besides giving Waymo a lot of unnecessary challenges trying to generate useful and valid data for training the AI that runs the LIDAR suite and also making it hard to decide who is "right" when the LIDAR and camera suites generate conflicting inputs?

I believe the Tesla approach of simply eliminating the LIDAR entirely and relying solely on cameras to be the correct one. It just makes sense if you simplify the problem and approach it from first principles.

Waymo customer support can intervene if necessary, but Waymos aren't teleoperated. There is not a remote human staring at each vehicle's systems. It is not equivalent to a human supervisor for FSD.

Driving isn't deterministic. Many stochastic elements, and that's the main problem to solve for in the final 1% of autonomous driving.

Sensor data is sensor data. Camera, LiDAR, RADAR, infrared, all useful for implementing an autonomous driving system. It doesn't need to map 1:1 with human experience.

It's not correct to say that a LiDAR sensor should never override a camera sensor. The purpose of sensor fusion is to find the appropriate weighting of each sensor to output the correct conclusion. As camera sensor uncertainty increases, the other sensors exist to bridge the confidence gap and output the correct decision.

The claim that Waymo vehicles cost $200,000 to outfit a LiDAR sensor suite is not accurate. Waymo's fifth-gen sensor suite is estimated at ~$9k, and the current sixth gen suite is reportedly significantly cheaper. Still more expensive than Tesla's ~$400 camera suite. Most of the cost of current Waymo vehicles is coming from the expense of the Jaguar i-pace, which is not relevant to the LiDAR suite's costs.
 
Like most things Tesla, this will be 99% hype and 1% results, then eventually getting to 50% hype and 50% results after 10 years, never actually getting close to what was promised lol.

The problem with their self driving stuff is it's not good with the .01% situations, making it unsafe at times. Waymo has solved that for the most part, Tesla has not. It does not matter if it works 99.99% of the time, it has to be like airlines and work 100% of the time if you want people to take it.
 
Waymo customer support can intervene if necessary, but Waymos aren't teleoperated. There is not a remote human staring at each vehicle's systems. It is not equivalent to a human supervisor for FSD.
The Waymo customer support agent can override the car and move it manually. Waymo also sends human employees to the site of stuck Waymos to move them by driving them away as necessary
Driving isn't deterministic. Many stochastic elements, and that's the main problem to solve for in the final 1% of autonomous driving.
We can generally treat the stochastic elements as 'edge cases' since they occur very infrequently during normal driving. This is the general reason for altering the videos in a training data set, for example when an unusual driving situation is captured on video
Sensor data is sensor data. Camera, LiDAR, RADAR, infrared, all useful for implementing an autonomous driving system. It doesn't need to map 1:1 with human experience.
Not when there is no training data of real use to "teach" the car what to do with the sensor data. That's the point I was trying to make. I was speaking from human experience simply because driving is a task which was originally designed for humans to perform, from road segment layout to the driver controls in the cabin
It's not correct to say that a LiDAR sensor should never override a camera sensor. The purpose of sensor fusion is to find the appropriate weighting of each sensor to output the correct conclusion. As camera sensor uncertainty increases, the other sensors exist to bridge the confidence gap and output the correct decision.
This is only possible when the other sensors have the necessary training data to bridge said confidence gap and increase the likelihood of a correct decision. My point has been that the data does not exist, and furthermore because driving was intended for humans originally, you don't need additional sensors beyond cameras because that's what humans have. The "expert" assumption has been that adding a lot of different sensors will improve the autonomous vehicle's ability to drive, and my response is that they do not and real world usage will prove this is the case
The claim that Waymo vehicles cost $200,000 to outfit a LiDAR sensor suite is not accurate. Waymo's fifth-gen sensor suite is estimated at ~$9k, and the current sixth gen suite is reportedly significantly cheaper. Still more expensive than Tesla's ~$400 camera suite. Most of the cost of current Waymo vehicles is coming from the expense of the Jaguar i-pace, which is not relevant to the LiDAR suite's costs.
Fair enough. That doesn't solve the issue of making the additional sensors relevant to making a correct decision however.
 
I was fortunate enough to get Model Y juniper earlier in the year (first Tesla). Model Ys are everywhere, so of course had plenty of chances to drive one before but was never impressed - felt janky, rough ride, plasticy, none of friends/fam had FSD.

Model Y Juniper is a marvel, perfection in my opinion. So smooth, quiet, and they nailed the small things like seat folding/trunk opening - this is basic stuff, but I'm not joking when I say Tesla perfected it and it's insane these features are on a car at this price level.

With all that said, FSD is still by far the most impressive thing about the Tesla. I used the 3 month trial and now I'm a subscriber. I know people don't seem to like it, but they must've exponentially improved with these last updates because I haven't run into any major issues. I had maybe one issue where it didn't recognize turning lane, but even then it felt like it would find a way to maneuver. It's sooo good, seriously personal chauffeur. It's cool to set a destination you've never been to before and it just finds a way to get there, no worries about direction. FSD is my default now.

Man I was so tempted to get the new Model 3 lease as well before tax credir ended, those are so damn cheap for what you get.

Huge fan of Tesla and FSD!
 
I've ridden in a Waymo probably 10-15 times, and I felt like it drove completely safely, obnoxiously so, in fact.

During one trip, we were experiencing extremely heavy rain, and the Waymo pulled over for about 15 minutes until the rain calmed down. A normal cabbie probably would have risked it (and more than likely would have been fine, but it still would have been a risk).

The other thing is that Waymos truly obey all traffic laws and speed limits, which just isn't how humans realistically drive. I'm not saying they shouldn't obey traffic laws and speed limits (they absolutely should), but on average, a Waymo trip took about 20-30% longer than the same trip with an Uber driver. Again, I'm an advocate for safety, and so I feel like it's a reasonable trade off… but I wish there was some way to legally and safely have autonomous cars drive a few mph over the speed limit, when the cars around them are doing the same.

In any case, I'd be interested to see the direct tech comparison between the Tesla cab and a Waymo. I definitely feel like the lidar will give Waymo a significant advantage; I was incredibly impressed at how clearly the Waymo could see people and cars that I, myself, could not visibly see. It's really cool how they display that on their dash panel for passengers to view in real-time, and it adds a lot toward building confidence in using autonomous cars.
How did it know where to pull over? where did it stop?
 
Actually, I think the discussion we're having should probably be taken a step back before we continue to argue the specific merits of adding additional sensors like LIDAR...

The first question we should all be asking is "What specific scenarios when driving would benefit from the addition of radar, LIDAR, IR, etc."?

If we can't actually think of any specific scenarios where having a LIDAR sensor suite would have prevented an accident that a camera would have been unable to detect and prevent, then on a fundamental level all the other discussions don't even matter
 
Actually, I think the discussion we're having should probably be taken a step back before we continue to argue the specific merits of adding additional sensors like LIDAR...

The first question we should all be asking is "What specific scenarios when driving would benefit from the addition of radar, LIDAR, IR, etc."?

If we can't actually think of any specific scenarios where having a LIDAR sensor suite would have prevented an accident that a camera would have been unable to detect and prevent, then on a fundamental level all the other discussions don't even matter
If I remember correctly, there's a situation where bright sun at sunrise can blind the cameras but the lidar still operates correctly. If you've ever driven in the mountains, blinding morning sun is a huge problem if you're traveling east.
 
If I remember correctly, there's a situation where bright sun at sunrise can blind the cameras but the lidar still operates correctly. If you've ever driven in the mountains, blinding morning sun is a huge problem if you're traveling east.
This has already been addressed with existing FSD hardware, the cameras have more than enough dynamic range to continue to perceive the environment even when the sun directly shines on the sensors
 
One major benefit to camera tech is that we humans understand visible light optics REALLY WELL, and have designed road systems with lots of visible light cues. Since we don't see the way Lidar works, we have much less understanding of what spoofs it or how to assist it.
 
Actually, I think the discussion we're having should probably be taken a step back before we continue to argue the specific merits of adding additional sensors like LIDAR...

The first question we should all be asking is "What specific scenarios when driving would benefit from the addition of radar, LIDAR, IR, etc."?

If we can't actually think of any specific scenarios where having a LIDAR sensor suite would have prevented an accident that a camera would have been unable to detect and prevent, then on a fundamental level all the other discussions don't even matter
Object classification at range is where Lidar shines. You can't classify objects with camera data alone at long range. They are a few pixels of data. Humans intuit that these are cars or cones or signs based on driving experience, but machines cannot - see the Uber crash where the system couldn't classify someone crossing the street with a bicycle at night on a fast moving road.

 
Top Bottom