One of the best ever. - "if you remember linear regression ..." - You are certainly addressing a small data set of folks! - "things" (countable) and "stuff" (not countable) - I need to work this into a conversation to seem even geekier. - "NI" (natural intelligence) Another geeky thing (countable) I learned today. - panoptic segmentation of video plus face recognition becomes scary.
First, thanks! And second, panoptic segmentation and facial recognition is indeed a scary idea--and likely coming to a public event near you sooner rather than later. Oh joy.
My first naive reaction was I thought the autolabelers were doing this already. It is amazing how well the cars are driving using a boxy vector space representation. Although panoptic segmentation would be useful for a car it will be essential for a Tesla bot in an interior space. For example, one cannot sit in a chair; if all you know about the chair is a bounding box! Which way is the chair facing? Does it have arms? Is the chair stable?
The data set is infinite because it has literally no end, it is continuously generating data 24x7x365. The moment you think it has a finite size, it just got bigger.
Great video! I started playing While True: Learn on PC thanks to being free on the Epic Game Store right now. It really does a good job explaining basic image recognition and machine learning. Also, it has cute cats, lol. I actually thought the cars were doing some kind of panoptic segmentation for parking lots and driveways already but hmm. Might need a server in the car then. Tesla keeps moving forward while others are just creeping along.
This type of video is where you really stand out from the crowd. FSD videos are a dime a dozen but something like this... fantastic. Thanks for the content.
I knew I was a technical expert. I can now quote you saying that “things” and “stuff” are technical terms. I use them all the time 🤣🤣 Excellent video, and the first I’ve seen that helped me understand panoptic segmentation, without me having to do separate research after watching 👏👏👏
Another really interesting and informative episode. I also want to praise you for the audio level between your show and the ads YT runs. Other channels make me scramble for the remote to hit mute and make me somewhat ticked off. I figure I should comment when it's good not just complain when it's bad (the others). Thanks for the excellent audio and the great shows in general.
Neural net does an amazing job of mapping the surroundings. The errors seem to be with the path planner. It pulls into the, appropriately labeled, left turn lane when it is not turning left. Heads straight for a barrier (also labeled on the screen).
I hope so re path planner but Maybe we will also need Hardware 4 or maybe Hardware 5 before we have enough onboard compute power and / or cached memory to process all the information and driving instructions quickly enough for true FSD.
In biological vision segmentation starts in the retina because of the way eyes capture light information. That information doesn't come in as a singular clump of values as opposed to discrete subsets of signal values corresponding to various optical wavelengths. The importance of this is that there are separate networks of neurons representing all of the various details of anything you see without the explicit need to "segment" it after the fact. The higher functions of the visual system are stitching together these groups of signals into a coherent "mental picture" that is what we call visual perception. But that stitching is not based on matrix multiplication of all values as each set of signals maintains its own separation and comparison space for later "reasoning". So for example, the most obvious example of this and why it is less computationally expensive is in color detection. A person can easily pick out the distinct color regions in an image because they represent distinct signal values within the neurons of the brain. And this is something provided very efficiently in the human visual system. But in computers, each pixel is a r, g, b value and there is no simple way to pick out a distinct color because the computer doesn't know what color is and all pixels look the same from a data perspective. There is no separation of images into various optical color components in computer vision, which means you have to do per pixel processing to distinguish one pixel from another and to assign them to logical subsets of values. And even beyond that, there are rods and cones in the eye, so not only do colors exist in distinct color spaces in the brain but also corresponding to that there is also the greyscale signal that is captured as well. This is why it is so easy to recognize the silhouette of an object as the same object from a previously viewed image, because that greyscale "segmentation mask" is automatically processed in its own signal space along with the color information.
Karpathy is correct with infinity sized dataset, as everything is constantly changing, different cars driving around different time of the year, leafs on the road, different traffic constellations and behaviors etc.
True. It's more a semantic thing (lol) for me. Infinity is a technical term in this instance. But we're splitting infinitely small hairs at this point :)
FSD beta isn’t even working at all anymore. Hasn’t for the last week. At least for me. About a week ago it stopped working altogether. Every once in a while you can see the visualization attempt to shift into FSD beta mode and the steering wheel icon pops up like it’s going to work, but it only lasts for a fleeting second. Very weird behavior. Any other beta testers experiencing this?
Seems like bounding boxes are really ok for driving, just give them a bit more polygons, but the panoptic will do wonders for the bot when it needs to manipulate an object.
They should push unknown object frames to the driver to identify at the end of a trip, crowd sourcing labeling would be more accurate and free to tesla
Checking auto labeller videos at Tesla may be tedious but no one takes the wrong snacks out of the lunch fridge after seeing the world in hyperlink mode😀
Dr. Know-it-all, the F-bomb within the first 15 seconds of your video? . I was shocked! Is that like you? Great video, nonetheless. Keep up the good work.
I personally feel that the thing most likely to continue to nag FSD - in terms of getting it to wide release - is temporal memory. I know they are working on spatial and temporal memory, but I don’t think it is long term enough. Humans make decisions all too often based on what they have learned long ago and apply lessons from past experiences to present situations all the time. I think FSD struggles if it can’t do the same.
All good stuff but I contend on board memory will be required. Example, learning my gravel winding driveway and sharp right turn into the proper bay of my two car garage. GPS and mapping can only go so far.
Thank you for explaining this, I understand it a little bit better than before. I understand your use of the Cinematic Mode on the iPhone but it looks fake and reminds me of Zoom Conference background replacements. Maybe if the blur was not as strong it would look more natural. Again thanks for the enlightening content.
Aspects of this will make certain parts of FSD better quickly, but there's also quite a step down in performance when building a new generation / major version upgrade. So I don't think it will be 10x better out of the box, however, some steps in the right direction.
I wonder, in panoptic segmentation, how do you track a specific thing like a car or a pedesrtrian between frames (? I mean, it is kind of easy to assign an id to a thing in a frame, but you have to be consistent between frames, if that matters somehow, And I think it has to be relevant, because you care about different instances. That must be also a difficult task right? I see a lot of problems to solve about doing it, like if you want to track maneuvers to avoid collisions (?
I guess it is infinity-sized in the sense that they can grow the unlabelled dataset as fast as they want and on demand, probably only limited by technical reasons like available bandwidth to each car.
Ironically I go a new Roomba J7+ and they integrated a forward facing camera on the vacuum. The vacuum uses the forward facing camera to identify obstacles and avoids them, the main purpose being for that of "seeing" dog poop on floors/carpets and not making a complete mess by spreading it around the house. Users can opt in to have the vacuum also send pictures of other obstacles so Roomba can work on labeling them. The age of AI vision is beginning to creep in to other smart appliances slowly but surely.
I wondered how they would solve the poo/vomit problem. Must be a lot of variants from runny, soft serve, and cigar, along with the miriad colour variants! Maybe they'll include a sniffer for edge cases one day.
Amazing Explanation, can’t wait till auto labeling programs can finally get the job done faster saving the auto labor time to focus on verifying the process and so they can focus on other task. The sooner we optimize this process in tandem to a faster computer DOJO, we can to finally have the best combination of tools to make FSD 10x and have the Tesla Bot up and running. What a future Tesla will have.
Labeling will only answer part of the problem, of course. Another problem is that drivers as well as computers need to become better "mind readers" or to be able to more reliably predict the intent of other drivers and pedestrians. Labeling alone is not enough. A rolling ball suddenly appearing on the roadway could mean a child may be momentarily running into the street to fetch it. A pedestrian who is standing at a corner and looking at you and perhaps smiling is intending to cross the street even though there is no crosswalk. Another pedestrian may be crossing the street in a crosswalk and looking at his or her cell phone intending to continue walking, no matter what. Humans are "scientists." They have their own theories about how the world is and are constantly making predictions in order to survive. Most people are relatively good scientists and some are really bad scientists. The bad scientists often die early and regrettably win the Darwin Awards and don't get to pass on their "defective" genes. The so-called good drivers must learn to read the intentions of others and this boils down to computers making sound predictions about the predictions of others. These are software problems that are basically optical and psychological in nature and should be solvable eventually. But I think that the problem of intent, not just labeling, is a key problem in achieving full self-driving. I enjoyed this video very much. Keep up the good work.
Seems like this could be very useful for traffic cones on road, I remember seeing. a video where someone had to take over to steer through cones perhaps the bounding boxes made the car think it couldn't get through?
Question for you DrKnowItAll, can you find out anything about whether it is a single NN that does the inference in the car, or whether it is made up of multiple components?
It is a large NN that is made up of multiple parts, some repeated. Not sure what you mean by component. You will find a lot about it in the recording of Tesla AI day, including the structure and parts of the NN. Take a look in this section: ruclips.net/video/j0z4FweCy4M/видео.html
I ask myself the question if identifying an object from static images is not less efficient than using a sequence of images. I think that using images spaced out in time first helps to determine if the object is moving or static. It is likely that by tracing the movement of 3d surfaces of a given color, it will first be possible to make a first selection of objects or rather surfaces which may or may not collide with the intended trajectory of the vehicle. This will therefore already eliminate the step of identifying objects with no chance of interacting with the vehicle. On the other hand, the identification of objects such as indicators (stop sign, red lights, speed limit, person managing the flow of vehicles) will have to be analyzed even if they are not in the path of the vehicle. However, to make a first selection of objects which may or may not intersect the vehicle should in my opinion make it possible to lighten the treatment of artificial intelligence. This is probably what the servo does, it seems to me that it identifies moving objects more quickly. For example yesterday I was reading and my attention was attracted by a mouse which had the audacity to pass in my periferic field of vision. Hey yes I live in an old house in rural Switzerland where it happens. It is also probable that using a time analysis will allow us to discriminate bizarre situations like a panel representing an object and the object itself which is probably difficult if one relies only on image processing. Sorry for my aproximarif English and thanks to the google translator. P.S .: Thank you for the always very informative videos.
Just a small point. In your Description of the podcast, you are still offering a 1,000 Free Supercharger miles if someone would use your code number. I believe you need to update this as Tesla no longer offers free Supercharger mileage. Like I said, a small point. Enjoy learning from you.
Very interesting video. Of course you could massively increase the number of RUclips hits if you changed the title to "Cute Kitty's and Tesla AI using Panaptic Segmentation etc!
I went into this video thinking. Hell ya ill understand this. No problemo. Now im not so sure!! lol. Great video though. I think i may rewatch it a couple times.
You might be right, but this would just release a lot of manual labelling and checking if auto labelling is correct without over fitting the model. Great video on explaining the panoptic segmentation! 👍
Enjoying your videos. Watched the AI presentation. And watched a lot of FSD real world issues. As a game developer, I think there is a missing step in the algorithm. In games we divide the navigable surface into a tagged nav-mesh. A geometric 2d surface, which is tagged into discrete regions to help the AI move about. I think Tesla AI is not computing the equivalent. It is not permanently tagging navigable regions with "bike lane", "right turning lane", "parking area","tram tracks" and so the route planner ends up making occasionally bizarre choices. It does something similar, but seemingly on an ongoing basis, so it forgets these classifications 15 times a second. The thrashing noodle seems evidence of this. The system seems to be performing a fresh evaluation of the entire scene with each frame of vision data, rather than doing less work and merely confirming the previous segmentation decisions. I raise that here, because the segmentation you describe in the video is a similar task.
Good question. I think since they're working so hard on their simulation engine, that it can indeed be "tricked" (or trained if you prefer) by virtual images.
Times up for FSD. They fiddling with things that they should have a year ago. I've turned off the Grandma Walton test for FSD. A refund of the money I gave them over two years ago has been demanded for reimbursement. maybe, one day in the far future, EElllllkon will learn to DELIVER
I think you’re completely mistaken about the “slowly moving viewpoint” and 60fps “shutter speed”. I think Karpathy is really meaning _slow_. As in walking speed slow. Otherwise, the 1/60s _equivalent_ shutter speed (probably around 1/120s as per video standards) would yield really terrible blurring. Even if Tesla goes beyond typical values and pushes the hardware to something like 1/240 shutter speed (frame sampling period), which remains to be seen, it is still pretty terrible from a photography/motion blur perspective.
Tesla cars need a dream mode for when they are not driving. Whereby they use their idle compute to replay their last driving scenarios over and over with monte carlo style variations of the vector space representation, to better grok the salient and more egregious deviations of real space experience from the AI model's nominal extemporaneous predictions at the time. Then all the "sleeping" Teslas should network with one another and adversarially virtually race these scenarios to find out best solution space to further pinpoint issues , and then upload it all to Tesla autoCloud Inc,. to hone an executive second order meta model which symbiotically steers the evolution and training of the core AI FSD beyond just products of auto labeling, routing logic, kinematic inference etc determining the most appropriate "dreamed up" adversarial virtual race "strategies" to implement in any particular context, for a much more dynamic and uber exciting driving model akin to something straight out of a Knight Rider rider television episode from the inimitable 80's lol
@@swait239 No.... the side view display which shows the blind spot areas is what I am talking about. My Honda's blind spot camera is mounted on the right side mirror and it is a life saver. It is shocking to me that Tesla doesn't have this basic 20 year old feature. ( I know they have a blind spot detection warning. I want to SEE, not have a warning! ) And the back up camera is lousy. Honestly... for a company whose future is built on cameras.... they should do much better. What would it take to display the blind spot areas on the screen full time? All other traffic is already on there. Blind spot traffic is the most dangerous and causes the most accidents. As a motorcyclist, I am very aware of this when operating around vehicles. Long rant. I have a Model Y on order but am unsure if I will take delivery. Perhaps the CyberTruck will have blind spot camera coverage.
@@eugeniustheodidactus8890 dude Tesla is not 20 years behind ANY feature. In the net, they are ahead so many years. Do yourself a favor and take delivery of the Y. And vocalize your ideas to Elon and twitter and your going to be surprised by how quickly your car improves
OK, I get it. But surely this is all well known and being done by everyone in the AI community? Mapillary does this with my photos. Admittedly 1 per sec and the queue takes a few hours, but it looks identical. Theirs is automated too. They don't have egineers inspecting them. Mapillary is only interested in static objects, street furniture, but it works. If I drive down the same street it recognises the same items and tags new ones with a date stamp. Other than speed, what is new?
Sidewalk chalk artists can draw an image that, from one angle, appears to a human as a three dimensional scene, like a crater in the sidewalk or a lamp post, for example. How would Tesla’s vision-only system see these drawings, assuming one were drawn in the middle of the road such that the vehicle would be approaching it from the perspective that yields the 3D effect? I’ll admit it’s an edge case, but one that someone could deliberately create (perhaps on a sheet of plastic that could be rolled out on a street for the express purpose of messing with autonomous vehicles; a way to shut down a robo-taxi but allowing human-piloted vehicles to pass). I’m thinking future Luddite protesters who don’t want to see the autonomous future arrive.
One of the best ever.
- "if you remember linear regression ..." - You are certainly addressing a small data set of folks!
- "things" (countable) and "stuff" (not countable) - I need to work this into a conversation to seem even geekier.
- "NI" (natural intelligence) Another geeky thing (countable) I learned today.
- panoptic segmentation of video plus face recognition becomes scary.
First, thanks! And second, panoptic segmentation and facial recognition is indeed a scary idea--and likely coming to a public event near you sooner rather than later. Oh joy.
My first naive reaction was I thought the autolabelers were doing this already.
It is amazing how well the cars are driving using a boxy vector space representation.
Although panoptic segmentation would be useful for a car it will be essential for a Tesla bot in an interior space.
For example, one cannot sit in a chair; if all you know about the chair is a bounding box!
Which way is the chair facing? Does it have arms? Is the chair stable?
Panoptic segmentation will go big in AGI( Tesla Bot). Unbelievable amount of datasets and data classes.
This was my first thought as well.
Very helpful and informative. Your detailed explanations allow me to wrap my head around this stuff.
Stuff or thing?. That is the question 😉
The data set is infinite because it has literally no end, it is continuously generating data 24x7x365. The moment you think it has a finite size, it just got bigger.
Great video! I started playing While True: Learn on PC thanks to being free on the Epic Game Store right now. It really does a good job explaining basic image recognition and machine learning. Also, it has cute cats, lol. I actually thought the cars were doing some kind of panoptic segmentation for parking lots and driveways already but hmm. Might need a server in the car then. Tesla keeps moving forward while others are just creeping along.
This type of video is where you really stand out from the crowd. FSD videos are a dime a dozen but something like this... fantastic. Thanks for the content.
I knew I was a technical expert. I can now quote you saying that “things” and “stuff” are technical terms. I use them all the time 🤣🤣
Excellent video, and the first I’ve seen that helped me understand panoptic segmentation, without me having to do separate research after watching 👏👏👏
Great explanation.
This looks to provide an AMAZING “ground truth” reference from training clips after some pre-processing to generate it. Spectacular!
Right? It's gonna be NUTS when this gets working properly!
Another really interesting and informative episode.
I also want to praise you for the audio level between your show and the ads YT runs. Other channels make me scramble for the remote to hit mute and make me somewhat ticked off.
I figure I should comment when it's good not just complain when it's bad (the others).
Thanks for the excellent audio and the great shows in general.
Neural net does an amazing job of mapping the surroundings. The errors seem to be with the path planner. It pulls into the, appropriately labeled, left turn lane when it is not turning left. Heads straight for a barrier (also labeled on the screen).
I hope so re path planner but Maybe we will also need Hardware 4 or maybe Hardware 5 before we have enough onboard compute power and / or cached memory to process all the information and driving instructions quickly enough for true FSD.
great breakdown on the meaning. Love the 'stuff' understanding
Real world question, is it going to fix phantom breaking?
Love this video man, great stuff
In biological vision segmentation starts in the retina because of the way eyes capture light information. That information doesn't come in as a singular clump of values as opposed to discrete subsets of signal values corresponding to various optical wavelengths. The importance of this is that there are separate networks of neurons representing all of the various details of anything you see without the explicit need to "segment" it after the fact. The higher functions of the visual system are stitching together these groups of signals into a coherent "mental picture" that is what we call visual perception. But that stitching is not based on matrix multiplication of all values as each set of signals maintains its own separation and comparison space for later "reasoning".
So for example, the most obvious example of this and why it is less computationally expensive is in color detection. A person can easily pick out the distinct color regions in an image because they represent distinct signal values within the neurons of the brain. And this is something provided very efficiently in the human visual system. But in computers, each pixel is a r, g, b value and there is no simple way to pick out a distinct color because the computer doesn't know what color is and all pixels look the same from a data perspective. There is no separation of images into various optical color components in computer vision, which means you have to do per pixel processing to distinguish one pixel from another and to assign them to logical subsets of values. And even beyond that, there are rods and cones in the eye, so not only do colors exist in distinct color spaces in the brain but also corresponding to that there is also the greyscale signal that is captured as well. This is why it is so easy to recognize the silhouette of an object as the same object from a previously viewed image, because that greyscale "segmentation mask" is automatically processed in its own signal space along with the color information.
Checking out your thesis now. Thanks for the upload 👍😎
I would like to know the process for calculating turn radius and corner speeds.
Karpathy is correct with infinity sized dataset, as everything is constantly changing, different cars driving around different time of the year, leafs on the road, different traffic constellations and behaviors etc.
True. It's more a semantic thing (lol) for me. Infinity is a technical term in this instance. But we're splitting infinitely small hairs at this point :)
FSD beta isn’t even working at all anymore. Hasn’t for the last week. At least for me. About a week ago it stopped working altogether. Every once in a while you can see the visualization attempt to shift into FSD beta mode and the steering wheel icon pops up like it’s going to work, but it only lasts for a fleeting second. Very weird behavior. Any other beta testers experiencing this?
Seems like bounding boxes are really ok for driving, just give them a bit more polygons, but the panoptic will do wonders for the bot when it needs to manipulate an object.
I do it too, but if we didn't swear when we are making a point it would actually help to get the point across. Thank you for all of your great videos.
Your NI missed the leashes! ;-)
Great explanation, thanks a lot!
Wonderful. Thanks so much for helping us to understand this.
They should push unknown object frames to the driver to identify at the end of a trip, crowd sourcing labeling would be more accurate and free to tesla
Checking auto labeller videos at Tesla may be tedious but no one takes the wrong snacks out of the lunch fridge after seeing the world in hyperlink mode😀
Dr. Know-it-all, the F-bomb within the first 15 seconds of your video? . I was shocked! Is that like you?
Great video, nonetheless. Keep up the good work.
I personally feel that the thing most likely to continue to nag FSD - in terms of getting it to wide release - is temporal memory. I know they are working on spatial and temporal memory, but I don’t think it is long term enough. Humans make decisions all too often based on what they have learned long ago and apply lessons from past experiences to present situations all the time. I think FSD struggles if it can’t do the same.
what’s spatial and temporal memory?
I think you are confusing definitions of memories, temporal memory remembering things from long time ago?
you made a new subscriber! great explanation.
thanks for the explanation!
All good stuff but I contend on board memory will be required. Example, learning my gravel winding driveway and sharp right turn into the proper bay of my two car garage. GPS and mapping can only go so far.
Great explanation thanks so much!
Thank you for explaining this, I understand it a little bit better than before.
I understand your use of the Cinematic Mode on the iPhone but it looks fake and reminds me of Zoom Conference background replacements. Maybe if the blur was not as strong it would look more natural.
Again thanks for the enlightening content.
Aspects of this will make certain parts of FSD better quickly, but there's also quite a step down in performance when building a new generation / major version upgrade. So I don't think it will be 10x better out of the box, however, some steps in the right direction.
You're my favourite channel
We need a part 2 with diving deeper into that AutoLabeling Stuff.
I wonder, in panoptic segmentation, how do you track a specific thing like a car or a pedesrtrian between frames (? I mean, it is kind of easy to assign an id to a thing in a frame, but you have to be consistent between frames, if that matters somehow, And I think it has to be relevant, because you care about different instances.
That must be also a difficult task right? I see a lot of problems to solve about doing it, like if you want to track maneuvers to avoid collisions (?
I guess it is infinity-sized in the sense that they can grow the unlabelled dataset as fast as they want and on demand, probably only limited by technical reasons like available bandwidth to each car.
It may be a distinction without a difference, but I would argue that if there's any limit, it's not infinity-sized.
It is not a set containing data that ha infinite size. It is a sequence of data, and that will not end.
I did some checking and guess what.? You're spot on.!
Thanks. Mailbox contain information on them..
I think tedious is the word instead of thankless. I’m sure the labelers likely get positive feedback at Tesla.
Ha-ha, Yes Doc, I really enjoyed this Video!
Cheers,
Eric
Ironically I go a new Roomba J7+ and they integrated a forward facing camera on the vacuum. The vacuum uses the forward facing camera to identify obstacles and avoids them, the main purpose being for that of "seeing" dog poop on floors/carpets and not making a complete mess by spreading it around the house. Users can opt in to have the vacuum also send pictures of other obstacles so Roomba can work on labeling them.
The age of AI vision is beginning to creep in to other smart appliances slowly but surely.
I wondered how they would solve the poo/vomit problem.
Must be a lot of variants from runny, soft serve, and cigar, along with the miriad colour variants!
Maybe they'll include a sniffer for edge cases one day.
Thanks for the great masterclass.
Really great piece, but there are so many ads that it’s very distracting.
Amazing Explanation, can’t wait till auto labeling programs can finally get the job done faster saving the auto labor time to focus on verifying the process and so they can focus on other task. The sooner we optimize this process in tandem to a faster computer DOJO, we can to finally have the best combination of tools to make FSD 10x and have the Tesla Bot up and running. What a future Tesla will have.
Great episode! This is your strong suit, and as a fellow FSD beta user I learned quite a lot. Thanks!
Very enlightening, thank you. What proof have you that all Teslas are uploading all their video data, always?
Did he say that? Tesla’s definitely don’t do that
Labeling will only answer part of the problem, of course. Another problem is that drivers as well as computers need to become better "mind readers" or to be able to more reliably predict the intent of other drivers and pedestrians. Labeling alone is not enough. A rolling ball suddenly appearing on the roadway could mean a child may be momentarily running into the street to fetch it. A pedestrian who is standing at a corner and looking at you and perhaps smiling is intending to cross the street even though there is no crosswalk. Another pedestrian may be crossing the street in a crosswalk and looking at his or her cell phone intending to continue walking, no matter what. Humans are "scientists." They have their own theories about how the world is and are constantly making predictions in order to survive. Most people are relatively good scientists and some are really bad scientists. The bad scientists often die early and regrettably win the Darwin Awards and don't get to pass on their "defective" genes. The so-called good drivers must learn to read the intentions of others and this boils down to computers making sound predictions about the predictions of others. These are software problems that are basically optical and psychological in nature and should be solvable eventually. But I think that the problem of intent, not just labeling, is a key problem in achieving full self-driving. I enjoyed this video very much. Keep up the good work.
Seems like this could be very useful for traffic cones on road, I remember seeing. a video where someone had to take over to steer through cones perhaps the bounding boxes made the car think it couldn't get through?
Its useful for everything around drive area. From top to bottom. So yeah. Makes teaching AI easier, faster and cheaper.
Thanks for the info! :)
Why is the tweet not linked the description or am I blind? 😕
Great video.
Question for you DrKnowItAll, can you find out
anything about whether it is a single NN that
does the inference in the car, or whether it is
made up of multiple components?
It is a large NN that is made up of multiple parts, some repeated. Not sure what you mean by component.
You will find a lot about it in the recording of Tesla AI day, including the structure and parts of the NN.
Take a look in this section: ruclips.net/video/j0z4FweCy4M/видео.html
I ask myself the question if identifying an object from static images is not less efficient than using a sequence of images. I think that using images spaced out in time first helps to determine if the object is moving or static. It is likely that by tracing the movement of 3d surfaces of a given color, it will first be possible to make a first selection of objects or rather surfaces which may or may not collide with the intended trajectory of the vehicle. This will therefore already eliminate the step of identifying objects with no chance of interacting with the vehicle. On the other hand, the identification of objects such as indicators (stop sign, red lights, speed limit, person managing the flow of vehicles) will have to be analyzed even if they are not in the path of the vehicle. However, to make a first selection of objects which may or may not intersect the vehicle should in my opinion make it possible to lighten the treatment of artificial intelligence.
This is probably what the servo does, it seems to me that it identifies moving objects more quickly. For example yesterday I was reading and my attention was attracted by a mouse which had the audacity to pass in my periferic field of vision. Hey yes I live in an old house in rural Switzerland where it happens.
It is also probable that using a time analysis will allow us to discriminate bizarre situations like a panel representing an object and the object itself which is probably difficult if one relies only on image processing.
Sorry for my aproximarif English and thanks to the google translator.
P.S .: Thank you for the always very informative videos.
Great explanation!
Just a small point. In your Description of the podcast, you are still offering a 1,000 Free Supercharger miles if someone would use your code number. I believe you need to update this as Tesla no longer offers free Supercharger mileage. Like I said, a small point. Enjoy learning from you.
I'm actually curious about your thesis. Is there a version that isn't behind a paywall? I really like your videos, btw. Thanks
DKIA is a cool dude 😎 👍
Very interesting video. Of course you could massively increase the number of RUclips hits if you changed the title to "Cute Kitty's and Tesla AI using Panaptic Segmentation etc!
I went into this video thinking. Hell ya ill understand this. No problemo. Now im not so sure!! lol. Great video though. I think i may rewatch it a couple times.
First identify the each instance of important information differentiated from the background stuff
You have the same 'old-school' Calculus text that I had/used for I, II $ III.
Your dogs are lovely!
I find it incredibly funny that one is named Diesel considering the general theme of this channel!
I laughed when I hear u say Diesel.
best discri[ption I have seen /
I’m now even more convinced that FSD won’t be truly finished until Tesla has developed and deployed HW4 or even HW5.
If you are correct. I guess Tesla would have to upgrade the vehicles for those of us who have paid for FSD? I hope so.
HW5 for sure
@@audunskilbrei8279 we paid for it ! we have to get the feature, its teslas problem how they do it !
You might be right, but this would just release a lot of manual labelling and checking if auto labelling is correct without over fitting the model. Great video on explaining the panoptic segmentation! 👍
@@audunskilbrei8279 absolutely
Nice work … great get … at age 72, I love discovering new concepts … nerd heaven 😃
Enjoying your videos.
Watched the AI presentation. And watched a lot of FSD real world issues.
As a game developer, I think there is a missing step in the algorithm. In games we divide the navigable surface into a tagged nav-mesh. A geometric 2d surface, which is tagged into discrete regions to help the AI move about.
I think Tesla AI is not computing the equivalent. It is not permanently tagging navigable regions with "bike lane", "right turning lane", "parking area","tram tracks" and so the route planner ends up making occasionally bizarre choices.
It does something similar, but seemingly on an ongoing basis, so it forgets these classifications 15 times a second. The thrashing noodle seems evidence of this.
The system seems to be performing a fresh evaluation of the entire scene with each frame of vision data, rather than doing less work and merely confirming the previous segmentation decisions.
I raise that here, because the segmentation you describe in the video is a similar task.
panoptic in this case actually means ‚across multiple cameras‘ (imo)
Well done.
infinity-sized means the never ending data streaming
Dog0, Dog1, … DogE - looks like a long series to me. Interesting to note what it is leading up to ?
Creating the dogz coin now
@@andrewpaulhart I think the the series would be in Hex notation, which would mean DogF would be the highest single digit suffix.
This really looks like prep work for Tesla Bot. :D
If you were saying 'What the f... does that mean?', can you imagine what we were saying!!! 😉
Can the system be fooled by a photo or optical trick painting?
Good question. I think since they're working so hard on their simulation engine, that it can indeed be "tricked" (or trained if you prefer) by virtual images.
I'm not an expert, but yes - at least at the beginning phase.
Can humans?
If a data set grows fast than it can be processed doesn’t that count as being infinite?
I guess quasi-infinite is the most accurate.
Video starts at 5:00
At what point will tesla be a better driver then KITT from the Knight Rider tv show?
I don't know if anyone noticed, but the whole video is generated by AI. The person talking is not a real human.
Times up for FSD.
They fiddling with things that they should have a year ago.
I've turned off the Grandma Walton test for FSD.
A refund of the money I gave them over two years ago has been demanded for reimbursement.
maybe, one day in the far future, EElllllkon will learn to DELIVER
Where is all this data stored? I don’t think Dogo is up and running yet. Oops, no Dogo yet.
Dojo is not a data storage system, it is a processing system. Data is no doubt stored on hard disks.
I clicked like when you reminded me. Some YT creators ask for an early click because apparently the YT algorithm likes early clicks
I think you’re completely mistaken about the “slowly moving viewpoint” and 60fps “shutter speed”. I think Karpathy is really meaning _slow_. As in walking speed slow. Otherwise, the 1/60s _equivalent_ shutter speed (probably around 1/120s as per video standards) would yield really terrible blurring. Even if Tesla goes beyond typical values and pushes the hardware to something like 1/240 shutter speed (frame sampling period), which remains to be seen, it is still pretty terrible from a photography/motion blur perspective.
Oh, at least now I know what 'cat'egory is all about :-P Stay safe and Ty vor the vid.
Mapillary using this since 2013
"Make a video in which you are reading an article" ...youtuber skill set 2021
funny with a snowmobile in the rear view
Erm Dr Know It All...ought to change his name to
Dr Knows A Bit!
It is an infinite data set available to them because we live in an infinite universe.
Cat-e-gory especially when you hit them
"it's a simulation."
That simple.
1sec at 60mph = 88 feet or 1000 inches.
Tesla cars need a dream mode for when they are not driving. Whereby they use their idle compute to replay their last driving scenarios over and over with monte carlo style variations of the vector space representation, to better grok the salient and more egregious deviations of real space experience from the AI model's nominal extemporaneous predictions at the time. Then all the "sleeping" Teslas should network with one another and adversarially virtually race these scenarios to find out best solution space to further pinpoint issues , and then upload it all to Tesla autoCloud Inc,. to hone an executive second order meta model which symbiotically steers the evolution and training of the core AI FSD beyond just products of auto labeling, routing logic, kinematic inference etc determining the most appropriate "dreamed up" adversarial virtual race "strategies" to implement in any particular context, for a much more dynamic and uber exciting driving model akin to something straight out of a Knight Rider rider television episode from the inimitable 80's lol
*Just think..........* all of these amazing software engineers and yet they cannot give us proper blind spot camera coverage for the Model Y.
What is missing with the Model Y? Is there a blind spot that can not be seen by any camera? That can not be solved by software.
@@vsiegel The cameras see everything. It takes software to display those images on the UI.
Are you talking about how , on reversing the car, it doesn’t alert you if someone is driving down parking lot fast?
@@swait239 No.... the side view display which shows the blind spot areas is what I am talking about. My Honda's blind spot camera is mounted on the right side mirror and it is a life saver. It is shocking to me that Tesla doesn't have this basic 20 year old feature. ( I know they have a blind spot detection warning. I want to SEE, not have a warning! ) And the back up camera is lousy. Honestly... for a company whose future is built on cameras.... they should do much better. What would it take to display the blind spot areas on the screen full time? All other traffic is already on there. Blind spot traffic is the most dangerous and causes the most accidents. As a motorcyclist, I am very aware of this when operating around vehicles. Long rant. I have a Model Y on order but am unsure if I will take delivery. Perhaps the CyberTruck will have blind spot camera coverage.
@@eugeniustheodidactus8890 dude Tesla is not 20 years behind ANY feature. In the net, they are ahead so many years. Do yourself a favor and take delivery of the Y. And vocalize your ideas to Elon and twitter and your going to be surprised by how quickly your car improves
OK, I get it. But surely this is all well known and being done by everyone in the AI community? Mapillary does this with my photos. Admittedly 1 per sec and the queue takes a few hours, but it looks identical. Theirs is automated too. They don't have egineers inspecting them. Mapillary is only interested in static objects, street furniture, but it works. If I drive down the same street it recognises the same items and tags new ones with a date stamp. Other than speed, what is new?
My brain hurts so much...
He is so prolix. We could understand it all in one quarter of the time.
Sidewalk chalk artists can draw an image that, from one angle, appears to a human as a three dimensional scene, like a crater in the sidewalk or a lamp post, for example. How would Tesla’s vision-only system see these drawings, assuming one were drawn in the middle of the road such that the vehicle would be approaching it from the perspective that yields the 3D effect?
I’ll admit it’s an edge case, but one that someone could deliberately create (perhaps on a sheet of plastic that could be rolled out on a street for the express purpose of messing with autonomous vehicles; a way to shut down a robo-taxi but allowing human-piloted vehicles to pass). I’m thinking future Luddite protesters who don’t want to see the autonomous future arrive.
The other day I was driving and AP thought a shadow was a driving direction white painted arrow.
I literally hear him talk - michel Foucault
60 mph = 1 inch per millisecond.
I actually think that's not the way to go. I might apply to tesla so I'll refrain from explaining why for now.
Wonder how it reacts to dead animals on the roads. Probably not trained for it