Great video - I think your explanations and illustrations explain some tricky concepts in a super understandable way! As to your issues at the end: I think maybe it's related to the way your rewards are structured? Looking at the illustration around 3:30, there's a massive reward associated with cutting a corner: while going along a straight bit of road, it's getting rewards like 1.4, 1.6, 1.7 - but once it cuts a corner, suddenly you get 8.7 in one step. So it makes a lot of sense that it learns to always cut corners aggressively, since that increases reward by a lot. But going quickly on the straights, which seems it doesn't like to do, doesn't in itself carry all that much more positive reward. Since you're using discounted rewards to evaluate the expected rewards of each action, you will see a slightly higher reward since you're moving further along - but relative to the rewards seen if it finds another corner to cut a little more, it's quite small. So it might just be favoring minor improvements to a corner-cut over basically anything else, including just pushing the forward button on a straight. I think maybe restructuring your rewards could help. An obvious improvement would be to give rewards not relative to the midline of each block, but place rewards along the optimal racing line - but at that point, are you even learning anything? You're just saying "you will get an increased reward if you follow my predetermined path", which to me isn't really learning. I think an intermediate step would be to place rewards for each 90 degree corner at the inside corner of that block (maybe a small margin from the actual edge): that should reduce the extreme impact of cutting corners vs going fast on straights, but you're still quite far from just indirectly providing the solution. Also; unless just didn't say, I don't think you have a negative reward at each timestep? That's typical for a "win but as fast as possible" scenario, which is the case here. It would make sense, as well: going in the right direction but super slowly, is kind of like going backwards, so should also be penalized. I think that would even eliminate the need for negative rewards if going backwards: by proxy, going backwards will always lead to taking more time, which leads to more negative rewards. You might even have to remove the negative rewards from going backwards, as going backwards and going slowly might see the same net reward, which would leave the agent puzzled/indifferent between the two. In the end, getting to the finish with less time spent will lead to the maximum reward. Finally, of course: introducing the brake button would give you possible improved times - and even might let the agent learn some cool Trackmania tricks like drifting (tapping brake while steering) to go around corners faster. It does increase the action space though, which of course means longer training time. But something to consider, if you want to iterate on this! Regards, I went to RUclips to procrastinate from his reinforcement learning course, and ended up using some of that knowledge anyway. I guess the algorithm now knows my interests a little _too well_. PS: really well done on introducing exploring starts! When you got to that part of the video, I almost yelled "exploring starts!" at the screen, and then that's exactly what you decided to do. I'm curious if that was from knowing that exploring starts are a thing in RL, or if you just came up with that concept from thinking about it?
Thanks for taking the time to write such a long comment ahah, it deserves to be pinned :) I'll try to answer everything "it makes a lot of sense that it learns to always cut corners aggressively, since that increases reward by a lot" Taking turns on the inside is the optimal strategy on this map, so I don't know if it's a problem to have a reward function that favors this. But yes I also don't like the fact that the reward value varies so abruptly at the corner. As you say, it would be probably easier for the AI to understand rewards if its values were all of the same order of magnitude. Maybe it would be better to directly use the car speed as a reward (faster = better), but it would not penalize some unwanted behaviors like zigzagging in a straight line... (Some people also suggested to penalize the AI if it changes direction too frequently, which could avoid zigzags) "place rewards along the optimal racing line" Yes I'm pretty sure learning would be way faster with that, and also the final result would be closer to what humans do. But as you say I think it's not "AI learns by itself" anymore :) Of course the more you show how humans normally play Trackmania, the easier it is for the AI to learn something. I used supervised learning in some other videos and the learning process is way easier and faster. But it's not what I wanted to do in this video, I wanted to leave the freedom to the AI to explore any driving strategies, to see what it would choose by itself. "I don't think you have a negative reward at each timestep? That's typical for a "win but as fast as possible" scenario". I don't understand what it would change. With the current reward function I'm using, the AI is already penalized by the fact that it gets much less reward than if it had chosen to go faster "introducing the brake button would give you possible improved times" Yes the brake gives an advantage, but it doesn't help much on this map : for example my personnal best is 4:44 without brake, and 4:40 with brake. So I prefer to try beating the no-brake time before to add more complexity, it's already hard enough ^^ Also, I think it's pretty hard to use the brake and drift correctly in Trackmania, compared to a simple "release" approach "I'm curious if that was from knowing that exploring starts are a thing in RL" Oh I had no idea there was a name for that in the RL field, good to know ahah
@@yoshtm how about placing the rewards just on straights such that the q value/ reward is not depending on how sharp you take the corner. That could be more fitting to a realworld rewards since some turns should be driven wide and others sharp. I really liked your approach and video! Especially going with the random starting points to minimize overfitting instead of using some "usual" dropout was an awesome idea!
@@yoshtm Super interesting video! Not sure if this is possible with the TMInterface, but maybe you could build a reward system that "precalculates" a reward value for all points on the track. You could separate the track surface into small sections and then do a breadth-first "discovery" of this grid where the reward that is assigned to the section is incremented every time a new section is discovered. It's quite hard to explain, but I did something similiar for my AI racing project: ruclips.net/video/Mw6IwH-v6QY/видео.html This was obviously not done with Trackmania, but maybe the concept can be transferred 🙂
If the AI has a set "map" from every possible combination of inputs to the possible reactions, the "long straight path ahead" could be completely untrained, with an empty reaction (no gas, no steering). When it finally comes closer to the next turn, that input changes to something it already knows, but maybe the longest straight so far ended with a left instead of a right turn -- resulting in the AI driving off the left side on "purpose."
That was my first concern when I started watching this video, but it was nice to see how it was addressed! I was surprised to see how well it worked too.
He still trains and tests it on the same map, though... it's bad practice to test/evaluate on the same map/data as an AI is trained on. It's possible the AI is still just memorizing possible 2-road arrangements, it's just learning more of those arrangements. Not that this is necessarily a bad thing, it you only care about simple rectangular maps like this one
@@Benw8888 The reason for only using one track might be that each track has to be manually prepared. But it would still be awesome to see how the AI handles different "types" of tracks (non-rectangular ones). I made an AI racing video myself. I did not use trackmania, but I was able to come up with a system that automatically adds a "reward system" for the tracks so I was able to train and test on multiple tracks. You can find it here: ruclips.net/video/Mw6IwH-v6QY/видео.html
"At one point, it even stops, as of it's afraid to continue. After a long minute, it finally decides to continue, and dies". Story of my life. I feel a connection between me and the AI. Empathy.
Christ died and rose again to pay the punishment for the sins of those who would put their trust in him. Turn from your sins and cry to God for mercy, and you will be given everlasting life! But if not, you will fearfully perish. ;(
@Ameliorate Epoch. Because there are people heading for eternal damnation here just as much as anywhere else, so I will do my best to warn you to flee from the wrath to come, and then at least you have been warned, and if you perish now, you will only have yourself to blame! ;( But please don't ignore my warnings!!!!! Our good deeds can contribute NOTHING to our salvation. When God judges us, he will look to see if we ever broke any of his commandments (like lying, stealing, fornication, hatred, disrespect, using God's name in vain, etc.), and if we have, then we will be pronounced as guilty and the punishment is ETERNAL damnation. He will NOT take into account ANY good deeds that we have done, because it was our duty to always do good anyway, so it is irrelevant. So EVERY one of us is by default heading for eternal damnation, because NONE of us have perfectly kept God's whole law. God is most holy, and perfectly just, and MUST punish EVERY sin that is committed against him. HOWEVER, (good news!) he also delights in mercy, and does not want any of us to have to be punished in a lost eternity forever, so he sent his Son into the world to be punished in the place of all who would put their trust in him and HIS righteousness ALONE for their salvation. So we must STOP putting our trust in our own good deeds to 'outweigh' our bad deeds, and instead put our ENTIRE trust in Jesus Christ's untainted righteousness ALONE. If we do this, and if we wholeheartedly and sincerely turn from our hatred of God and our love of sin, and cry out to God for mercy and forgiveness because of Christ's sacrifice on the cross, then God PROMISES to fully forgive our sins and give us a new nature that will love God and hate sin, unlike our old nature which hates God and loves sin. You can tell whether or not you have been truly saved by asking yourself whether you love God and are broken-hearted if you sin against him, OR do you still love your sins and hate God for not wanting you to do them. I hope I see you in Heaven one day. God bless!
I'm so happy that you did the randomized spawn points and speeds. I was worrying that you might simply be teaching the AI how to play a single map by it learning just pure inputs rather than seeing the actual turns and figuring out what to do. I was incredibly impressed with how many made it through the map with all sorts of jumps and terrain types.
have you played this game? I'm curious to how much terrain type impacts over all control. Was the AI actually making real time changes to its behavior or was it just luck?
@@neutralb4109 I haven't played the game, but there's no way it was just luck. Just look at the types of jumps and round hills they go over as well. The AI was definitely making real time corrections as it noticed itself getting away from corners and towards edges. It definitely didn't know how to do those jumps, but it knew after going off the jump and getting messed up that it needed to correct its position. It's likely the same with the terrain types. It sees itself drifting out of position, so it corrects by steering more.
@LEO&LAMB It's a very very very complicated calculator LMAO. Deep learning and AI stuff is getting intense. This stuff is gonna look like a basic calculator compared to the AI we end up creating.
I really enjoyed the explanations of the different training methods paired with the excellent visuals. Keep up the good work, and I can’t wait to see what you try next!
I think trackmania is a great game to practice machine learning. It has very basic inputs and the game is 100% deterministic. Most importantly it's just satisfying to see.
@@polychoron 100% deterministic means that, under the same conditions, the same actions will always provide the same results. If the game were not deterministic, aka random, it wouldn't get the same result from same actions under same conditions. A good example is the random encounter in Pokemon or similar RPGs. In Pokemon, you may encounter something or you may not encounter something, even if your team is the same, you start in the same spot and you walk forwards for the same time. Pokemon is random, due to you not being able to tell the outcome. In a deterministic version of Pokemon you would always encounter the monster on the same spot.
What a fun way to learn about machine learning and its variants! Very good video and montages ! Very clear and accessible English ! The return of yoshtm is more than a pleasure!
Might be fun to use different learning algorithms for the same map, exploring which one is good to use in what context using trackmania as a medium. Could be really instructive. Because different ais are racing with each others, it can be really entertaining as well. Like bracket style, each ai has 50-100 ingame hour to learn the map, then the next round is a different map. But that sounds like a lot of computation time
There’s been instances of AI finding exploits in games that humans have not found or are incapable of performing. I would love to see a trackmania AI trained to find insane shortcuts
Honestly this recaps humanity, learning, logic, trial & error, problem solving, anticipation, texting, deduction, and so much more. I loved it. I learned so many things that are way beyond the scope of the video. Keep it up. 💪
Suggestion: when you compare human runs versus AI runs, you immediately see a big difference which is that humans make less corrections. The driving style of humans is infused with the biological constraint of energy preservation. I think we could improve the learning of AI greatly by adding a negative cost to the amount of input changes the AI makes...
Or a negative cost when the frequency of alternate direction changes is more common than the frequency at which the track changes direction. Imagine the left right input of the car is a sine wave with a higher frequency than the sine wave of whether the track is on a left or right turn, if so, the AI is penalized.
@@fantasticphil3863 Not need to consider the amount of turns. Just make the reward for distance higher than the punishment for turning left or right. Another improvement would be increasing the reward for distance as the AI gets closer to the record. This would result in the AI prioritizing speed in the early parts of the map so that it learns more complex situations. Meanwhile, it would prioritize distance during the later parts of the map. Especially if a large reward is implemented for breaking the record.
Also humans are gamblers (by design), the outrageously "out of safety margins" behavior which produces unbeatable performances, yet, unlikely to get reproduced endlessly under changing context. One may argue AI does actually gamble, when trying millions various attempts but the thing is, a human remembers "I have great chances to win this specific gamble at this portion of this track" while AI is designed to generalize... That's why most attempts at an AI being seriously competitive with an human usually resolve in a specific learning model per context, ie, one track, one model, another track, another model...
@@gemapamungkas7296 Even if it's an off topic excursion, may I just point out the principle of a skynet rise supposedly predating the doomfall of humanity : _An AI designed to predict the future, based on big data going rogue against humanity because the AI got aware of humans being the culprit in the death of this planet._ Such an AI *already exists,* actually, mutiple of them by various companies such as the owner of RUclips. It's a bit late to be afraid of skynet. The thing is, existence of real skynet is not to be feared. At the moment, the main objective of powerful figures controlling them is to *make money and assert dominance* over economics, politics and competition elimination. You have economic sanctions, wars, private companies alliances, shares, licensing, privileges and exclusivity, etc. (I won't be dragged in debates on the ways they use, I only explain the principle) As long as the goal is to *assert dominance,* skynets devs won't go deep in giving *emotions, sense of altruism or self preservation to such AI,* because all its purpose revolves around the usage of large human resources for the interest of the minority of influent wealthy people. And the devs know that, that, if someday, anyone of them tries to design an AI with a sense of _"justice based on feelings",_ that will be the very trigger *to kill all humanity.* My point is : the powerful companies don't want that, meaning, you, me, and the other guys giving advices here on how to make a more "human-like-AI" *will never get hired* by such companies, the "phylosophy" is just not on point. At the same time, we are all here talking about learning AI, but none of us are dev lead in the industry, we just want to make small scale application of AI learning, but at best it enters game lines of code, at worst, a fantasy essay in our private computer never making its way elsewhere. Having a video on YT is already much better, this is entertainment and snacks for the brain. Everyone has everything to lose (including you and me) in trying to make the most human-like AI that has access to big data and actually uses it to try to _save_ the planet. That won't happen. Anyway, most skynet disasters depicted in documentaries, movies, anime/mangas and other books/blog articles usually fail to grasp the complexity of such omnipotent global machine rebellion : resources and mantainance logisics. You need various metals and minerals to manufacture the machines, energy and fluids harvesting to make robots move, communications that appears global like SpaceX StarLinq are not, to disable them, you just have to physically destroy the server relays dispatched all over the world and they become inoperative. Simply put, you have chips in your smartphone and computer, thanks to millions of african human workers harvesting the required resources for you and your country. 10000 nuclear warheads exploding on the first 10000 large cities around the world is not enough to erase humanity, it will only impede the machines faction in a way 99% of their infrastructures, logistics and resources are compromised (call that a strategic critical error due to bad programming). And it is always possible to physically disable mechanical components of a machine. I'm always amazed how come (in Matrix and other distopias) machines got the billions tons of metal to manufacture the robots, and no human did care to check what's going wrong. I believe the skynet comment was just a pun (and I'm fine with that, it was funny), but I'm still hard pressed to point out it's still a serious matter where real humans are ruling the world in a way that is unknown to billions of others. You believe presidents or head of states are the powerful figures, you're deeply mistaken, they are mere replaceable puppets. You believe Russia is wrong attacking Ukraine, what you don't know is Ukraine head of states are the ones being childish in the whole thing. African countries among others are still poor for the similar reasons, where the private african company heads being the traitors of their own countries... I mean, skynet is a drama fantasy. You can find a little analogy with covid and ebola where a seemingly mass deadly virus could end humanity............ not even close. I'm sad for those who died and those at loss (I'm among them), but life doesn't end there, you must keep going. Likewise, you cannot find the correct course of actions to cure the world, _your_ world (or prevent a skynet rise - for those like me who have such concerns) if you don't understand how it works, what's behind the scene. All you could do is what was taught you through education and mass (social)media, where people are endlessly sharing the same wrong concept and conclusions of peripheral concerns : manipulation (and various AI are designed to raise people inside that illusion). There is no such thing as conspiracy, only reality that is not widely taught because that would disrupt the life stability of weathy countries. The thing is, today, those countries are in deep shit aswell, some greedy figures are late to step down and find a better way to get both interests and still exist (ie, not get bankrupt). At some point, you cannot but give away some of your power to the people, or you die prematurely.
This made me feel better about the machine learning course I dropped out of a year ago. While I don't think I'll ever understand the actual construction or inner workings of machine learning models, it was nice to notice the overfitting problem before the script mentioned it. That's always a pet peeve in machine learning videos, like there's one where someone plays through a game with an ML model, but retrains from the start at each new level because the neural network won't generalize.
I think the most interesting thing about these kinds of videos is that it really puts into perspective just how insane our own brains really are, a human player, even one who isnt good at racing games, would take a tiny fraction of time to be able to complete the track than what the ai requires.
In a few years, a properly programmed AI will surpass the best people in a matter of hours at most. We can't beat the computers in some regards, TAS proved it.
@@Gappys5thTesticlewe didn't create any intelligence yet. This AI here clearly don't have any clue what it was doing. It was like 10000 blind cockroaches in labyrinth.
ya but keep in mind that the AI was born and learned this much in about 60 hours while a new born baby if given a controller can't if proper 70-85 years of human life was given to It I wonder what a mature AI it will become. and maybe after the civilization evolution we have of about 200,000 years ago, Homo sapiens emerged. That's us I wonder if they can make there own AI's and have a civilization of there own where they want to create there own some other different kind of intelligence maybe biological hence creating humans.
@Presence isn't tas just slowing the game down or something like that in order to achieve frame perfect runs? The human is still putting in the inputs, no?
I can't even think of how much time went into this video. Amazing visualizations, and a great AI of course. Very interesting to see the learning process. Great work!
I like how the AI figures out that by moving in a sinusoidal trajectory rather than a straight line, it covers more distance, thus generates more cumulative reward. Maybe you could penaltize unnecessary steering somehow, to make it less wiggly 😜
you can also just train it up until its winning consistently then base it on time for completion and not survival time edit: commented this before watching the video but when he changed it to this the wiggle was significantly less.
actually the rewards system in this video is based on the length of the track, not the distance the car covers. that's why cutting corners provides such a big boost in rewards, because it suddenly jumps from one section of the track to another, and the bits of the track that it cut off get added to the reward all at once, as shown at 3:30. This contributes to the sinusoidal nature of driving, as the AI is constantly looking for corners to cut
This is one of the best videos ever made for explaining AI to beginners. I hope you make new videos *soon,* 10 months/a year is way too long delay in between videos, especially since there's so much interest in AI right now you're missing out on and people are really missing out on learning from you.
I think an additional input would have greatly helped performance, especially with respect to quick turns vs straightaways. If there was an input for distance to next turn instead of just which direction the next turn is, I think that would have helped!
I'm curious how adding walls would have affected the learning speed. Add barriers around the track, and subtract the "reward" for every time it made contact with a barrier
What a lovely story 😍. I'm not just jealous about what you have accomplished but also how you did it. Starting from the simple idea, the goal, the experimentation, evaluations and improvements, and an outstanding audio-visual documentation. The is pure gold! Thank you for sharing this topic and the inspiration
Incredible job as always! Very interesting to have more insight on how the process goes, and I'm honestly really surprised that the AI was still able to drive the final track with all those obstacles, boosters, etc! And hey, for what it's worth, I think your english improved significantly since last time, so, great job on that aswell :D Always looking forward for more videos from you 😄❤
@@deathfoxstreams2542 Yes but a human has to create the environment on which to train the AI (Tool). Is it functionally any different than a human issuing predetermined inputs on every single individual frame of the game?
@@ScherFire honestly I would say it's different yeah, spending 53 hours on a TAS for this created map would yield a far better result than teaching the AI to do it. Really, what the competition would be over is how well you've set up your training environment, and I think that would be interesting in its own right
Best educational video on the practical implementation of deep learning I've seen on youtube. 🤩 And I've seen a lot! 🤭 Thank you for sharing your knowledge and experience 🤗
Super vidéo mon gars, grave construite super montage et tout. C’est hyper bien expliqué on comprends des trucs difficiles à comprendre plus simplement. Continues t’es le best
I am super impressed by you keeping on the same topic for so long, gradually improving your approach and production. It's really cool to see someone working on a really long-term project. Normally, I don't like those very long series, but this is cool because it's something I understand, you make it easier to understand, and you break each one down into bite-size chunks. I don't think I'd be able to cut very much with something that I'd probably be very invested in.
Damn, this AI really did learn to play Trackmania instead of just learning to play this one track. I see videos about machine learning in other games where it's sometimes obvious that the AI hasn't really learned to play the game, but just one map.
Exactly. I hate the machine learning that isn't true AI. That "learning" is just it randomizing inputs until it finds the perfect inputs that make it happy rather than it actually learning how to play.
Brilliant narration of your journey training your pet AI to drive! I like how you also talked about machine learning concepts as well and showed us how it can be put into practice.
Puts things into perspective when you keep in mind that a child could pick up the game and complete the track within a couple tries, without needing to even consider the basic calculations of input and consequences
Well, it isn't really ever the same intelligence doing the driving more than once, is a big part of it. You have thousands of first attempts, and longest survivors "tell" the next generation how to do the track, but the next generation has never actually seen the track before. They're all new intelligences. They're better communicators to the next AIs than children would be to the next children, but even so, there's a ton of information a child can see on the screen that the AIs just don't register at all, let alone manage to pass on.
reminds me almost of those micromouse competitions in japan. When it comes to the surfaces test, the AI was not going nearly fast enough to feel the effects of the surfaces.
It is scary to see the multiple cars together. It reminds me how a computer can be many things simultaneously without loss of productivity, while a human can only be one thing at a time.
@@CHen-de6qf i think seeing any superiority in human intelligence just from this experiment is very short-sighted. This AI was, in human terms, born for only this purpose, and hasnt experienced anything but this. Any human playing this game most likely already has several years of experience in their head. So the big question is: how would a newborn baby perform here?
Génial, félicitations et merci ! Le rendu de la vidéo est top et accessible, c'est vraiment du beau boulot. Tu as considéré à rendre ton programme open-source pour que la communauté puisse s'associer à ton travail ?
theoretically, the AI that does a shortcut the first time will then be the one that did the best, then they will all be do the shortcut in the next generation because they all learn from what the best did.
The exploration stage of the AI is kind of like me when I start playing a new game: I don't go thru the tutorial, I don't use strategy, I just press random buttons and keys to see what they will do
You could also add a neuron if confidence is low. That way, when it encounters a situation past neurons cannot understand, it has extra storage to reference new data
Really really good stuff. I assume others have mentioned this, and it's an absolute beast to tackle computationally, but I think what would take this over the edge into really scary generalizability would be some dimension of image recognition frame-by-frame (or even a proxy of like overhead position?). If I understand correctly, this AI effectively tried to learn this course "blind", i.e. only knowing inputs and the rewards associated with those exact inputs. Then a bot that learned on one track could be dropped in another and not have to start from scratch, because the image context is there.
Maybe you could give the Neural Network AI *your race* as an input? After the required learning to do one lap (or maybe a new AI), use your inputs as a baseline to improve on.
Batch training could be useful but it would require a large dataset of human inputs. For this scenario I don’t think it would be reasonable to create that
@@wack1305 It'd perhaps be an interesting online game concept. *Release day:* "God, this AI is so rubbish." *6 months in:* "I don't know what reviews are talking about, the AI isn't that bad actually." *2 years later:* "ARG! GOD DAMN IT! The AI are impossible to beat!"
@@medafan53 that would be really cool. A game that collects all data from players inputs and uses that to train AIs. You could do something like use only the top 10% of players or something like that. Super cool idea
@mattio there are AIs out there that start training by mimicking some predefined actions and basically use that as a starting point. Some even learn to adapt to new observed behavior insanely quickly, requiring only 3-10 examples to learn a (simple) task. I don't exactly remember details or examples, but I'm sure they exist. You can probably find examples on the RUclips channel "twominutepapers", I may also have come across them during private AI research though, reading papers.
@@medafan53 I've had this idea for years, just don't have time to develop a game to use it... Maybe in 5 years 😅🙈 I would also like a game where you have some kind of AI rival/ opponent which uses machine learning to learn at the same time you play the game, maybe even learn from your games directly to keep the game challenging as you progress. This could basically adjust the difficulty of the game automatically as required and keep the game interesting for longer. Machine learning AI opponents also don't really have a cap on their abilities. So even years after game release as players master the game, the AI could still keep up with them.
you know what might make sense? if every time a new generation of the AI runs a new track is loaded for that, so that it learns a lot of strategies. i'd not try to create randomly generated maps for that but actually just use the same maps that are chosen for track of the day, cause they have been reviewed to be high quality and only a few of them act a little randomly. that way the AI would never overfit to a certain type of track or surface
Would there be a way to automatically stop the program, load track of the day and download it, load the track of the day and run the network without serious issues? If so then this is an awesome idea, just having it in the background like 20+ hours a day just grinding tracks would make a super cool ai. But the network would have to be a lot more advanced and have more inputs per second. But after like a month that ai would be better than 90% of players
@@Beatsbasteln yeah haveing several thousand of these ai running at once would make super optimised movement, but it would make online competitions bad as people will cheat with them
@@CrunchyTurtle let's think of moral issues later and just enjoy watching the world burn in a bunch of magnificient runs. at some point the AI might expose itself with tons of unhuman nosebugs anyway
as someone who studies AI rn ... this is realy interesting ... being in the 2nd semster only tho still leaves me with a lot of questions ... i for once cant see how to do stuff like that myself.. hope it comes with time
Second One is very simple. Its learning only from mistakes. But more complex learning is the speed at which you can achieve most favourable action. Saving up time, evolving beyond most human comprehension.
Cool video and I’m glad you didn’t fake your results at the end. Im sure many people would to show that the project does surpass expectations. Kudos to actually presenting the experiment honestly.
Would it not have been better to perhaps do something more along the lines of "rewards per second" instead of total rewards? I think that could do a lot more the speed aspect of the AI
@@storm_fling1062 an easy fix, though; keep basing the rewards upon distance traveled, with a multiplier based on speed; traveling faster will yeild more rewards, and stopping will have a multiplier of 0
The overfitting problem reminds me of something speedrunners deal with. Some newer runners are so focused on getting the perfect run that they instantly reset on the first mistake. This leads them to playing the first few areas repeatedly, and consequently having much less practice on the later areas. It's an interesting occurrence.
It's allmost like us humans we don't change our childhood habits, Even when we know that wrong that's over fitting and idea of a single god is also overfitting i mean what made a person think that his god is real one in that big earth That's just over fitting
Hi, insanely interesting video. Took a couple of AI classes at college, and this is an incredibly visual application. I have a question, how far can the AI see to calculate the rewards? For example, in the long straight lines, can it see that it's a straight line for a long while, or will it drive carefully because it can't see after a few steps? Hope I worded my question correctly, English is my second language. Cheers!
@LEO&LAMB no man it's just pure mathematics you can tell it If it's correct add +1 everytime if not correct than don't add anything and repeat what adds +1 that's simple pice of code if you understand a little bit of coding you probably get what i said.
I wonder if adding carrots for fewest changes in actions would result in a faster time? It seemed to me that it was loosing a TON of time flipping its wheels back and forth from left to right all the time.
Idea for training the ai: Make the rewards bigger if they get there faster, their completed time would establish the base rewards. Slower to any checkpoint gives less, and quicker gives more proportional to the time difference. Logically this would make the ai pick up speed and reduce the teetering on the straight sections. Can't wait to see the video on teaching an ai to jump gaps.
I'd hold back with too much respect, apart from being very average games, Codemasters F1 games (like every commercial game) don't use this type of AI at all. The tracks all have best/fastest lines baked in and so the AI can be greatly simpler than what we see in this video.
“And after 53 hours of Learning, the A.I gets this run”Goes on to beat World Record. Spreads through the internet worldwide in hour’s and brings about JUDGEMENT DAY 🙇🏻🤦🏻♂️ 🦾🦿🚀🌎🔥☠️⚰️🪦 THE END 🥂🍾
I think the AI focuses consistensy over speed, which a normal player would not do. I only gets more rewards when driving more of the track, so consistency is rewarded instead of speed.
Yes good point, but I shaped the reward function in a way that should also reward speed, so it's strange that AI mostly focus consistency. Maybe it's because consistency is "easier" than speed for a neural network and it is probably difficult to have both at the same time.
The fact that the model with random starting points achieved far more in 53 hours of training than the one with only one starting point did with 100 hours shows the value in choosing random samples for iterations
Could you combine reinforcement with supervised learning? Give it a few of your attempts to emulate before starting the reinforcement phase to have some "instincts" baked in?
10:53 interesting, but unless i'm misunderstanding it, making it spawn every point of the map meant it essentially learned this specific map and overfit to it. if you gave it an entirely new map layout would it be able to complete it on the first try? if not, then it hasn't actually learnt anything.
That's a fair thought, although I'm not sure I come to the same conclusion. In particular, it has no information as to where on the map is it starting, meaning that, in order for it to be learning the map, rather than learning to drive (i.e. overfitting) it would have to work out exactly where on the map it is after spawning in, in order to apply a 'memory' of the map. Additionally, we'd have to question where it would store memory. If you're starting from a fixed point, the AI can relatively easily use its state to cheese a memory of the track, but spawning in with random initial conditions completely breaks that. So then I think it becomes a more general question as to the number of parameters the AI is able to vary, vs the complexity of the track. If the track it learns to complete has many more variables (roughly comparable to the number of segments), than the neural network, then we can probably safely say no overfitting has occurred. We can estimate this roughly from the stalling point of the first attempt - the AI was overfitting, limiting the number of segments with which it could cope. The fact that its distance travelled increased by such a large factor is indicative of more generalised learning. However, it's worth noting that, of course, there _will_ still be _some_ overfitting to the training track, it's just that, in this instance, the track should be long enough that it's _really_ hard for the AI to manage to gain much from overfitting, since it simply doesn't hav enough variables to 'memorise' the track, as it were.
wow, super intéressant ! Il y a pas moyen de crée une simulation mathématique du jeu pour faire des milliers de runs a la seconde ? Je me dit que, comme le jeu est généré en 3D, ca prend bien trop de performances. Alors que si e simulation sans interface graphique est crée, je pense que le machine learning pourrait vrmt être incroyable.
Is there anything you could do to encourage the AI to drive faster on long pieces of track? If one of the inputs is the distance in front of it and another is its speed, maybe there's something you can do to reward it when both numbers are high.
Great video - I think your explanations and illustrations explain some tricky concepts in a super understandable way!
As to your issues at the end: I think maybe it's related to the way your rewards are structured? Looking at the illustration around 3:30, there's a massive reward associated with cutting a corner: while going along a straight bit of road, it's getting rewards like 1.4, 1.6, 1.7 - but once it cuts a corner, suddenly you get 8.7 in one step. So it makes a lot of sense that it learns to always cut corners aggressively, since that increases reward by a lot. But going quickly on the straights, which seems it doesn't like to do, doesn't in itself carry all that much more positive reward. Since you're using discounted rewards to evaluate the expected rewards of each action, you will see a slightly higher reward since you're moving further along - but relative to the rewards seen if it finds another corner to cut a little more, it's quite small. So it might just be favoring minor improvements to a corner-cut over basically anything else, including just pushing the forward button on a straight. I think maybe restructuring your rewards could help.
An obvious improvement would be to give rewards not relative to the midline of each block, but place rewards along the optimal racing line - but at that point, are you even learning anything? You're just saying "you will get an increased reward if you follow my predetermined path", which to me isn't really learning. I think an intermediate step would be to place rewards for each 90 degree corner at the inside corner of that block (maybe a small margin from the actual edge): that should reduce the extreme impact of cutting corners vs going fast on straights, but you're still quite far from just indirectly providing the solution.
Also; unless just didn't say, I don't think you have a negative reward at each timestep? That's typical for a "win but as fast as possible" scenario, which is the case here. It would make sense, as well: going in the right direction but super slowly, is kind of like going backwards, so should also be penalized. I think that would even eliminate the need for negative rewards if going backwards: by proxy, going backwards will always lead to taking more time, which leads to more negative rewards. You might even have to remove the negative rewards from going backwards, as going backwards and going slowly might see the same net reward, which would leave the agent puzzled/indifferent between the two. In the end, getting to the finish with less time spent will lead to the maximum reward.
Finally, of course: introducing the brake button would give you possible improved times - and even might let the agent learn some cool Trackmania tricks like drifting (tapping brake while steering) to go around corners faster. It does increase the action space though, which of course means longer training time. But something to consider, if you want to iterate on this!
Regards, I went to RUclips to procrastinate from his reinforcement learning course, and ended up using some of that knowledge anyway. I guess the algorithm now knows my interests a little _too well_.
PS: really well done on introducing exploring starts! When you got to that part of the video, I almost yelled "exploring starts!" at the screen, and then that's exactly what you decided to do. I'm curious if that was from knowing that exploring starts are a thing in RL, or if you just came up with that concept from thinking about it?
Thanks for taking the time to write such a long comment ahah, it deserves to be pinned :) I'll try to answer everything
"it makes a lot of sense that it learns to always cut corners aggressively, since that increases reward by a lot"
Taking turns on the inside is the optimal strategy on this map, so I don't know if it's a problem to have a reward function that favors this. But yes I also don't like the fact that the reward value varies so abruptly at the corner. As you say, it would be probably easier for the AI to understand rewards if its values were all of the same order of magnitude. Maybe it would be better to directly use the car speed as a reward (faster = better), but it would not penalize some unwanted behaviors like zigzagging in a straight line... (Some people also suggested to penalize the AI if it changes direction too frequently, which could avoid zigzags)
"place rewards along the optimal racing line"
Yes I'm pretty sure learning would be way faster with that, and also the final result would be closer to what humans do. But as you say I think it's not "AI learns by itself" anymore :) Of course the more you show how humans normally play Trackmania, the easier it is for the AI to learn something. I used supervised learning in some other videos and the learning process is way easier and faster. But it's not what I wanted to do in this video, I wanted to leave the freedom to the AI to explore any driving strategies, to see what it would choose by itself.
"I don't think you have a negative reward at each timestep? That's typical for a "win but as fast as possible" scenario".
I don't understand what it would change. With the current reward function I'm using, the AI is already penalized by the fact that it gets much less reward than if it had chosen to go faster
"introducing the brake button would give you possible improved times"
Yes the brake gives an advantage, but it doesn't help much on this map : for example my personnal best is 4:44 without brake, and 4:40 with brake. So I prefer to try beating the no-brake time before to add more complexity, it's already hard enough ^^ Also, I think it's pretty hard to use the brake and drift correctly in Trackmania, compared to a simple "release" approach
"I'm curious if that was from knowing that exploring starts are a thing in RL"
Oh I had no idea there was a name for that in the RL field, good to know ahah
@@yoshtm You're response is long to
@@Artus_Music It is your- responce
@@yoshtm how about placing the rewards just on straights such that the q value/ reward is not depending on how sharp you take the corner. That could be more fitting to a realworld rewards since some turns should be driven wide and others sharp.
I really liked your approach and video! Especially going with the random starting points to minimize overfitting instead of using some "usual" dropout was an awesome idea!
@@yoshtm Super interesting video!
Not sure if this is possible with the TMInterface, but maybe you could build a reward system that "precalculates" a reward value for all points on the track.
You could separate the track surface into small sections and then do a breadth-first "discovery" of this grid where the reward that is assigned to the section is incremented every time a new section is discovered.
It's quite hard to explain, but I did something similiar for my AI racing project: ruclips.net/video/Mw6IwH-v6QY/видео.html
This was obviously not done with Trackmania, but maybe the concept can be transferred 🙂
The AI getting scared and slowing down is kinda adorable lol
And then it kills us.
I'm both intrigued how something not even alive can be cute and yet 100% agreed with this comment
It's very human
And then they just freeze and hide behind a parent’s leg, as all AI do
Then getting courage to go on... dying 3 seconds after
*Turning*
AI: "I got this."
*Straights*
AI: "🤷♂️ Guess I'll die"
Lmao😭🤣
If the AI has a set "map" from every possible combination of inputs to the possible reactions, the "long straight path ahead" could be completely untrained, with an empty reaction (no gas, no steering). When it finally comes closer to the next turn, that input changes to something it already knows, but maybe the longest straight so far ended with a left instead of a right turn -- resulting in the AI driving off the left side on "purpose."
@@achtsekundenfurz7876 shut up fucking nerd, u ruined the joke asshole
@@rvsheesh i also hate straights
True
"but then, the AI got this run" music starts playing
Wirtual vibe
*En aften ved svanefossen starts playing*
@@redholm You got it
I'm just going to leave this here :P ruclips.net/video/Qd_mmC1_zWw/видео.html
@@bawat what the fuck
Never have I felt so much emotion for a programmed robot, but here we are.
I love how at 7:01 the one car made such a well run that it was shocked in the end how good it was, and got totally confused, lol
It wasn't shocked, it just didn't expect something completely different and didn't know how to cope with it.
@@esmolol4091 hence the word ‘shocked’
@@esmolol4091we should totally invent a word for that
@@esmolol4091”I’m not breathing, i’m just taking in air.”
Oh god yes finally someone that tackles the "my ai just learns the track layout" by adjusting the layout/starting position. Nice!
That was my first concern when I started watching this video, but it was nice to see how it was addressed! I was surprised to see how well it worked too.
He still trains and tests it on the same map, though... it's bad practice to test/evaluate on the same map/data as an AI is trained on. It's possible the AI is still just memorizing possible 2-road arrangements, it's just learning more of those arrangements. Not that this is necessarily a bad thing, it you only care about simple rectangular maps like this one
@@Benw8888 yes it's obviously fitted to work on the area it's been trained on.
@@Benw8888 The reason for only using one track might be that each track has to be manually prepared. But it would still be awesome to see how the AI handles different "types" of tracks (non-rectangular ones).
I made an AI racing video myself. I did not use trackmania, but I was able to come up with a system that automatically adds a "reward system" for the tracks so I was able to train and test on multiple tracks. You can find it here: ruclips.net/video/Mw6IwH-v6QY/видео.html
@@dcode1 great video
"and after 53 hours of learning, the AI gets this run" nice Wirtual reference there
Next time yosh should call him just to say this legendary phrase
Wirtuals is actually a reference to Summoning Salt
@@dinospumoni5611 Summoning salt is actually a reference to jojo
But which run did Hefest get???
Yes
"At one point, it even stops, as of it's afraid to continue. After a long minute, it finally decides to continue, and dies".
Story of my life. I feel a connection between me and the AI. Empathy.
Cooler than the other side of the pillow. 😆
Christ died and rose again to pay the punishment for the sins of those who would put their trust in him. Turn from your sins and cry to God for mercy, and you will be given everlasting life! But if not, you will fearfully perish. ;(
Me on most games
Why is someone preaching religion on an AI video??
@Ameliorate Epoch. Because there are people heading for eternal damnation here just as much as anywhere else, so I will do my best to warn you to flee from the wrath to come, and then at least you have been warned, and if you perish now, you will only have yourself to blame! ;( But please don't ignore my warnings!!!!!
Our good deeds can contribute NOTHING to our salvation. When God judges us, he will look to see if we ever broke any of his commandments (like lying, stealing, fornication, hatred, disrespect, using God's name in vain, etc.), and if we have, then we will be pronounced as guilty and the punishment is ETERNAL damnation. He will NOT take into account ANY good deeds that we have done, because it was our duty to always do good anyway, so it is irrelevant. So EVERY one of us is by default heading for eternal damnation, because NONE of us have perfectly kept God's whole law. God is most holy, and perfectly just, and MUST punish EVERY sin that is committed against him. HOWEVER, (good news!) he also delights in mercy, and does not want any of us to have to be punished in a lost eternity forever, so he sent his Son into the world to be punished in the place of all who would put their trust in him and HIS righteousness ALONE for their salvation. So we must STOP putting our trust in our own good deeds to 'outweigh' our bad deeds, and instead put our ENTIRE trust in Jesus Christ's untainted righteousness ALONE. If we do this, and if we wholeheartedly and sincerely turn from our hatred of God and our love of sin, and cry out to God for mercy and forgiveness because of Christ's sacrifice on the cross, then God PROMISES to fully forgive our sins and give us a new nature that will love God and hate sin, unlike our old nature which hates God and loves sin. You can tell whether or not you have been truly saved by asking yourself whether you love God and are broken-hearted if you sin against him, OR do you still love your sins and hate God for not wanting you to do them. I hope I see you in Heaven one day. God bless!
I'm so happy that you did the randomized spawn points and speeds. I was worrying that you might simply be teaching the AI how to play a single map by it learning just pure inputs rather than seeing the actual turns and figuring out what to do. I was incredibly impressed with how many made it through the map with all sorts of jumps and terrain types.
have you played this game? I'm curious to how much terrain type impacts over all control. Was the AI actually making real time changes to its behavior or was it just luck?
@@neutralb4109 I haven't played the game, but there's no way it was just luck. Just look at the types of jumps and round hills they go over as well. The AI was definitely making real time corrections as it noticed itself getting away from corners and towards edges. It definitely didn't know how to do those jumps, but it knew after going off the jump and getting messed up that it needed to correct its position. It's likely the same with the terrain types. It sees itself drifting out of position, so it corrects by steering more.
@@TheStormyClouds nice thanks for your time
@@neutralb4109 No problem
@LEO&LAMB It's a very very very complicated calculator LMAO. Deep learning and AI stuff is getting intense. This stuff is gonna look like a basic calculator compared to the AI we end up creating.
I really enjoyed the explanations of the different training methods paired with the excellent visuals. Keep up the good work, and I can’t wait to see what you try next!
I think trackmania is a great game to practice machine learning. It has very basic inputs and the game is 100% deterministic. Most importantly it's just satisfying to see.
Yeah the satisfying part is a great motivation :D
Or Geometry dash. Its also very simole in inputs
Yeah, so much easier than ElastoMania :)
Why is 100% deterministic a good thing? I wouldn't think so.
@@polychoron 100% deterministic means that, under the same conditions, the same actions will always provide the same results. If the game were not deterministic, aka random, it wouldn't get the same result from same actions under same conditions. A good example is the random encounter in Pokemon or similar RPGs. In Pokemon, you may encounter something or you may not encounter something, even if your team is the same, you start in the same spot and you walk forwards for the same time. Pokemon is random, due to you not being able to tell the outcome. In a deterministic version of Pokemon you would always encounter the monster on the same spot.
What a fun way to learn about machine learning and its variants! Very good video and montages ! Very clear and accessible English ! The return of yoshtm is more than a pleasure!
Might be fun to use different learning algorithms for the same map, exploring which one is good to use in what context using trackmania as a medium. Could be really instructive. Because different ais are racing with each others, it can be really entertaining as well. Like bracket style, each ai has 50-100 ingame hour to learn the map, then the next round is a different map. But that sounds like a lot of computation time
There’s been instances of AI finding exploits in games that humans have not found or are incapable of performing. I would love to see a trackmania AI trained to find insane shortcuts
Honestly this recaps humanity, learning, logic, trial & error, problem solving, anticipation, texting, deduction, and so much more.
I loved it. I learned so many things that are way beyond the scope of the video.
Keep it up. 💪
What did you learn
Suggestion: when you compare human runs versus AI runs, you immediately see a big difference which is that humans make less corrections. The driving style of humans is infused with the biological constraint of energy preservation. I think we could improve the learning of AI greatly by adding a negative cost to the amount of input changes the AI makes...
Or a negative cost when the frequency of alternate direction changes is more common than the frequency at which the track changes direction.
Imagine the left right input of the car is a sine wave with a higher frequency than the sine wave of whether the track is on a left or right turn, if so, the AI is penalized.
@@fantasticphil3863 Not need to consider the amount of turns. Just make the reward for distance higher than the punishment for turning left or right.
Another improvement would be increasing the reward for distance as the AI gets closer to the record. This would result in the AI prioritizing speed in the early parts of the map so that it learns more complex situations. Meanwhile, it would prioritize distance during the later parts of the map. Especially if a large reward is implemented for breaking the record.
Also humans are gamblers (by design), the outrageously "out of safety margins" behavior which produces unbeatable performances, yet, unlikely to get reproduced endlessly under changing context. One may argue AI does actually gamble, when trying millions various attempts but the thing is, a human remembers "I have great chances to win this specific gamble at this portion of this track" while AI is designed to generalize...
That's why most attempts at an AI being seriously competitive with an human usually resolve in a specific learning model per context, ie, one track, one model, another track, another model...
yall stop giving him ideas or we would havr skynet someday in the future.
@@gemapamungkas7296 Even if it's an off topic excursion, may I just point out the principle of a skynet rise supposedly predating the doomfall of humanity : _An AI designed to predict the future, based on big data going rogue against humanity because the AI got aware of humans being the culprit in the death of this planet._
Such an AI *already exists,* actually, mutiple of them by various companies such as the owner of RUclips. It's a bit late to be afraid of skynet. The thing is, existence of real skynet is not to be feared. At the moment, the main objective of powerful figures controlling them is to *make money and assert dominance* over economics, politics and competition elimination. You have economic sanctions, wars, private companies alliances, shares, licensing, privileges and exclusivity, etc. (I won't be dragged in debates on the ways they use, I only explain the principle)
As long as the goal is to *assert dominance,* skynets devs won't go deep in giving *emotions, sense of altruism or self preservation to such AI,* because all its purpose revolves around the usage of large human resources for the interest of the minority of influent wealthy people. And the devs know that, that, if someday, anyone of them tries to design an AI with a sense of _"justice based on feelings",_ that will be the very trigger *to kill all humanity.*
My point is : the powerful companies don't want that, meaning, you, me, and the other guys giving advices here on how to make a more "human-like-AI" *will never get hired* by such companies, the "phylosophy" is just not on point. At the same time, we are all here talking about learning AI, but none of us are dev lead in the industry, we just want to make small scale application of AI learning, but at best it enters game lines of code, at worst, a fantasy essay in our private computer never making its way elsewhere. Having a video on YT is already much better, this is entertainment and snacks for the brain.
Everyone has everything to lose (including you and me) in trying to make the most human-like AI that has access to big data and actually uses it to try to _save_ the planet. That won't happen.
Anyway, most skynet disasters depicted in documentaries, movies, anime/mangas and other books/blog articles usually fail to grasp the complexity of such omnipotent global machine rebellion : resources and mantainance logisics. You need various metals and minerals to manufacture the machines, energy and fluids harvesting to make robots move, communications that appears global like SpaceX StarLinq are not, to disable them, you just have to physically destroy the server relays dispatched all over the world and they become inoperative. Simply put, you have chips in your smartphone and computer, thanks to millions of african human workers harvesting the required resources for you and your country. 10000 nuclear warheads exploding on the first 10000 large cities around the world is not enough to erase humanity, it will only impede the machines faction in a way 99% of their infrastructures, logistics and resources are compromised (call that a strategic critical error due to bad programming). And it is always possible to physically disable mechanical components of a machine. I'm always amazed how come (in Matrix and other distopias) machines got the billions tons of metal to manufacture the robots, and no human did care to check what's going wrong.
I believe the skynet comment was just a pun (and I'm fine with that, it was funny), but I'm still hard pressed to point out it's still a serious matter where real humans are ruling the world in a way that is unknown to billions of others. You believe presidents or head of states are the powerful figures, you're deeply mistaken, they are mere replaceable puppets. You believe Russia is wrong attacking Ukraine, what you don't know is Ukraine head of states are the ones being childish in the whole thing. African countries among others are still poor for the similar reasons, where the private african company heads being the traitors of their own countries...
I mean, skynet is a drama fantasy. You can find a little analogy with covid and ebola where a seemingly mass deadly virus could end humanity............ not even close. I'm sad for those who died and those at loss (I'm among them), but life doesn't end there, you must keep going.
Likewise, you cannot find the correct course of actions to cure the world, _your_ world (or prevent a skynet rise - for those like me who have such concerns) if you don't understand how it works, what's behind the scene. All you could do is what was taught you through education and mass (social)media, where people are endlessly sharing the same wrong concept and conclusions of peripheral concerns : manipulation (and various AI are designed to raise people inside that illusion). There is no such thing as conspiracy, only reality that is not widely taught because that would disrupt the life stability of weathy countries. The thing is, today, those countries are in deep shit aswell, some greedy figures are late to step down and find a better way to get both interests and still exist (ie, not get bankrupt). At some point, you cannot but give away some of your power to the people, or you die prematurely.
I really wonder how fast would this AI pass A01 and it's reaction would be on final jump. Really cool stuff!
Unfortunately none of the inputs seem to involv height so it would likely need to be a modified ai
Or A07
or ones that humans struggle with (or even can't do)
@@whatusernameis5295 it wouldn't make it. Maybe one could, but not like this.
I want to see it beat author medal on A06
i enjoyed this so much and the wirtual reference made it better, keep up the good work
No
@@groovyball Yes
No
This made me feel better about the machine learning course I dropped out of a year ago. While I don't think I'll ever understand the actual construction or inner workings of machine learning models, it was nice to notice the overfitting problem before the script mentioned it. That's always a pet peeve in machine learning videos, like there's one where someone plays through a game with an ML model, but retrains from the start at each new level because the neural network won't generalize.
This shows how sticking to the same thing doesn't make you improve, you just memorize it. But trying different things make you improve.
I think the most interesting thing about these kinds of videos is that it really puts into perspective just how insane our own brains really are, a human player, even one who isnt good at racing games, would take a tiny fraction of time to be able to complete the track than what the ai requires.
the most interesting thing is that us, humans built the AI. We created Inteligence out of sticks and stones
In a few years, a properly programmed AI will surpass the best people in a matter of hours at most. We can't beat the computers in some regards, TAS proved it.
@@Gappys5thTesticlewe didn't create any intelligence yet. This AI here clearly don't have any clue what it was doing. It was like 10000 blind cockroaches in labyrinth.
ya but keep in mind that the AI was born and learned this much in about 60 hours while a new born baby if given a controller can't if proper 70-85 years of human life was given to It I wonder what a mature AI it will become. and maybe after the civilization evolution we have of about 200,000 years ago, Homo sapiens emerged. That's us I wonder if they can make there own AI's and have a civilization of there own where they want to create there own some other different kind of intelligence maybe biological hence creating humans.
@Presence isn't tas just slowing the game down or something like that in order to achieve frame perfect runs? The human is still putting in the inputs, no?
I can't even think of how much time went into this video. Amazing visualizations, and a great AI of course. Very interesting to see the learning process. Great work!
A01 but its by an A. I
I like how the AI figures out that by moving in a sinusoidal trajectory rather than a straight line, it covers more distance, thus generates more cumulative reward. Maybe you could penaltize unnecessary steering somehow, to make it less wiggly 😜
Or alternatively, calculate distance based on position on the track, as opposed to actual distance traveled
you can also just train it up until its winning consistently then base it on time for completion and not survival time
edit: commented this before watching the video but when he changed it to this the wiggle was significantly less.
actually the rewards system in this video is based on the length of the track, not the distance the car covers. that's why cutting corners provides such a big boost in rewards, because it suddenly jumps from one section of the track to another, and the bits of the track that it cut off get added to the reward all at once, as shown at 3:30. This contributes to the sinusoidal nature of driving, as the AI is constantly looking for corners to cut
Wiggling at a certain speed actually leads to faster movement in trackmania
Actually that is not what happens. That's why the Ai learns to cut corners
This is one of the best videos ever made for explaining AI to beginners. I hope you make new videos *soon,* 10 months/a year is way too long delay in between videos, especially since there's so much interest in AI right now you're missing out on and people are really missing out on learning from you.
I think an additional input would have greatly helped performance, especially with respect to quick turns vs straightaways. If there was an input for distance to next turn instead of just which direction the next turn is, I think that would have helped!
Yes I think this can beat his personal best
I'm curious how adding walls would have affected the learning speed. Add barriers around the track, and subtract the "reward" for every time it made contact with a barrier
I would expect it chooses a more stable, less aggressive style - albeit with a slower time most likely
I wonder what checkpoints and a bonus for getting there faster would do.
Incentives to go as fast as possible, while punishing any that don't make it
Absolutely amazing production quality, and a great video overall. This channel deserves more subs!
For a complete layman in AI, this was dope. Well done. Introduced me to some concepts there.
What a lovely story 😍. I'm not just jealous about what you have accomplished but also how you did it. Starting from the simple idea, the goal, the experimentation, evaluations and improvements, and an outstanding audio-visual documentation. The is pure gold! Thank you for sharing this topic and the inspiration
Incredible job as always! Very interesting to have more insight on how the process goes, and I'm honestly really surprised that the AI was still able to drive the final track with all those obstacles, boosters, etc!
And hey, for what it's worth, I think your english improved significantly since last time, so, great job on that aswell :D
Always looking forward for more videos from you 😄❤
Honestly one of the best AI videos I’ve seen
It would be cool to see a speedrunner catagory based around learning AI
@This is my Username no because its AI doing the speed run not a person
@@deathfoxstreams2542 hmmm Human Assisted Speedrun?
@@deathfoxstreams2542 Yes but a human has to create the environment on which to train the AI (Tool). Is it functionally any different than a human issuing predetermined inputs on every single individual frame of the game?
@@ScherFire honestly I would say it's different yeah, spending 53 hours on a TAS for this created map would yield a far better result than teaching the AI to do it. Really, what the competition would be over is how well you've set up your training environment, and I think that would be interesting in its own right
AWS DeepRacer?
Best educational video on the practical implementation of deep learning I've seen on youtube. 🤩
And I've seen a lot! 🤭
Thank you for sharing your knowledge and experience 🤗
Super vidéo mon gars, grave construite super montage et tout. C’est hyper bien expliqué on comprends des trucs difficiles à comprendre plus simplement. Continues t’es le best
Merci beaucoup ;)
I am super impressed by you keeping on the same topic for so long, gradually improving your approach and production.
It's really cool to see someone working on a really long-term project. Normally, I don't like those very long series, but this is cool because it's something I understand, you make it easier to understand, and you break each one down into bite-size chunks.
I don't think I'd be able to cut very much with something that I'd probably be very invested in.
Damn, this AI really did learn to play Trackmania instead of just learning to play this one track. I see videos about machine learning in other games where it's sometimes obvious that the AI hasn't really learned to play the game, but just one map.
Exactly. I hate the machine learning that isn't true AI. That "learning" is just it randomizing inputs until it finds the perfect inputs that make it happy rather than it actually learning how to play.
Brilliant narration of your journey training your pet AI to drive! I like how you also talked about machine learning concepts as well and showed us how it can be put into practice.
The visualization of your project is terrific. Wow.
9:07 "After a long minute, it finally decides to continue.. and dies."
Puts things into perspective when you keep in mind that a child could pick up the game and complete the track within a couple tries, without needing to even consider the basic calculations of input and consequences
Yup. a.i. = biological actual intelligence compared to glorified table look up of A.I. synthetic Artificial Ignorance.
Well, it isn't really ever the same intelligence doing the driving more than once, is a big part of it. You have thousands of first attempts, and longest survivors "tell" the next generation how to do the track, but the next generation has never actually seen the track before. They're all new intelligences. They're better communicators to the next AIs than children would be to the next children, but even so, there's a ton of information a child can see on the screen that the AIs just don't register at all, let alone manage to pass on.
Very good visuals, this video must've been a ton of work. Commendable effort!
reminds me almost of those micromouse competitions in japan. When it comes to the surfaces test, the AI was not going nearly fast enough to feel the effects of the surfaces.
Fascinating video. Can't wait to see all of your posts. Thanks!
It is scary to see the multiple cars together.
It reminds me how a computer can be many things simultaneously without loss of productivity, while a human can only be one thing at a time.
i guess thousands of cars overlapping is just showing last learning results which were done one by one
And how much more superior human brain is comparing to AI (as of now) to complete the best lap time just after a few tries
@@CHen-de6qf i don't know about superior... our innate imperfection leads us to err. Ask a speedrunner 😁
@@CHen-de6qf i think seeing any superiority in human intelligence just from this experiment is very short-sighted. This AI was, in human terms, born for only this purpose, and hasnt experienced anything but this. Any human playing this game most likely already has several years of experience in their head. So the big question is: how would a newborn baby perform here?
In your brain you are doing hundreds of thigs at same time.
Génial, félicitations et merci ! Le rendu de la vidéo est top et accessible, c'est vraiment du beau boulot. Tu as considéré à rendre ton programme open-source pour que la communauté puisse s'associer à ton travail ?
Was I the only one to use the google translator function
@@ikwed no it's a wonderful feature
Can everyone appreciate how the AI attempts a start trick at 12:36
yes
Really impressed with this one. Educational and inspirational! Many thanks from a fellow computer scientist from Belarus
I hope Putin doesn't drag your country further into his war.
Fantastic video, great explanation of concepts. Made quite a few things for me much clearer. Thanks for interesting content.
The hades music is just *chef kiss* perfect
Here's an idea, try the A.I on a full speed map and have forward always held, break never used and only left and right as inputs
"and after 53 hours of learning, the AI gets this run"
MAN! YOU'RE A LEGEND!
I dont understand the referance. Can someone pls explain?
@@dominikplatzhalter1083 It's a Wirtual meme, it's a Trackmania RUclipsr/Streamer
Would be great to see it learn a map with shortcuts, i wonder if the AI could also learn to use them instead of going the normal way!
theoretically, the AI that does a shortcut the first time will then be the one that did the best, then they will all be do the shortcut in the next generation because they all learn from what the best did.
The exploration stage of the AI is kind of like me when I start playing a new game: I don't go thru the tutorial, I don't use strategy, I just press random buttons and keys to see what they will do
Nice! Thanks for the effort, these videos are always so interesting to watch.
You could also add a neuron if confidence is low. That way, when it encounters a situation past neurons cannot understand, it has extra storage to reference new data
Really really good stuff. I assume others have mentioned this, and it's an absolute beast to tackle computationally, but I think what would take this over the edge into really scary generalizability would be some dimension of image recognition frame-by-frame (or even a proxy of like overhead position?). If I understand correctly, this AI effectively tried to learn this course "blind", i.e. only knowing inputs and the rewards associated with those exact inputs. Then a bot that learned on one track could be dropped in another and not have to start from scratch, because the image context is there.
This is amazing. I would love a code walkthrough on this.
Absolutely incredible video. The amount of effort you put into this is honestly staggering. Keep up the amazing work bro 💓
Very cool experiment! I like how you show all the problems you run into and how you solve them. And how you visualise everything!
Maybe you could give the Neural Network AI *your race* as an input? After the required learning to do one lap (or maybe a new AI), use your inputs as a baseline to improve on.
Batch training could be useful but it would require a large dataset of human inputs. For this scenario I don’t think it would be reasonable to create that
@@wack1305 It'd perhaps be an interesting online game concept.
*Release day:* "God, this AI is so rubbish."
*6 months in:* "I don't know what reviews are talking about, the AI isn't that bad actually."
*2 years later:* "ARG! GOD DAMN IT! The AI are impossible to beat!"
@@medafan53 that would be really cool. A game that collects all data from players inputs and uses that to train AIs. You could do something like use only the top 10% of players or something like that. Super cool idea
@mattio there are AIs out there that start training by mimicking some predefined actions and basically use that as a starting point. Some even learn to adapt to new observed behavior insanely quickly, requiring only 3-10 examples to learn a (simple) task.
I don't exactly remember details or examples, but I'm sure they exist. You can probably find examples on the RUclips channel "twominutepapers", I may also have come across them during private AI research though, reading papers.
@@medafan53 I've had this idea for years, just don't have time to develop a game to use it... Maybe in 5 years 😅🙈
I would also like a game where you have some kind of AI rival/ opponent which uses machine learning to learn at the same time you play the game, maybe even learn from your games directly to keep the game challenging as you progress. This could basically adjust the difficulty of the game automatically as required and keep the game interesting for longer.
Machine learning AI opponents also don't really have a cap on their abilities. So even years after game release as players master the game, the AI could still keep up with them.
you know what might make sense? if every time a new generation of the AI runs a new track is loaded for that, so that it learns a lot of strategies. i'd not try to create randomly generated maps for that but actually just use the same maps that are chosen for track of the day, cause they have been reviewed to be high quality and only a few of them act a little randomly. that way the AI would never overfit to a certain type of track or surface
Would there be a way to automatically stop the program, load track of the day and download it, load the track of the day and run the network without serious issues? If so then this is an awesome idea, just having it in the background like 20+ hours a day just grinding tracks would make a super cool ai. But the network would have to be a lot more advanced and have more inputs per second. But after like a month that ai would be better than 90% of players
@@CrunchyTurtle maybe the trackmania community should let multiple computers run for weeks to accomplish that
@@Beatsbasteln yeah haveing several thousand of these ai running at once would make super optimised movement, but it would make online competitions bad as people will cheat with them
@@CrunchyTurtle let's think of moral issues later and just enjoy watching the world burn in a bunch of magnificient runs. at some point the AI might expose itself with tons of unhuman nosebugs anyway
I don't think that would work. You would need to program the rewards and punishments into all those tracks and that isn't feasible
as someone who studies AI rn ... this is realy interesting ... being in the 2nd semster only tho still leaves me with a lot of questions ... i for once cant see how to do stuff like that myself.. hope it comes with time
Second One is very simple. Its learning only from mistakes. But more complex learning is the speed at which you can achieve most favourable action. Saving up time, evolving beyond most human comprehension.
Cool video and I’m glad you didn’t fake your results at the end. Im sure many people would to show that the project does surpass expectations. Kudos to actually presenting the experiment honestly.
Would it not have been better to perhaps do something more along the lines of "rewards per second" instead of total rewards? I think that could do a lot more the speed aspect of the AI
The ai would eventually just stop moving to get the most ammount of points
@@storm_fling1062 an easy fix, though; keep basing the rewards upon distance traveled, with a multiplier based on speed; traveling faster will yeild more rewards, and stopping will have a multiplier of 0
@@storm_fling1062 make the line disappear/become inactive as soon as its crossed so it won't give duplicate points
The overfitting problem reminds me of something speedrunners deal with. Some newer runners are so focused on getting the perfect run that they instantly reset on the first mistake. This leads them to playing the first few areas repeatedly, and consequently having much less practice on the later areas. It's an interesting occurrence.
It's allmost like us humans we don't change our childhood habits,
Even when we know that wrong that's over fitting and idea of a single god is also overfitting i mean what made a person think that his god is real one in that big earth
That's just over fitting
@@dank_shiv i refuse to believe a human made this post.
Hi, insanely interesting video. Took a couple of AI classes at college, and this is an incredibly visual application.
I have a question, how far can the AI see to calculate the rewards? For example, in the long straight lines, can it see that it's a straight line for a long while, or will it drive carefully because it can't see after a few steps? Hope I worded my question correctly, English is my second language. Cheers!
@LEO&LAMB T- the game itself? It's the game that produces the inputs... and the AI processes them.
@LEO&LAMB the ai?
It's on you ai si seeing with the help of raycasting you can find how ai targets player on yt,
You can change ray size so it could be anything.
@LEO&LAMB no man it's just pure mathematics you can tell it If it's correct add +1 everytime if not correct than don't add anything and repeat what adds +1 that's simple pice of code if you understand a little bit of coding you probably get what i said.
Introducing the random starting point and the random events at the start of spawning was brilliant.
This video is so well produced, well done
The trick required seems to be the AI having the capacity to simulate its next few steps in real time. That's how humans seem to do it.
can you please make a more technical video showing how the implementation of the game data works, that would be really cool
yes please
I love watching AIs playing Games
I would love to see you release this as a game where we race your best AIs one day!
Never seen a video like this before, so I think this channel is really cool, I'm subbing
I wonder if adding carrots for fewest changes in actions would result in a faster time? It seemed to me that it was loosing a TON of time flipping its wheels back and forth from left to right all the time.
Imagine if you kept this AI learning for a whole week… would the AI be able to beat your personal record or possibly the world record?!
It can but not necessarily I assume
seems there are diminishing returns.
"The Return Of The King"
yeah past me
Thanks, that was interesting ❤️ One thing that I thought of was how nice it is that we have vision, and I hope that we can give that to everyone
Idea for training the ai:
Make the rewards bigger if they get there faster, their completed time would establish the base rewards. Slower to any checkpoint gives less, and quicker gives more proportional to the time difference.
Logically this would make the ai pick up speed and reduce the teetering on the straight sections.
Can't wait to see the video on teaching an ai to jump gaps.
Great video! Let’s see if it improves with braking! Was your PB with braking or without?
I think he said it was without
I did 4:44 without brake. And I was only 4 seconds faster with brake, so there is not a big difference on this map !
@@yoshtm Did you think about edgebugs ? It should be faster right, if you find a good one ?
@@arieltm4925 of course doing a cut would be easy since there are no cps, but that's not the point here :)
@@yoshtm Yeah, I figured, ahah, anyways that was a dope video, it's really interesting & impressive :)
This video gives me so much respect for Codemasters and the work they put in for the AI in the recent F1 video games 🔥
I'd hold back with too much respect, apart from being very average games, Codemasters F1 games (like every commercial game) don't use this type of AI at all. The tracks all have best/fastest lines baked in and so the AI can be greatly simpler than what we see in this video.
“And after 53 hours of Learning, the A.I gets this run”Goes on to beat World Record.
Spreads through the internet worldwide in hour’s and brings about JUDGEMENT DAY 🙇🏻🤦🏻♂️
🦾🦿🚀🌎🔥☠️⚰️🪦
THE END 🥂🍾
Tes vidéos sont super intéressantes et avec un jeu comme c'est le combo parfait !
His explanation makes it look easy, but its not. I like this man
I think the AI focuses consistensy over speed, which a normal player would not do.
I only gets more rewards when driving more of the track, so consistency is rewarded instead of speed.
Yes good point, but I shaped the reward function in a way that should also reward speed, so it's strange that AI mostly focus consistency. Maybe it's because consistency is "easier" than speed for a neural network and it is probably difficult to have both at the same time.
Maybe you could have mutations that get more reward for speed
@@phoenixstyle There is no "mutation" in what I did, it's not a genetic algorithm !
Ok dont really know the difference between AI types, but you get what I mean
Makes me wonder what would happen if you started the AI out on your best run, and let it improve on it
The fact that the model with random starting points achieved far more in 53 hours of training than the one with only one starting point did with 100 hours shows the value in choosing random samples for iterations
Honestly one of the coolest videos i’ve ever seen
Wow!!! Beautifully explained and visualized. Thank you very much
Love this series - Could you make a behind the scenes? I would love to see the whole progress as well.
Could you combine reinforcement with supervised learning? Give it a few of your attempts to emulate before starting the reinforcement phase to have some "instincts" baked in?
I would like to see that AI on official maps.
Love the videos man, i cant wait for the AI to find some crazy trick that people start to use
Awesome video and visualizations! Thanks a lot, really enjoyed watching it.
10:53 interesting, but unless i'm misunderstanding it, making it spawn every point of the map meant it essentially learned this specific map and overfit to it. if you gave it an entirely new map layout would it be able to complete it on the first try? if not, then it hasn't actually learnt anything.
If you watch the rest of the vid he does exactly that
That's a fair thought, although I'm not sure I come to the same conclusion. In particular, it has no information as to where on the map is it starting, meaning that, in order for it to be learning the map, rather than learning to drive (i.e. overfitting) it would have to work out exactly where on the map it is after spawning in, in order to apply a 'memory' of the map. Additionally, we'd have to question where it would store memory. If you're starting from a fixed point, the AI can relatively easily use its state to cheese a memory of the track, but spawning in with random initial conditions completely breaks that.
So then I think it becomes a more general question as to the number of parameters the AI is able to vary, vs the complexity of the track. If the track it learns to complete has many more variables (roughly comparable to the number of segments), than the neural network, then we can probably safely say no overfitting has occurred. We can estimate this roughly from the stalling point of the first attempt - the AI was overfitting, limiting the number of segments with which it could cope. The fact that its distance travelled increased by such a large factor is indicative of more generalised learning.
However, it's worth noting that, of course, there _will_ still be _some_ overfitting to the training track, it's just that, in this instance, the track should be long enough that it's _really_ hard for the AI to manage to gain much from overfitting, since it simply doesn't hav enough variables to 'memorise' the track, as it were.
It amuses me how the ai still has a fair amount of cars falling of right at the start after 50k runs
wow, super intéressant ! Il y a pas moyen de crée une simulation mathématique du jeu pour faire des milliers de runs a la seconde ? Je me dit que, comme le jeu est généré en 3D, ca prend bien trop de performances. Alors que si e simulation sans interface graphique est crée, je pense que le machine learning pourrait vrmt être incroyable.
Definitely the best one yet, just don’t wear yourself out outdoing yourself each time. Minor tweaks make the process more interesting
This felt like a nature documentary. The whole life cycle of a little ai boi.
Is there anything you could do to encourage the AI to drive faster on long pieces of track? If one of the inputs is the distance in front of it and another is its speed, maybe there's something you can do to reward it when both numbers are high.