Tap to unmute

I Trained an AI to Beat This Absurd World Record

Share
Embed

Comments • 2K

  • @yoshtm
    @yoshtm Month ago +543

    Thanks for watching!! To support this work: patreon.com/yoshtm

    • @redfoamcarpet
      @redfoamcarpet Month ago +10

      thanks for you letting us watch

    • @Local_S-s1b
      @Local_S-s1b Month ago +2

      Does the ai have precise control like it can move by 0,2% or 1% like the last time?

    • @Local_S-s1b
      @Local_S-s1b Month ago +6

      did you try making a "mold" of the WR?

    • @savage069
      @savage069 Month ago

      Set the reward for beating Link's time to 1000000 times what you have it right now.

    • @Reconnecting07
      @Reconnecting07 Month ago

      When I saw there was a new Yosh video I creamed my pants!!
      (Sorry I am too poor to support the patreon, but if my career continues it’s current trajectory I should be able to donate by this time next year 😢😢)

  • @ImAerio
    @ImAerio Month ago +11321

    Link must feel like a god after watching this

    • @Seno06
      @Seno06 Month ago +16

      Why? He said it was an accident and luck

    • @Leoiles-o8z
      @Leoiles-o8z Month ago +147

      ​@Seno06 because it means its basically impossible to beat his record since it would take millions if not billions of attempts for a human who cant get the jump consistent everytime to beat his record

    • @dablob4491
      @dablob4491 Month ago +14

      @Leoiles-o8z It would likely take from less than to a couple of million, and nowhere close to a billion attempts.

    • @catholicismintruth223
      @catholicismintruth223 Month ago +11

      GOD BLESS YOU ALL JESUS LOVES YOU SO MUCH RETURN TO HIS CATHOLIC CHURCH AMEN

    • @jonaskarlsson5901
      @jonaskarlsson5901 Month ago +6

      @dablob4491 so why has nobody done it yet?

  • @EvanG529
    @EvanG529 Month ago +4310

    The bucketloads of cars pouring across the track will never not be comical

    • @PeterDanielBerg
      @PeterDanielBerg Month ago +73

      truly, its like a fluid sim

    • @shannonbowlingchannel
      @shannonbowlingchannel Month ago +7

      I saw the track branch out before me like a thousand shimmering roads on A06, and from the turn of every wheel, like a bright possible future, a different run beckoned and winked.
      One run was a perfect line, fast and smooth, carrying speed so cleanly it seemed almost impossible. Another was bold and reckless, gaining time for a moment before striking the wall and dying. Another was careful and safe, but too slow to matter. Another found a beautiful angle through the corner, only to lose everything on the landing. And beyond and above these runs were many more runs I could not quite make out.
      I saw the AI sitting in the middle of these branching paths, sending itself again and again into the same few seconds, searching for the one true way through.
      It studied each and every one of them, but choosing one input meant losing all the others, and as it watched, unable to know at first which tiny movement would lead to greatness, many of the runs began to fail and fall away.
      Some plopped uselessly into the wall.
      Some spun out and vanished.
      Some came heartbreakingly close.
      And one by one, through all the failures, a few bright runs remained - leaner, faster, cleaner than the rest - until at last the best path stood alone.

    • @threebrickboyslego
      @threebrickboyslego Month ago +16

      And the "Hall of the mountain king" music 😂😂

    • @Falcodrin
      @Falcodrin Month ago +6

      This is how Dr. Strange saw all the timelines

    • @syvulpie
      @syvulpie Month ago

      car vomit

  • @ClosestToTheSun
    @ClosestToTheSun 22 days ago +795

    The AI is incapable of taking one step back in order to potentially take two steps forward

    • @ChrisGuy-s1z
      @ChrisGuy-s1z 22 days ago +16

      That’s a hard quote

    • @WalintHUN
      @WalintHUN 21 day ago +5

      You know it will read you...

    • @narbogoogle7466
      @narbogoogle7466 21 day ago +17

      There is an algorithm called hill climbing which allows a model to take a few steps back to experiment

    • @KawaiiOtouto
      @KawaiiOtouto 20 days ago

      It is kinda like a microbe

    • @wallacesantos0
      @wallacesantos0 20 days ago +36

      his model, yes.

  • @Majorzigzag
    @Majorzigzag Month ago +316

    Maybe next time you should try the epsilon greedy approach and the epsilon decay to like 0.5 or even 0.6 So the agent never stop finding new ways of doing things.

    • @EgeauTM
      @EgeauTM 29 days ago +11

      Q learning already performs an epsilon decay in most implementations, no? To the best of my knowledge, most implementations have the hyper-parameters of the initial epsilon, the epsilon decay, and the minimum epsilon that it will never go under, stopping the epsilon decay.
      It's a pretty standard extension, anyway, and I would expect hyper-parameters like this to have been played with. By the way: instead of playing with the epsilon decay, it makes more sense to me to set a higher minimum epsilon, such that the chance of random actions never goes down too far, while not messing with the initial training to get "close enough" initially.

    • @Majorzigzag
      @Majorzigzag 27 days ago +6

      ​@EgeauTM Yeah you're right. Then he could add noisy network layer to it as an add on. This helps with smarter exploration. While the epsilon decay is at 0.2 or 0.1 which is more stable.

    • @lovelessissimo
      @lovelessissimo 6 days ago +2

      ​. Where did you guys learn this stuff?

    • @Majorzigzag
      @Majorzigzag 4 days ago +3

      ​@lovelessissimoI'm an electrical engineer by trade but I did some courses in machine learning. I follow research papers published by researchers which also helps

    • @EgeauTM
      @EgeauTM 11 hours ago

      ​@lovelessissimoI'm a PhD student in a somewhat related area. My LinkedIn is not that hard to find from my trackmania name. :P

  • @beezechurger666
    @beezechurger666 20 days ago +41

    There’s something kind of cute about all these cars trying their best

  • @EDNProds
    @EDNProds Month ago +1357

    « AI Learns to Uberbug » video next

    • @therisingf1champion459
      @therisingf1champion459 Month ago +25

      AI beats hockalicious wr next

    • @SlyFoxl
      @SlyFoxl Month ago +6

      The title of Wirtual's next video

    • @ЭДЭ
      @ЭДЭ Month ago +2

      @therisingf1champion459 reminds me a fallout new vegas custom campaign in L4D2 workshop

    • @Ivan-N-Ivan-N
      @Ivan-N-Ivan-N Month ago +1

      just wait until the end of the video

    • @𱘇
      @𱘇 Month ago +2

      ?

  • @MX_Reaperr
    @MX_Reaperr Month ago +3328

    Pre liking the video cause I've yet to see a disappointing video from this guy lol

    • @juliusdusing3452
      @juliusdusing3452 Month ago +5

      Same here

    • @ItsMeNebulae
      @ItsMeNebulae Month ago +2

      Same here

    • @gigabytegeforce
      @gigabytegeforce Month ago +83

      ironically i feel like this video is a tiny bit disappointing compared to his other videos...... not only did he fold and start forcing the AI to take specific lines by limiting it rather than guiding it via rewards and shit as he usually does, the video also didn't reach a satisfying goal...... although don't get me wrong, this video is still better than most the garbage i get recommended on youtube nowadays lol

    • @lktsoi
      @lktsoi Month ago +18

      @gigabytegeforce Yeah I agree, thought he was gonna go after the tas like he did in the a01 video, then I saw there was 2 mins left in the video xD

    • @Hotdog6606
      @Hotdog6606 Month ago +2

      ​@gigabytegeforcehe actually ends up forcing the line on most videos, and compares his times to human records instead of TAS records (because an AI has no consistency issues...). Still waiting for a video about AI beating random map challenge or fast learning, but it would take decades at this point

  • @sdrawkcab190
    @sdrawkcab190 Month ago +2398

    You need to add mutations to get out of local maximums. It's the same concept as evolution, genes randomly mutates most of the time into something with no benefit. Every once in a while you get something with a huge advantage like another cone so you can distinguish between green and red. Give the AI random inputs at a range of different points and eventually one will stumble into something more optimal.

    • @too_blatant
      @too_blatant Month ago +89

      Yea he needs to add randomness to the gradient descent algorithm so that it finds the global minimum instead of local minima

    • @aproudswede4955
      @aproudswede4955 Month ago +78

      You could run the ai in groups within groups. Say you have a group with 15 groups in it. Those 15 groups vary quite a lot from eachother. Within those 15 groups you have 100 cars that don't vary as much. You then pick the group with the best individual performance to run the next generation on.

    • @aproudswede4955
      @aproudswede4955 Month ago +22

      Kind of like different sub-spiecies

    • @RafaelAguilar1987
      @RafaelAguilar1987 Month ago +10

      Genetic alg reinforced learning ftw

    • @ExperienceGaming95
      @ExperienceGaming95 Month ago +82

      yeah and this would make it so you don't have to force the conditions with walls

  • @MrBattlecharge
    @MrBattlecharge Hour ago +1

    12:50 Link's WR run at that jump is low, so his wheels touch down early. He also hooks left more than the AI was, giving him more room to gun the gas rather than worrying about hitting a wall.
    I would suggest putting walls in the air along the jump, to force the AI to try and keep low and left. To me, this would mean having to swing more right leading into the jump and then turning left while on the jump, or trying as Link did and cutting in early and then having to hard hook left later to make sure he made the landing.

  • @Edbrad
    @Edbrad 5 days ago +8

    14:26 I mean damn just run it another week of training and see if it will do even better

  • @Elcar0
    @Elcar0 Month ago +957

    What seems to happen is that not the fastest AI gets to be built upon, but the fastest ai THAT FINISHES. Once they find a consistent line that gets them to the finish they try and improve upon that line, but it fails to consider different lines that could be faster and in the end you end up with the fastest line that finishes, not the fastest line that could've been. You could try and work with more checkpoints in specific locations. That or insert more randomness so it keeps experimenting with different lines. Cool video though, looking forward to the other videos :)

    • @Patchypatchypatchy
      @Patchypatchypatchy Month ago +72

      while i think this makes sense in theory, selecting a fastest run at some checkpoint is also quite difficult and what the reward strategy is trying to avoid doing. things like position, speed, and orientation may not really line up with hitting the local maximum speed from optimizing a run.
      that said i think here playing with reward penalties instead of barriers may have helped. like penalizing airtime may have helped nudge it into looking for lower jumps, but that probably doesnt help for making a general purpose training alg for trackmania

    • @Biobanditten2
      @Biobanditten2 Month ago +1

      Would make sense.

    • @kappa9740
      @kappa9740 Month ago +19

      @Patchypatchypatchy I think thats a good idea, but I have built RL on driving simulations like this (albeit simpler than TM), and it tends to hurt to try to over penalize, causing it to fall into alot of local optima that suck. I believe u may be correct that this is one of the only solutions if there is a viable one with RL tho

    • @LinesightMKWii
      @LinesightMKWii Month ago +9

      As someone who has worked with RL Agents significantly, this is incorrect. The AI learns from every race, whether it finishes or not. When the AI successfully does the flip, it's getting more rewards, and learns that trajectory to be good. When the AI was learning, it learned one specific trajectory to give it a consistent flip that allowed it to make further progress, but that line never gave it WR speed. Once it has learned a line that is pretty good, it searches for the local optimum, but the global optimum could be link's line, or another different line like mentioned in the video. It's risk V reward all over again, which is a very deep topic in RL and is what the text splash at 12:50 is all about.

    • @𱘇
      @𱘇 Month ago +1

      What about most rewarded

  • @brunor.922
    @brunor.922 Month ago +671

    5 hours till release is an unbearable amount of time to wait

    • @cloudtaker633
      @cloudtaker633 Month ago +14

      Versus the 6,000-hour training? Fair's fair

  • @dybdal509
    @dybdal509 Month ago +634

    An idea for a different way to reward the Ai:
    Reward for time with wheels on the ground
    Reward for Speed slide
    Reward for uberbugs etc.
    That way you reward it for exploring the Techniques the pros use when hunting a map

    • @nullbind
      @nullbind Month ago

      Uberbugs aren't allowed in tmx records btw

    • @sheepox573
      @sheepox573 Month ago +2

      @nullbind Every Uber bug? even A12?

    • @nullbind
      @nullbind Month ago

      ​@sheepox573 oh my am I a right numpty, I was thinking of nose bugs, no Uber bugs are fine 😅

    • @Adrian-bye
      @Adrian-bye Month ago +7

      Perhaps the AI could be praised depending on how close it is to the world record path, then, purely by chance, she should defeat him

    • @Cousinouf
      @Cousinouf Month ago

      @nullbind nose bugs are fine on the TMX boards too, there's just very few opportunities to do them humanly

  • @profoundschnook4381
    @profoundschnook4381 22 days ago +6

    your editing is immaculate, dude. just a joy to witness

  • @Vincent_C
    @Vincent_C 2 days ago

    Reinforcement learning to try to beat the WR is the best way to attract folks who likes to watch all the pretty colors in order to solve the problem in the most inefficient way possible.

  • @Comfy_Ivy
    @Comfy_Ivy Month ago +401

    The fact that a literal AI didn't beat link after 2000 simulated hours is a true testament to how insanely good link is. Amazing video as always, excited to see what'll happen on the rest of the campaign.

    • @abebuckingham8198
      @abebuckingham8198 Month ago +5

      It's pretty clear that it was a fluke and had nothing to do with their skill. He just got really lucky on that one run.

    • @benjaminlynch9958
      @benjaminlynch9958 Month ago +29

      He’s very good, but on that WR run he himself admitted he got very lucky.

    • @Comfy_Ivy
      @Comfy_Ivy Month ago +46

      @abebuckingham8198 The flip was a fluke, sure, but the rest of the run was still insane enough that he ended up with a lead of 15 hundredths, saying it has nothing to do with skill is crazy considering he's literally the person with the most world records in the campaign

    • @kdesikdosi5900
      @kdesikdosi5900 Month ago +2

      well the AI isn't very good

    • @kustow.967
      @kustow.967 Month ago +21

      the AIs are very dumb to learn. We can't compare time for AI to learn to time for human to learn. AI is effective because in most situations, they can make a try in milliseconds, they don't eat and don't sleep, they don't need to rest.
      A human who has never played the game will be able to complete the track in a few try, while it took hundreds or thousands for the IA.
      And of course Link is very good.

  • @davebensam
    @davebensam Month ago +163

    pure cinema, the way he makes the videos, the editinng, the storytelling and the way he explains hard things in an easy way. Its just special

  • @gogo23166
    @gogo23166 Month ago +128

    im not sure but i feel like you could get a better route by:
    1. making the punishment for getting a bad run lower
    2. reducing the reward for getting an average run
    3. greatly increasing the reward for getting a really good run
    This would punish the AI less for trying out new routes and it could get higher rewards if it finds a really good route once.

    • @taylorolson6228
      @taylorolson6228 Month ago +16

      I think that would still get it stuck in a local optima a lot of the time. If any change reduces the fitness then it will just try not to change.
      My guess is that adding some noise to the inputs would be a better option, or just letting it run for a bunch more attempts and praying to the chaos gods like the video showed

    • @gogo23166
      @gogo23166 Month ago +2

      @taylorolson6228yeah i kinda agree but if you make the increase in rewards exponentially better then it will almost allways be encouraged to try and get a better run

    • @taylorolson6228
      @taylorolson6228 Month ago

      ​@gogo23166not really, The problem with local optima is that it's punished for any change, and that it has no bridge to those point gains

    • @uuproverlord8324
      @uuproverlord8324 Month ago

      @gogo23166 hes right the ai isnt smart enough to realize it needs to try new stuff for the bigger rewards because its punished for doing bad

    • @devnom9143
      @devnom9143 29 days ago +1

      ​​@taylorolson6228You have a fair point, however, I'd argue that the manual insertion of obstacles blocking the local optima is analogous to implementing a punishment when the AI has stopped improving, but it is known that there is a more optimal path
      We know there is a more optimal path than what the AI is doing until the AI at least matches the World Record & in many instances it is reasonable to believe that the World Record likely has room for minor improvements, so it wouldn't be too unreasonable to use the logic "there is a more optimal path until the AI has beat the World Record"

  • @hamhox
    @hamhox 29 days ago +5

    @youshtm: I think the next leap here isn’t a better single bot, it’s a whole society of bots with shared memory. One class maps the track from observation "Bonkers", one hunts crazy high-upside glitch tech "Glitchers", one refines stable sectors "Soldiers", and a controller combines all of it into actual WR attempts "Queens". Every run would leave notes and POIs on an internal 3D map, so the system learns the level instead of just converging on one safe line. That feels way bigger than “AI learns a racing line” ... it starts to look like an actual racing research team.

    • @mattymerr701
      @mattymerr701 29 days ago +1

      This kind of AI isn't like LLMs. You can't really have agents like that

    • @TehAwesomer
      @TehAwesomer 29 days ago +1

      I am working on this sort of approach...

    • @TehAwesomer
      @TehAwesomer 29 days ago +2

      @mattymerr701 trust me, you really can.

  • @plainlake
    @plainlake Month ago

    This illustrates some of the real pitfalls of AI favouring incremental improvement over radical new approaches.

  • @NitFlickwick
    @NitFlickwick Month ago +90

    On the last attempt to change the AI’s line, you blocked the line into the flip. That doesn’t necessarily encourage it to change how it approaches the flip, just where on the ramp it will be. If the AI is blocked from landing on the right side, it will have to adjust its line in some way to compensate, if it’s going to make it across. IOW, put the block in front of the landing zone, not the take-off zone.

    • @argantosnl
      @argantosnl Month ago +1

      that is also what seemed to me a way to force the landing in the correct path.

  • @GigaChadziIIa
    @GigaChadziIIa Month ago +82

    an AI that plays videogames better than humans??
    and I thought that being uneployed would save me from AIs stealing jobs :'(

    • @tucho6
      @tucho6 Month ago +10

      wait, an AI trained to maximize it's score on only this map, is far from "playing", is pure deterministic function, exactly the same as using TAS. Both optimized just to perform well on this case, and 100% reproducible.

    • @FakedPvp
      @FakedPvp Month ago +3

      If you train an AI on one map, it learns the core mechanics steering, speed control, drifting, and how the car physics work. When you move it to a new map, it doesn’t have to relearn those fundamentals it just has to learn the layout. That drastically reduces training time.
      And if the AI learned advanced techniques like speed drifting or air control, those skills transfer too. It can apply them on new maps whenever similar situations appear, only needing a bit of fine-tuning for the layout. At that point it’s technically playing the game much like a human would. using learned skills and adapting them to a new track rather than memorizing a single map.

    • @RedstonekPL
      @RedstonekPL Month ago +1

      ​@FakedPvp ai doesnt learn anything
      what ai is is just a function that takes input and produces output
      theres no awareness of anything
      just a bunch of multiplication and addition
      if you plop ai trained strictly on one map it wont suddenly be better if you plop it onto another map due to overfitting

    • @ballom29
      @ballom29 Month ago +3

      And mind you AIs crushign humans at video game is nothign new at all.
      Years ago we already got AIs who destroyed world champions at Starcraft 2 and Dota 2.
      And since theses game are pvp , their victories weren't a fluke by pourring thousand of attempts until threadign the needle.

    • @FakedPvp
      @FakedPvp Month ago

      ​@RedstonekPL yes awareness isn't a thing but it does reduce training time. Cause it's not relearning what it already knows

  • @Bunsalot
    @Bunsalot Month ago +42

    Song in the 2:00 minute mark is called sumerian paradise BTW

  • @JojoChannel-o4g
    @JojoChannel-o4g Month ago +14

    10:48 But Trackmania is deterministic, it might be because the approach angle is different beforehand.

    • @Vaaaaadim
      @Vaaaaadim Month ago +15

      The physics is deterministic but *chaotic*, this fact is covered in this creator's "AI Learns to Drive on Pipes" video.
      I don't know why he opted to say "random" instead of chaotic this time.

    • @rotli2189
      @rotli2189 29 days ago +2

      butterfly effect

    • @JojoChannel-o4g
      @JojoChannel-o4g 29 days ago

      @rotli2189 exactly, one tiny difference in the angle coming in and the outcome will be quite different.

    • @JojoChannel-o4g
      @JojoChannel-o4g 29 days ago +1

      @Vaaaaadim I mean, Wirtual said over and over again that it's deterministic, meaning that you could copy inputs and you get the exact same run, that's how some cheated runs were found in the early days. Also, press forward maps are based on this too.

    • @Vaaaaadim
      @Vaaaaadim 29 days ago +2

      @JojoChannel-o4g I agree it's deterministic. But the physics system is chaotic. Even waiting a random amount of time at the starting line before going affects what happens, because the physics engine still runs even when you're completely still and will imperceptibly affect your physics state (this is noted in his pipes video). A deterministic outcome does not mean a predictable outcome, unless you have perfect knowledge of the state of your car.

  • @venaybanga
    @venaybanga 7 days ago

    As an engineer, I want you to consider the undeniably great opportunity of partnering with educational institutions that make learning fun, as this is amazing content that could be used to teach the fundamentals of reinforcement learning, machine, learning, and the components that actually make artificial intelligence possible. I know that value exists because you made this video without those intentions in mind, but if you catered it a little bit more to the educational side of things, I think your scope of audience increases multifold. I personally have never heard of this game in my life, but this video helped to visualize and understand many fundamental concepts that are used to improve the results of artificial intelligence, and I think that much of your viewership is watching for somewhat the same reasons as well. All I would worn against is to not make it overly educational as your current video is already amazing quality, and I would say it would only require 10 to 15% tweaking to really align it with what I’m talking about, which is why I’ve feel that it reinforces the fact that there is such a great opportunity laying at your fingertips to further your life. And I don’t even think these partnerships that I mentioned have to be necessarily RUclips related, although they definitely could be.

  • @simonk.7223
    @simonk.7223 Month ago +255

    now imagine if he had RAM

    • @DioMyLove
      @DioMyLove Month ago

      He does tho

    • @artnaz
      @artnaz Month ago +7

      Why RAM? It's more about CPU I expect.

    • @mecha-musume
      @mecha-musume Month ago +1

      this isnt that kind of AI

    • @ATTILA0769
      @ATTILA0769 Month ago +6

      @artnaz GPU and VRAM actually for computing power on AI.

    • @NoenD_io
      @NoenD_io 26 days ago

      ​@ATTILA0769I do ML on CPU, it's pain stacked on double pain whopper combo with pain fries

  • @abraxas2658
    @abraxas2658 Month ago +82

    A thought on the jump: you might be able to calculate a probabilistic fastest speed for each run.
    Say the AI starts a jump (during training only).
    1) Spawn 100 copies of the current AI in the current location.
    2) Wiggle the location of the car a sub-pixel amount according to a 2D gaussian distribution.
    3) Run all 100 copies to the next "predictable locations" (eg. the next checkpoint).
    4) Continue only the fastest run and reward based on its results.
    The idea is that you're giving the AI the benefit of the doubt on the RNG sections of the track. Based on the RNG required, you create a "carpool" of enough cars that the AI is likely to pass the RNG a single time (given it was on a good line). If the RNG is 1/10, you split to 15 or 20; if the RNG is 1/1000, you maybe split to 1500 cars.
    Obviously the splitting system would be VERY expensive computationally. I suggest (like a threadpool or, in some languages, workers) a "carpool" : a bank of a few instances of the game ready for the gamestate to be transferred to during the splitting phase. The best gamestate result is passed back to the original copy so it can continue its run from there.
    If there isn't enough RAM for the 1/1000 case, you can obviously run 5 or 10 cases at a time, storing the current best run until you find a better one.

    • @sandro7
      @sandro7 Month ago

      Yeah I’m no expert at all but I was thinking something like this would make sense

    • @haubiwanb769
      @haubiwanb769 Month ago +1

      I havent worked with RL in a while but this seems destined to find local minima, something like an outside line maximizes speed would just not be found if I understand you correctly.
      There is a concept called intrinsic reward specifically for sparse reward environments, basically during training there is an exploration phase in which there is a world model that tries to predict how the environment responds and that world models loss gets added as intrinsic reward. The concept is to explore areas of the environment that are not as well understood by the model. Of course in trackmania this would baloon compute time. But if used in conjunction with your idea it would probably find optima. Later on you could group the sections you mentioned together and create larger continuous parts to get closer to a full run

    • @abraxas2658
      @abraxas2658 Month ago

      @haubiwanb769 yeah! That makes a ton of sense. I think in my suggestion, I was assuming he would do the work of narrowing down the possible range (like with the walls in the video) and then let the AI find the local maximum from there

    • @haubiwanb769
      @haubiwanb769 Month ago

      ​@abraxas2658 oh yea if there is an optimal strategy already discovered that should work. Afaik Trackmania is deterministic

  • @logictm
    @logictm Month ago +15

    A06 saga is iconic. Great video

  • @dennik8374
    @dennik8374 Month ago +11

    Link went trough the middle of the 2 poles (11:21) you said Link is only doing 1 thing diff which was his approach angle but he also lands and drives through the middle of the pole and he gains 0.01 sec on the next check point but he stays on the left while the ai goes to the right for the long U-turn what if the Ai went through the middle of the poles and went on the left but swerved to the right before the big u-turn jump?

  • @methtal-chris
    @methtal-chris Day ago

    Wow! What a great video. Not only from the perspective of technology, but also from the perspective of script writing. What a
    nail-biter. Great entertainment! Thank you! 🙏

  • @DPSFSU
    @DPSFSU Month ago +30

    This series has to be one of the most liked ever on RUclips. Seriously, with currently 5k+ views and 3.3k likes, that’s 33 likes to every 50 people who’ve seen it. Keep em coming Yoshi! Make those little AI sonovaguns earn those 🥕!!!

    • @gameplaychanellacaso2403
      @gameplaychanellacaso2403 Month ago

      Even better now at 5,6k whit 4,3k likes

    • @Fierylunar
      @Fierylunar Month ago +2

      View counts on recently uploaded videos are notoriously error prone

    • @nikarmotte
      @nikarmotte Month ago +1

      It's at 11k likes for 33k views, but that's still an insane ratio.
      I've also discovered the hype feature on this video by accident.

  • @xenquish
    @xenquish Month ago +25

    THE HYPE IS REAL

  • @l0gicaA12
    @l0gicaA12 Month ago +66

    Very interesting 14:34 can't wait to see the full results on this :)

    • @brettgrindel9017
      @brettgrindel9017 Month ago +7

      What an absolute turtle tease.
      P.s. AI beat Link everywhere except the barrel roll. Seemed worthy of noting. The crossover roll creates a longer path to maintain the same ish average amount of velocity. Basically, Link hit the NOS too early. On accident. Badass runs from both sides 😊

    • @GameristicForce
      @GameristicForce Month ago

      ​@brettgrindel9017lets go gambling!

  • @durbanpoison031
    @durbanpoison031 2 days ago +1

    I’ve never heard of this game in my life

  • @BlitzWarriyo
    @BlitzWarriyo 24 days ago

    There is a reason im watching this...
    I have never seen someone with such a big patience and big brain, always be able to train an AI (from scratch, as usual) on a map to beat records that were found on accident or were forced...
    I dont care how long you take to make those videos, I will always be here to enjoy the video and see what kind of absolute masterpiece you created
    Keep up the good work!

  • @RBR_48
    @RBR_48 Month ago +11

    I can't wait. I was just scrolling when this got to my recommended

  • @BadChess56
    @BadChess56 Month ago +17

    You could try multiplying the loss if the reward predictor over predicts by a constant under 1 to try to force the average to be skewed so it prioritizes even a chance of a good run. Just a thought.

  • @AfonsodelCB
    @AfonsodelCB Month ago +10

    it seems like you had 2 things to optimise: fastest possible ramp jump, and fastest possible finish. if you trained the AI by spawning it near the ramp in a range of positions and speeds and placed the finish line a bit after the jump ends, you'd focus it's reward system entirely on optimizing the jump, then after that could put it back on the original map and have it find a way to incorporate that in its full map strategy

    • @artnaz
      @artnaz Month ago

      Yes, great intuition. Not sure how easy this is to actually implement though.

  • @user48096
    @user48096 Month ago +1

    14:42 like how you're speaking as you write the script. It's a creative touch.

  • @StriderGW2
    @StriderGW2 29 days ago

    that's so fascinating, I love this analysis on rng

  • @dreaejrns6281
    @dreaejrns6281 Month ago +9

    One of the only, if not the only youtubers videos I instaclick without looking at what the video is.

  • @diyartaskiran
    @diyartaskiran Month ago +58

    My takes on your notes: I think two key improvements you can make to ensure correct exploitation vs exploration is using Advantage instead of Value and well-tuned SAC. Briefly, advantage does not consider agent reward based on your reward function directly, but instead gives a reward proportional to the value above expectation (i.e. instead of considering Q(s, a), consider giving reward A(s, a) = Q(s, a) - V(s)). There are other ways to define a baseline, but this is an easy one. And in your SAC implementation, your agent should generate a distribution over actions (i.e. instead of generating a single output value for each button and then pressing the button if it‘s above a threshold, it should generate a mean and variance for each button press and then sample the action from that distribution, and compare that to the threshold). You can then define some minimum variance used during training, making sure the agent is never too sure which actions are best; for the actual runs you can then just use the mean value of the distribution as the action, since that should be optimal. TD learning might also help, but there I am less sure (also not clear whether you‘re already using it).
    I expect that spawning the agent in the trajectory of the WR run could have a benefit, but you have to be careful about how you choose where to insert the agent. Your goal is not to make the agent handle the jump like the WR, because it will never arrive in those situations itself (ie. this is wasted training imo). Instead, your goal is to make sure the agent arrives at the section ready to take the jump in the right way, so it‘s exit out of the previous corner should match Link‘s.
    Another approach you can take to increase willingness to take risk is rewarding the entropy in agent action: evaluate how common/rare a given trajectory is and give reward for trajectories with high entropy. the problem is that this too becomes a reward tuning task.
    Regarding your last section about why the AI is getting confused by the wall before the wall: this could be caused by anything, and without further details I can‘t give an exact reason. It might be related to reward shaping, feature selection or feature processing.
    Hope this helps!

    • @shadowfire04
      @shadowfire04 Month ago +2

      this is a nicely formatted way of expressing the same thing i was going to suggest - rewarding unexpected outcomes that deviate from the norm. adding some randomness would probably also help, or at least be an interesting experiment.

    • @yoshtm
      @yoshtm Month ago +7

      @diyartaskiran Thanks for your comment!
      -I'm already using Advantage, with the Dueling Network architecture.
      -I've tried SAC several times in the best but couldn't get better performance. Maybe some hyperparameters were not properly tuned though.
      -Regarding entropy and introducing some variances in the policy: from my experience it has a negative effect regarding risk-taking in Trackmania. When the agent knows there is some random noise in its policy, I think it tends to drive safer in some sections to take this randomness into account. For example it drives further away from walls, which is often slower.
      -Regarding the WR spawns, I agree this approach isn't ideal! I should have at least added noise in the position/speed of these spawns I think

    • @maximereynouard1588
      @maximereynouard1588 Month ago +2

      @yoshtm I wonder whether an easy way to achieve all of that would be to aggregate the reward of your policy over several runs near the ramps through a max function, if there indeed is some randomness in the outcome (if it's not really randomness but chaos => small action noise allows you to say the chaos = randomness). This has the benefit of privileging policies that have a better chance of establishing a record (you don't care if 9 out of 10 of your runs are terrible if 1 out of 10 is way above what anyone can do).
      I am not an RL expert, just a mathematician slightly versed in ML

    • @paulmayer8782
      @paulmayer8782 29 days ago +1

      ​@yoshtm I think the problem isn't the method. The problem is to much freedom.
      All players drive past the pole on the right side. Except for one Link drives in between them. I think this is the key to being faster, his jump is lined up differently and the result is passing in between the poles.
      So instead of restricting the track on the way to the jump, I think you could try to block the path so that the AI can only run in-between the poles.
      This should force it to try out different jumps, since getting a jump and hitting the wall will be very slow.

  • @patateflambee1358
    @patateflambee1358 Month ago +9

    From content to editing, everything is near perfection. You sound a little bit french, alors merci pour ces vidéos, le montage est incroyable, et le travail derrière monstrueux. Franchement, tu mérites plus de reconnaissance, tes vidéos sont niches et en même temps tellement abordables et passionnantes, ça force le respect. Chaque fois, c'est un plaisir de les regarder, et tu m'as fait découvrir l'univers du speedrun de trackmania d'une manière tellement originale, c'est sûrement devenu ma licence de jeu de course préférée! Alors merci encore, et bonne chance!

  • @denissopichev5986
    @denissopichev5986 28 days ago

    The best RL videos in RUclips, which explain intuition about tuning. I wish to have similar videos, but for robots/humanoids too😅

  • @morzie4075
    @morzie4075 Month ago

    wow i loved every moment of this, i couldnt do what you do with programming but it really does interest me

  • @Ozaryk
    @Ozaryk Month ago +78

    6:37 - Is it possible to reward it for having less air time?

    • @maestroeragon
      @maestroeragon Month ago +6

      less air time = it goes faster, so it's implicitely being rewarded for less air time

    • @Mauz1
      @Mauz1 Month ago +8

      ​@maestroeragon While true, perhaps it's not enough? Many behaviours account for laptime, and thus compete for the same reward. By adding an explicit reward for less airtime, it should focus more on it.

    • @wroomwroomboy123
      @wroomwroomboy123 Month ago +5

      I think the correct approach is to give rewards for quick rotation speed. The AI never did the kind of aggressive flip people do in hunt sessions.

    • @evancombs5159
      @evancombs5159 Month ago

      ​@Mauz1 I think the real issue here is not enough randomness in each generation. So it is locking in on a single solution, then not varying much.

    • @RandomJoe-d9x
      @RandomJoe-d9x 21 day ago

      @Mauz1 Nah cause too low airtime isn't always the best and could eat up other strategies. Rewarding for time is enough cause AI will find the most ideal trajectory

  • @LordDecapo
    @LordDecapo Month ago +15

    The post-landing trajectory shoots across the track, you could try putting a gateway over there that rewards the AI the closer they get to that shoot across action.

  • @THE_tomjer
    @THE_tomjer Month ago +20

    13:21 is that a nose boost

  • @andrewkr4e224
    @andrewkr4e224 28 days ago

    very interesting to watch this, ty for your work

  • @Christopher-TM
    @Christopher-TM 24 days ago

    I really really liked this video. I've not played this game in years but I found this video so absolutely fascinating and well made.

  • @XerShadowTail
    @XerShadowTail Month ago +67

    Considering the game uses floating point calculations especially for collisions, you may want to consider putting in each individual bit of the float for the car's position, speed, and/or orientation. The randomness is still deterministic since you can play it back, however when you calculate floats sequentially you can end up with calculations that end up operating on denormals and values that are off by one or a few ulps. This can be compounded especially if round, floor, or ceil are used if the ulp crosses a threshold. The AI should be able to pick up the pattern on bit changes even though it would effectively look like noise.

    • @MichaelMikeMigos
      @MichaelMikeMigos Month ago +1

      just reward it going between the two columns after the flip

    • @illumiyaa
      @illumiyaa Month ago +5

      What does this comment even mean... none of this makes sense... are you suggesting converting the floating point into induvial bits and passing each one as a input parameter? That makes no sense, it would remove all meaningful structure for the model... What does "AI should be able to pick up the pattern on bit changes" even mean... this is also a RL model, which is even more sensitive to noise than supervised learning, which would make this approach make even less sense... It would also make the training super slow, you'd need a larger input layer with more weights and way more gradient updates, all you'd get is slower convergence, worse policies and tripled training time. I mean just think about it for a second, you got the car going at 100mph or however Track Mania works I don't play this game, so your input is like:
      10010000000000000000000
      Then we go ever so slightly faster, 100.1, now you got:
      10010000011001100110011
      So a bunch of things change without the ability to really see a clear pattern. This entire comment makes no sense. I mean you can try it, train a little supervised learning model (which should perform better with your freaky architecture and look at the results), here is your thing:
      Epoch 10/50 | Val MSE: 0.4058
      Epoch 20/50 | Val MSE: 0.3614
      Epoch 30/50 | Val MSE: 0.3478
      Epoch 40/50 | Val MSE: 0.3990
      Epoch 50/50 | Val MSE: 0.3527
      And here is a normal model:
      Epoch 10/50 | Val MSE: 0.3898
      Epoch 20/50 | Val MSE: 0.3488
      Epoch 30/50 | Val MSE: 0.3243
      Epoch 40/50 | Val MSE: 0.3137
      Epoch 50/50 | Val MSE: 0.3066
      What so weird about this to me is this isn't even a concept, not a theoretical thing or something even discussed, you just like made this up in the moment off your noggin

    • @aw2031zap
      @aw2031zap Month ago

      @illumiyaa go to your browser, hit f12, type 0.1+0.2 in the console and hit enter
      you will not get 0.3
      This is a simplified example, but basically, in floating point, you get accumulating arithmetic error which is essentially a "randomizer"
      games which have continuous coordinate systems + don't use integers = there are actually errors in the code! that are rare, but still happen
      ruclips.net/video/lEBQveBCtKY/video.html

    • @evancombs5159
      @evancombs5159 Month ago +8

      ​@MichaelMikeMigos I feel like guiding the AI to the solution defeats the purpose.

  • @CHAT_TASTROPHE
    @CHAT_TASTROPHE Month ago +6

    For tomorrow, I have 3 tests and a lot of homeworks... I was working during 1 hour in a row, and for my break I just saw a notification on yosh profile. YAY !!

  • @fluffsquirrel
    @fluffsquirrel Month ago +4

    Beautiful video, can't wait to see the rest of the tracks your AI masters!

  • @nordmu
    @nordmu 28 days ago

    thanks for this amazing journey!

  • @TheGoodTheBadAndTheBitcoin

    your visualizations of machine learning data are really top notch. Well done!

  • @Kugelteam
    @Kugelteam Month ago +44

    Port the AI to Trackmania 2020 and make it play Deep Dip!

  • @chuchun-boy
    @chuchun-boy Month ago +4

    having a fucking god like machine try 700k times in the span of a few days and still barely win you when it never gets bored, tired or unmotivated or loses focus and energy or forgets and loses muscle memory, this is a great indication of how EPIC we are as species.

  • @aapoxs
    @aapoxs Month ago +49

    This man just bruteforces a map with AI every few months and makes bank.
    infinite money glitch

  • @andreaschiavinato4528

    Man, never played at this game but your content is pure gold, keep it up !

  • @ogfelle
    @ogfelle Month ago

    The man has done it again. Great job!! 👑

  • @LinesightMKWii
    @LinesightMKWii Month ago +10

    Incredible video yet again! This AI is part of what motivated me to create my channel, and getting to watch the continued progress makes me want to get my own AI beating more WRs.
    12:51 As for why the AI couldn't match the WR here: I think you're 100% on the money. I think the problem is that because the learning algorithm struggles to even learn the flip in the first place, once it finds a method that works, it becomes very hard to find a second or third method that works. Because the section is so noisy, and you've forced it to take that very noisy path, the amount of actual learning it can do is reduced. I think this is a limitation with the learning algorithm itself, and not something that can be solved with sophisticated reward functions. Good reward tuning will compensate and help to improve (I would consider blocking the normal ramp jumps part of this tuning) but that only raises the ceiling of the agent; Improving the algorithm raises both the floor and the ceiling simultaneously. I've been looking into DQN extensions I can add to Linesight (although not with much luck, I'm not exactly an RL scientist, moreso a python hobbyist) and UPER (Uncertainty Prioritized Experience Replay) caught my attention as something that could help to reduce noise in the network when dealing with heavy Pseudo-RNG sections.
    One idea for reward functions though: Directly comparing to whatever the agent's current PB is. A moving target that gives a bonus reward to the AI when it is closer to the current fastest run. In theory this would solve the problem of the AI finding an 'ok' solution and never deviating away from it, because if it can drive significantly faster in a certain section (with some tuning to make sure it's not taking a bad line to do so) then it should prefer to beat its 'ok' solution in that section, learning to take more risk. It might be a bit more unstable, as the reward function changes dynamically, but it's something I'm going to be testing with Linesight soon™ to see if it can be effective for learning difficult tricks.

    • @yoshtm
      @yoshtm Month ago

      Hi, thanks for this nice comment :) I watched your mario kart videos, very cool!
      I've experimented with PER, but I haven't heard about UPER, I'll check that
      I've been thinking about use a moving target for the reward like you said. But when you are in the middle of the track, it's hard to know if your current run is better than the fastest run. Sometimes you can be ahead at some point, but with a lower speed, and thus have a strategy that will result in a slower finish time. So it might be hard to tune this kind of reward system for intermediate sections of the track. Unless it's a sparse reward system where only the final finish step is rewarded?

  • @MrBrukmann
    @MrBrukmann Month ago +31

    Gran Turismo time trial GOAT here. 10:22 look where the red car is. I predicted the red car was the one who landed properly because of its weight transfer. At speed, tiny changes in momentum add up to major changes in results, especially in the case of a brief traction loss. The red car starts further to the left, so it has to throw its body to the right at a slightly higher rate, leading to a slightly higher momentum and greater loading. They all cut back to the left at your "indistinguishable" moment, but with the visually hidden variable of weight transfer to the right side, it clips the ramp more gently with less-loaded tires, retaining the forward vector momentum more and increasing axial rotation on that vector. Nothing mysterious at all if you know what you're looking at. AWESOME video though, as usual, not to take away, you're just wrong. Nothing random is happening.

    • @MrBrukmann
      @MrBrukmann Month ago +13

      Again at 11:18 look at Link's blue line. It is obvious he is shifting the weight to the right side more dramatically.

    • @MrBrukmann
      @MrBrukmann Month ago +10

      12:28 you are gating where you think the key moment is, but the key moment is an 1/8th of a mile back setting up the weight transfer. You're hand selecting based on improper inference.

    • @HaloNeInTheDark27
      @HaloNeInTheDark27 9 days ago

      What

  • @mickhalsband
    @mickhalsband 6 hours ago

    Cool video. Did you use some kind of temperature / creativity parameter?

  • @5ky13apg0d
    @5ky13apg0d Month ago

    I just really enjoy your vids they’re like my mental floss just what your doing your presentation and I just generally like trackmania

  • @EgeauTM
    @EgeauTM 29 days ago +8

    I'll give you my two cents on why the fastest line isn't learned.
    Source: I'm a PhD student in a research group on "AI" (as so many are these days.) RL is not exactly my area of expertise (I'm originally in formal methods, the group is mostly graph-learning as well as some NLP 'cause that stuff gets funding), but I helped teach a master course on it for a while.
    I think your explanation is correct that your current reward structure incentivises consistency over single great runs. All runs that crash still get their gradients added to the Q-table. To be more precise, default DQN optimises the expected value of the reward of a single run, which is not what you are after.
    I think many of the people suggesting randomness are flat-out wrong. Adding more randomness in RL is known to frequently end up with the opposite effect, where you train to get runs that are robust against random inputs, as the previous inputs end up being penalised for them. (The expected value of the reward of a single run that might have random inputs is higher for safer runs.) Q-learning innovates on this by performing random actions off-policy. There is a really simple school exercise you can do where you train a reward table to find a path from A to B where the fastest line walks past some "cliffs" that end the run with negative reward. Q-learning, which you are already using, will find the path walking next to the cliff, but other random approaches like SARSA do not. Q-learning's hyper parameter epsilon (or group of hyper parameters if used epsilon-decay) are what you need to play with to get different randomness, and I presume you already did.
    These kind of approaches, as well as other suggested ones like increasing the reward for single great runs (however that is defined) will only come so far. The one thing I did not see suggested you could still try is to multiply all gradients by 0.01 or something for failing runs, but ,still, if your reward structure is over a single run you are going to see somewhat "safe" behaviour. This means the trivial thing to try, if you have not already, is to re-structure the rewards to go over multiple runs: Instead of training over the expected value of reward, try re-calculating the loss such that it trains over the expected value of the max of ten, or a hundred rewards. The generalisation of this is called quantile regression, where you learn the whole distribution of returns instead of only the mean, after which you can disproportionately act upon the policies that have the possibility to end up in the higher quantiles. There are several algorithms for this you might want to play with, none of which I am familiar with, though I know these are known to work quite well for video games and such.
    Not sure if you are already doing these things, though. I did not look at your code, and you have spent a lot of time on this problem already. ^.^

    • @tommurphy1153
      @tommurphy1153 13 days ago

      Those seem like good suggestions but I cant help thinking that RL is just the wrong algo for this job. Wouldnt something "genetic" like NEAT be a better fit to the problem?
      Selection of the next generation could be based on fractions of the track ie: most improved position from this bridge to the next earns you a place in the next gen as well as overall performance, to bubble up local "great moves"

    • @EgeauTM
      @EgeauTM 12 days ago +1

      ​​@tommurphy1153NEAT learns way slower and would likely not work. It is effectively surpassed by DQN in all but the simplest tasks.
      The sentiment you and others are expressing about the gradient decent of deep learning and how an evolutionary optimisation algorithm should be better is very old. It's intuitive, but it experimentally doesn't hold.
      Stuff like this depends a lot on the task, but the closest personal example I can give is from a MSc course I helped teach for four semesters (twice as a student one year, twice as a PhD the next) on learning behavior of real robots using a mix of lab training and simulation. Students were free to chose their own training algorithms, as the course was on the lab/simulation interaction. We held a competiton for bonus points at the end of it with the final and hardest task, and all winning groups I was there to see (including when I took the course myself) used DQN or Actor-Critic. There were always people who tried something more evolutionarily, but it never scales to complex tasks, which is also coroborated by the literature.

  • @HbilTM
    @HbilTM Month ago +4

    I played this map for about 6 hours myself without any flips being even close to good, untill i suddently got a flip ahead of 2nd place, sadly i failed it in the last part of the map as i didnt do the ending properly and landed in a bug right infront of the finish. either way this goes to show just how insanely random this flip is, and how incredible links WR is.

  • @FleakeeYT
    @FleakeeYT Month ago +20

    I like how, after 5 years, every single reinforced learning video explains what reinforced learning is at the beginning still.

    • @johnzhu5735
      @johnzhu5735 22 days ago +1

      reinforcement learning has been around a lot longer than 5 years.

  • @MarAt0m616
    @MarAt0m616 25 days ago

    Beautifully made content. Thank you for your hard work!

  • @frederickrueger7861

    It's always a treat to watch your videos. Thank you!

  • @Gwilo
    @Gwilo Month ago +3

    you could've just added a variable for max inputs per ms to avoid cheat detection and taken every world record. but you gave us something great to watch for the past few years, and a few more to come

    • @shalevforfor5550
      @shalevforfor5550 11 days ago

      I'm pretty sure the ai input system is taken using mods, so it doesn't matter, it won't be playable online

  • @Hokiebird428
    @Hokiebird428 Month ago +12

    1:34 So did most of us humans also have trouble with that part! You have to hit that jump _perfectly_ to get the author medal.

    • @nuclearmedicineman6270
      @nuclearmedicineman6270 Month ago +1

      I ragequit the A levels after getting +0.03 behind AT 5 times in a row. I hate that track so much.

  • @yorkwestenhaver8680
    @yorkwestenhaver8680 Month ago +4

    Hey Yosh huge fan! RL expert here.
    Been watching since the first one. Have you seen any of the Robotics RL work adjacent to “Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion?”
    Any of the recent work in robots from Aviv Tamar might be a good place to look for inspiration.
    Also might be worth looking into the new C-JEPA paper from Yann LeCun.

    • @yoshtm
      @yoshtm Month ago

      Hi thanks for your comment, I'll check that!

  • @TheofficialVRkid-g7y

    This was an amazing video. This deserves a subscribe.

  • @mycoloth
    @mycoloth Month ago

    great work! thanks for the great content and insight of your project

  • @Roobotics
    @Roobotics Month ago +4

    12:28 if you wanted it to imitate link, you should be blocking off the right side of the landing pad and force it to find a line that lands a flip in the center? You are forcing a start condition, not an intended and partially known end-result. As long as it hugs the right it will never find his line. As for deviations in driving, maybe you need some sort of mutators that encourage wider or tighter lines. As for the 'this is a RNG jump' I kind of don't buy that, it's more that the angles for entry it's picking, cause a wildly fluctuating output based on the most subtle game conditions, collision tick checks, accel values, etc. Changes to the line entry angles might very well make the output side more contiguous and controllable. Force it into sharper angles, force it into wider angles, then evaluate those outcomes. It's stuck on an 'optimum angle' that is likely limiting what future maneuvers are possible.

    • @SkopOneFour
      @SkopOneFour Month ago

      I think it would be also possible to work on rewards based on the rotation speed, or the longer contact between the left wheels and the ramp (seems to be at least).

    • @Roobotics
      @Roobotics Month ago

      ​@SkopOneFourI like this idea, but maybe one step further is just rewarding for landings that carry good speed and an acceptable exit angle? Then it can do whatever voodoo it likes, like clipping the back wheels on the edge of the crash barrier to soften the twist of the landing like it did in one of the replays. Something I don't think any of the players even did.

    • @LinkMaxTM
      @LinkMaxTM Month ago +1

      My flip isn't even the fastest in the top10, logic's was slightly faster but his landing was bad on the next jump. The flip is actually functionally random, trackmania is a deterministic game but it's more deterministically chaotic. Float point differences in position can radically change the outcome hell starting to accelerate later than instantly will radically change what happens even if you do nothing but press forward, his "AI learns to drive on pipes" video is a good demonstration for that. Despite how it looks everyone who has hunted this flip will know how disgustingly random it is. it's most likely possible for the ai to get the flip somewhat more cons but it will never be the fastest if consistency is what it is going for as the fastest flips are always outliers.

  • @Attewir
    @Attewir 29 days ago +7

    My two cents
    As you mentioned, the RL agent goes for the local and is avoiding bad attempts
    Another agent could pick the macro strategy so different valleys are found
    Let the AI imagine the result, humans usually know what the desired result looks like, not just how fast it will be (to elaborate, the AI is focused on the timer, not the desired replay which is opposite from what humans are focused on)
    You already segment some attempts by hard cutting/starting runs, but perhaps: limit how far the AI sees and let it focus on what's right ahead (let the AI figure out how it's impact on the result increases the closer it gets to the jump)
    ps. my pet peeve is that you already know what the record looks like and try to aim for that. possibly look at the same data the AI is looking at (i.e. how high the rewards are), not the line comparison :)
    *sorry for wall of text*

  • @prxperrr
    @prxperrr Month ago +3

    7:16 prxper, you say?

  • @swarmslab3074
    @swarmslab3074 16 days ago

    Fun video. Beyond the technical part, I love the drama in the story!

  • @lorenzoperuzzi3355

    incredible video as always, pls dont stop

  • @Evo_Goblin_Cage1
    @Evo_Goblin_Cage1 Month ago +8

    2:29 Wirtual being top 150 is crazy

  • @PampersRockaer
    @PampersRockaer Month ago +4

    AI PhD Student here: To get the AI to get more innovative, often an Evolutionary approach is used:
    1. Train a set of AI nets (e.g. 40) at the same time, let them perform some runs while optimizing via Q-Learning and compare their reward at the end.
    2. Take the top 20 of the nets and mix the weights together by taking half of the weights from one and half from the other. Make 40 new nets from this
    3. Add a bit of randomness to a few percent of new weights.
    4. Go to step 1
    The last step "encourages" the existing net to innovate by "mutating" the behaviour.
    This setup tries to emulate nature's survival of the fittest setup and mutation helps moving out of a local minima.

    • @laundmo
      @laundmo Month ago +2

      I commented the same: evolutionary algorithms ftw! My idea was more to let the different AIs drive their runs without training, and then only keeping the best runs and training a new generation of models by combinations of the training runs. probably ends up very similar, but a different approach nonetheless

  • @mukonank783
    @mukonank783 Month ago +51

    4:23 This is exactly how human ingenuity will defeat Ai in the future. Once it hits its limits it can’t come up with creative plans

    • @malteh.5075
      @malteh.5075 28 days ago +2

      I dont see why this would be the case. why couldnt it come up with new ways? in this specific setup there is no high variance in approaches once a certain level ist met. But this is setup specific not AI specifc.

    • @TheDudeQB
      @TheDudeQB 28 days ago +15

      And also why calling this an intelligence is misleading. It's an optimization algorythm. Everything this AI "learns" will be useless on a different track.

    • @sidpomy
      @sidpomy 28 days ago

      This is essentially an important premise to the entire Expeditionary Force sci-fi book series

    • @Music-oi4rh
      @Music-oi4rh 26 days ago +1

      defeat in what? ai is a human made thing lol

    • @NormanTheDormantDoormat
      @NormanTheDormantDoormat 26 days ago

      I mean, it has to be herded into copying the human strats after running like a million attempts one after the other. It literally can not "come up" with anything, there is no thought process, creativity or skill whatsoever.

  • @rmac0101
    @rmac0101 28 days ago

    This is really well produced, great work

  • @fen4128
    @fen4128 21 day ago

    Incredible combination of technical knowledge, scientific thinking and great video editing, combined with an amazing game. Peak content, kudos!!

  • @extertimator7242
    @extertimator7242 Month ago +3

    Français ici, on reconnaît

    • @reveauthomas1023
      @reveauthomas1023 29 days ago

      De ouff je trouve qu’il a un accent a couper au couteau c’est sur il et français 😂

  • @VanguardJester
    @VanguardJester Month ago +5

    this is exactly why A.I physically can not, and will not ever replace humans with our current understanding.
    It can't innovate on it's own, only "achieve the objective." This is also why it can't make art, only copy. Not until it can draw of it's own volition, without any input.

    • @PampersRockaer
      @PampersRockaer Month ago +8

      AlphaGo showed that it certainly can invent and innovate. It found new strategies no one found before and top players felt playing against an alien. This is more of an example of overfitting and to narrow loss/fitting functions.

  • @SuravoidYT
    @SuravoidYT Month ago +11

    7:30 what song?

    • @ZavrisV2
      @ZavrisV2 Month ago +3

      Cold War Games by Gabriel Lewis, I'm pretty sure

    • @yoru_on_120
      @yoru_on_120 29 days ago +1

      Idfk Shazam it or sum

  • @alexeyv7323
    @alexeyv7323 Month ago

    That's really impressive. Great work

  • @l0lwarrior470
    @l0lwarrior470 Month ago

    crazy videos as always, thanks for the quality

  • @nakchAk
    @nakchAk 29 days ago

    That was awesome keep up the excellent work

  • @OldNickskeeee
    @OldNickskeeee Month ago

    This video is amazing! The editing and the explanation where spot on! I can't wait for future video's!

  • @MatthieuVion
    @MatthieuVion 27 days ago

    Incredible work as always! Can't wait the next video 😊

  • @TheLastTater
    @TheLastTater 27 days ago

    This was awesome! I can’t wait till the next videos

  • @ErrorAcquired
    @ErrorAcquired Month ago

    Awesome! Keep up the good training!

  • @tonys1406
    @tonys1406 Month ago

    Phenomenal content, great job.

  • @G4Cidib
    @G4Cidib Month ago

    I love these AI videos, You're one of my only notification i let ring.

  • @zIggyholmxd
    @zIggyholmxd 14 days ago

    This is the greatest video i have ever seen in my life, and I feel like it has a deeper meaning. Showing that ai is not human, and never will be. Side note your french accent is perfection.

  • @shalevforfor5550
    @shalevforfor5550 11 days ago +1

    10:20 bro we talked about it already in previous video 😂 CHAOS ohhh scarrrwwy 😅

  • @user-us8jz7pu8z
    @user-us8jz7pu8z 14 days ago

    Should let it go for a month. I know the difference would be subtle, but 1 second quicker would would look insane