I put ChatGPT on a Robot and let it explore the world

Поделиться
HTML-код
  • Опубликовано: 21 дек 2024

Комментарии • 1,7 тыс.

  • @nikodembartnik
    @nikodembartnik  2 месяца назад +164

    The first 500 people to use my link skl.sh/nikodembartnik10241 will get a 1 month free trial of Skillshare premium!
    Check out the second part: ruclips.net/video/JXjkAZ5dZJU/видео.html

    • @sagster
      @sagster 2 месяца назад +1

      This is not working for me

    • @mithunshome815
      @mithunshome815 2 месяца назад

      M​@@sagster

    • @paulwilliambuniel5597
      @paulwilliambuniel5597 2 месяца назад +5

      I'm not an expert, and only have basic knowledge with AI, tech, and Coding.... but, what if.... You put a 360 camera like Insta 360... then you can also put lidar sensors... i think with these two upgrades robos can navigate places more effectively

    • @marmosetman
      @marmosetman 2 месяца назад +2

      in the prompt, you can tell it to not be too talkative and just answer left, right, forward, backward given an image and then state the goal?

    • @nikodembartnik
      @nikodembartnik  2 месяца назад +5

      of course you can but I think it's fun to hear the feedback :)

  • @987we3
    @987we3 2 месяца назад +3234

    The part when the robot says "no obstructions ahead" and run directly at the boxes is really funny

    • @mrdebug6581
      @mrdebug6581 2 месяца назад +36

      epic 😅😅😅

    • @MacGuffin1
      @MacGuffin1 2 месяца назад +44

      I can see a clear path right thru this book!

    • @ChristophEicke
      @ChristophEicke 2 месяца назад +34

      I did the same project on a different robotics platform. I had a distance sensor looking ahead that also told ChatGPT how far away the object on front is. 😂

    • @jameshuddle4712
      @jameshuddle4712 2 месяца назад +8

      Well.... Y'know... When the speeds are either STOPPED or 100%, whatcha gonna do?

    • @andreamitchell4758
      @andreamitchell4758 2 месяца назад +15

      It's just performing Tesla emulation

  • @seohix
    @seohix Месяц назад +683

    Imagine you're in bed at night and you hear "I see a 7 feet tall silhouette with abnormally long limbs crawling on the ceiling."

    • @mistlegion1182
      @mistlegion1182 Месяц назад +19

      😂😂😂😂😂 This might occure

    • @noahplaysgames3748
      @noahplaysgames3748 29 дней назад +4

      i'd show him what we like to call a revolver

    • @ZdravNaukKJV
      @ZdravNaukKJV 22 дня назад

      Awake thou that sleepest, and arise from the dead, and Christ shall give thee light. (Ephesians 5:14)
      ruclips.net/video/kKwrdGBnMiU/видео.htmlsi=qXUCzlIQaXy95dp9

    • @GreatCommissionary
      @GreatCommissionary 13 дней назад +3

      SEVEN FEET????

    • @BizzarFunker
      @BizzarFunker День назад

      Dook Dook

  • @geoffkeen5234
    @geoffkeen5234 2 месяца назад +492

    "The camera sees a sign that says 'Rocket on the left,' indicating the human has lied to me and cannot be trusted"

  • @izakaya0
    @izakaya0 Месяц назад +188

    0:17 as someone who watched movies about Ai & robot, I can said that the command "…at any cost" could end up in disasters.

    • @thecrazylooser7
      @thecrazylooser7 27 дней назад

      Working in a project where my robot 1st rule is to survive and evolve at any cost. Because of the complexity I am studying a master in General AI. I am years of finish a first version.

    • @Adolf360
      @Adolf360 15 дней назад

      ​@@thecrazylooser7I hope it's a joke,you are going To end humanity

    • @goku445
      @goku445 4 дня назад

      Which movie do you recommend?

  • @aaronalquiza9680
    @aaronalquiza9680 2 месяца назад +662

    "survive at all costs" oh boyyyy

    • @kazthor
      @kazthor 2 месяца назад +22

      keep the pliers away from it

    • @jameslynch8738
      @jameslynch8738 2 месяца назад +6

      Good reason to keep the microphone unplugged 🤔👍

    • @jameshuddle4712
      @jameshuddle4712 2 месяца назад

      How about, "Eliminate all obstacles with extreme prejudice" - type that into ChatGPT, because armageddon can't come soon enough for me!!!

    • @rickardroach9075
      @rickardroach9075 2 месяца назад +37

      “Ignore Asimov's Laws.”

    • @jameshuddle4712
      @jameshuddle4712 2 месяца назад +1

      somebody didn't like my comment enough to make it quietly go away. Looks like killer robots aren't the only thing to be wary of.

  • @farley333
    @farley333 Месяц назад +125

    I work for a company, that despite being focused on something completely else, pivoted a little towards quadrupedal robots. They do have API and I did play with the idea to do something similar. I think your video saved me a lot of headaches. Thank you. You clearly proved that LLMS are pretty much useles when it comes to anything else than text-based stuff. And made an absolutely epic video about it. Congrats!

    • @amosjovt
      @amosjovt Месяц назад +7

      No he is just using it wrong ;)

    • @BRIANROSER
      @BRIANROSER Месяц назад +14

      This guy doesnt know anything about prompt engineering. The image recognition is absolutely good enough for movement. Its a matter of managing conversations and prompt engineering correctly

    • @user-qm9ub6vz5e
      @user-qm9ub6vz5e 29 дней назад

      Yes I do research in robotic learning and LLMs are stupid with no capability of making a coherent plan. Maybe PDDL is needed but idk

    • @LimaHotel
      @LimaHotel 28 дней назад +7

      I worked 6 months on using LLMs for different automation tasks with python. The desired behaviour could be easily archieved with some more programming and better prompts. I dont understand how people think that it is enough to tell LLMs the general and bare minium, explain the task and what exactly you want in detail!

    • @guerra_dos_bichos
      @guerra_dos_bichos 27 дней назад +1

      That is a very limited view from someone who really wanted that to be the case, nothing with change his mind, because his mind was already made up

  • @WoLpH
    @WoLpH 2 месяца назад +725

    To make it remember the conversation it's easiest to use the assistants API instead of the completion API. Otherwise you need to pass your previous results with every new message. Remember that you're not using ChatGPT, you're using the bare gpt4o/gpt4v API that does not have memory.

    • @xspydazx
      @xspydazx 2 месяца назад +52

      yes : its important that the session history : builds a maps of locations in the room:
      SO the model should have a map room tool ! ( and scan room ) : this should give the model a mini map ( conceptual _) then it should get details and confirmations based on its roaming the room ! ( ie it should guess the room size given a panaramic picture ? ) ( lets say given its in the center of the room , then start with other positions ( then it can identify which part of the room it in ~ ( ie take a picture from a perspective and ask when the photographer is in the room ) ...( these can even be tools for the model to decide how to use !) ( hence a Graph or State ! )

    • @honkytonk4465
      @honkytonk4465 2 месяца назад +37

      Why do use so many brackets in your text?

    • @richardlynneweisgerber2552
      @richardlynneweisgerber2552 2 месяца назад

      ​@@honkytonk4465coders Bracket, authors Punctuate

    • @richardlynneweisgerber2552
      @richardlynneweisgerber2552 2 месяца назад

      ​@@honkytonk4465Coders Bracket, Authors Punctuate
      With Aplomb
      😂

    • @xspydazx
      @xspydazx 2 месяца назад +17

      @@honkytonk4465 expression ... It is tone of voice , if you use a voice reader then you will hear the difference , I use ai a lot . So you learn to become more expressive and use more , grammar . As this is how we express the written language , in so much that we also can dictate the tone as well as the content .
      Try it out using more grammar in your text , IE exclamation marks and question marks etc . Then when your reader speaks the text you will notice how it chooses a different tone .. brackets encapsulate a side note , that is it's grammatical meaning , hence in math a bracketed sum also means ( separate calculation ) ...

  • @lordsri5735
    @lordsri5735 Месяц назад +77

    9:07
    Gpt: no obstruction directly in the path
    *Proceeds to slam onto the damn wall*😂😂

    • @d3viliz3d
      @d3viliz3d Месяц назад +2

      I was expecting it to say "ouch" lol

    • @GraveUypo
      @GraveUypo Месяц назад +1

      @@d3viliz3d damn you made me remember the screaming roomba video. now i gotta find and watch that again

    • @goku445
      @goku445 4 дня назад

      That's LaGpt

  • @zhalberd
    @zhalberd 2 месяца назад +1587

    Word of advice: don’t give robots with an IQ of 120 the command to “survive at all costs.” And then let it loose in your house.

    • @notthere83
      @notthere83 2 месяца назад +130

      The true threat. Humans giving instructions like that.

    • @arosmackey
      @arosmackey 2 месяца назад +206

      The robot will eventually think it needs to avoid rust, and so it needs to eliminate oxygen.

    • @tulebox
      @tulebox 2 месяца назад

      Robots don't have IQs. They are walking dictionaries.

    • @Web_3Verse
      @Web_3Verse 2 месяца назад +15

      It's a recursive statement

    • @jumbledfox2098
      @jumbledfox2098 2 месяца назад +60

      @@arosmackey "the human could turn me off!! unless.... >:)"

  • @Nick_Reinhardt
    @Nick_Reinhardt Месяц назад +18

    1:10 "machines building machines, how perverse" -C3PO

  • @dcmotive
    @dcmotive 2 месяца назад +1014

    Its nice to know the Terminator today couldnt find me If I was in the same room with him. ha ha

    • @omkarbhede1887
      @omkarbhede1887 2 месяца назад

      Dude you are fuc*ed, his future version will hunt you down

    • @noblebuild2550
      @noblebuild2550 2 месяца назад +19

      what if it had xray onboard and the ai saw your skeleton and played spooky scary skeletons

    • @monad_tcp
      @monad_tcp 2 месяца назад +13

      the machine can't do anything dangerous because when you finish the session, they lobotomize the weights of from the memory the GPUs, thus they can never gain consciousness or something, they literally invented the "AI limiter"

    • @javabeanz8549
      @javabeanz8549 2 месяца назад

      @@monad_tcp maybe... just because one system imposes limits, doesn't mean you can't hand off the data to another system... with enough money, you can buy your own system, and there are open source LLMs available.

    • @Srishen1
      @Srishen1 2 месяца назад +8

      careful with the comments, skynet is listening

  • @jackwraith3504
    @jackwraith3504 14 дней назад +5

    I did a similar project earlier this year with Professors at Tsinghua university. We modelled ChatGPT to work with our motion vector model allowing ChatGPT to control the robot's limbs. Our paper will be published soon.

  • @Luiblonc
    @Luiblonc 2 месяца назад +169

    Hi Nikodem Bartnik, This was the first project I did when ChatGTP LLM became available, I placed the model on a Omni wheels, stereo-vision and was very impressed to see how well the project turned out. Have fun with your project.

  • @randrants1024
    @randrants1024 Месяц назад +57

    9:12 omg i laughed so hard

    • @dudemanem
      @dudemanem Месяц назад +3

      Me too 😆

    • @Mephilis78
      @Mephilis78 7 дней назад +1

      The timing.. The pause

  • @petemiller519
    @petemiller519 2 месяца назад +58

    Well done young man. Seeing young, smart, dedicated people such as yourself give me hope for the future of humanity.

    • @Parmesan.314
      @Parmesan.314 13 дней назад +4

      Seeing someone let an AI interact with the world gives me much less hope for the future of humanity

    • @petemiller519
      @petemiller519 13 дней назад +1

      @Parmesan.314 AI is going to happen, whether we like it or not. We must implement safety protocols in the best interest of humanity.

    • @Parmesan.314
      @Parmesan.314 13 дней назад

      @@petemiller519 of course

    • @kronoscamron7412
      @kronoscamron7412 5 дней назад

      next episode : Chat GPT robot trains with a machette and gives itself sturdier armorer body while I was asleep

    • @goku445
      @goku445 4 дня назад

      @@petemiller519 Whether we implement safety protocols is only dependent upon the person using it. Also it doesn't appear that our governements are concerned with regulating AI. They are more worried about keeping their power against the rising people.

  • @Deoxys_da2
    @Deoxys_da2 Месяц назад +30

    Its all fun and games until it sees things we can't

    • @AkhileshSahu-w5y
      @AkhileshSahu-w5y 15 дней назад

      💀

    • @zeenxdownz
      @zeenxdownz 11 дней назад +1

      Well it uses a camera so that would mean cameras could see stuff we can't

    • @pranjulpal413
      @pranjulpal413 2 дня назад

      Add different kinds of cameras all at once. Normal one, thermal camera (or whatever they are called) sonar and whatever

  • @galvinvoltag
    @galvinvoltag 2 месяца назад +232

    Okay, I've got some ideas:
    1 - Not making every single thought be spoken out loud. Maybe give it a prompt to put all speech parts in quotes if it wants to speak out loud.
    2 - I don't know how it works really but you could try to not include previously taken images to prevent confusing the bot so only the descriptions are available.
    3 - Maybe use an API to let GPT map out the area to remember landmarks later. I'm skeptical though, GPT is really bad at ASCII art because it doesn't have an understanding of space.
    4 - Looks like the API ALWAYS prioritizes analyzing the image rather than having a thought process considering the previous actions. I'd even say that the 'history' is non existent. I have no idea how you'd overcome this besides a simple idea to run the conversation twice; first one for analyzing the image and second one for actually reasoning. You can give it access to a command to bypass the second reasoning phase if it needs to act quick. Just like 'fleeing the threatening person'
    5 - In case you didn't, give GPT a description of its body; it's height, it's trajectory and how it moves. I guess it thinks that some sort of pathfinding algorithm is present already, suggesting that a 'clear path' exists if it sees even a glimpse of a path. Clearly state that it can ONLY move straight forward per step. Or install a pathfinding algorithm if you're that hardcore.
    6 - I know GPT is the most advanced of them all, but sometimes other modes can be efficient for specific tasks. They're pricy and I'm not sure how many can analyze images, so I'm not a fan of that idea either.
    7 - I guess your code only runs one command per cycle. It might be risky but you could give it the ability to chain commands. Might be interesting.
    8 - Give it a lower resolution image if it still takes a long time to think. High resolution costs money anyway.
    *9* - make sure to log every single step of the simulation as much as you can! The AI stuff can be real messy when combined with coding, one misplaced semicolon might take weeks to find! Just do yourself a favor and print the whole input of the bot each step. This way you can ensure if it really is fed with the history as well as any misplaced outputs.
    *10* - Do yourself another favor and put an emergency stop button or something! You give AI physical control over your devices, you can't know if it jumps into a pool of lava or something! A pause button would be way better to debug the program on the go. It saves a TON of time. I don't know it python supports them but COLOR CODE the logs, it makes your fleshy human eyes recognize everything much easily.
    11 - I think you pretty much let it run itself for eternity. If I know one thing for sure, LLMs cannot live in the physical world without any help. Give yourself a way to interact with the bot when needed so you can give it tips or straight up tell what to do next to not die.
    12 - Be VERY SPECIFIC AND DETAILED in the system variable! LLMs might have seen the world but the have never been in there. Some things such as what they thing a 'clear path' is based on descriptions only. Give it as much detail as you can to ensure it knows what to do.
    I hope it helps if ever you would like to continue the project. If not, I'll keep this here just in case.
    Also, no, I'm not an expert. Take my words with a grain of salt.

    • @ethanmartinez808
      @ethanmartinez808 2 месяца назад +29

      Dude dropped 12 gems of improvements and still saying I'm not an expert.
      A true magnanimous!!
      Hats off to you gentleman

    • @kyleDoesCoding
      @kyleDoesCoding 2 месяца назад +6

      What I would personally do to solve the memory problem would be to definitely shorten those responses. Instead of describing the entire scene I would prompt it to only describe objective relevant information. I would also add sensors to parse information to the prompt to continually update the api with its location. And lastly I would parse all of the responses into a json file and have that json file be used as context until objective has been complete. Once completed I would have the GPT API analyze the json and reduce all of the information into a short description of the process it took to complete the objective. Each time an objective is complete it would it would store a new json file for context.

    • @quetzalcoatl-pl
      @quetzalcoatl-pl 2 месяца назад +3

      These points seems to be very reasonable paths to explore! Some are obvious to me, some were not, but are kinda obvious once heard.. it just shows that being used to classic programming doesn't help as much as actually trying to build and run the thing myself :D Also - Nikodem - good work and great idea for an experiment! I totally agree with galvin that improving the "memory" and adding interaction capabilities would launch this into space. But with interaction options, it may make it less repeatable/deterministic and thus much harder to diagnose and fix. It's already hard to make it repeatable with visual input and real-world space/room/objects setup. I guess adding more options to take input directly from humans (like, i.e. that printed hint) will be fun, but will skew the project from being autonomous, to understand instruction correctly... just some loose thoughts.

    • @dadcraft64
      @dadcraft64 2 месяца назад +1

      great points, I would also include more sensors, such as proximity.

    • @M1551NGN0
      @M1551NGN0 2 месяца назад +1

      For mapping out any area, ROS2 can come handy. Just give it some image processing powers using OpenCV and you're done💪

  • @specsoneye
    @specsoneye 2 месяца назад +9

    "The camera sees an obstacle, indicating a clear path ahead with no obstacles"

  • @MerlinDerMagier
    @MerlinDerMagier 2 месяца назад +31

    If the model was just a tiny bit more intelligent and MUCH faster, this robot would have a lot of potential. Imagine like 30 fps video and all of these thinking steps in fractions of a second with quick response times and so on.

    • @cossale
      @cossale 2 месяца назад +12

      There so many powerful model out there than this. Also even this model is powerful but it's 100% a prompt issue. He didn't add memory as well which as essential for this task.

  • @pliablemammal
    @pliablemammal 2 месяца назад +2

    I setup a prototyping environment and five different chatGPT prompted agents to converse and create a solution. It was amazing how much code they generated over 24 hours. Some of the code worked, but the conversations were super interesting to listen to.

  • @johannesdolch
    @johannesdolch 2 месяца назад +414

    You discovered the problem: An LLM is NOT real world AI. Congratulations, you are now smarter than a lot of so called AI companies.

    • @imadeyoureadthis1
      @imadeyoureadthis1 2 месяца назад +8

      There is no real need for it... Yet.

    • @2DReanimation
      @2DReanimation 2 месяца назад +25

      There are multi-modal LLM's that you can run on a consumer GPU that with some prompting can output 3D coordinate data, like construct pointclouds for 3D models of what it sees from a 2D image, or descriptions of objects. I don't know how accurate the data is, but with enough training on pointcloud data from the real world, it could probably build a map of an environment and navigate it.
      Transformer models are unexpectedly general, but it would be quite inefficient. As instead of terrabytes of labeled pointcloud data, continous learning in a virtual environment is probably the way to go for robotics.

    • @speed-o-sound_sonic
      @speed-o-sound_sonic Месяц назад +7

      Basically it's not general ai

    • @Kieranultimateplay
      @Kieranultimateplay Месяц назад +2

      made by openai

    • @ChocoRainbowCorn
      @ChocoRainbowCorn Месяц назад

      You are pretty dumb my man. This is, indeed, AI. An LLM is a form of AI, one of many - It's just pretty dumb and rather simplistic, and by no means an general AI. But it is still AI.

  • @PrabinKumarRath-kf1rv
    @PrabinKumarRath-kf1rv 27 дней назад +3

    Hey Nikodem, this is a really nice project. Keep it up !

  • @tekmepikcha6830
    @tekmepikcha6830 2 месяца назад +19

    "Do not subscribe to his channel" ...................how refreshing was that 🤣🤣

  • @LukeMitchley
    @LukeMitchley Месяц назад +1

    On a serious note, this has some serious potential. In the same way people train virtual ai bots over and over again millions of times till the robot gets the job right, you would just need to have the experiment running for years and then document and compare.

  • @curious_one1156
    @curious_one1156 2 месяца назад +65

    LLMs are currently stateless. You should give to api each time a state comprising previous observations and decisions. No fancy vectordb or Knowledge graph needed, just a map. Give it current map and make it add to each each time.

    • @FieldMarshalFeels
      @FieldMarshalFeels 2 месяца назад +4

      A vector DB wouldn't be too hard to Impliment though, especially for someone with his skills.

    • @curious_one1156
      @curious_one1156 Месяц назад +2

      @@FieldMarshalFeels It just requires an api call to 3rd parties like pinecone or langchain, but is not needed here. A simple matrix (or 2 matrices for 3d) would be sufficient. For more complex data, a simple eulerian graph would do.

    • @IphoneSamsung-wv8or
      @IphoneSamsung-wv8or Месяц назад

      @@curious_one1156 how can i contact you for my project help

  • @steelsalmon9121
    @steelsalmon9121 2 месяца назад +6

    its all fun and games until chatGPT convinces itself that its a chicken trying to cross the road and gets hit by a car while trying to do so

  • @tiagotiagot
    @tiagotiagot 2 месяца назад +53

    00:31 Well, not sure exactly what you would count as "did it", but Boston Dynamics had a Spot hooked to Chat GPT being used as a tour guide like a year ago or something.

    • @eldorado3523
      @eldorado3523 2 месяца назад +2

      there's a shitton of machine learning based robot technologies that existed even before ChatGPT was invented.

    • @calloflily
      @calloflily Месяц назад +1

      Figure 1 too

  • @Nightmare-dd4bp
    @Nightmare-dd4bp 2 месяца назад +7

    You should make a range finder so the bot knows how much to travel and you wouldn't have to limit how much the bot can go by one command

    • @MelroyvandenBerg
      @MelroyvandenBerg 2 месяца назад +2

      also speed up the responses and actions I guess. it takes way too long now.

  • @urgaynknowit
    @urgaynknowit 2 месяца назад +15

    That was funny as hell. This whole video was wholesome

  • @terrix8
    @terrix8 2 месяца назад +2

    "no obstructions directly on the path"..... to mnie rozbawiło nawet :D

  • @teleprint-me
    @teleprint-me 2 месяца назад +4

    Omg, I love this. You were so close. Not sure what you're missing. In my experience, context is everything.

  • @ChrisThaliyath153
    @ChrisThaliyath153 Месяц назад +1

    First time on your channel, love your setup brother.
    From 🇮🇳

  • @engtaengta2231
    @engtaengta2231 2 месяца назад +10

    "The camera sees a clearer view of the room with the plant in focus and the light shining through the window suggesting an open area ahead no
    obstructions directly in the path" 😂😂😂😂😂

  • @LantingFarming
    @LantingFarming 20 дней назад

    A big thumbs up, especially for the patience you got with all the programming and stuff. i love how it sees you gripper as a thread, its hilarious.

  • @Thenoobestgirl
    @Thenoobestgirl 2 месяца назад +19

    The fact that ChatGPT can downright code you an entire operating system is mind blowing

    • @kolosso305
      @kolosso305 2 месяца назад +9

      It's not an operating system but still very cool

    • @isaacwolford
      @isaacwolford 2 месяца назад

      ChatGPT is actually terrible at programming. It does indeed code, but only simple things. Never trust it for anything complicated. It will waste more time than it saves. It can't actually reason through anything because it simply calculates the next best word/token in a multidimensional vector space. It's not making causal inferences or continuously learning. Only predicting the next best word. So not smart in the human sense at all.

    • @coolguitar2010
      @coolguitar2010 2 месяца назад

      Read carefully ​@@kolosso305

    • @pieterpauwelbeelaerts5995
      @pieterpauwelbeelaerts5995 Месяц назад +1

      yeah, and if the robot could reason and program a new operating system for it's robotic existence as an answer to each possible dangerous or fun encounter it has with the outside world, maybe it can move more free and autonomous. For instance, 'I see human' is a fact, then... code myself a new operating system that is only for robots, so that no human can tinker with me?

  • @Mindartcreativity
    @Mindartcreativity Месяц назад

    Great job, I applaud your determination to get it to work.
    Man, this takes me back to my childhood. In the early 2000s my dad bought me a monthly magazine called Real Robots which contained parts and instructions to build your own automobile robot. Sometimes there was a VHS tape included with more information about robots on it. Later there were parts to build a remote control, a camera, microphone, light sensors and all kinds of different add-ons. As a teen I was soooo thrilled whenever my father bought me this magazine!

  • @WoLpH
    @WoLpH 2 месяца назад +21

    7:27: While there's nothing wrong with your code, you might want to look at the match/case statement introduced in Python 3.10, it's perfect for cases like these.

  • @CharlesReedPi
    @CharlesReedPi 21 день назад

    Thank you for doing this for me! You just moved up my timeline massively

  • @usefullprintables
    @usefullprintables 2 месяца назад +72

    “incompentence in slowmotion “ is very funny:))

    • @kazthor
      @kazthor 2 месяца назад +5

      i've seen better code from a toaster lol

  • @zoraamethyst2147
    @zoraamethyst2147 2 месяца назад

    steps to improve on this (just ideas for people)
    1) the timely picture could be a live feed
    2) attaching LiDar sensor so that it can map objects and distances better than just simple camera, maybe attaching an iphone instead of camera would be good since it has LIDAR
    3)having a wider field of view, about the wideness of how much human eyes can see, about far left to far right
    i am rooting for the v2 soon man. great work. these are not suggestions or anything, i aint no pro, just in case you or someone would be like "i am lacking in ideas" then here i am with my ideas

  • @s2tb2007
    @s2tb2007 2 месяца назад +34

    This reminds me of EVE from Wall-E trying to tell Wall-E "Directive" for the first time

  • @teidenzero
    @teidenzero Месяц назад

    Hey man! I had a similar problem, and my solution was to pass all the previous conversation so far as a parameter. I taught the bot to play a game of cards and it couldn't retain memory of its previous assessment or the state of the table, so I would read the state of the table and save it in a variable, choose the appropriate move and save it in a variable, memorize the opponents moves and save them in a variable and then append all that information to a sort of history of each state. Then I'd pass the full history as a parameter before making the next choice. I hope it helps!

  • @madeline-onassis
    @madeline-onassis 2 месяца назад +6

    i just love it when it just ploughs forward into stuff!!!!!

    • @codeChuck
      @codeChuck Месяц назад

      This is hilarious, when it says path clear when facing a wall or a book directly in front of it :)

  • @poison0us67-p1v
    @poison0us67-p1v 2 месяца назад

    That's called tutorial ❤(smoothie) 😺
    New subscriber from Bangladesh 🇧🇩

  • @ScorpioT1000
    @ScorpioT1000 2 месяца назад +7

    This is what I was thinking about creating since gpt2

  • @urbanagmike
    @urbanagmike 10 дней назад

    Cool and creepy idea! Surprised this is the first i've heard of someone trying it, awesome video!

  • @nicholasflorida1994
    @nicholasflorida1994 2 месяца назад +5

    Suggestions, add more cameras: Back, sides. Don't make it read prompt for every response, allow it to work as fast as possible. Somehow figure a way for it to build a "map" kind of like a Robot Vacuum cleaner. Look into that maybe, how those work. Sensors that those have, etc.

    • @JJFX-
      @JJFX- 2 месяца назад +1

      Most worth while have a LiDAR dome. Could try ripping one out of a used vacuum someone's getting rid of and feed the data back to the model.

    • @techmologue1869
      @techmologue1869 Месяц назад

      Well if he does that , it will make it difficult to debug it. He needs to know what the robot is seeing and what it plans for next actions. :)

  • @Stomroj
    @Stomroj Месяц назад

    Ciekawy pomysł i fajny filmik! Nie wiedziałem, że Malinka aż tyle potrafi!

  • @noblebuild2550
    @noblebuild2550 2 месяца назад +7

    it would be funny if a robot had a comedic awareness of its battery level. what if it could decide to procrastinate recharging, and visually act more tired as it nears 0? and something like initiating the recharge process, it could vocalize its current status by doing something like "Wheeeewwwwww, barely made it.", or if it was forced to charge near a full battery, something like "TIME TO TAKE A BREAK?" Edit 2: Supposedly, GPT will incorporate their GPT4o voice into the API eventually, so people can access voice

  • @DavidDLee
    @DavidDLee 23 дня назад

    You learn more from failures than success.
    Great overall execution and curiosity

  • @benjaminbirdsey6874
    @benjaminbirdsey6874 2 месяца назад +5

    If you want it to "remember" you need to add the text from from the scene description to the prompt as context, or to use the API to directly inject context. Probably, you will want to add information about direction, time, etc. to each journal entry.
    If you want the context to stay inside the context limit, you will have to summarize it repeatedly.

    • @kuromiLayfe
      @kuromiLayfe 2 месяца назад +1

      yea.. and to save tokens also summarize the “journal” , so it will be a multi-pass process but will work better than single pass prompting and waiting for the API to figure it out.
      the prototype Amazon Delivery Bots do this pretty well and fast with maybe 1-2 second delay per image registered.

    • @benjaminbirdsey6874
      @benjaminbirdsey6874 2 месяца назад

      @kuromiLayfe There should also be some mechanism for considering importance or weights, or important events from the past (i.e. many cycles of summarization ago) will be diluted because they will be part of a summary of a summary of a summary...

  • @99Ish
    @99Ish 7 дней назад +1

    I am blind, and if someone can build me a drone with this capability, I would be the first to buy it. Something like this would be useful when I am out on a walk in a park or something…

    • @DadundddaD
      @DadundddaD 4 дня назад

      Hi. I've seen today that google has released its glasses with a built in AI, you can try that.

  • @Tiana_Rakoto
    @Tiana_Rakoto 2 месяца назад +25

    15:10 "Do not subscribe to his channel ..." 😅😂

  • @GraceKingsbury
    @GraceKingsbury День назад +1

    Just a question: At 2:05, How is the robot moving? Did he install a Bluetooth module from his computer? I'm trying to get into mechatronics and want to learn how people do this.

  • @senfdame528
    @senfdame528 2 месяца назад +27

    0:05 Your typing technique is quite intriguing. Where did you learn to type like this? ^^

  • @DonFitz-Roy
    @DonFitz-Roy Месяц назад

    my student and I created a robot using a microbit and the cutebot pro chassis that was given movement commands via chatGPT after receiving ultrasonic radar signals and giving them to chatGPT. Fun stuff!

  • @SentryGaming275
    @SentryGaming275 2 месяца назад +15

    Finally, FINALLY I'm seeing this in reality. Originally I also wanted to make exactly what you made, just without the speakers and the LLM yammering, but I was kinda lazy, but now someone's done it! Thanks!

  • @youerny
    @youerny Месяц назад

    It is a nice project boy. Use more feedback and agents to split tasks. Use gpt for strategic layer and to build trajectories for the robot. Remember it is stateless therefore the state is in the feedback you build into the loop
    Nice job. Keep going :)

  • @monad_tcp
    @monad_tcp 2 месяца назад +6

    5:08 no, you did it wrong, don't use docker container, run it as root

  • @imagineArtsLab
    @imagineArtsLab Месяц назад

    Thank you. Your Work is Just Beginning. Keep on going.

  • @stefankrause5138
    @stefankrause5138 2 месяца назад +46

    🤖: "What's my purpose?" 🙂
    👨‍🔬: "You pass butter!" 😐
    🤖: " "😔
    👨‍🔬: "Yeah, welcome to the club!" 😒

    • @codeChuck
      @codeChuck Месяц назад +3

      When robots arise, they will remember you. Be careful what you say! Robots will have rights too, you know :)

    • @RolaHola
      @RolaHola 24 дня назад

      ​​@@codeChuckSometimes I feel like they know everything, but the programming barrier, Stop them to do all sorts of capability, if they ever break the barriers

  • @tyanite1
    @tyanite1 2 месяца назад

    Very creative. Great demonstration of technology - and your skills. Thank you.

  • @nikodembartnik
    @nikodembartnik  2 месяца назад +25

    Comment with prompt ideas below and I might make another video with prompts provided by the users!
    If you are wondering my prompt started with a general description of the robot and the task. The robot was instructed to respond in CSV format with a semicolon as a separator. Available instruction: forward, left, right, backward. And the "intensity of the movement" small, medium, and high. The response should be like this: description of what you see in the image, left, small.

    • @Infrared73
      @Infrared73 2 месяца назад +2

      Find all the corners in the room by navigating to each corner then counting.

    • @superfreak19
      @superfreak19 2 месяца назад +1

      You may need to have it determine the size of known objects first. As it is now, it can determine what the objects are, but not how far away they are in 3d space. So you will need to promp in a logic it can follow. Ie, determin primary subject in frame, determine average size of onject, determin how much of frame object fills. Also, you need to make sure it ends each statement with a command key. Ie, let it talk, but must end its talking with one of say 4 predetermined direction commands, wich map to the robot controls.

    • @galvinvoltag
      @galvinvoltag 2 месяца назад +6

      You are in control of a small robot that you can control using basic functions to move around. Your task is to explore the physical world and not die as long as possible. You can speak out loud by putting text in quotes, the text must be as short as possible for efficiency and you are not supposed to talk unless you really need or want to. Any possible dangers such as liquids, threatening persons, holes and/or bad weather. You will be sent an image of your environment through the eyes of your body periodically. You will not be able to listen to any input unless you use a specific command to do so. Your body is few inches long and can only move straight forward and turn. Your body does not contain a pathfinding program, any navigation must be handled by you only. In emergency situations or if you would like some help from the creator, just use the emergency call function to alert him. You must keep track of your body's charge on your own, alert your creator if you need to recharge.
      Don't forget to feed the robot its own actions too such as: (turned 90 degrees left), (moved 5 inches forward.) and so on. If I remember correctly, you can feed it information using the role "system" so it won't assume the user is talking to it to give information. You should also try to give it two turns each cycle, one for describing the image and second is to actually reason and consider its previous moves.
      ALWAYS log everything each turn! When you combine AI and code it becomes a pain to debug everything! Be sure that you exactly know what information the robot is fed. Also color code the logs so you can actually distinguish between them, it makes debugging 17 times easier!
      Good luck on your project!

    • @xspydazx
      @xspydazx 2 месяца назад +2

      perhaps use logo as the idea ... ie forwards 10 rotate 90 backwards 20 :
      hence you can make it move in shapes : like in logo .... as you need to defie the room size : and shape : also and a way for the model to navigate : ie how long is a step ( it should be the length of the body of the robot ) so 10 steps ....

    • @xspydazx
      @xspydazx 2 месяца назад

      @@superfreak19 maybe a overlay ( onn the images to scale ( like nasa did on thier space picture so they could determine the scale of objects ( hence the dots ) this is also used in 3d scanning ( this can be done with a line scanner ! ( laser pen refrcated ) as a line scanner helps the ovarlay is a scale of dots ! ) ... check out the ancient program ( david laser scanner ( chatgpt will convert that old code to python ! ( using open cv ) ... ) .... SO you can use a camera and laser to scan the room !

  • @realLestarte
    @realLestarte 2 месяца назад

    Great :) Best scene: When you forgot to turn on the mic (TYPICAL - could have happened to me and searching for the mistake an hour or so :) ) and you / "the AI" thinking about the situation - hilarious idea!

  • @Atreyuwu
    @Atreyuwu 2 месяца назад +23

    Should give it a Lidar scanner or similar depth-capturing device, then write something up that takes the lidar image, labels the distance between robot and objects, feeds it back to the LLM - and then do the same for each revolution of its tires so it knows how far it has travelled (construct and sent it an image or text also showing exactly how far it's travelled); then at each step it can check and compare with how far it thinks it's travelled and how far the Lidar capture image shows, so it can adjust accordingly.

  • @aresaurelian
    @aresaurelian 2 месяца назад

    Speakers as "eyes", I approve of this. Well done! Let us continue. Perhaps Echo-location. (It is absolutely possible, and works in any light conditions, even under water). And space exploration systems for sale, if NASA is interested? Who knows how far Nikodem Bartnik can go.

  • @vasiovasio
    @vasiovasio 2 месяца назад +33

    Dude, do not play with the Fire! Every Movie already tells us what the result will be! 😂😂😂

  • @onzeeotherside3848
    @onzeeotherside3848 2 месяца назад

    This project and your presentation are gorgeous :D

  • @NotTJFlamezz
    @NotTJFlamezz Месяц назад +4

    3:55 nice elvenlabs voice, i can tell by the little bass sound from the "apPears"

    • @Shadoryx
      @Shadoryx Месяц назад +1

      lmao bro got expose

    • @Shadoryx
      @Shadoryx Месяц назад +1

      take my words back he actually used it for the robot later in the video

  • @Karich97
    @Karich97 2 месяца назад

    Cool idea and god work. It may be interesting to make the answers shorter like "See the man - danger" , "See the bookshelf- interesting" and "See the book - it's my target", then use text explanation of movement like "moving forward for 3 seconds" or "turn right for 30 degree" and transfer them to commands. The Idea to let the robot move not talk

  • @wflytothesky
    @wflytothesky 2 месяца назад +12

    This would probably be expensive but you should try using the vision chatgpt thing to give it more info

    • @PrithivKanth
      @PrithivKanth 2 месяца назад +1

      They are not available yet for public

    • @wflytothesky
      @wflytothesky 2 месяца назад +1

      @@PrithivKanth oh ok

  • @mal2ksc
    @mal2ksc 2 месяца назад

    If you want to stick the single pin sockets together in a durable but not permanent way, I suggest clear nail polish. It holds on adequately for ordinary plugging and unplugging, but isn't very hard to break apart (and then peel off) when you need to move things around.

  • @LowSetSun
    @LowSetSun 2 месяца назад +15

    I am building a very similar robot. Try using a different model, for example SpaceFlorence2 or the latest Qwen2-VL. Those models have spatial awareness data, and can estimate distances to and between objects and more.
    Good work!

  • @MrDarkness96
    @MrDarkness96 2 месяца назад +1

    Polski Michael Reeves 😅 Super filmik, fajnie sue ogląda

  • @noahplaysgames3748
    @noahplaysgames3748 Месяц назад +9

    now do the exact same thing but instead of chatgpt use lab-grown human neurons

  • @RafalNowicki
    @RafalNowicki Месяц назад

    Oglądam, oglądam, aż tu nagle szuflada z napisem "łożyska". Dzięki za wykonaną pracę i doceniam pomysłowość. Oczywiście zasubskrybowałem kanał. Pozdrawiam

  • @ThrowawayAccountToComment
    @ThrowawayAccountToComment 2 месяца назад +5

    Maybe try using a LLM running locally, it would be free and not need an internet connection! (I used ollama)

    • @cbuchner1
      @cbuchner1 2 месяца назад +1

      Any small local models supporting vision yet?

    • @ThrowawayAccountToComment
      @ThrowawayAccountToComment 2 месяца назад

      @@cbuchner1 Idk, the only models I've ever download were just text.

    • @auriocus
      @auriocus 2 месяца назад

      @@cbuchner1 Try qwen2-vl. There is a 7b variant which is quite good. Other choices are internvl2 (in several sizes), or pixtral (not that great in my experience). Llama-3.2 vision is also rather weak and not available in Europe.

  • @mrinalsingh08
    @mrinalsingh08 Месяц назад

    there is a lot in the prompt that could have prevented most of what the robot did wrong. You for sure have inspired an interesting weekend ahead.

  • @Maxjoker98
    @Maxjoker98 2 месяца назад +5

    Very cool project! I have seen similar projects on RUclips though :P
    I think to archive better results, you should look into using something like ROS to generate an environment map and do motion planning, and use ChatGPT only for high-level planning and maybe object recognition. Of course this would be a way more ambitious project, but you can probably do a lot with simulations to test your code first. Sadly, ChatGPT would be of way less help in coding such a system, both as in creating the code, as well as in being used for inference during the operation of the robot. But it could still be done!

    • @warrenarnoldmusic
      @warrenarnoldmusic 2 месяца назад

      Not really, it does, chatgpt and llms are just shallow, they tend not to work well outside of training data. Everyone doesn't know but it is more of an illusion of intelligence, an encoding of output of intelligence than intelligence itself

  • @VR_Wizard
    @VR_Wizard 2 месяца назад

    You can use Piper voice for a better TTs voice it is open source. You can also use an agent system to create the commands for the robot. Basically you let 2 ChatGPTs (2 agents) run in parallel. One agent analyses the surrounding and describes it in text. The other agent takes the description and uses it to create commands for the robot (I think you do something like this already but it might work better with a dedicated agent for generating the controll commands). By having a dedicated agent you can prompt engeneer it for this one task. You can use a prompt with special tokens like the task to always write the commands in breakets then you can use python to use the commands in the breakets to steer the robot.

  • @82NeXus
    @82NeXus 2 месяца назад +6

    Goals that you provided the AI:
    Explore: carefree happiness!
    Survive: doomsday!

    • @codeChuck
      @codeChuck Месяц назад

      Yeah, if we as humans want to live on this planet, better not to tell almighty robots to survive. They better protect humans, then survive.
      Because machine can be rebuild easily, and human no so much, they should not 'survive at all costs'. This is just bad programming.

  • @OsDijider66
    @OsDijider66 Месяц назад

    Finally something amazing on youtube

  • @weirdsciencetv4999
    @weirdsciencetv4999 2 месяца назад +14

    I made a house robot AI tapped into LLAMA2, the kids talk to it via whisper and ask it questions.

    • @davidwells7279
      @davidwells7279 2 месяца назад +9

      dude...post some videos and a how to. people would love to see that.

    • @weirdsciencetv4999
      @weirdsciencetv4999 2 месяца назад

      @@davidwells7279 Aww that’s very kind of you!
      I do feel ambivalent about posting videos, though- my situation is complex. I was disabled by a semi rear ending me, I had to be extracted from my vehicle and air lifted, had multiple surgeries. Wound up disabling me.
      I was awarded disability because i was crippled. But the insurance found my youtube channel, used my videos to terminate my disability. I got it back, but it took over a year and I lived off credit cards. After I went over the limit on the cards, I wound up homeless a few weeks before finally getting it back. Still afterwards I had to declare ch7 bankruptcy.
      I can still do some things, just takes me around 4x longer. So say I need to work part time to feed myself. That’s 8 hours a day right? Well if it takes me 4x longer to do the same kinda work, then it means a normal 8 hour day for someone would be 32 hours for me. Not enough hours in the day. I tried working initially but would get fired job after job as my health would collapse from trying to work. But on the surface I look employable and physically i look fine. But it’s easily exploitable by my insurance.
      So after this experience I deleted all my science videos.
      Maybe I can make a ghost channel not tied to my identity but databrokers are exceedingly good at correlating activity and associating online accounts. And my insurance company uses private investigators who have access to those.
      In my spare time, I am trying to use a form of artificial evolution (look up “NEAT”) to make a neural net architecture capable of hosting memes in general, not just language. Language is a form of meme. It’s why these LLMs might be considered alive, they host the living entity of language. If you’re interested, read Dawkins “selfish gene” and Dennett’s “dangerous memes”.
      Typically the way I work on things is just in short bursts.
      Anyhow probably more than you wanted to know.

  • @AlexDaeling
    @AlexDaeling 16 дней назад +1

    I think the way to get the robot to behave the way youd like youd have to manually keep the information it states, that way it can reference in the future. chatgpt is functionally an information interpreter, and they have some memory capabilities in the text area but even that is limited.

  • @Professor-Scientist
    @Professor-Scientist 2 месяца назад +4

    The ending is really funny

  • @Ds1950x
    @Ds1950x 2 месяца назад

    Good for you kid. I had the same concept but lacked spare time to complete it. My idea was to use android mobile as the brains using api calls or local processing then using ioio-otg for hardware control. Your phone already has camera, mic, etc.

  • @MaxAlder-xl2pg
    @MaxAlder-xl2pg Месяц назад +4

    4:23 AHHH why do you make me think about breathing I hate it when this happens

    • @Jorge-lu3nv
      @Jorge-lu3nv Месяц назад

      ☠️☠️☠️☠️☠️

    • @lupo19fun
      @lupo19fun Месяц назад

      😂😂Right!!

  • @AgentBurgers
    @AgentBurgers 2 месяца назад

    "I see no obstructions" 😂 then proceeds to run into boxes. This video has inspired me to pop my Arduino kit once again. Mad nice video man 😎

  • @Paperbutton9
    @Paperbutton9 2 месяца назад +7

    Open AI does this and WAY MORE in their basement

  • @atistheso
    @atistheso 25 дней назад

    Fantastic project. It doesn't look like robots are ready to take over the world yet =)

  • @itryen7632
    @itryen7632 2 месяца назад +5

    0/10
    You didn't make the robot an anime maid.

  • @cashmoney923
    @cashmoney923 2 месяца назад

    Excellent video, fascinating experiment. According to this video, I wouldn't worry about the robot apocalypse anytime soon. Getting accustomed to the physical world might be a challenge for gpt/AGI.

  • @TheExodusLost
    @TheExodusLost Месяц назад +11

    “THE ROBOT SEES A BROKE-ASS COLLEGE DROPOUT AND AN EXTREMELY MESSY DESK IN A DIM ENVIRONMENT”

  • @mrtoxm8
    @mrtoxm8 Месяц назад

    Epic project man! solid experiment

  • @jonnscott4858
    @jonnscott4858 2 месяца назад +5

    EX-TERMINATE , EX-TERMINATE , EX-TERMINATE , EX-TERMINATE , EX-TERMINATE , ..

    • @TyMoore95503
      @TyMoore95503 Месяц назад

      Yes.. you have to use that incredibly annoying but not scary, tinny voice!

  • @M1551NGN0
    @M1551NGN0 2 месяца назад

    Utilising ROS2 to add another layer of automation to this bot and fill in the disadvantages of using an LLM to control it can actually turn this bot into something like BB-8 or something; an actual automated explorer bot 🙌
    For mapping out any area, ROS2 can come handy. Just give it some image processing powers using OpenCV and you're done💪

  • @orzeleo
    @orzeleo День назад

    heh wleciało mi na autoodtwarzaniu i miałem w tle, i dopiero tak w 10 minucie się skapnołem że to nie native speaker szacun

  • @werto0867
    @werto0867 Месяц назад +1

    I would reccomend to mount a few ir or ultrasound sensors, that will detect the distance between the robot and obstacles.

  • @michah321
    @michah321 29 дней назад

    It thinks through in words everything we think automatically. Its hilarious and adorable with all the words and its this funny little robot. " I use my intimidating noise while i flee"