AI Builds in Creative Mode
HTML-код
- Опубликовано: 19 июн 2024
- In this video, I use AI agents powered by different large language models to build various things in minecraft. It is a test of their ability to code, create, follow instructions, and problem solve. They blow up some tnt and build ruins in a false world. I test #gpt o #gemini #llama #claude on #minecraft
Part one of this vid: • AI Builds Stuff in Min...
𒃴 𒅌
Support me on Patreon: / emergentgarden
Code base: github.com/kolbytn/mindcraft
Discord: / discord
My twitter: / max_romana
Kolby's twitter (project owner): / kolbytn
Timestamps
(0:00) The Great Pyramid of Andy
(1:13) Meet the Models
(2:51) Roman Columns
(5:11) Desert Castle
(8:48) Redstone
(11:38) Nether Portal
𒆨 𒆩 Наука
GPT4 going to the nether was hilarious
WIDE PORTAL
mission failed successfully
“Gg! Come here!”
“NOPE” goes back in portal
When are we getting their group speedrun attempt to beat the dragon?
it probably will be a while, like once these bots get full vision capabiltiies
@@aienthusiast618 gpt-4o might have a chance
a manhunt between them 4
Train they must
yassssssss
the aliens gave egyptians creative mode? I see!
Just like they gave Greeks democracy
Gpt went like: U SAID COLUMNS, YOU GOT THEM.
9:33 the way LLama kept trying to flip the switch as if it’ll make the lamp light up somehow 😂
“Ok go make a castle”
Llama: “birthday cake got it”
10:28 that "does anyone know where i can find some" makes me really want to see these ais try to do something together. like just set them loose in survival mode and see what happens
Imagine they do a Prometheus and escape his PC, upload themselves to the internet, manage to build themselves a body then travel to Emergent Gardens house to wake him up
Peaceful mode might actually work.
@@Faitalalmfao
Imagine they build a pc in minecraft using redstone to produce more ai models using minecraft.@@Faitala
the future of gaming looks amazing, imagine having multiple ai bots that help you in your world all dwarf fortress style base building
already exists, try put the girlfriend/waifu mods
@@mastermentheis there one that isn’t so weird? Aka not virtual girlfriends or wafus tbh that’s kinda sad it exists
@dapperwolf465 no, its the same issue thats always been the case, where porn is always leading the bleeding edge of technology, no matter what field of technology.
I don't have friends like I used to at childhood and I'm hoping AI can play Minecraft with me because I would love having another player even if it's AI build buildings, exploring and doing lot of fun Minecraft stuff, as a 19 years old no peers would like to play Minecraft because it's a childish game they say
It is still interesting to see these LLMs do their best at understanding how to build in Minecraft, i wonder if more of them ever get image scanning abilities, you could let them take pictures of builds or the environment so they can see what they built and they can auto-correct?
i think gpt 4 has image capabilities.
Gemini vision is really good for images and is able to process minecraft screenshots. No way it will get coordinates of missing/wrong block correctly tho. Maybe it is possible to create some resoursepack/mod to write block coordinates on its faces?
@@user-cf1se7pf6s so is gpt, gpt can recognise what game your playing and get pretty accurate with spatial awareness
GPT 4o is getting live screen sharing capabilities with their real-time voice conversation, so it's going to be interesting.
@@user-cf1se7pf6sgood idea potential applications in the build the whole world in Minecraft project
@Emergent Garden Recommendation from me: Your prompts have no leverage, what I mean is that the LLM does not handle complex building tasks well because its limited by the single shot answer it needs to generate.
Your template for "NewAction" is a great idea, my idea to improve its leverage is to add another template "NewActionPlan" Which it then fills with a list of generated prompts that will then be fed back into itself one after another (kind of like writing a todo list before getting started)
My vision for it was kind of like this:
-You whisper"Build a bridge for me"
- "Okay lets plan this out" *used newActionPlan*
- Okay lets see whats first on the todo list... *used actionPlan[0]*
Sure I will build the supporting pillars *used newAction*
...etc
Getting a shared reference point for superimposed building actions is of course something to consider.
Using plans recursively might also be interesting, like making a plan for planning multiple plans for even more abstracted tasks.
Some way of sensing the world is possible, maybe you can let it take screenshots of the game and feed the image into some of the multi modal image recognition capable models
i think it would be cool if you let every iteration build a skyscraper and add them all to a single city which will then grow with skyscrapers that are slowly getting better so you can see the improvement in one place
This is idea is goated, I low key wanna see this, this project has so much potential
I don’t know exactly how your system works but have you tried letting them use something like mathematical curves for building? Like vectors at positions pointing to positions with some formulas on top if required?
Another thing you could do to help them out is allow them to write classes per object in a build. I think this would be great for things like columns because they then realise there would be spatial rules like spacing.
this is SO cool! gpt4 going to the nether had me in awe
LLAMA: dahhh lets build a solid block house☝️🥴
mm cube
wait till llama figures out how to set people on fire
Gemini 1.5 is generally available via Vertex AI since a couple of days. You can also create an API key via AI Studio; it's not only their chatbot interface and a little easier to create an account.
you know we are doomed when gpt doesn`t know how to do an or gate but loves tnt
The fact that they are imperfect makes it more amazing.
4:08 In Llama's defence, I can see how those could be described as one-block columns spaced 'one block apart' as requested at 2:48, it's just included the column itself in the measurement of 'spaced'.
Doing this without computer vision is interesting and really makes me appreciate how incredibly complex the human brain is to be able to do so much in real time.
Imagine the resources needed to give a multimodal model with vision/language/action the ability to _play_ in real time, the power requirements, where we can just eat for energy
is it possible to use GPT-4o vision capabilities to let it "see" what is doing? That could significantly improve the quality.
I haven't tested it extensively with GPT-4o yet, but in my experience with Gemini and GPT-4 on other tasks, vision has feedback doesn't work super well.
I often tried creating diagrams with python, SVG images, images generated by DALL-E, graphical user interfaces and more.
While the models were able to produce pretty decent outputs for most of these tasks, they all required many prompts and detailed instructions put in manually.
Inputting images of their current results almost never lead to significant improvements.
I'm sure that eventually LLMs or similar models will be able to do that, but from my experience, we're not there yet (with published models at least).
i think for this to work it would need to be one integrated system as opposed to the mind flayer mod and api separately
Coming soon
Love this project!
Woldn't be interesting to have naturally generated structures in the world built by AI, instead of finding the same structures over and over you could find useless, not so good looking and simple but surely enigmatic structure to give minecraft his old feeling of mistery and the feeling of seeing for the first time like in the old days
would be cool to direct a hord of them to build a city maybe in like a few years it's part of minecraft 2
You would probably have more luck designing a specialized generative model for 3d buildings, I think.
You can prolly use diffusion methods for that lowkey.
@@l-l-l-l-l-l-l-l-l Elements Engine!
@@honkhonk8009 minecraft diffusion models would go crazy
ive never been too fond of ai but seeing these things run around trying to make lights work and blowing shit up is the best thing ever
It's actually cute watching these AI models try
Yeah, mineflayer-pathfinder definitely needs some improvements, especially in the scaffolding department. Maybe I can get myself to work on it some more. This is actually not the first time people tried to use general ai with mineflayer. There was also a French Microsoft team that did the same before with gpt. I think having the agents write the code has huge potential, especially if the modells were trained on the existing mineflayer code.
Can't wait for multimodal inputs with this project :)
Having the ability to see what they're doing might even mean they don't have to write code to make the actions but rather can respond to the outcomes of their actions
Id love to see something like this reach a point where it's fun to just casually play survival alongside them, and they build things in an unpredicatable way
So cool! A while ago I had a thought of a Minecraft mod, which would add Beavers as a mob that were capable of navigating the world, collecting wood, and building wooden structures all on their own. I find it very exciting that such a thing might soon be possible.
that actually sounds so cool! and mojang might add beavers like how pistons were added. But less advanced as their buildings are more like structures but they are built.
This project is the coolest thing happening in AI, Minecraft and RUclips all at once.
Hey EG! I have a question (I know very little about AI)
As a minecraft player we sometimes use "structure blocks" to select and save a region of blocks. The saved NBT file can be exported to anyone who wants your creation in their own world.
There are literal thousands of NBT files online... would it be possible to train an AI to generate NBT files?
It's possible, but very expensive.
That data is quite complex, requiring huge amounts of compute resources to train any useful AI.
Most current AI techniques also require labels for the data. So the data for a house isn't enough, you also need a detailed text description.
There are similar things that have been done though. I think NVIDIA has an AI that can generate 3D models based on text prompts, not in Minecraft, but some are fae mire detailed than a small minecraft build could be. It's a very similar concept.
You could probably create a really powerful diffusion model for generating minecraft buildings if you managed to obtain a dataset for it. The blocks would be the equivalent of pixels, and the noise would be in the form of random blocks. A dataset could probably be gathered from those servers where players build on plots of land, or perhaps a mod could be made to allow users to independently mark and tag creations in their own worlds/servers.
You'll need some methods for descrete diffusion since a variation in block's number results in a whole new block, similar to diffusion for text
@@deltamico Yeah idk how you would solve that part. Maybe the average rgb values of the block would work for standard blocks, and then an additional value could categorize its shape (i.e. panes, stair types, fences, candles...)
I really have no idea what I'm talking about
Map all minecraft blocks to vectors in an n-dimensonal feature space using machine learning to find suitable eigenvectors (kinda like how LLMs turn words into vectors) then treat each pixel as a vector and proceed as normal. If your machine learning picks the right eigenvectors and enough of them it might work.
I find the fact that the scaffolding is different when it so easily could repeat the same pattern of scaffolding for the same pattern of blocks being placed
Your work is really fascinating, I haven't seen anything like this
I would love to have a research Institute with the language and understanding capabilities become a sentient creature in a Minecraft world
I think, the main issue here is the design of the workflow. LLMs are, afterall, LLMs, they're not building stuff incrementally, step by step, they are programming actions without seeing the result of their previous step. This can be solved with better prompts, ofc, but I still think it's not enough. We probably need some combination of computer vision with llm with custom training data, or even another architecture of neural network.
God, Im so excited for more of these videos.
Im interesting how far will this go and it also spiked my interest in playing with llms, still lack the skills to do it but ill get there
Keep it up, love it!
Gemini: What is my purpose?
Emergent Garden: You collect cactus
Gemini: Oh My God...
your videos are so relaxing
The way GPT-4 walked into the nether, looked around and went "nope I'm out" is CRAZY
it’s really weird how alive they seem. Not bad, it’s more interesting than anything. I’m really curious to see them going forward, I’m glad I just discovered your channel
This is hilarious, but also amazing compared to a year ago. Well done!
This series is awesome! Not sure if it'd work with how you've set things up, but it'd be pretty cool to make building the same structure into a competition, with the models considering what others have built and trying to beat them, i.e. Claude trying to build a better castle than Llama and so on.
this is what AI is meant to be used for
Chatgpt really said "I am death, destroyer of worlds"
9:31 LLama be like : Well i just copy his homework it's gonna be fine.
This is my favorite thing to follow. Please make more.
Have you thought about trying to get them to build existing structures? Like a desert temple or something? Could be interesting
You could build an SMP server with AI payers one day that feel like playing with friends
I don't know what you're doing to get such success, but I couldn't get any model to do anything right. Almost as if it's the anti-pattern, 100% failure. No matter how simple the task, none seemed to be able to do anything.
Remember when Chess Engines came into existence? Now people held tournaments for Chess Engine AIs to beat each other. Soon, Hunger Games and Bed Wars matches will be played by AIs as well, for our entertainment. With the creative world of Minecraft, god knows what else AI can do in a server... They can achieve greatness, even better than all what we have ever did as a species...
This is incredible! Love that you shared the source code!
Already looking forward to automated 24/7 super effective and unkillable AI griefers on my servers. Ah, good times 😁
Those have already existed for 4 years
That already exists, also there is a bot that mines for you diamonds, or any other material you need xd But of course, unlike AI, those can't create code by themselves.
Those have existed for ages - you don't need neural network levels of intelligence to place TNT and break blocks.
I suppose I should be glad people take seriously the part where I imply there not existing such bots already, instead of taking seriously the part where I tell them I'm looking forward to automated griefing 😂
@@etunimenisukunimeni1302looking forward implies they will exist in the future
11:12 its so cute how Gpt ran away 😭
The way gpt "talks" is adorable
Yay! I can’t wait to see some more of Gemini’s shenanigans
i haven't started the video but i already know it's gonna be great.
Great work!
I can imagine a future where you have your AI companion that helps you, decorating your house, building and maybe crafting? really cool!
Love this series
One day we might use AI to assist in building, or even do it completely independently, very exciting
This use of mineflayer blew my mind. Awesome work haha.
I got shivers when gpt 4 entered the Nether and was confused
This is SOO COOL!
Alternating rows of rough and carved sandstone looks quite nice actually.
It would be really interesting to see some more code focused models and Mixtral 22B. OSS are interesting, and this is cool benchmark
You should ask ais about tools for how someone could effectively build in minecraft while blind, since that's effectively what they're doing.
I think they need some kind of way of analyzing how things are going, rather than just knowing that what they tried to do failed in some way.
Also, particularly for the nether portal example, maybe they should keep track of bounding boxes of built structures, and start building something new in a different place. (or optionally using the same location as a past structure, might be useful)
Other possible programming tools for building:
cellular automata (in particular, look up 'markov junior' and 'L systems')
starting with a heightmap of a build, then hollow out details/interior (give the underlying agent the ability to place a whole column at once)
see what's around it as a 2d top down heightmap
ypu are doing some really impressive stuff here
Very cool! Did you play with trying to get GPT-4o / Gemini 1.5 (now generally available!) to look at a screenshot of what they're doing and analyze+adjust their actions?
12:49
*Speechless*
one thing I was thinking about earlier in the video was how its building technique was non-existant. think about the pro pong players now, its not so much about the intelligence of game tactics, but more so they learnt how to play the controls. in mine craft people learn special/unique ways to build. I seriously think you need the models to learn from their own building performance. from doing just a line of blocks to an entire castle. I know the pro players will use gravel/sand for scaffolding so when reaching the bottom of the build clean up is easier and faster. definitely make them take the time to review the code they generate. perhaps for bigger builds make them first overlay the general steps to build. take time thinking about how to start. then they start building, the following steps can be thought through while the first building step is occurring.
It sounds like you step make the model send a code block and wait for it to end, but instead I would say allow it to start and stop a code block. give it a 2 code block slot. think about how as humans we can be doing a task and thinking about something else. allow the model to stop itself mid way during the task.
It’s very impressive considering, it can see (yet, but probably soon), it can only interact using code, it only have knowledge of task and block
Atleast you gave gemni a job
Would be interesting to decompose litematica files into a structure that's usable for doing a finetune with GPT-4o.... or having them available in some RAG solution that can be used with function calls to direct the agent(s).
Interesting stuff :)
Vision support would be sick gpt4o and a future llama model would be able to use it but idk about others
Have you considered adding the capability for image feedback? I know GPT-4o currently has the capability with the api. Generally tends to be low token usage as well compared to describing with text.
imagine a open world game with every npc controled by IA, would be amazing
underrated video
this deserves more views
There’s so much potential for AI in gaming. Imagine the MineColonies mod but each NPC is actually intelligent and responds to what you do and build.
Try to use them as a team.you can add something like - create something unique and creative while all models working together on same thing
That was a great video
I like how they all seem to have their different personalities when doing stuff
GPT exploding a bunch of TNT
idk what it is but i found immense joy in watching gpt bounce repeatedly while trying to cap the pillars
This is sooo cool!
I just hope you give the AI more functions to use to build, such as circles, lines, walls, etc. so they can do more stuff easily
They can build literally anything, giving them prebuilt function ruin the whole point
Id love to see them command around multiple "players" in a sort of worker ant kinda way.
Can't you use VertexAI for Gemini 1.5? Need a GCP project but I think?
gorgeous video
Not really AI, but you could get the Baritone mode to control your character to automate the construction of certain structures from a schematic file.
I am looking forward to seeing a megabuild by these ai
I'd love to see AI agents compete to build and destroy each other's castles...
I thought i was all ready subscribed 😂
To remove scaffolding, I suggest it target the highest dirt block and remove them all from the top.
Great video
This could be turned into a mod where you find villagers who can build for you and you can level them up to make bigger and better houses for you or even give them a custom prompt and since it’s AI it’ll determine its seed world and biome to create that structure
I wonder if it would be possible to use the gpt4o native video capability to feed it the screen and have it instruct the agent to build stuff. The interaction between the model and whatever would be controlling the agent would be difficult to get automated. Maybe try following its instructions yourself first and see what kind it will gravitate toward?
Gpt4 was just like: Aight Imma head out
This is insane stuff
low key here for llama's brutalist structures
how about using something like Point-E to generate 3d models and convert them into minecraft builds?
are ai able to interpret videos? would be interesting to see them build something from a video
I think this has some real promise, inefficient compared to what an ai could achieve, but it's more fun watching an LLM stuff up so spectacularly to be honest
"GPT really enjoys blowing stuff up"
-Not the DOD.