This is actually really impressive. GPT-4o watching you act and understands what is done, then writes code to reproduce it, which can then be run and automated. Very clever flow, OpenAI should definitely hire you.
Cool experimental project and idea 👍 The entire process can be scripted further to continuously store the most recent number of screenshots in 2-second intervals to VRAM using PyTensor, and a call can be triggered at any time with keyword through mic input or keys shortcut to send it to gpt-4o to retrieve the "reply last action script" and then automatically execute it to save time doing some mundane tasks👍👍
Thanks, this is interesting. I was wondering about this as well and had a thought about adding log data of user interactions to give the model more telemetry. So it not just vision but also the actual logs of all the interactions happening in the background.
Very interesting. I think it could also be useful to provide it with the mouse positions between different frames. To go further, we could create multiple actions and then implement a RAG that allows the model to choose the correct snapshot and execute it. Thanks for this video.
I tried LLaVA but it will run very very slow that it is not worth it, the analyzing of 1 picture might take up to 2-4 minutes, so for 20 images 40-80 minutes, you will need to use an API for a server that runs the model, almost all of them are not free
I've been thinking recall and omni screenshots were ways to create large pratical data sets to train lams. Do you think that is what's happening? You seem to be doing a smaller version of this
I think you can link to git sub folders. The repo is pretty messy, but keep in mind, this is free. Thou I am also not able to find code for some projects on that repo.
This is actually really impressive. GPT-4o watching you act and understands what is done, then writes code to reproduce it, which can then be run and automated.
Very clever flow, OpenAI should definitely hire you.
Anyone can do better than this with a powerful language model, it's not much. It's just that the rabbit is overrated.
Cool experimental project and idea 👍 The entire process can be scripted further to continuously store the most recent number of screenshots in 2-second intervals to VRAM using PyTensor, and a call can be triggered at any time with keyword through mic input or keys shortcut to send it to gpt-4o to retrieve the "reply last action script" and then automatically execute it to save time doing some mundane tasks👍👍
Thanks, this is interesting. I was wondering about this as well and had a thought about adding log data of user interactions to give the model more telemetry. So it not just vision but also the actual logs of all the interactions happening in the background.
So good to see you getting onboard the rabid r1. It's seriously going to change lives.Enjoyed the video man.
I can think of so many uses for this. Great work.
Very interesting. I think it could also be useful to provide it with the mouse positions between different frames.
To go further, we could create multiple actions and then implement a RAG that allows the model to choose the correct snapshot and execute it.
Thanks for this video.
Useful information. Thank you!👍👍👍
Bro Plz create video for real time vision and response
Woh woh look whos here bhai kya aap mere ko jante ho ya yaad rkhe ho?
Looks great
Interesting project as always.
so where is the code for this project! looks fun
Are there any local LLM's this might work with?
Maybe, LLaVA 13b can
I tried LLaVA but it will run very very slow that it is not worth it, the analyzing of 1 picture might take up to 2-4 minutes, so for 20 images 40-80 minutes, you will need to use an API for a server that runs the model, almost all of them are not free
This is awesome!
I've been thinking recall and omni screenshots were ways to create large pratical data sets to train lams. Do you think that is what's happening? You seem to be doing a smaller version of this
Great start. What's the GH url for subscribers?
honestly more legit than scammer Jesse Lyu and RabbitR1 garbage hardware scam after his NFT game scam.
Disclaimer for those who thinking of implementing this project, open ai GPT models are not free so you have to pay to let it the code run
learning how to be data scientist 80% from u bro haha
How does it know where to click though? Does
Humane and Rabbit watching this and raising another round of funding
the github is always the same repo btw itl be easyer tomake a new repo for each project and put project link in description
I think you can link to git sub folders. The repo is pretty messy, but keep in mind, this is free. Thou I am also not able to find code for some projects on that repo.
Hello sir can u recreate gemini vision fake demo in real life
So literally open interpreter…