Reinforcement Learning (RL) explained (LLM, Vision, Robot)

NEW Transformer for RAG: ModernBERT

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

I MADE THINGS OFFICIAL WITH NICOLETTE

Making Cookies For Santa

Duramax Diesel "Extreme" Tune and Allison Transmission Service (My Going Ta' Town Rig!)

Robotics Transformer w/ Visual-LLM explained: RT-2

Discover AI

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 дек 2024
RT-2, short for Robotics Transformer 2, is a cutting-edge model that harnesses the power of a Vision-Language Models VLMs 55B to enhance robotic control. This model represents a significant leap in the field of robotics, demonstrating how web-scale pre-training can be used to improve the generalization performance of robotic systems.
Vision Language Models further fine-tuned with a robotics data set to a VLA model. A Vision-Language-Action model for advanced robotics.
#ai
#robotics
#explained

Комментарии •

@bomxacalaka2033 Год назад ⁺⁵
the next military drone patch looking sick
@rishiktiwari Год назад ⁺¹
Really good summarisation of RT. I am a student researcher working on VLA and robotic arm currently, it’s quite tedious tbh. Btw, you missed that RT also uses SayCan for grounding and chunking high level command to small action sentences.
@aryangod2003 7 месяцев назад
SayCan., haha Ancient (meaning nearly 2 years old!). But that was one of the first models that Used LLMs in Robotics..asdie from just Language Conditioned Reinforcement learning, or behavior cloning
@rishiktiwari 7 месяцев назад
@@aryangod2003 Yeah I know.
@loserc1854 6 месяцев назад ⁺²
@@rishiktiwari If I wanna start learning LLM + robotics, whats the recommended learning path?
@rishiktiwari 6 месяцев назад
@@loserc1854 The current state of general embodied AI is very open and experimental. We all know that transformers won’t solve the robotic problems. LLMs do not play enough role in robotic control, its mostly for natural language interfacing. Multi-modal models (VLMs, VLAs) are attempted commonly in closed-loop control but most of the current implementation is hard coded rather than fuzzily generating from AI (exception RT, VC1).
I would say, look into the existing models and how can you combine them at the layer output level to get something better (most of the present research releases are just this).
Develop intuition of how AI models work, and how it happens in human brain. See if you can correlate something or come-up with new design.
In AI, programming and math are just tools to implement your intuition, remember this! Being good at math is beneficial but don’t worry if equations in paper looks scary, half of them are incomplete or BS. This is true for many papers released on arxiv or privately, they are not peer-reviewed and contain lot of flaws or tell a incomplete story.
The resources on internet are scarce, especially for someone starting new. Better to be good at reverse-engineering and asking people directly.
I would highly recommend to try implementing RNN, LSTM, CNN, TCN on your own and realise its limitation for better understanding of why transformers and attention.
Remember that transformers never know the meaning of token, they just transform on embeddings.
Also, there are three distinct groups: one who is trying to improve the AI architecture (mostly academicians), second who are improving the base models (mostly researchers & engg), and third who are putting all these models to meaningful use.
Sorry for long response but wanted to give a broad picture.
@NeuroScientician Год назад ⁺³
Does it have a context window? Or are all actions completely independent? Can I give it follow-up instructions? How does the feedback back to the LLM looks like? How does it know it worked? How about conditions? Like "open the drawer and pull out all blue cubes, but if there is a yellow ball in there, take only two blue cubes". What would happen?
I tried something like this with GPT3&4 it had no physical body, it was in a turn based game, every turn, the state of the game would be passed to all agents. It would work fine for like 5 moves, then it failed apart.
What are the specs for the arm they used? Is it like something insanely expensive, or can I use a lego arm? I would really give this a go. Mount it on a Roomba a let it collect socks or something :D
@code4AI Год назад ⁺²
Great comment, I understand that I have to explain Reinforcement Learning (RL) with Transformers for Robotics. Next video will answer your question.
@bomxacalaka2033 Год назад
it makes sense that it would work with a lego arm, and since it has LLM it wouldnt really care that they arm is shorter or less accurate or that the camera is low quality. Now in terms of follow-up instructions, my guess is that it would be some system similar to the agents system, it has the LLM, memory and the tools. So you give it a prompt and that prompt is parsed to the LLM and memory so that it always has context of what its task is so that it knows when its done and what is the next step.
@fgh680 Год назад
❤ this ; thank you for making such informative videos!😊
@aryangod2003 7 месяцев назад
@@bomxacalaka2033 It is an interative Loop until some kind of End tokken is generated I think.
@delegatewu 9 месяцев назад ⁺¹
Thank you, this is a great video.
@TheAmazonExplorer731 Год назад ⁺¹
some one please give me the link of the code for the research as i am phd student from China

Следующие

Автовоспроизведение

Reinforcement Learning (RL) explained (LLM, Vision, Robot)

Reinforcement Learning (RL) explained (LLM, Vision, Robot)

NEW Transformer for RAG: ModernBERT

NEW Transformer for RAG: ModernBERT

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

I MADE THINGS OFFICIAL WITH NICOLETTE

I MADE THINGS OFFICIAL WITH NICOLETTE

Making Cookies For Santa

Making Cookies For Santa

Duramax Diesel "Extreme" Tune and Allison Transmission Service (My Going Ta' Town Rig!)

Duramax Diesel "Extreme" Tune and Allison Transmission Service (My Going Ta' Town Rig!)

The Most DISRESPECTFUL Way To End a Game I've Seen

The Most DISRESPECTFUL Way To End a Game I've Seen

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Alternative to bearings for tiny robots

Alternative to bearings for tiny robots

Vision language action models for autonomous driving at Wayve

Vision language action models for autonomous driving at Wayve

Robotics in the Age of Generative AI with Vincent Vanhoucke, Google DeepMind | NVIDIA GTC 2024

Robotics in the Age of Generative AI with Vincent Vanhoucke, Google DeepMind | NVIDIA GTC 2024

RT2 (Robotics Transformer 2) from DeepMind

RT2 (Robotics Transformer 2) from DeepMind

Attention in transformers, visually explained | DL6

Attention in transformers, visually explained | DL6

LLMs or Reinforcement Learning? Which is better for robot control?

LLMs or Reinforcement Learning? Which is better for robot control?

LCM: The Ultimate Evolution of AI? Large Concept Models

LCM: The Ultimate Evolution of AI? Large Concept Models

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

There are never many gifts… #cat #funny #catlover

There are never many gifts… #cat #funny #catlover

同学之间的友谊才是最好的友谊 #海贼王 #路飞

同学之间的友谊才是最好的友谊 #海贼王 #路飞

Государство, породившее Гитлера | ФАЙБ

Государство, породившее Гитлера | ФАЙБ

воросы 2 🎁 | WICSUR #shorts

воросы 2 🎁 | WICSUR #shorts

Гугл карты помогли полиции

Гугл карты помогли полиции

100% real 🤝#Ligue1 #Barcola #Fortnite #PSG #Ligue1McDonalds @psg

100% real 🤝#Ligue1 #Barcola #Fortnite #PSG #Ligue1McDonalds @psg

Players vs Trophies 🤯

Players vs Trophies 🤯