- Видео 11
- Просмотров 43 919
Bots Know Best
Великобритания
Добавлен 22 янв 2023
I am a PhD candidate at Cambridge Uni, and I make videos about artificial intelligence, programming, algorithms, chess, and anything else I get obsessed with :)
6 Years of AI Progress: ModernBERT Finally Replaces BERT
After six years, we finally have a worthy replacement for BERT!
Meet ModernBERT, the state-of-the-art encoder-only model that outshines its predecessors with larger context, improved downstream performance, and blazing-fast speeds. Designed as a drop-in upgrade for BERT-like models, ModernBERT is a great choice for retrieval, classification, and code-related applications.
⭐ SUPPORT ⭐ ──────────────────
- Subscribe!
- ☕️ Coffee - www.buymeacoffee.com/botsknowbest
🎥 CHAPTERS ──────────────────
00:00 - Encoders and Decoders
00:46 - RAG
02:35 - ModernBERT
03:13 - Training Data
04:12 - Context Length
05:11 - Attention Mechanism
06:43 - ModernBERT Example
📊 PAPERS ──────────────────
👉 Smarter, Better, Faster...
Meet ModernBERT, the state-of-the-art encoder-only model that outshines its predecessors with larger context, improved downstream performance, and blazing-fast speeds. Designed as a drop-in upgrade for BERT-like models, ModernBERT is a great choice for retrieval, classification, and code-related applications.
⭐ SUPPORT ⭐ ──────────────────
- Subscribe!
- ☕️ Coffee - www.buymeacoffee.com/botsknowbest
🎥 CHAPTERS ──────────────────
00:00 - Encoders and Decoders
00:46 - RAG
02:35 - ModernBERT
03:13 - Training Data
04:12 - Context Length
05:11 - Attention Mechanism
06:43 - ModernBERT Example
📊 PAPERS ──────────────────
👉 Smarter, Better, Faster...
Просмотров: 895
Видео
AI Can Make 3D Objects Just from Images Now | 3D Reconstruction with VFusion3D NeRF
Просмотров 8472 месяца назад
Let's talk about single-image 3D reconstruction! AI models like LRM and VFusion3D can turn a single image into a 3D model. So, let's generate and print a new chess set just from the text! ⭐ SUPPORT ⭐ ────────────────── - Subscribe! - ☕️ Coffee - www.buymeacoffee.com/botsknowbest 🎥 CHAPTERS ────────────────── 00:00 - Intro 00:50 - NeRF 01:10 - Zip-NeRF 02:25 - Generating Chess Pieces 03:15 - Clo...
Not Smart Enough! Puzzles That Stump AI (Even GPT o1?)
Просмотров 1,3 тыс.3 месяца назад
AI can pass bar exams and ace math tests, but can it handle the infamous Einstein's Riddle? In this video, we put state-of-the-art AI models to the test and explore recent research aiming to enhance AI's problem-solving skills. Tested language models: - GPT-4o - GPT-4o (Mini) - o1-preview - Claude-3.5 (Sonnet) - Claude-3 (Haiku) - Llama-3.1 (405B) - Llama-3.1 (8B) - Gemini-1.5 (Pro) - Gemini-1....
GPT-4o vs 30 Chess Bots | How Good Is ChatGPT at Playing Chess?
Просмотров 1,1 тыс.7 месяцев назад
It's a common belief that ChatGPT can't play chess well, with some estimates placing its rating as low as 249. However, when used correctly, ChatGPT's chess abilities are quite impressive for an AI language model not specifically designed for the game of chess. In this video, I will match the latest OpenAI model, GPT-4o, against several chess bots and attempt to estimate its current chess ratin...
LLAMA 3 : Explained and Summarised Under 8 Minutes (Compared to Llama 2, Meta AI)
Просмотров 3,4 тыс.8 месяцев назад
Meta just released the next iteration of their open-access Llama language models. The Llama 3 AI model is now publicly available and provides state-of-the-art performance among LLMs. We don’t have a paper for Llama 3 yet. Still, there is a lot of information scattered across the web, so this video summarised the most important details about the Llama 3 models we currently know. I did my best to...
Can AI Navigate Mazes? Spatial Reasoning of GPT4 and Llama
Просмотров 8989 месяцев назад
ChatGPT and Llama2 language models can write excellent poems, but can they navigate mazes? In this video, I will show you my experiments with these AI models, benchmarking their capabilities in spatial navigation and maze solving. ⭐ SUPPORT ⭐ ────────────────── - Subscribe! - ☕️ Coffee - www.buymeacoffee.com/botsknowbest 🎥 CHAPTERS ────────────────── 00:00 - Intro 00:22 - Maze Solving 01:12 - S...
ImageBind: This Meta AI Project Binds 6 Modalities!
Просмотров 790Год назад
Meta recently released a new project called ImageBind. This video shows you all you need to understand ImageBind, covering the demo, blog post, and paper. And I will also show you some initial experiments with their code. I've aimed to be concise and informative, providing you with a brief but comprehensive overview of Image Bind. Thank you for watching! ⭐ SUPPORT ⭐ ────────────────── - Subscri...
ZipNeRF: New AI for View Synthesis From Google Research | Explained
Просмотров 4,6 тыс.Год назад
Google Research just released their new AI model for view synthesis, a quickly evolving research branch of computer vision. In this video, I will cover what view synthesis is and the research behind this new AI model from Google, and I will also show you how to use these models on your own. Thank you for watching! ⭐ SUPPORT ⭐ ────────────────── - Subscribe! - ☕️ Coffee - www.buymeacoffee.com/bo...
Segment Anything Paper Explained: New Foundation Model From Meta AI Is Impressive!
Просмотров 11 тыс.Год назад
Meta AI just released Segment Anything Model (SAM), an important step toward the first foundation model for image segmentation. I read the paper and played with the code for the past few days, and I would like to share some insights about this model. I've aimed to be concise and informative, providing you with a brief but comprehensive overview. ⭐ SUPPORT ⭐ ────────────────── - Subscribe! - ☕️ ...
ChatGPT vs 30 Chess Bots | Mastering AI Chess Through Correct Prompting
Просмотров 6 тыс.Год назад
ChatGPT's chess rating has been estimated at 249, but that's not the whole story. In this episode, I will show you how proper prompting can lead to significantly better results when playing chess with ChatGPT. ChatGPT will take on 30 chess bots and show some impressive chess skills, even though it's not a dedicated chess engine. ⭐ SUPPORT ⭐ ────────────────── - Subscribe! - ☕️ Coffee - www.buym...
New GPT-4 Report Explained: Exciting or Disappointing?! (9 Key Insights)
Просмотров 13 тыс.Год назад
GPT-4 is out, and OpenAI released a 98-page technical report describing their latest language model. I read all 98 pages and will take you through the most interesting key points, revealing hidden details lurking between the lines. I've aimed to be concise and informative, providing you with a brief but comprehensive overview. Stay tuned for future videos where I'll dive deeper into the details...
How do BERT and ModernBERT differ from embedding models such as OpenAI's text-embedding-3-small and text-embedding-3-large?
@clay1958 BERT and ModernBERT are general-purpose NLP models. They were trained using masked language modeling (MLM), which involves predicting masked tokens in a sequence. Using MLM, the models can learn to understand the context and can capture lots of knowledge due to large pre-training corpora, but you usually want to fine-tune them for your downstream tasks to get the best results. For example, if you want to use ModernBERT for semantic search, you might want a version fine-tuned on MS-MARCO, a popular retrieval dataset with contrastive examples. (This is what I did in my video.) OpenAI's embedding models are already fine-tuned on some internal datasets. They're designed to work well for tasks like semantic search, clustering, and recommendations right out of the box, so you don't need additional training to get started. ModernBERT Embed (huggingface.co/nomic-ai/modernbert-embed-base) is probably a good alternative to OpenAI's embed models at the moment. It's trained on the Nomic Embed datasets and performs on par with OpenAI's text-embedding-3 models on benchmarks like MTEB. Plus, you will also get all the benefits of using open-source models.
@@botsknowbest Thank you for this reply! And fantastic video by the way
Thank you!!
Amazing video!!
Thank you!!
Thank you for watching! What do you think about ModernBERT?
time to test the PRO version...
Holy cow how has this channel not blown up yet?? FANTASTIC quality, content, methodology, presentation I could go on! Seriously I'm going to share this video with everyone I know in the chess and AI community (which sadly isn't a huge number of people but I'll do what I can) because you deserve exposure like seriously! I do have one suggestion/question though. In the situations where cgpt hallucinates, what if you go back a move and change the format for that query? Like you could give it the FEN I believe it's called where the placement of the pieces is notated without the previous moves. Just a thought I'm curious about.
Thank you so much for your comment! I'm really glad you enjoyed the video!🙂 Regarding your question, I agree that re-running inference with representations like FEN could help prevent illegal moves. We would be basically prompting for a more diverse output, hoping to get a new legal move. Besides FEN, we could also try sampling moves multiple times with high temperatures or selecting the highest-scoring legal move from the output distribution (assuming we have access to logits). But even if that works, I would worry that the model is still misunderstanding the given position and that re-running only produces a legal move at the cost of lower performance. Also, this approach wouldn't address cases where LLMs hallucinate legal moves, like in 5:37. We could also add FEN (or some other representations) to every move and provide more information about each position like that. That should work in theory, but in my previous video about GPT-3.5, I experimented with PGN, FEN, visual formats and chain of thought variants, and I only got a small boost with FEN+PGN. Since this video involved quite a lot of games, I ended up using only PGN to make everything faster and more cost-efficient. But I might revisit other notations in future videos! Hope this makes sense, but let me know if you have any other questions 🙂
5:10 fr
Great video. I was playing against GPT-4o mini and it kept halucinating because it was in one session and wondered why. 😂 Try playing geoguessr with AIs, they are pretty good. You can only play NMPZ country streak tho
Thanks!! Yeah, playing in one session just doesn't work atm🙂Haven’t tried GeoGuessr yet, so thanks for the tip!
Do you think this will affect 3d modelling jobs in the future? I am studying to become one currently😭
Just make it easy for most people so you gotta consider that
@danielkempenaar154 I think it's similar to many other tech jobs now-AI is not necessarily replacing them, but knowing how to use AI effectively can make you much better at your job. For example, software developers who are good at coding and know how to use AI (e.g., Copilot) are doing well now. So, I think you will be fine, but I would suggest not ignoring AI and staying updated on the latest advancements in your field so you can use it if needed. Hope this helps!
man this video is amazing
Thank you!!
Nice video!
Thank you for watching! I really enjoy reading your comments, so if you have any questions about these models or ideas for future experiments, I'd love to hear them!
видео от ИИ ?
I take it you just let it keep trying until it made a legal move... very generous but suspicious. GPT 4o in my testing returns an illegal move in every single game that goes over about 30 moves.
So, I set the temperature parameter to 0, which makes the output deterministic. Thus, re-running the model would be pointless because it would return the same answer. When the model suggested an illegal move, I simply forced it to resign. You can see an example in the 6th game with Isabel (lichess.org/study/IdRA6R7f/JsZGxPjU). If I remember correctly, GPT4o lost its Queen and then tried to play again as if it still had the Queen. However, this was quite rare, and it could even play games lasting over 100 moves, as I showed in the video. Its 'understanding' of the game after 100 moves is a bit questionable, though, and you can see how poorly it plays at that point. I noticed that it is much more likely to struggle and suggest illegal moves when it's losing. It's quite confident in its own play and doesn't like to accept that it might be losing. So maybe you were just playing much better than these bots, and GPTo couldn't keep up with you. 🙂
No. Watch the video
I mean, I've put together a list of 'zero knowledge', text-based questions that my 5 year old can answer correctly that o1 and every other LLM gets wrong every time. Even basic understanding of simple objects in the world (something a bee can do) is going to take an exabyte mountain of perfect text description training for an LLM to have any hope of training their way into a 'word at a time' understanding of 3d space. An LLM agent will just get itself or you flattened by a car, or the garage door, and will never understand that you can't see through a wall just because nobody has referred to it as a 'wall'.
Thanks for your comment! You're absolutely right- AI's perception of the world is fundamentally different from human understanding, and this can become quite dangerous if people start placing blind trust in these systems. Personally, I find it interesting how each new model iteration improves in reasoning and provides more accurate and helpful responses. That being said, I also always analyze its mistakes, which can tell you a lot about the current state of research. When I tested these models on zebra puzzles, several models made basic reasoning mistakes that no human would likely make. As I showed in the video, o1 seems to be a substantial improvement on this front. But would I trust it to drive my car? Definitely not. One last point- it's important to remember that while its output may look plausible, it's still model-generated, and we don't know if the model is describing the true reasoning process behind the final answer. These models, especially the proprietary ones, remain largely a black box. Hopefully, we'll see the development of more transparent models in the future.
Nice video!
Thanks- glad you liked it! 🙂
what a great video
Thank you!
Another great video!!
Thank you!
Thank you for sharing this! It’s nice to see interesting experiments like this. Less flashy than the clickbait content, but much more informative 🍻 I did have a question about how you evaluated the model output though. Did you hand-transpose the results from the models into a parsable form, or did you ask them to respond in JSON format (or something else?)
Thanks for the great question! While getting the answer directly in JSON format would be convenient, not all LLMs can handle this consistently. In my experience, asking for JSON output tends to negatively impact reasoning, and I didn't want this to affect the chain-of-thought generation. Instead, I added "Format the final answer as Answer = ..." at the end of my prompts. That was easy to parse and worked well for almost all models. A few smaller models sometimes didn't comply, but regenerating the output with higher temperatures always fixed the problem. If I were developing a product and needed a more reliable solution, I would first generate the CoT reasoning, then run a separate prompt to extract the answer-perhaps even using constrained decoding to ensure the response fits one of the expected options.
In your example some information given was not a constraint for the solution but was en exclusion from the possible answers. For example "House 1 and House 2 are occupied by X where X is not what we are asking". Since there are 4 houses, only House 3 and 4 remain as possible responses no matter what the constraints are. By random chance you will have 50% accuracy in responding correctly. I suspect there were many similar questions generated, so the accuracy in the results is very non statistically relevant. You would need to adjust the accuracy score by how many exclusions there were in each particular puzzle in order to know how well the models adhere to the constraints and how much could be random chance. And it would still be quite imprecise because there are few possible outcomes.
Thanks so much for your comment! I see your point. I think that would definitely be true if we were directly removing the options from the list of possible answer candidates (e.g., allowing only houses 3 and 4 to be selected). But that's not the case, and the model can still choose any of those answers. The thing with language models is that they can be very smart on the one hand and not understand simple logical deductions on the other (e.g., the infamous Reversal Curse). So, even if we tell an LLM that X occupies House 1, the model might not necessarily deduce that Y cannot live there at the same time and eliminate it. It can also happen that it reasons correctly but then forgets about this decision later in the output and still makes a mistake. So, I would say it still makes sense to test models even in such cases. What do you think? Regarding the results, I'm sure there could be many improvements. Like, I didn't calculate a p-value or anything, and it would probably be better to fully separate the results based on the number of constraints. But all models were tested on an identical set of 500 puzzles, so that should make for a good enough comparison :)
I just watched all of this and didn't realise you are a small youtuber. Good job! And subscribed. Cover ARC next please! ;)
Thank you so much! I'm really glad you enjoyed the video, and I really appreciate the subscription and suggestion! 🙂 By ARC, do you mean the Abstraction and Reasoning Corpus? That's a great idea - I really enjoyed reading the 'On the Measure of Intelligence' paper.
Thanks sir
Thanks! Super interesting
Thank you for watching! I really enjoy reading your comments, so if you have any questions or ideas for future experiments, I'd love to hear them!
I was curious about this. Thanks! Keep making videos!
@BeeJamin-i5b Thank you!!
Interesting video. I played a couple games with ChatGPT 4o just for fun and was surprised that I beat it easily, and I rarely play chess. I'll have to try your idea of recounting each move in each prompt. ChatGPT frequently forgot where its own pieces were and didn't even know that I had put it in checkmate.
@ronnielane6903 Thanks! Yes, definitely give it a try. It improves substantially and rarely makes illegal moves. I'm still amazed that it can grasp positions so well, even after 100+ moves.
also, OpenAI o1 preview just released, i suggest testing it in chess and see if it is better or not.
Really fun project. What if you add a maze representation of the moves gpt already made? This way it would have perfect memory and you could reduce the amount of tokens by a lot. The complex part would be how to make the representation using text. A workaround that wouldn’t be cheap is to make a literal map of moves and create an image that you can pass to gpt since it’s multi-modal
Thanks for the suggestions! Yeah, I've tried textual representations for chess before, but it didn't work well. And I assume the same here. But creating a visual map of the moves, as you suggested, sounds like a great idea! GPT-4o with visual inputs is more affordable now, so maybe I should give it a shot!
@@botsknowbest it would be great to see something like that. New sub earned mate. Nice work
@@matiascoco1999 Thank you!!
I usually never like any video but you made me click the like button so HARD! u r good man
This is a helpful video. I would say the chat approach is normal though. That's how we typically interact with ChatGPT 4O, so it's reasonable to play chess that way too. As you point out it loses track of the position by move 20. Have you tried Claude 3.5 Sonnet and Grok 2?
Thanks so much! I agree with you that it's absolutely reasonable to expect a typical chat interaction. I'm mostly highlighting the fact that using the right input format can lead to very different outcomes and that current LLMs are still very sensitive to variations in prompts. The difference in performance is substantial and almost feels like interacting with two different models. As LLMs get better, I expect this gap to get much smaller. I haven't tried Claude or Grok yet, but I will include them in the follow-up video. 🙂
Great video, I need to contact you to discuss some Idea rgarding this video if you have a time.
*Have you tested this using the latest Claude 3.5 Sonnet model?*
Thanks for your comment! Not yet, but I plan to include more models in the follow-up video, including Claude, Gemini, and Grok.
I suggest exploring the application of advanced prompt engineering techniques, specifically few-shot prompting and chain-of-thought reasoning, to enhance the capabilities of PGN (Portable Game Notation). While I don't have a specific implementation in mind, I believe experimenting with these methods could potentially yield valuable improvements to PGN's functionality and versatility.
Thanks for your comment and these suggestions! The video involved lots of experiments (games), and my goal was to keep the prompts as short as possible to make everything computationally feasible. I actually tried experimenting with CoT, PGN, FEN, and a few visual formats in my previous videos, and I got small improvements when using FEN+PGN. But maybe I should revisit these formats in the follow-up video!
the thing is open AI didnt prepare GPT to be for chess at all, just on the info that slipped trought GPT managed to become this adavanced now. which is great. just a text based model to be like this is mind blowing,
@aminothing-ps4ev Exactly! Like of course its chess skills are far from perfect and Stockfish/AlphaZero is much better...but GPT was never directly trained to play chess!
And it's playing blindfolded!
Don't want it though, there should be an opt out on EVERY ai model. It's performing illegal activity which the creator's can hide behind to keep from getting into legal issues.
wow, awesome video!
Thanks!
excellent video and smart approach, you deserve thousands of views
Thank you!! Glad you liked it 🙂
Super fascinating. I wonder how GPT 4o would do!
Another awesome video! Thanks for the explanation on why ChatGPT seems to fail in the dialogue plays. It makes a lot of sense that our input needs to fit the training data format. Question- Are there other types of activities/tasks where chatgpt might do better if we change the format of our input?
Glad you liked it! This applies to any task, so it’s always a good idea to spend some time on prompt engineering and try different input formats.
Thank you for watching! If you have any suggestions for future experiments, I'd love to hear them! ♟ Also, feel free to reach out if you have any questions about my setup-I'm always happy to share the details!
Thanks. This was really helpful and well presented.
Thank you! Glad it was helpful 🙂
Thank you for making this video! Appreciate the links to the papers :)
Thanks so much! And thank you for the feedback about links- I wasn't sure if people find it useful or not 🙂
@@botsknowbest we do! I needed that for citing it in my thesis :D
@@dariyanagashi8958 Perfect! 😀All the best with your thesis!
@@botsknowbest thanks 😊
Thanks, that's a great summary!
Thank you!
What do you think about Llama 3 so far? 🦙🦙🦙
How does that compare to just random exploration?
I just did a few tests and random exploration could solve about 50% of 4x4 mazes and 20% of 5by5 mazes. 10x10 has way too many decision points to be solvable by random walk in less than 100 moves. But I want to run more comprehensive tests with random walk next week or so, and will let you know!
This was awesome!! I love how you created the visualizations of the solving process. The comparisons at the end were especially fascinating. I'm surprised there was such a big difference.
Thank you!!
Interesting test. I think it would have been more interesting if you had told the models to use the 'hand on the wall' strategy. Then you could have gotten a better idea of when the model lost its ability to keep track of what it was doing (which would likely correspond to the "needle in a haystack" ability to function accurately within its context window) since you could tell when its behavior was following a logical pattern as opposed to when it was just doing things randomly. It would also be interesting to see Claude 3 Opus tested, since it seems to have much better needle-in-a-haystack capabilities than GPT-4 does at present.
Thanks for the suggestions! If I manage to make another video on this topic, maybe I can test prompting with different maze-solving strategies (like hand on the wall) to see if it knows these concepts and can apply them. A few-shot inference with examples of solving a maze could also be interesting. Yeah, Claude 3 Opus would be great. Setting up these three models took already more time than I wanted 😅, but I will definitely include more models in future tests!
10:25 generally to reduce the token payload for each request, you would let the model itself generate a summary of the previous communication, however for maze walkthrough this doesn't make much sense, because you don't want to lose fidelity. Instead, reduce all the previous QA style prompts from 10:16 to a simpler format and pass it as a system prompt, for example.
Thanks! Yeah, I agree that summary would not work in this case but simplifying the format is a great idea! GPT4 is quite good with JSON/Python-like formats, so maybe I can try passing the previous moves and available directions like that next time.
Thank you for watching! Feel free to ask any questions about this project!
An academic paper in the thumbnail always let me know that the video is likely well researched, nice
Great Explanation on SAM!
Thank you!