Introducing Genie... the most capable AI software engineer

12 INSANE Use Cases for NEW ChatGPT o1 Model! (The BEST LLM)

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

TE LO DIJE (Video Oficial) - Codiciado, Grupo Firme

BLESSD ❌ ANUEL AA | DEPORTIVO 💜 (VIDEO OFICIAL)

WEIRDEST FOOTBALL PRODUCTS THAT SHOULD BE ILLEGAL!

ChatGPT-4o is now the best LLM in Chatbot Arena! (Tested)

Elvis Saravia

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 ноя 2024

Комментарии • 30

@orbedus2542 3 месяца назад ⁺⁷
accurately counting letters is architecturally impossible. the best the model can do is guess/estimate. a correct guess does not make a model better or worse. tokenization means the model cannot "see" any specific letters. its like asking a blind person how many fingers you are holding up. But people seem to not know how current LLM architecture functions so they will be easily fooled by fitted responses.
@elvissaravia 3 месяца назад ⁺¹
Fair. I get the architectural limitations and how tokenisation also influences results. How do you think LLM should address these types of tasks and other issues or do you think it’s simply not feasible with current architectures?
@MAGAMINDMITCH 3 месяца назад
This is an extremely simple problem actually the model just has to give a output first in a scratch pad or working memory then review its first response before giving a response to you kind of like how humans say uhh then talk
@RyluRocky 3 месяца назад ⁺²
Impossible no, much more difficult yes, a blind person can’t see but they can feel the amount of fingers. If you have two or more different parsed tokenizations of the same word you can figure it out, practical? Not super, unless you find a better implementation but not impossible.
@VinMan-ql1yu 3 месяца назад
Guys, individual letters ARE tokens. You give it a word, it should have learnt somewhere which individual letters it is made of...
@elvissaravia 3 месяца назад
@@VinMan-ql1yu It could learn it for sure but there is a different problem I try to highlight in the video which is that tokenization depends on context and the understanding of those tokens may be different depending on that tokenization, context, and the model's internal representations.
@cbgaming08 2 месяца назад ⁺²
Does Claude 3.5 Sonnet still holds the crown?
@elvissaravia 2 месяца назад ⁺¹
For my uses cases, Yes! Mostly doing stuff with code generation and reasoning along with some vision capabilities.
@ritvikrastogi4912 3 месяца назад ⁺⁷
isn't the strawberry problem, related to tokenization...? How could this be solved...?
@elvissaravia 3 месяца назад
I believe it is. I mention it later in the video.
@ritvikrastogi4912 3 месяца назад
@@elvissaravia what could be the possible fix?? preference optimization?
@petargolubovic5300 3 месяца назад
@@ritvikrastogi4912 potential solution is the model being aware of it's architecture and using tools or just simply spelling words out every time it gets asked a question like that. if you ask it to spell out the word, it get's it right every time.
@MAGAMINDMITCH 3 месяца назад
It's very simple they start with a scratch pad that is not something that you get from the API like a working memory and then it is capable of reviewing that and getting the correct answer @@ritvikrastogi4912
@elvissaravia 3 месяца назад
@@ritvikrastogi4912 hard to tell without actually running a robust set of experiments. I think eventually it will be fixed, either through bruteforce preference optimization or maybe architectural novelties. I think overall this is an interesting area of research in addition to understanding other quantitative related tasks.
@Cine95 3 месяца назад ⁺²
weird tomorrow they are going to announce gpt 4o-large
@elvissaravia 3 месяца назад
Is that confirmed or rumoured?
@Cine95 3 месяца назад
@@elvissaravia it is confirmed by that strawberry account he said that its going to happen on thirsday
@MehulPatelLXC 3 месяца назад
@@Cine95Where can I find the account you’re referring to?
@elawchess 3 месяца назад
That's still a rumour, that's not "confirmation".
@Aziz0938 3 месяца назад
Dont trust him@@Cine95
@bastabey2652 3 месяца назад
how many r are there in the sentence "how many r are there in the word strawberry"?
the answer is being hacked across all LLM vendors
gpt-4o answered correctly when prompt is phrased:
how many r are there in the sentence "how many r are there in the word strawberry"?
perform step by step reasoning leading to the final answer
@HarringtonBartholomew-u9d 2 месяца назад
Allen Edward Johnson Frank Clark Eric
@gerkim62 3 месяца назад
I assume you are not a coding engineer because what kind of coder wants to see comments in code😅
@elvissaravia 3 месяца назад
haha i am and i do think commenting is important in large codebases. depends on what kind of code you are referring to and for what it is used.
@gerkim62 3 месяца назад ⁺¹
@@elvissaravia i get it. but i think overly commented code is not good. When chatgpt came out initially it used to add too many comments to the code.
@elvissaravia 3 месяца назад
@@gerkim62 agree overcommenting is a problem
@bastabey2652 3 месяца назад ⁺⁴
only latest gpt-4o and sonnet-3.5 answers correctly the complex 5 candles riddle with the following system prompt:
You are a logic and reasoning expert. Reason step by step leading to the final answer.

Следующие

Автовоспроизведение

Introducing Genie... the most capable AI software engineer

Introducing Genie... the most capable AI software engineer

12 INSANE Use Cases for NEW ChatGPT o1 Model! (The BEST LLM)

12 INSANE Use Cases for NEW ChatGPT o1 Model! (The BEST LLM)

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

Run ALL Your AI Locally in Minutes (LLMs, RAG, and more)

TE LO DIJE (Video Oficial) - Codiciado, Grupo Firme

TE LO DIJE (Video Oficial) - Codiciado, Grupo Firme

BLESSD ❌ ANUEL AA | DEPORTIVO 💜 (VIDEO OFICIAL)

BLESSD ❌ ANUEL AA | DEPORTIVO 💜 (VIDEO OFICIAL)

WEIRDEST FOOTBALL PRODUCTS THAT SHOULD BE ILLEGAL!

WEIRDEST FOOTBALL PRODUCTS THAT SHOULD BE ILLEGAL!

Using SPRUNKI to FOOL My Friend in Minecraft

Using SPRUNKI to FOOL My Friend in Minecraft

Why Did OpenAI Fire ChatGPT's Boss Sam Altman?

Why Did OpenAI Fire ChatGPT's Boss Sam Altman?

$25,000 vs. $25,000,000

$25,000 vs. $25,000,000

This is How I Scrape 99% of Sites

This is How I Scrape 99% of Sites

Unreasonably Effective AI with Demis Hassabis

Unreasonably Effective AI with Demis Hassabis

Microsoft Just Showed Us How To Use New AI Agents...

Microsoft Just Showed Us How To Use New AI Agents...

Why & When You Should be Using Claude over ChatGPT

Why & When You Should be Using Claude over ChatGPT

Run your own AI (but private)

Run your own AI (but private)

Google Cloud Platform Tutorial 2024 | Google Cloud In Depth Tutorial | Cloud Computing | Simplilearn

Google Cloud Platform Tutorial 2024 | Google Cloud In Depth Tutorial | Cloud Computing | Simplilearn

ChatGPT: от новичка до PRO за полчаса

ChatGPT: от новичка до PRO за полчаса

Can You Find Hulk's True Love? Real vs Fake Girlfriend Challenge | Roblox 3D

Can You Find Hulk's True Love? Real vs Fake Girlfriend Challenge | Roblox 3D

The Black Cat Was Bullied Because His Fur Color Was Different From Other Cats#Animation#Cartoon

The Black Cat Was Bullied Because His Fur Color Was Different From Other Cats#Animation#Cartoon

Ида Галич: "Я никогда не посмотрю, сколько он зарабатывает". Про новые отношения и чувство времени

Ида Галич: "Я никогда не посмотрю, сколько он зарабатывает". Про новые отношения и чувство времени

Sprunki Oren: Wrong teacher to prank! 🤬 #animation #funny #sprunki

Sprunki Oren: Wrong teacher to prank! 🤬 #animation #funny #sprunki

"Теперь вы первая линия" - "Барсик" про операцию в Клещеевке #зсу #рф #війна #новини #україна

"Теперь вы первая линия" — "Барсик" про операцию в Клещеевке #зсу #рф #війна #новини #україна

UNLIMITED CHOCOLATE 😲😍| My Dad is a Vending Machine!

UNLIMITED CHOCOLATE 😲😍| My Dad is a Vending Machine!

ВРУТ ПО ТЕЛЕФОНУ, НАВЯЗЫВАЮТ ДОПЫ. ВЫРЫВАЕМ ЛАДА НИВА У ДИЛЕРА БЕЗ ДОПОВ

ВРУТ ПО ТЕЛЕФОНУ, НАВЯЗЫВАЮТ ДОПЫ. ВЫРЫВАЕМ ЛАДА НИВА У ДИЛЕРА БЕЗ ДОПОВ

АНЕКДОТ ОТ КАТИ МОРГУНОВОЙ #мнесмешно #моргунова #прикол #воронин #бабьяк #mediumquality #юмор

АНЕКДОТ ОТ КАТИ МОРГУНОВОЙ #мнесмешно #моргунова #прикол #воронин #бабьяк #mediumquality #юмор