ChatGPT vs Sparrow - Battle of Chatbots

MAMBA and State Space Models explained | SSM explained

The moment we stopped understanding AI [AlexNet]

Crews in Avery County search for missing storm victims | WSOC-TV

Can Shayne Guess Our Childhood Photos?

The SCARIEST Night of our Lives

Training learned optimizers: VeLO paper EXPLAINED

AI Coffee Break with Letitia

Просмотров 5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 3 окт 2024

Комментарии • 24

@MachineLearningStreetTalk Год назад ⁺⁴
Happy new year Ms. Coffee Bean!!! ☕
@AICoffeeBreak Год назад ⁺⁴
Happy New Year, Tim! 🍷
@zilliard1352 Год назад ⁺⁶
I signed up to your sponsor , They should appreciate you more.
@deadbeat_genius_daydreamer Год назад ⁺⁷
Great to see you again
@AICoffeeBreak Год назад ⁺⁴
Great you are watching again. 😶‍🌫️
@deadbeat_genius_daydreamer Год назад ⁺⁵
@@AICoffeeBreak yup otherwise, I'll miss out something of great value,😍😍
@DerPylz Год назад ⁺⁸
Happy New Year, Ms. Coffee Bean! :D
@AICoffeeBreak Год назад ⁺²
Happy New Year to you too. ☺️
@sriramgangadhar2408 Год назад ⁺⁵
Happy New year, Ms Coffee Bean ✨
@AICoffeeBreak Год назад ⁺³
Happy new year to you too! :)
@DerPylz Год назад ⁺⁴
I often feel like I need a ccup of of prrdductce workk.
@AICoffeeBreak Год назад ⁺³
Me too! ☕☕☕ Appy ne wearry!
@KeinNiemand Год назад ⁺⁵
But can you train a better version of VeLO using VeLO have they tried that yet? What happens when you train VeLO again with the same training data using VeLO
@abhishekshakya6072 Год назад ⁺³
Thanks for the video - that really helped in understanding in short amount of time with clarity
Question:
Do you think it will perform well on large genomic datasets or Large Language Models(LLMs)?
@AICoffeeBreak Год назад ⁺³
It's hard to day without someone having tested this. The paper is quite clear that VeLO performs well on the kind of architectures (transformers: check ✅) and objectives (language modelling ✅) and tasks (genomics ❌) it has seen in training .
@danji9485 Год назад ⁺³
Since the VeLO neural net must first be pre-trained, shouldn't that be counted in the total training time when comparing benchmarks?
@AICoffeeBreak Год назад ⁺⁹
Yes and no.
No, because it has to be trained just once.
Yes, but then hyperparameter tuning time should go into the time measures of standard optimizers.
@joshuadunford3171 Год назад
I have a question about AI, let’s say I put in “ Asian Police officer with pigtails” in the prompt; I have been told It should have a close resemblance to the police uniform, but not any real police officers unless that officer’s image is put in by random chance. Is this true, will the Image I have been given be of a police officer who doesn’t really exist?
@tamimyousefi Год назад ⁺²
This is one reasonable step towards not having to train NNs from scratch every time.
@josephsantarcangelo9310 Год назад ⁺²
I have been playing with hypergradients
@adi331 Год назад ⁺²
Puhhh seems way too complex to be useful imo.
@adi331 Год назад ⁺²
Also the performance doesn't seem that much better to justify all this complexity.
@AICoffeeBreak Год назад ⁺⁵
How do you mean complex? Complex in usage? Or conceptionally? Because the conceptional complexity is taken off you by the author's work. You just need to use their neural net to so weight updates instead of an optimizer. They have a JAX implementation, so yeah, not easy to use in Keras. Yet. 😅
@adi331 Год назад ⁺⁶
@@AICoffeeBreak Hmmm. Good question. Since the training process of VeLO is so compute intensive, there is no way to fine tune it to new architectures. This wouldn't be an issue if it actually generalizes perfectly (which I doubt.).
Then if you train something new and it doesn't work you have additional source of error. You might think to yourself oh maybe it's because VeLO isn't working with my architecture.
Maybe it has a certain kind of affinity for certain activations functions etc.
Also for big networks the forward pass requires a lot of GPU memory. Which means less memory I can spend on my actual dataset.
While I agree that in theory it's simple, just one forward pass. I feel like in practice it will be too much of additional headache.
It's more of an opinion of mine rather than hard facts.
All of the above combined with the marginal improvement I don't see myself using this anytime soon.
Thx for the video, the explanation was very good :)

Следующие

Автовоспроизведение

ChatGPT vs Sparrow - Battle of Chatbots

ChatGPT vs Sparrow - Battle of Chatbots

MAMBA and State Space Models explained | SSM explained

MAMBA and State Space Models explained | SSM explained

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Crews in Avery County search for missing storm victims | WSOC-TV

Crews in Avery County search for missing storm victims | WSOC-TV

Can Shayne Guess Our Childhood Photos?

Can Shayne Guess Our Childhood Photos?

The SCARIEST Night of our Lives

The SCARIEST Night of our Lives

True Facts: How Jellyfish Hunt

True Facts: How Jellyfish Hunt

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Sparse LLMs at inference: 6x faster transformers! | DEJAVU paper explained

Beyond neural scaling laws - Paper Explained

Beyond neural scaling laws – Paper Explained

ConvNeXt: A ConvNet for the 2020s - Paper Explained (with animations)

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

Transformers explained | The architecture behind LLMs

Transformers explained | The architecture behind LLMs

My PhD Journey in AI / ML (while doing YouTube on the side)

My PhD Journey in AI / ML (while doing YouTube on the side)

A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets

A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution - Paper Explained

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Cool Parenting Gadget Against Mosquitos! 🦟👶 #parentinghacks #funny #DIY

Cool Parenting Gadget Against Mosquitos! 🦟👶 #parentinghacks #funny #DIY

ЭТО НАСТОЯЩАЯ МАГИЯ😬😬😬

ЭТО НАСТОЯЩАЯ МАГИЯ😬😬😬

小路飞嫁祸姐姐搞破坏 #路飞#海贼王

小路飞嫁祸姐姐搞破坏 #路飞#海贼王

這種要是上擂台，幾個泰森才能打的過？ #shorts #sports #fighting

這種要是上擂台，幾個泰森才能打的過？ #shorts #sports #fighting

ЧЕСНОК + ЧОМПЕР = ИМБА! / ТАКОЙ ГИБРИД МОД Я ЕЩЁ НЕ ВИДЕЛ

ЧЕСНОК + ЧОМПЕР = ИМБА! / ТАКОЙ ГИБРИД МОД Я ЕЩЁ НЕ ВИДЕЛ

ЭТИ ДУХИ ДОЛЖНЫ БЫТЬ У КАЖДОЙ | моя КОЛЛЕКЦИЯ ПАРФЮМОВ

ЭТИ ДУХИ ДОЛЖНЫ БЫТЬ У КАЖДОЙ | моя КОЛЛЕКЦИЯ ПАРФЮМОВ

ВЫ НИЧЕГО НЕ ЗНАЕТЕ О ВАШЕМ ТЕЛЕ - ТОПЛЕС

ВЫ НИЧЕГО НЕ ЗНАЕТЕ О ВАШЕМ ТЕЛЕ — ТОПЛЕС

iPhone 16 & beats 📦

iPhone 16 & beats 📦