How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

MIT Introduction to Deep Learning | 6.S191

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

Ariana Grande - eternal sunshine (live version)

Minecraft Live 2024: The Monster in the Woods

Harris and Trump Rallies Cold Open - SNL

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

Twelve Labs

Просмотров 199

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 окт 2024
In the 53rd session of Multimodal Weekly, we had three exciting researchers working in multimodal understanding and reasoning benchmark, video instruction tuning, and explanation methods for Transformers and ConvNets.
✅ Xiang Yue, Postdoctoral Researcher at Carnegie Mellon University, will introduce MMMU - a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.
Follow Xiang: xiangyue9607.g...
MMMU: mmmu-benchmark...
✅ Orr Zohar, Ph.D. Student at Stanford University, will introduce Video-STaR - a self-training for video language models, allowing the use of any labeled video dataset for video instruction tuning.
Follow Orr: orrzohar.githu...
Video-STaR: orrzohar.githu...
✅ Mingqi Jiang, Ph.D. Student at Oregon State University, will propose explanation methods in order to gain insights about the decision-making of different visual recognition backbones.
Follow Mingqi: mingqij.github...
CDMMTC: mingqij.github...
Timestamps:
00:13 Introduction
02:25 Xiang starts
02:45 Progress of notable ML models (specifically multimodal models)
03:38 5 levels of AGI
05:23 From existing MM benchmarks to measuring expert AGI
06:18 MMMU - multi-discipline multimodal understanding and reasoning
07:09 Sampled MMMU examples from each discipline
07:32 Recognition of MMMU
08:18 Rigorous data curation process and high-quality data
10:44 Effective suite for tracking multimodal model development
12:16 Excellent model diagnosis tool
14:26 Error analysis + language as vehicle
15:32 Lack of knowledge
15:52 Perceptual error
16:13 Reasoning error
16:40 MMMU-Pro: expanded options and realistic visual content
19:21 Conclusion & acknowledgement
21:07 Orr starts
21:31 Why do we care about video-LLMs?
22:22 Collecting video instruction tuning data is hard
22:55 Existing annotation approaches
23:20 Resulting video instruction tuning datasets
24:07 Compute-dataset size tradeoff
25:10 Video-STaR - use any video label for video instruction tuning!
26:48 Answer generation
27:02 Label rationalization
27:34 Label verifier
28:12 Data flow
31:33 Source and generated datasets
32:32 Quantitative performance
34:34 Qualitative performance
37:05 Mingqi starts
37:30 Attribution map approaches for model explanation
38:05 ConvNets may only need a small amount of parts
39:17 Structural attention graphs
39:52 Idea of this paper
40:35 Different behaviors from ConvNets and Transformers
42:22 Minimal Sufficient Explanations
44:22 "Compositional" behavior
44:50 "Disjunctive" behavior
45:20 Experiments
51:02 Cross testing and experiments
53:12 Conclusion
Join the Multimodal Minds community to receive an invite for future webinars: / discord

Комментарии •

Следующие

Автовоспроизведение

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

Ariana Grande - eternal sunshine (live version)

Ariana Grande - eternal sunshine (live version)

Minecraft Live 2024: The Monster in the Woods

Minecraft Live 2024: The Monster in the Woods

Harris and Trump Rallies Cold Open - SNL

Harris and Trump Rallies Cold Open - SNL

More than 100 people have died, 600 still missing after Hurricane Helene ravages several states

More than 100 people have died, 600 still missing after Hurricane Helene ravages several states

Where Are Laid Off Tech Employees Going? | CNBC Marathon

Where Are Laid Off Tech Employees Going? | CNBC Marathon

Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57

Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57

The Race to Harness Quantum Computing's Mind-Bending Power | The Future With Hannah Fry

The Race to Harness Quantum Computing's Mind-Bending Power | The Future With Hannah Fry

What Is an AI Anyway? | Mustafa Suleyman | TED

What Is an AI Anyway? | Mustafa Suleyman | TED

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

What do tech pioneers think about the AI revolution? - BBC World Service

What do tech pioneers think about the AI revolution? - BBC World Service

[1hr Talk] Intro to Large Language Models

[1hr Talk] Intro to Large Language Models

Prof. Geoffrey Hinton - "Will digital intelligence replace biological intelligence?" Romanes Lecture

Prof. Geoffrey Hinton - "Will digital intelligence replace biological intelligence?" Romanes Lecture

С Максимом Шевченко. Израиль, Иран, война. 02.10.24

С Максимом Шевченко. Израиль, Иран, война. 02.10.24

Comedy Club: Пароль от telegram - Карибидис, Шальнов, Грачев @ComedyClubRussia

Comedy Club: Пароль от telegram - Карибидис, Шальнов, Грачев @ComedyClubRussia

весело, но повторять не буду 🤧✨ #feeviun #makeup #makeuptutorial #shorts

весело, но повторять не буду 🤧✨ #feeviun #makeup #makeuptutorial #shorts

ПОДАРЕННУЮ КВАРТИРУ ЗАБИРАЮТ А ТАНЮ УВОЗЯТ.

ПОДАРЕННУЮ КВАРТИРУ ЗАБИРАЮТ А ТАНЮ УВОЗЯТ.

"ВОТ БЫЛА ЖИЗНЬ В ДУШАНБЕ!" / таджикский дедушка снова скучает по СССР / ссылка на серию в описании

"ВОТ БЫЛА ЖИЗНЬ В ДУШАНБЕ!" / таджикский дедушка снова скучает по СССР / ссылка на серию в описании

Иран запустил ракеты по Израилю - Минобороны

Иран запустил ракеты по Израилю - Минобороны

OYUNCAK DİREKSİYON İLE ARABAYI SÜRDÜ 😱

OYUNCAK DİREKSİYON İLE ARABAYI SÜRDÜ 😱

От первого лица: Школа 7😡 СКАНДАЛ в ШКОЛЕ 😱РАЗГРОМИЛИ САЛОН 😰БОЛЬНОЙ ОДНОКЛАССНИК 🥹ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 7😡 СКАНДАЛ в ШКОЛЕ 😱РАЗГРОМИЛИ САЛОН 😰БОЛЬНОЙ ОДНОКЛАССНИК 🥹ГЛАЗАМИ ШКОЛЬНИКА