Видео 67
Просмотров 43 933

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

1:03:27

Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55

1:03:44

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

41:22

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

57:00

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

1:06:52

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

58:37

Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57

In the 57th session of Multimodal Weekly, we had three exciting presentations - two on video captions and one on training data for foundation models.
✅ Lucas Ventura discussed CoVR - a nice work that generates triplets given video-caption pairs, while also expanding the scope of the task to include composed video retrieval.
- Follow Lucas: lucasventura.com/
- CoVR: imagine.enpc.fr/~ventural/covr/
✅ Shayne Longpre discussed his work Consent in Crisis: The Rapid Decline of the AI Data Commons and its multimodal implications. This work has been covered by the NYT, 404 Media, Vox, and Yahoo! Finance.
- Follow Shayne: www.shaynelongpre.com/
- Consent in Crisis: www.data...

Видео

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

1:03:27

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

Просмотров 8716 часов назад

In the 56th session of Multimodal Weekly, we have three exciting presentations across different video understanding tasks: action recognition, video description, and video summarization. ✅ Jacob Chalk and Jaesung Huh will discuss Time Interval Machine (TIM) - which addresses the interplay between the two modalities in long videos by explicitly modeling the temporal extents of a...

Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55

1:03:44

Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55

Просмотров 12714 дней назад

In the 55th session of Multimodal Weekly, we had three Ph.D candidates from Stony Brook University working on long-form video understanding under Michael Ryoo. ✅ Jongwoo Park will introduce LVNet - a video question answering framework with optimal strategies for key-frame selection and sequence-aware captioning. - Connect with Jongwoo: www.linkedin.com/in/jongwpark/ - LVNet:...

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

41:22

Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54

Просмотров 80Месяц назад

In the 54th session of Multimodal Weekly, we had Ronit Ogra from Phyllo to discuss AI-powered visual insights from social data with Phyllo and Twelve Labs. - Follow Ronit: www.linkedin.com/in/ronit-ogra-8424b676/ - Check out Phyllo: www.getphyllo.com/ - Read our joint blog post: www.twelvelabs.io/blog/twelve-labs-and-phyllo Timestamps: 00:15 Introduction 02:35 Ronit starts 03:40 Phyllo is so...

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

57:00

Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53

Просмотров 199Месяц назад

In the 53rd session of Multimodal Weekly, we had three exciting researchers working in multimodal understanding and reasoning benchmark, video instruction tuning, and explanation methods for Transformers and ConvNets. ✅ Xiang Yue, Postdoctoral Researcher at Carnegie Mellon University, will introduce MMMU - a new benchmark designed to evaluate multimodal models on massive multi-di...

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

1:06:52

How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52

Просмотров 145Месяц назад

In the 52nd session of Multimodal Weekly, we had three exciting researchers working in Human-Computer Interaction for video understanding, large-scale multimodal models, and video question answering. ✅ Saelyne Yang, Ph.D. Candidate at KAIST, will present her work on enhancing how people learn procedural tasks through how-to videos. - Follow Saelyne: www.saelyne.com/ - Beyond Inst...

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

58:37

Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51

Просмотров 129Месяц назад

In the 51st session of Multimodal Weekly, we have three exciting presentations from startup founder and researchers working in Multimodal AI. ✅ Jay Chia, the co-founder of Eventual Computing, will share the DIY multimodal data lake with Daft dataframes. - Follow Jay: www.linkedin.com/in/chiajay/ - Watch Jay's talk at Data AI Summit 2024: ruclips.net/video/hS_3j7IYHUs/видео.html ✅...

Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50

54:46

Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50

Просмотров 226Месяц назад

In the 50th session of Multimodal Weekly, we have two exciting presentations from startup founders building real-world products for Multimodal AI applications. ✅ Jesse Clark, the Co-Founder and CTO of Marqo AI, will discuss generalized contrastive learning for multimodal retrieval and ranking. They generalize the popular training method of CLIP to accommodate any number of text and image...

Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49

58:02

Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49

Просмотров 1812 месяца назад

In the 49th session of Multimodal Weekly, we had two exciting presentations from researchers working in language model alignment and large multimodal models. ✅ Jiwoo Hong, a M.S. student at KAIST AI, will discuss ORPO, a monolithic odds ratio preference optimization algorithm that eliminates the need for an additional preference alignment phase and reference model. This is a resource-eff...

Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48

1:00:53

Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48

Просмотров 1342 месяца назад

In the 48th session of Multimodal Weekly, we welcomed two researchers working in multimodal understanding. ✅ Max (Letian) Fu, a Ph.D. student at UC Berkeley, will dive into aligning touch, vision, and language for multimodal perception. - Follow Letian: max-fu.github.io/ Check out the following resources on the TVL paper: - Project: tactile-vlm.github.io/ - Paper: arxiv.org/abs/2401.143...

SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47

58:22

SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47

Просмотров 1222 месяца назад

In the 47th session of Multimodal Weekly, we welcomed Benjamin Muller, Tu-Anh Nguyen, and Bokai Yu from Meta AI to discuss their work SpiRit-LM, which is designed to freely mix text and speech - allowing for a seamless integration of both modalities. Resources: - Project Page: speechbot.github.io/spiritlm/ - Paper: arxiv.org/abs/2402.05755 - Connect with Benjamin: www.linkedin.com/in/benj...

Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46

49:00

Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46

Просмотров 1193 месяца назад

In the 46th session of Multimodal Weekly, we welcomed Anoop Thomas, Director of Technology at EMAM, Inc. to discuss practical use cases of multimodal AI in video production and media search. Additional resources: - Connect with Anoop: www.linkedin.com/in/anoopthomas/ - About EMAM Inc.: www.emamsolutions.com/ - EMAM and Twelve Labs Announce an Integrated Solution for Video AI: metro.newscha...

Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45

57:58

Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45

Просмотров 2573 месяца назад

In the 45th session of Multimodal Weekly, we welcomed two graduate students working at the cutting edge of academic research in large language models and multimodal models. ✅ Seungone Kim, M.S. Student at KAIST AI & Incoming Ph.D. Student at Carnegie Mellon University (Language Technology Institute), dived into Prometheus - a series of open-source models specialized in evaluations. - Follo...

Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44

50:00

Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44

Просмотров 1513 месяца назад

In the 44th session of Multimodal Weekly, we welcomed the Surgical Data Science Collective team to look at building a global data platform to deliver data and quantitative insights to surgeons at all levels of training, anywhere on earth. ✅ Dr. Daniel Donoho, Founder at SDSC, will give a high-level overview of SDSC, their users, and the surgical video platform that they offer. - Connect wi...

Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43

43:28

Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43

Просмотров 1454 месяца назад

In the 43rd session of Multimodal Weekly, we welcomed the MindsDB team to look at how to connect enterprise data to video foundation models with MindsDB and Twelve Labs. ✅ Jorge Torres, Co-Founder and CEO of MindsDB, will give a high-level overview and history of the company. - Follow Jorge: tuicasso ✅ Vibhu Sapra, Developer Relations at MindsDB, will showcase the integrat...

Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

54:47

Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

Просмотров 2054 месяца назад

Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42

Referring Image Segmentation and Compositional Visual-Linguistic Models | Multimodal Weekly 41

1:05:14

Referring Image Segmentation and Compositional Visual-Linguistic Models | Multimodal Weekly 41

Просмотров 1404 месяца назад

Referring Image Segmentation and Compositional Visual-Linguistic Models | Multimodal Weekly 41

The Future of Video Editing with Multimodal AI | Multimodal Weekly 40

54:24

The Future of Video Editing with Multimodal AI | Multimodal Weekly 40

Просмотров 4355 месяцев назад

The Future of Video Editing with Multimodal AI | Multimodal Weekly 40

Pegasus-1 Open Beta, setting new standards in video-language modeling | Multimodal Weekly 39

53:49

Pegasus-1 Open Beta, setting new standards in video-language modeling | Multimodal Weekly 39

Просмотров 1365 месяцев назад

Pegasus-1 Open Beta, setting new standards in video-language modeling | Multimodal Weekly 39

Introducing Marengo-2.6, a SOTA video foundation model for any-to-any search | Multimodal Weekly 38

53:16

Introducing Marengo-2.6, a SOTA video foundation model for any-to-any search | Multimodal Weekly 38

Просмотров 2945 месяцев назад

Introducing Marengo-2.6, a SOTA video foundation model for any-to-any search | Multimodal Weekly 38

Computer Vision for Non-ML Engineers and Semantic Search for Post-Production | Multimodal Weekly 37

1:01:17

Computer Vision for Non-ML Engineers and Semantic Search for Post-Production | Multimodal Weekly 37

Просмотров 1025 месяцев назад

Computer Vision for Non-ML Engineers and Semantic Search for Post-Production | Multimodal Weekly 37

Text-to-Video Synthesis with Lumiere | Multimodal Weekly 36

1:01:20

Text-to-Video Synthesis with Lumiere | Multimodal Weekly 36

Просмотров 1566 месяцев назад

Text-to-Video Synthesis with Lumiere | Multimodal Weekly 36

48:57

Vision Mamba | Multimodal Weekly 35

Просмотров 6216 месяцев назад

Vision Mamba | Multimodal Weekly 35

Multimodal AI in TypeScript and Vision Transformers need Registers | Multimodal Weekly 34

59:42

Multimodal AI in TypeScript and Vision Transformers need Registers | Multimodal Weekly 34

Просмотров 2666 месяцев назад

Multimodal AI in TypeScript and Vision Transformers need Registers | Multimodal Weekly 34

From Idea to Execution: Building with the Twelve Labs API | Multimodal Weekly 33

50:47

From Idea to Execution: Building with the Twelve Labs API | Multimodal Weekly 33

Просмотров 4296 месяцев назад

From Idea to Execution: Building with the Twelve Labs API | Multimodal Weekly 33

Linear Transformers Are Faster After All and LLMOps for Production Success | Multimodal Weekly 32

56:58

Linear Transformers Are Faster After All and LLMOps for Production Success | Multimodal Weekly 32

Просмотров 3547 месяцев назад

Linear Transformers Are Faster After All and LLMOps for Production Success | Multimodal Weekly 32

Video Frame Interpolation and Open-Source Observability for LLMs | Multimodal Weekly 31

49:04

Video Frame Interpolation and Open-Source Observability for LLMs | Multimodal Weekly 31

Просмотров 1557 месяцев назад

Video Frame Interpolation and Open-Source Observability for LLMs | Multimodal Weekly 31

Unlocking Model Performance Insights with ZenoML | Multimodal Weekly 30

40:27

Unlocking Model Performance Insights with ZenoML | Multimodal Weekly 30

Просмотров 1128 месяцев назад

Unlocking Model Performance Insights with ZenoML | Multimodal Weekly 30

AI in Hollywood and Fine-Tuning Open-Source LLMs | Multimodal Weekly 29

1:01:36

AI in Hollywood and Fine-Tuning Open-Source LLMs | Multimodal Weekly 29

Просмотров 1838 месяцев назад

AI in Hollywood and Fine-Tuning Open-Source LLMs | Multimodal Weekly 29

Advanced Considerations In Multimodal Search and Leveling Up Video Datasets | Multimodal Weekly 27

47:01

Advanced Considerations In Multimodal Search and Leveling Up Video Datasets | Multimodal Weekly 27

Просмотров 2378 месяцев назад

Advanced Considerations In Multimodal Search and Leveling Up Video Datasets | Multimodal Weekly 27