Twelve Labs
Twelve Labs
  • Видео 67
  • Просмотров 43 933
Composed Video Retrieval, Consent In Crisis, and Video Annotations at Scale | Multimodal Weekly 57
​In the 57th session of Multimodal Weekly, we had three exciting presentations - two on video captions and one on training data for foundation models.
​​​​​​​​​​​​​​​​✅ Lucas Ventura discussed CoVR - a nice work that generates triplets given video-caption pairs, while also expanding the scope of the task to include composed video retrieval.
- Follow Lucas: lucasventura.com/
- CoVR: imagine.enpc.fr/~ventural/covr/
​​​​​​​​​​​​​✅ Shayne Longpre discussed his work Consent in Crisis: The Rapid Decline of the AI Data Commons and its multimodal implications. This work has been covered by the NYT, 404 Media, Vox, and Yahoo! Finance.
- Follow Shayne: www.shaynelongpre.com/
- Consent in Crisis: www.data...
Просмотров: 44

Видео

Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56
Просмотров 8716 часов назад
​In the 56th session of Multimodal Weekly, we have three exciting presentations across different video understanding tasks: action recognition, video description, and video summarization. ​​​​​​​​​​​​​​​​✅ Jacob Chalk and Jaesung Huh will discuss Time Interval Machine (TIM) - which addresses the interplay between the two modalities in long videos by explicitly modeling the temporal extents of a...
Long-Form Video Reasoning and Question-Answering | Multimodal Weekly 55
Просмотров 12714 дней назад
​​​​In the 55th session of Multimodal Weekly, we had three Ph.D candidates from Stony Brook University working on long-form video understanding under Michael Ryoo. ​​​​​​​​​​​​​​​​✅ Jongwoo Park will introduce LVNet - a video question answering framework with optimal strategies for key-frame selection and sequence-aware captioning. - Connect with Jongwoo: www.linkedin.com/in/jongwpark/ - LVNet:...
Visual Insights from Social Data with Phyllo and Twelve Labs | Multimodal Weekly 54
Просмотров 80Месяц назад
​​​In the 54th session of Multimodal Weekly, we had Ronit Ogra from Phyllo to discuss AI-powered visual insights from social data with Phyllo and Twelve Labs. - Follow Ronit: www.linkedin.com/in/ronit-ogra-8424b676/ - Check out Phyllo: www.getphyllo.com/ - Read our joint blog post: www.twelvelabs.io/blog/twelve-labs-and-phyllo Timestamps: 00:15 Introduction 02:35 Ronit starts 03:40 Phyllo is so...
Multimodal Reasoning, Video Instruction-Tuning & Explaining Vision Backbones | Multimodal Weekly 53
Просмотров 199Месяц назад
​​​In the 53rd session of Multimodal Weekly, we had three exciting researchers working in multimodal understanding and reasoning benchmark, video instruction tuning, and explanation methods for Transformers and ConvNets. ​​​​​​​​​​​​✅ Xiang Yue, Postdoctoral Researcher at Carnegie Mellon University, will introduce MMMU - a new benchmark designed to evaluate multimodal models on massive multi-di...
How-to Videos, Feeling Multimodal Intelligence, & Visually-Grounded Video QA | Multimodal Weekly 52
Просмотров 145Месяц назад
​​​In the 52nd session of Multimodal Weekly, we had three exciting researchers working in Human-Computer Interaction for video understanding, large-scale multimodal models, and video question answering. ​​​​​​​​​​​​✅ Saelyne Yang, Ph.D. Candidate at KAIST, will present her work on enhancing how people learn procedural tasks through how-to videos. - Follow Saelyne: www.saelyne.com/ - Beyond Inst...
Multimodal Data Lake, Video Repetition Counting, and Low-Resource Vision | Multimodal Weekly 51
Просмотров 129Месяц назад
​​​In the 51st session of Multimodal Weekly, we have three exciting presentations from startup founder and researchers working in Multimodal AI. ​​​​​​​​✅ Jay Chia, the co-founder of Eventual Computing, will share the DIY multimodal data lake with Daft dataframes. - Follow Jay: www.linkedin.com/in/chiajay/ - Watch Jay's talk at Data AI Summit 2024: ruclips.net/video/hS_3j7IYHUs/видео.html ​​​​✅...
Generalized Contrastive Learning and Transforming Video Production | Multimodal Weekly 50
Просмотров 226Месяц назад
​​​In the 50th session of Multimodal Weekly, we have two exciting presentations from startup founders building real-world products for Multimodal AI applications. ​​​​✅ Jesse Clark, the Co-Founder and CTO of Marqo AI, will discuss generalized contrastive learning for multimodal retrieval and ranking. They generalize the popular training method of CLIP to accommodate any number of text and image...
Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49
Просмотров 1812 месяца назад
​​​In the 49th session of Multimodal Weekly, we had two exciting presentations from researchers working in language model alignment and large multimodal models. ​​​​✅ Jiwoo Hong, a M.S. student at KAIST AI, will discuss ORPO, a monolithic odds ratio preference optimization algorithm that eliminates the need for an additional preference alignment phase and reference model. This is a resource-eff...
Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48
Просмотров 1342 месяца назад
​​​In the 48th session of Multimodal Weekly, we welcomed two researchers working in multimodal understanding. ​​​​✅ Max (Letian) Fu, a Ph.D. student at UC Berkeley, will dive into aligning touch, vision, and language for multimodal perception. - Follow Letian: max-fu.github.io/ Check out the following resources on the TVL paper: - Project​: tactile-vlm.github.io/ - Paper: arxiv.org/abs/2401.143...
SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47
Просмотров 1222 месяца назад
​​In the 47th session of Multimodal Weekly, we welcomed Benjamin Muller, Tu-Anh Nguyen, and Bokai Yu from Meta AI to discuss their work SpiRit-LM, which is designed to freely mix text and speech - allowing for a seamless integration of both modalities. ​ Resources: - ​Project Page: speechbot.github.io/spiritlm/ - ​Paper: arxiv.org/abs/2402.05755 - Connect with Benjamin: www.linkedin.com/in/benj...
Enhancing Video Production & Media Search with eMAM and Twelve Labs | Multimodal Weekly 46
Просмотров 1193 месяца назад
​​In the 46th session of Multimodal Weekly, we welcomed Anoop Thomas, Director of Technology at EMAM, Inc. to discuss practical use cases of multimodal AI in video production and media search. ​ Additional resources: - Connect with Anoop: www.linkedin.com/in/anoopthomas/ - About EMAM Inc.: www.emamsolutions.com/ ​- EMAM and Twelve Labs Announce an Integrated Solution for Video AI: metro.newscha...
Open-Source LLM Evaluation & Multimodal Models for Audio Processing/Creation | Multimodal Weekly 45
Просмотров 2573 месяца назад
​​In the 45th session of Multimodal Weekly, we welcomed two graduate students working at the cutting edge of academic research in large language models and multimodal models. ​​​✅ Seungone Kim, M.S. Student at KAIST AI & Incoming Ph.D. Student at Carnegie Mellon University (Language Technology Institute), dived into Prometheus - a series of open-source models specialized in evaluations. - Follo...
Next-Generation Surgical Insights with SDSC and Twelve Labs | Multimodal Weekly 44
Просмотров 1513 месяца назад
​​In the 44th session of Multimodal Weekly, we welcomed the Surgical Data Science Collective team to look at building a global data platform to deliver data and quantitative insights to surgeons at all levels of training, anywhere on earth. ​​​✅ Dr. Daniel Donoho, Founder at SDSC, will give a high-level overview of SDSC, their users, and the surgical video platform that they offer. - Connect wi...
Bring Enterprise Data to Video Foundation Models with MindsDB and Twelve Labs | Multimodal Weekly 43
Просмотров 1454 месяца назад
​​In the 43rd session of Multimodal Weekly, we welcomed the MindsDB team to look at how to connect enterprise data to video foundation models with MindsDB and Twelve Labs. ​​​​✅ Jorge Torres, Co-Founder and CEO of MindsDB, will give a high-level overview and history of the company. - Follow Jorge: tuicasso ​​​​✅ Vibhu Sapra, Developer Relations at MindsDB, will showcase the integrat...
Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42
Просмотров 2054 месяца назад
Generative Representational Instruction Tuning and Agents for Video Creation | Multimodal Weekly 42
Referring Image Segmentation and Compositional Visual-Linguistic Models | Multimodal Weekly 41
Просмотров 1404 месяца назад
Referring Image Segmentation and Compositional Visual-Linguistic Models | Multimodal Weekly 41
The Future of Video Editing with Multimodal AI | Multimodal Weekly 40
Просмотров 4355 месяцев назад
The Future of Video Editing with Multimodal AI | Multimodal Weekly 40
Pegasus-1 Open Beta, setting new standards in video-language modeling | Multimodal Weekly 39
Просмотров 1365 месяцев назад
Pegasus-1 Open Beta, setting new standards in video-language modeling | Multimodal Weekly 39
Introducing Marengo-2.6, a SOTA video foundation model for any-to-any search | Multimodal Weekly 38
Просмотров 2945 месяцев назад
Introducing Marengo-2.6, a SOTA video foundation model for any-to-any search | Multimodal Weekly 38
Computer Vision for Non-ML Engineers and Semantic Search for Post-Production | Multimodal Weekly 37
Просмотров 1025 месяцев назад
Computer Vision for Non-ML Engineers and Semantic Search for Post-Production | Multimodal Weekly 37
Text-to-Video Synthesis with Lumiere | Multimodal Weekly 36
Просмотров 1566 месяцев назад
Text-to-Video Synthesis with Lumiere | Multimodal Weekly 36
Vision Mamba | Multimodal Weekly 35
Просмотров 6216 месяцев назад
Vision Mamba | Multimodal Weekly 35
Multimodal AI in TypeScript and Vision Transformers need Registers | Multimodal Weekly 34
Просмотров 2666 месяцев назад
Multimodal AI in TypeScript and Vision Transformers need Registers | Multimodal Weekly 34
From Idea to Execution: Building with the Twelve Labs API | Multimodal Weekly 33
Просмотров 4296 месяцев назад
From Idea to Execution: Building with the Twelve Labs API | Multimodal Weekly 33
Linear Transformers Are Faster After All and LLMOps for Production Success | Multimodal Weekly 32
Просмотров 3547 месяцев назад
Linear Transformers Are Faster After All and LLMOps for Production Success | Multimodal Weekly 32
Video Frame Interpolation and Open-Source Observability for LLMs | Multimodal Weekly 31
Просмотров 1557 месяцев назад
Video Frame Interpolation and Open-Source Observability for LLMs | Multimodal Weekly 31
Unlocking Model Performance Insights with ZenoML | Multimodal Weekly 30
Просмотров 1128 месяцев назад
Unlocking Model Performance Insights with ZenoML | Multimodal Weekly 30
AI in Hollywood and Fine-Tuning Open-Source LLMs | Multimodal Weekly 29
Просмотров 1838 месяцев назад
AI in Hollywood and Fine-Tuning Open-Source LLMs | Multimodal Weekly 29
Advanced Considerations In Multimodal Search and Leveling Up Video Datasets | Multimodal Weekly 27
Просмотров 2378 месяцев назад
Advanced Considerations In Multimodal Search and Leveling Up Video Datasets | Multimodal Weekly 27