Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56
HTML-код
- Опубликовано: 2 окт 2024
- In the 56th session of Multimodal Weekly, we have three exciting presentations across different video understanding tasks: action recognition, video description, and video summarization.
✅ Jacob Chalk and Jaesung Huh will discuss Time Interval Machine (TIM) - which addresses the interplay between the two modalities in long videos by explicitly modeling the temporal extents of audio and visual events
Follow Jacob: jacobchalk.git...
Follow Jaesung: www.robots.ox....
TIM: jacobchalk.git...
✅ Haran Raajesh and Naveen Reddy Desanur will discuss Movie-Identity Captioner (MICap) - which is a new single stage approach that can seamlessly switch between id-aware caption generation or fill-in-the-blanks when given a caption with blanks.
Follow Haran: haran71.github...
Follow Naveen: dnaveenr.githu...
MICap: katha-ai.githu...
✅ Aditya Kumar Singh and Dhruv Srivastava will discuss their work "Previously on ..." From Recaps to Story Summarization - which tackles multimodal story summarization by leveraging TV episode recaps - short video sequences interweaving key story moments from previous episodes to bring viewers up to speed.
Follow Aditya: rodosingh.gith...
Follow Dhruv: www.github.com...
Recap Story: katha-ai.githu...
Timestamps:
00:10 Introduction
03:52 Jacob & Jaesung start
05:15 Audio and visual labels
06:22 Current recognition approaches fail to utilize the true context
07:20 Introducing TIM
09:00 TIM - the full picture
10:43 Encoding time intervals
11:25 Qualitative results
12:02 Recognition results
13:14 Adapting TIM for detection
14:43 Detection results
15:03 Analyzing time intervals
18:12 Q&A with Jacob
21:20 Naveen and Haran start
23:35 Audio descriptions
24:02 Identity aware captioning
25:35 Large-scale movie description challenge: Fill-in-the-blanks and full captioning tasks
26:14 Challenging example
26:48 Method overview: movie-identity captioner
27:32 Method (step 1: feature extraction)
28:20 Method (step 2: creation of captioning memory)
29:36 Method (step 3: causal shared decoder)
31:06 iSPICE
32:51 SoTA results
33:07 Attention analysis
34:35 Q&A with Naveen and Haran
43:03 Aditya and Dhruv start
43:50 Goal and key idea
44:44 Motivation
46:07 PlotSnap dataset
47:08 How to construct story-summary labels?
48:35 TaleSumm - our approach for story summarization
52:47 Experiments and ablations
54:32 Qualitative analysis
57:10 Q&A with Aditya and Dhruv
01:02:20 Conclusion
Join the Multimodal Minds community to receive an invite for future webinars: / discord