Time-Interval Machine, ID-Aware Movie Descriptions, and Story Summarization | Multimodal Weekly 56

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024
  • ​In the 56th session of Multimodal Weekly, we have three exciting presentations across different video understanding tasks: action recognition, video description, and video summarization.
    ​​​​​​​​​​​​​​​​✅ Jacob Chalk and Jaesung Huh will discuss Time Interval Machine (TIM) - which addresses the interplay between the two modalities in long videos by explicitly modeling the temporal extents of audio and visual events
    Follow Jacob: jacobchalk.git...
    Follow Jaesung: www.robots.ox....
    TIM: jacobchalk.git...
    ​​​​​​​​​​​​​✅ Haran Raajesh and Naveen Reddy Desanur will discuss Movie-Identity Captioner (MICap) - which is a new single stage approach that can seamlessly switch between id-aware caption generation or fill-in-the-blanks when given a caption with blanks.
    Follow Haran: haran71.github...
    Follow Naveen: dnaveenr.githu...
    MICap: katha-ai.githu...
    ​​​​​​​​​​✅ Aditya Kumar Singh and Dhruv Srivastava will discuss their work "Previously on ..." From Recaps to Story Summarization - which tackles multimodal story summarization by leveraging TV episode recaps - short video sequences interweaving key story moments from previous episodes to bring viewers up to speed.
    Follow Aditya: rodosingh.gith...
    Follow Dhruv: www.github.com...
    Recap Story: katha-ai.githu...
    Timestamps:
    00:10 Introduction
    03:52 Jacob & Jaesung start
    05:15 Audio and visual labels
    06:22 Current recognition approaches fail to utilize the true context
    07:20 Introducing TIM
    09:00 TIM - the full picture
    10:43 Encoding time intervals
    11:25 Qualitative results
    12:02 Recognition results
    13:14 Adapting TIM for detection
    14:43 Detection results
    15:03 Analyzing time intervals
    18:12 Q&A with Jacob
    21:20 Naveen and Haran start
    23:35 Audio descriptions
    24:02 Identity aware captioning
    25:35 Large-scale movie description challenge: Fill-in-the-blanks and full captioning tasks
    26:14 Challenging example
    26:48 Method overview: movie-identity captioner
    27:32 Method (step 1: feature extraction)
    28:20 Method (step 2: creation of captioning memory)
    29:36 Method (step 3: causal shared decoder)
    31:06 iSPICE
    32:51 SoTA results
    33:07 Attention analysis
    34:35 Q&A with Naveen and Haran
    43:03 Aditya and Dhruv start
    43:50 Goal and key idea
    44:44 Motivation
    46:07 PlotSnap dataset
    47:08 How to construct story-summary labels?
    48:35 TaleSumm - our approach for story summarization
    52:47 Experiments and ablations
    54:32 Qualitative analysis
    57:10 Q&A with Aditya and Dhruv
    01:02:20 Conclusion
    Join the Multimodal Minds community to receive an invite for future webinars: / discord

Комментарии •