Momentum Contrastive Learning

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии • 26

  • @connor-shorten
    @connor-shorten  4 года назад +4

    2:33 Contrastive Learning - Dollar Drawings
    3:09 Motivation of Self-Supervised Learning
    4:48 Success with DeepMind Control Suite
    6:00 MoCo Framework Overview
    8:08 Dynamic Dictionary Look-up Problem
    8:46 Data Augmentation
    9:21 Key Dictionary should be Large and Consistent
    10:42 Large Dictionaries
    11:32 Dictionary Solutions in MoCo
    13:37 Experiments
    14:26 Ablations
    16:10 MoCov2 with SimCLR extensions
    18:24 Training with Dynamic Targets

  • @abhishekyadav479
    @abhishekyadav479 4 года назад +29

    The queue encoding is FIFO not LIFO, correct me if I'm wrong

    • @dutchJSCOOP
      @dutchJSCOOP 4 года назад +1

      You are not.

    • @kunai9809
      @kunai9809 3 года назад +2

      i was confused by it too...

  • @kunai9809
    @kunai9809 3 года назад +1

    6:15
    in the denominator are not all _other_ keys, but _ALL_ keys, including the positive one.
    From the paper, right under the equation: " The sum
    is over one positive and K negative samples"

  • @vikrammurthy8337
    @vikrammurthy8337 2 года назад +2

    thanks for taking the time Connor ..still couldn't figure out 2 mysteries from the paper
    a) why maintain a dictionary when we are NOT sampling from it ? from the psuedo code in the paper, the only time the queue is used is while calculating -ve logits (which has an additional issue .. if im taking all KEYS from current batch, there will definitely be +ve keys in the queue when i multiply the query into the queue right ? most will be -ve but atleast the +ve pairs in the batch WILL result in +ve keys)
    b) while calculating the loss , the paper uses an N dim array of 0's .. i understand it specifies the 0th index of the target label so i can assume the 0th index to 1 and the rest as 0's BUT one would assume that only the positive logits would need to be closer to the 0th index ..why are they making even the -ve logits come closer to the 0th index ) .. im quite confused

    • @connor-shorten
      @connor-shorten  2 года назад

      Hey Vikram, I will try to get around to this. Please feel free to join the Weaviate slack chat to ping me again about this in case I forget.

    • @vikrammurthy8337
      @vikrammurthy8337 2 года назад +1

      @@connor-shorten thanks much .. i just re read the paper and realized that the dictionary is nothing but a big sampler for ALL -ve keys ..so my understanding is that since the Query encoder is being trained to learn the best possible representation of the images, it can only do so if it can come as close as possible to the +ve key and go as far AWAY as possible from all the -ve keys in the dictionary ..so more the -ve keys it can "escape" from the better and crisper the image representation gets hence enabling the encoder to allow for richer image embeddings that can be used in low volume datasets via supervised learning ( instead of using the small dataset to create an overfit model OR , theoretically, use imagenet's supervised pre trainers )

  • @ShivaramKR
    @ShivaramKR 4 года назад +4

    What is the problem in using the same encoder for both key and query, why should they be different?

    • @timbrt9413
      @timbrt9413 3 года назад

      If I have understood correctly from the paper, using the same encoder for keys and querys yields in an oscillating loss, because the encoder changes to fast for the "older" keys. (See section 3.2 in momentum update and 4.1 in ablation: momentum in the paper)

    • @dilipyakkha9225
      @dilipyakkha9225 3 года назад

      Speed.

  • @siarez
    @siarez 4 года назад +4

    Thanks for the video.
    Why are the weights computed for the query encoder useful at all for learning the key encoder?

    • @connor-shorten
      @connor-shorten  4 года назад +1

      We are aiming for one representation space as the product of this task. The query and key encoders can't be too disentangled from each other because than the query encoder could learn a trivial solution to map queries to their positive keys.
      Good question, it's challenging to answer well, please ask any follow up questions or comments on this.

    • @safoorayousefi3814
      @safoorayousefi3814 3 года назад

      @Patrik Vacek Because then you'll either have a small dictionary due to memory constraints, or if you store past mini-batches then your dictionary will be inconsistently out-dated.

  • @ThibaultNeveu
    @ThibaultNeveu 4 года назад +4

    Thank you !!!!!

  • @TimScarfe
    @TimScarfe 4 года назад +1

    Great summary Connor!

  • @egeres14
    @egeres14 3 года назад +2

    I love someone is breaking actually complex topics in AI with this much care and cosistency, but it goes completely unnoticed while siraj + medium collect views with clickbaity content xd

  • @farzadimanpoursardroudi45
    @farzadimanpoursardroudi45 2 года назад

    Thank you, It was really helpful.

  • @BlakeEdwards333
    @BlakeEdwards333 4 года назад

    Thank you!

  • @phuccoiinkorea3341
    @phuccoiinkorea3341 3 года назад

    Thank you

  • @spenhouet
    @spenhouet 4 года назад

    I did not look at the paper but that looks similar to Siamese neural networks.

  • @2107mann
    @2107mann 4 года назад +2

    Nice

  • @SparshGarg-n8e
    @SparshGarg-n8e 8 месяцев назад

    Thank you!