How is Beam Search Really Implemented?

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024

Комментарии • 13

  • @ameynaik2743
    @ameynaik2743 Год назад +1

    ruclips.net/video/tOhWpF5-_z4/видео.html - here beam len is 2.
    ruclips.net/video/tOhWpF5-_z4/видео.html - here beam len is 3.
    ruclips.net/video/tOhWpF5-_z4/видео.html - here beam len is 6?
    Why do we take top 6 (num_beams * 2) as mentioned here ruclips.net/video/tOhWpF5-_z4/видео.html ?
    Also ruclips.net/video/tOhWpF5-_z4/видео.html with boy as input, 'and' and 'who' had highest prob (you chose top 2)
    but with 'dog' as input only 'who' i.e. top 1 was chosen?
    are you picking top 3 across outputs with inputs 'boy' 'dog' and 'woman'?

    • @EfficientNLP
      @EfficientNLP  Год назад +2

      In the code example, the beam size is 3, but the batch size is 2. That's why it appears we have 6 sequences at a time, and this illustrates how beam search is combined with batching.
      About your question about taking the top 3: We are taking the top 3 beams overall, and they may correspond to any beams from the previous iteration (it's not necessarily a 1-to-1 correspondence). So we might use 2 candidates from the beam ending with "boy", 1 from the beam ending in "dog", and 0 from the beam ending in "woman".
      Hope this clarifies things!

  • @amritpandey6964
    @amritpandey6964 Год назад +1

    nicely explained!

  • @feixyzliu5432
    @feixyzliu5432 6 месяцев назад

    seems no kv cache is used in the implementation. How to make beam search compatible with kv cache and make it more efficient?

    • @EfficientNLP
      @EfficientNLP  6 месяцев назад

      I didn't mention it in this video, but the KV cache is supported in the Hugging Face implementation (and by default is turned on) -- it is the use_cache parameter.

    • @feixyzliu5432
      @feixyzliu5432 6 месяцев назад

      I just read the Hugging Face transformers implementation. Sure, it does support kv cache, however beam search in transformers is implemented by simply expanding batch size. I'm sure this is not that efficient, especially for memory, since nothing is reused here, even the kv cache for the prompts in prefilling phase is not reused. Do you know any implementation that is more mature or optimized? Thanks a lot! @@EfficientNLP

    • @arjunkoneru5461
      @arjunkoneru5461 6 месяцев назад

      You can pass your custom past_key_values by doing a forward pass once and load it in generate @@feixyzliu5432

  • @kushagrabhushan
    @kushagrabhushan Год назад

    hey, great video! I just wanted to ask what you are using as a debugger to get the intermediate values of the variables? looks very interesting...

    • @EfficientNLP
      @EfficientNLP  Год назад

      I used PyCharm for this video, but most modern IDEs should have a similar feature.

    • @kushagrabhushan
      @kushagrabhushan Год назад

      @@EfficientNLP thank you so much!

  • @kevon217
    @kevon217 Год назад

    Very well explained!

  • @piotr780
    @piotr780 6 месяцев назад

    what IDE is this ?

    • @EfficientNLP
      @EfficientNLP  6 месяцев назад

      This is PyCharm, but VS Code has similar debugging functionality.