Vector Database Search - Hierarchical Navigable Small Worlds (HNSW) Explained

Поделиться
HTML-код
  • Опубликовано: 25 июл 2024
  • In this video, we explore how the hierarchical navigable small worlds (HNSW) algorithm works when we want to index vector databases, and how it can speed up the process of finding the most similar vectors in a database to a given query.
    Related Videos
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    Why Language Models Hallucinate: • Why Language Models Ha...
    Grounding DINO, Open-Set Object Detection: • Object Detection Part ...
    Transformer Self-Attention Mechanism Explained: • Transformer Self-Atten...
    How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • How to Fine-tune Large...
    Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained: • Multi-Head Attention (...
    LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p: • LLM Prompt Engineering...
    The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits: • The Era of 1-bit LLMs:...
    Contents
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    00:00 - Intro
    00:17 - Vector database and search
    01:42 - Navigable small worlds
    03:29 - Skip linked lists
    04:49 - Hierarchical Navigable Small Worlds
    06:47 - HNSW Search Speed
    07:49 - Outro
    Follow Me
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    🐦 Twitter: @datamlistic / datamlistic
    📸 Instagram: @datamlistic / datamlistic
    📱 TikTok: @datamlistic / datamlistic
    Channel Support
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    The best way to support the channel is to share the content. ;)
    If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
    ► Patreon: / datamlistic
    ► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
    ► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
    ► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
    ► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
    #vectordatabase #vectorsearch #rag #hnsw

Комментарии • 13

  • @datamlistic
    @datamlistic  2 месяца назад +1

    Text tokenization is one of the most overlooked topics in LLMs, although it plays a key role in how they work. Take a look at the following video to see how the most popular tokenization methods work: ruclips.net/video/hL4ZnAWSyuU/видео.html

  • @poppop101010
    @poppop101010 2 месяца назад +3

    best explanation on youtube atm

    • @datamlistic
      @datamlistic  Месяц назад

      Thanks! Glad you think so! :)

  • @himanikumar7979
    @himanikumar7979 24 дня назад

    Perfect explanation, exactly what I was looking for!

    • @datamlistic
      @datamlistic  22 дня назад

      Thanks! Glad you found it helpful! :)

  • @ZoinkDoink
    @ZoinkDoink 6 дней назад

    great explanation, massively underrated video

    • @datamlistic
      @datamlistic  5 дней назад

      Thanks! Glad you liked the explanation! :)

  • @andrefu4166
    @andrefu4166 Месяц назад

    great explanation

    • @datamlistic
      @datamlistic  Месяц назад

      Thanks! Happy to hear that you liked the explanation! :)

  • @lynnwilliam
    @lynnwilliam 2 месяца назад

    How did you go from a group of random vectors to a skip linked list structure?

    • @datamlistic
      @datamlistic  2 месяца назад

      The nodes between levels represent the same vectors. Basically on the top level you have you have a sparse graph of vectors and on the lowest level you have the entire graph. Similar to a skip linked list, you move to another node in the same level if that's closer to the query, or move down if no such nodes exists. This allows you to travel higher distances since you start at a higher level. Hope this makes sense and please let me know if you need further clarification! :)

  • @alicetang8009
    @alicetang8009 Месяц назад

    If the K equals to the total number of documents, will this approach also be like brute force? Because it needs to go through each linked document.

    • @datamlistic
      @datamlistic  27 дней назад

      If k equals to the number of documents, why not just simply return all documents? :)