Semantic Chunking for RAG with

Поделиться
HTML-код
  • Опубликовано: 10 дек 2024
  • In this event, we’ll learn how the semantic chunking algorithm works! Text is split into sentences that are converted vectors through an embedding model. Similarity is measured between each pair of consecutive sentences. If sentences are too similar, as defined by a threshold, additional chunks are created. We can ensure that if any two consecutive sentences are too different from one another, additional chunks can be created. In theory, this will allow us to achieve better results during retrieval within our RAG system.
    Event page: lu.ma/chunkingrag
    Have a question for a speaker? Drop them here:
    app.sli.do/eve...
    Speakers:
    Dr. Greg, Co-Founder & CEO
    / gregloughane
    The Wiz, Co-Founder & CTO
    / csalexiuk
    Join our community to start building, shipping, and sharing with us today!
    / discord
    Apply for our new AI Engineering Bootcamp on Maven today!
    bit.ly/aie1
    How'd we do? Share your feedback and suggestions for future events.
    forms.gle/1Uxk...
    #chunking #rag

Комментарии • 15

  • @AI-Makerspace
    @AI-Makerspace  8 месяцев назад +1

    Google Colab notebook: colab.research.google.com/drive/1gGLd-rdPsM1iy4JmL1V1mfZm90CmDcXR?usp=sharing
    Event Slides: www.canva.com/design/DAGAtxFPH2M/3oo8gElRKU21fQH-ZzYNNA/view?DAGAtxFPH2M&

  • @damiangilgonzalez8011
    @damiangilgonzalez8011 8 месяцев назад +1

    Awesome job guys! I wached this video with my coffe this morning and it was a perfect way to start my day (learning, drinking a coffe and lisening a really good spekears/teachers)

    • @AI-Makerspace
      @AI-Makerspace  8 месяцев назад

      This is awesome Damian - thank you! We're pumped we got to spend the morning with you :)

  • @bananamaker4877
    @bananamaker4877 8 месяцев назад

    Love this video and new strategy of semantic chunking. Thanks to Greg and Chris for explaining this concept the way how it should be. Again thanks for making it open source.

    • @AI-Makerspace
      @AI-Makerspace  8 месяцев назад +1

      Thanks bananamaker!! We enjoyed getting down into the weeds of some often-overlooked pieces today, and we're also fans of the new strategy! Look for more content like this from us soon!

  • @JankayYashwant
    @JankayYashwant 6 месяцев назад

    Please make many more awesome explainers like this!

    • @AI-Makerspace
      @AI-Makerspace  6 месяцев назад +1

      You can count on it @JankayYashwant!

  • @DataScienceandAI-doanngoccuong
    @DataScienceandAI-doanngoccuong 2 месяца назад

    Trong thang đánh giá kỹ thuật Chunking thì Chunking theo ngữ nghĩa và chunking theo agent được đánh giá ở cấp 4 và 5. Thực nghiệm cho thấy chunking agentic sử dụng LLMs cho kết quả cao nhất.
    Cấp 1: Tách ký tự - Các đoạn dữ liệu ký tự tĩnh đơn giản
    Cấp 2: Tách văn bản ký tự đệ quy - Chia nhỏ đệ quy dựa trên danh sách các dấu phân cách
    Cấp 3: Tách theo từng loại tài liệu - Các phương pháp chia nhỏ khác nhau cho các loại tài liệu khác nhau (PDF, Python, Markdown)
    Cấp 4: Tách ngữ nghĩa - Chia nhỏ dựa trên embedding. Kỹ thuật này chia đoạn văn bản thành các đoạn nhỏ dựa trên ngữ nghĩa, thay vì chỉ dựa vào độ dài cố định.
    Cấp 5: Tách dùng agent - Agentic Chunker: Agentic Chunker tự động nhóm các propositions (mệnh đề) có liên quan vào các chunks (nhóm). Khi thêm một proposition mới, hệ thống sẽ xác định xem có nên thêm nó vào một chunk hiện có hay tạo một chunk mới.

  • @NhatNguyen-bq6jj
    @NhatNguyen-bq6jj 6 месяцев назад

    Can you introduce some related articles? Thanks!

    • @AI-Makerspace
      @AI-Makerspace  4 месяца назад

      medium.com/the-ai-forum/semantic-chunking-for-rag-f4733025d5f5

  • @channel_panel193
    @channel_panel193 8 месяцев назад +1

    heyyy u guys look familiar from the fourthbrain bootcamp i took! nice

  • @zugbob
    @zugbob 8 месяцев назад

    When doing RAG in general is it best to insert it into the system prompt or to have an assistant message for it?

    • @AI-Makerspace
      @AI-Makerspace  8 месяцев назад

      It's really up to you - and depends on if you're using examples or not.

  • @MrDespik
    @MrDespik 8 месяцев назад

    You forgot to show how we can combine semantic chunking with parent document retriever)
    I mean what chunks we need to use as parents and as childs.

    • @AI-Makerspace
      @AI-Makerspace  8 месяцев назад

      I'm sorry! We didn't intend to explore this in the session!