How to Build ML Solutions (w/ Python Code Walkthrough)

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024

Комментарии • 7

  • @ShawhinTalebi
    @ShawhinTalebi  4 месяца назад

    More on Full Stack Data Science 👇
    👉 Series Playlist: ruclips.net/p/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL
    💻 Example Code: github.com/ShawhinT/RUclips-Blog/tree/main/full-stack-data-science/data-science

  • @divyanshtripathi4867
    @divyanshtripathi4867 3 месяца назад +1

    Great playlist Thanks!

  • @angieyoon9900
    @angieyoon9900 2 месяца назад +1

    Amazing!

  • @kreddy8621
    @kreddy8621 4 месяца назад

    Brilliant, thanks

  • @Tenebrisuk
    @Tenebrisuk 4 месяца назад

    Great video, really interesting.
    A question on the encoding process. Does condensing transcripts into an embedding with 384 dimensions lose much information, or does the encoding process truncate the text at a point?
    How would something like this manage a lengthy transcript where you cover several different topics?
    Does the embedding get too "noisy" in that case to be able to really stand above your threshold if only perhaps 5 lines out of 100 contain the information relating to the search?

    • @ShawhinTalebi
      @ShawhinTalebi  4 месяца назад

      That's a great question. Whether (much) information is lost depends on the specific use case. For example, if you have simple text chunks that either say "True" or "False" then even a 1 dimensional embedding will preserve all the information. However, as your describing, the longer the chunks the more information can be lost. This is why experimentation is so critical because you can't really know 1) how much "information" is preserved by embeddings and 2) how that impacts your use case, before just trying it out.