Это видео недоступно.
Сожалеем об этом.

Cohere's Wikipedia Embeddings: A Short Primer on Embedding Models and Semantic Search

Поделиться
HTML-код
  • Опубликовано: 13 авг 2024
  • Learn about Wikipedia embeddings from Cohere! This video explains how Cohere embedded millions of Wikipedia articles and released them for open use. Embeddings represent text as numbers, allowing us to determine how semantically similar two pieces of text are. Using Cohere's embeddings, you can build applications like neural search, query expansion, and more. Check out the code example in Colab to get started with Cohere's embeddings today!
    🔗 Cohere's Wikipedia Blog: txt.cohere.com...
    🔗 Colab in Video: colab.research...
    🔗 Cohere's Embedding's Tutorial - In depth: txt.cohere.com...
    About me:
    Follow me on LinkedIn: / csalexiuk
    Check out what I'm working on: getox.ai/
    #embeddings #cohere #wikipediaembeddings

Комментарии • 10

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    Great content. Hope to see more on what Cohere is doing.

  • @joeybasile1572
    @joeybasile1572 2 месяца назад

    Nice video.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    Is there a way to see how they convert words into embeddings? Is it by predicting context from word or vice versa?

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      You can check out this blog post which goes into more detail about their model: txt.cohere.com/multilingual/
      Though they're fairly loose on the details!

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    Do different models given entirely different embeddings?
    Do the embeddings also depend on the size of the training data?

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      1. Most likely, there are possible scenarios where you wind up with similar embeddings - but they are unlikely at best.
      2. Yes, they depend on the vocabulary and instances/documents/passages.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    Why 768?

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      Likely tuned during training and found to be the best! They don't provide much specific detail on this point.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    But isn’t that a lot of dot scores to calculate? If we are talking about all of Wikipedia.

    • @chrisalexiuk
      @chrisalexiuk  Год назад +1

      It is, but it's vectorized with `torch.mm` and so it's no tooooo bad. Though we're only using a sample of the data - and I'd suggested doing some pre-filtered first if you wanted the best performance.