Это видео недоступно.

Сожалеем об этом.

Cohere's Wikipedia Embeddings: A Short Primer on Embedding Models and Semantic Search

Chris Alexiuk

Просмотров 1 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 13 авг 2024
Learn about Wikipedia embeddings from Cohere! This video explains how Cohere embedded millions of Wikipedia articles and released them for open use. Embeddings represent text as numbers, allowing us to determine how semantically similar two pieces of text are. Using Cohere's embeddings, you can build applications like neural search, query expansion, and more. Check out the code example in Colab to get started with Cohere's embeddings today!
🔗 Cohere's Wikipedia Blog: txt.cohere.com...
🔗 Colab in Video: colab.research...
🔗 Cohere's Embedding's Tutorial - In depth: txt.cohere.com...
About me:
Follow me on LinkedIn: / csalexiuk
Check out what I'm working on: getox.ai/
#embeddings #cohere #wikipediaembeddings

Комментарии • 10

@user-wr4yl7tx3w Год назад ⁺¹
Great content. Hope to see more on what Cohere is doing.
@joeybasile1572 2 месяца назад
Nice video.
@user-wr4yl7tx3w Год назад ⁺¹
Is there a way to see how they convert words into embeddings? Is it by predicting context from word or vice versa?
@chrisalexiuk Год назад ⁺¹
You can check out this blog post which goes into more detail about their model: txt.cohere.com/multilingual/
Though they're fairly loose on the details!
@user-wr4yl7tx3w Год назад ⁺¹
Do different models given entirely different embeddings?
Do the embeddings also depend on the size of the training data?
@chrisalexiuk Год назад ⁺¹
1. Most likely, there are possible scenarios where you wind up with similar embeddings - but they are unlikely at best.
2. Yes, they depend on the vocabulary and instances/documents/passages.
@user-wr4yl7tx3w Год назад ⁺¹
Why 768?
@chrisalexiuk Год назад ⁺¹
Likely tuned during training and found to be the best! They don't provide much specific detail on this point.
@user-wr4yl7tx3w Год назад ⁺¹
But isn’t that a lot of dot scores to calculate? If we are talking about all of Wikipedia.
@chrisalexiuk Год назад ⁺¹
It is, but it's vectorized with `torch.mm` and so it's no tooooo bad. Though we're only using a sample of the data - and I'd suggested doing some pre-filtered first if you wanted the best performance.

Следующие

Автовоспроизведение

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

HuggingChat - Is this open source LLMs "ChatGPT" moment?

HuggingChat - Is this open source LLMs "ChatGPT" moment?

What is Semantic Search?

What is Semantic Search?

CANELO ÁLVAREZ & EDGAR BERLANGA: MILLION DOLLAZ WORTH OF GAME EPISODE 286

CANELO ÁLVAREZ & EDGAR BERLANGA: MILLION DOLLAZ WORTH OF GAME EPISODE 286

Rachel Dratch Interrupts Jimmy's Monologue as Australian Olympic Breakdancer Raygun | Tonight Show

Rachel Dratch Interrupts Jimmy's Monologue as Australian Olympic Breakdancer Raygun | Tonight Show

I Watched FORTNITE ABSOLUTE DOOM in 0.25x Speed and Here's What I Found

I Watched FORTNITE ABSOLUTE DOOM in 0.25x Speed and Here's What I Found

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

May the 4th Be With You: YOLO-NAS Powered Jar Jar Binks Detector

May the 4th Be With You: YOLO-NAS Powered Jar Jar Binks Detector

What are AI Agents?

What are AI Agents?

OpenAI Embeddings and Vector Databases Crash Course

OpenAI Embeddings and Vector Databases Crash Course

How to Harness Wikipedia and Wikidata for Entity Recognition

How to Harness Wikipedia and Wikidata for Entity Recognition

Vectoring Words (Word Embeddings) - Computerphile

Vectoring Words (Word Embeddings) - Computerphile

Lit-LLaMA: Freeing the LLaMA! - Another Permissively Licensed LLaMA Reproduction

Lit-LLaMA: Freeing the LLaMA! - Another Permissively Licensed LLaMA Reproduction

Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

Low-rank Adaption of Large Language Models Part 2: Simple Fine-tuning with LoRA

Semantic Search with Supabase and OpenAI

Semantic Search with Supabase and OpenAI

😱Гениальный фокус с купюрой!#обучениефокусам #александрнапорко #напорко

😱Гениальный фокус с купюрой!#обучениефокусам #александрнапорко #напорко

Безумно вкусная кабачковая тортилья 💚

Безумно вкусная кабачковая тортилья 💚

🇮🇹 Итальянец VS русские привычки🤯 @mishacrylove

🇮🇹 Итальянец VS русские привычки🤯 @mishacrylove

Самый продаваемый в истории #микрофон #трамп #музыка

Самый продаваемый в истории #микрофон #трамп #музыка

Причёска меняет жизнь! 🤍 Телеграм @sashaspilberg. #сашаспилберг

Причёска меняет жизнь! 🤍 Телеграм @sashaspilberg. #сашаспилберг

Необычный способ сохранить овощи до весны! #овощи #дом #дача #хранение #огород #урожай #лайфхак

Необычный способ сохранить овощи до весны! #овощи #дом #дача #хранение #огород #урожай #лайфхак

13 Карт - Мафия | 5 серия

13 Карт — Мафия | 5 серия

ФСБ задержала в Ростове-на-Дону сотрудницу воинской части по подозрению в госизмене

ФСБ задержала в Ростове-на-Дону сотрудницу воинской части по подозрению в госизмене