How to Apply LLMs on Audio Recordings with Multiple Speakers
HTML-код
- Опубликовано: 28 июн 2024
- Get AssemblyAI API key for this tutorial: www.assemblyai.com/?...
LLMs work wonders on text data but if you want to use audio or video files instead of text, things get a bit trickier. An easy solution is to transcribe the audio or video files. This would work but you will lose valuable information, especially in multi-speaker situations, like how many people were speaking and who said what.
In this video, we’ll learn how to build a RAG application in 10 minutes that can take multiple speakers into account when answering a question.
Colab notebook: github.com/deepset-ai/haystac...
AssemblyAI-Haystack Integration docs: www.assemblyai.com/docs/integ...
Blog post of this video: haystack.deepset.ai/blog/leve...
00:00 Introduction
00:32 Effect of Speaker Labels
01:49 Libraries and example files
04:43 Transcription Pipeline
07:52 RAG Application
10:34 Results
11:52 Try it out yourself!
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: www.assemblyai.com/?...
🐦 Twitter: / assemblyai
🦾 Discord: / discord
▶️ Subscribe: ruclips.net/user/AssemblyAI?...
🔥 We're hiring! Check our open roles: www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning - Наука
Excellent work.
I have a request.
Please make a video about
"Authentication of user identify through voice."
Great, I was looking for something like this. 🙏
Great timing!
Thanks for the awesome tutorial!
Is there some way to map Speaker A to a known speaker? I was thinking of something like speaker embeddings? Also, is it possible to use this in a realtime application?
FIRST 🎉
Second 🙂