Speaker diarization -- Herve Bredin -- JSALT 2023

Multi Speaker Transcription with Speaker IDs with Local Whisper

OpenAI Whisper: The Ultimate Tool for Audio Transcription and Sentiment Analysis

Kenji's Sushi Shop Showdown - Brawl Stars Animation

Bhool Bhulaiyaa 3 (Teaser): Kartik Aaryan, Vidya Balan, Triptii Dimri | Anees Bazmee | Bhushan Kumar

transcription and speaker identification OpenAI-Whisper and Pyannote [Python]

Mastering Python

Просмотров 16 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 29 сен 2024
Hello guys, in this video I will how you how to transcribe and identify the speaker by using OpenAI Whisper, Pyannote and Pydub .
For Pyannote you must register on huggingface website to get the access token.
Support me by subscribing to my channel and leave a like.
Github repository for the source code :
github.com/Mas...
OpenAi github link :
github.com/ope...
Pyannote github link :
github.com/pya...
Pydub github link :
github.com/jia...
#openai
#openai_whisper
#pyannote
#pydub
#python
#speaker_identification
#transcription
#diarization

Комментарии • 38

@bhuvneshsaini93 2 месяца назад ⁺¹
Please provide requirement.txt, else its really very hard to make it workable.
@leoncezammit2502 10 месяцев назад
Im really struggling to get this working, would i be able you to send you my output log ?
@enriqueleonmacias249 Год назад
Wow, the transcript takes like two times the duration of the file to process. I guess that this solution wouldn’t work to monitor hours of call recordings unless you use gpu servers.
@masteringpython Год назад
it is recomended to use cuda ( nvidia gpu ) for speed
cpu is very slow
@sakibzaman7719 6 дней назад
is it working on any other language?
@hrishikeshnamboothiri.v.n2195 Год назад ⁺⁴
try to include its requirements.txt also...
Thanks
@lawrencemedina5593 Год назад ⁺¹
conda activate open_chatting does not work on my computer. "EnvironmentNameNotFound: Could not find conda environment: open_chatting
You can list all discoverable environments with `conda info --envs`."
@masteringpython Год назад
install conda toolkit then create an environment called open_chatting by typing :
conda create --name open_chatting
after that install the libraries that i mentioned in the video then run the code
@chungrandy780 7 месяцев назад ⁺¹
Is there a colab version?
@ryanschwartz3340 Год назад ⁺¹
nice video. Is the repo hard-coded to your directory structure? when I tried to change it, it said the format wasn't recognized
@masteringpython Год назад
do you mean segment file ?
@JasminePlows-r4y Год назад ⁺¹
Thanks for the demo. I am getting the following error, even while using your audio.mp3 file:
end = int(millisec(j[3]))
return (int)((int(spl[0]) * 60 * 60 + int(spl[1]) * 60 + float(spl[2])) * 1000)
ValueError: invalid literal for int() with base 10: ''
@JasminePlows-r4y Год назад
@mamido mami Yes, I did that, still getting the same error
@auflute Год назад
same problem
@lunarl-l1k Год назад
same problem
@jbatista2008 Год назад
From the error message and the code, it seems that the error is happening because the millisec function is trying to convert an empty string to an integer.
The millisec function splits a time string, given in the format "hh:mm:ss.sss", into hours, minutes, and seconds, and then converts these components to milliseconds.
Here is an example of the string being parsed:
['[', '00:00:00.998', '-->', '', '00:00:20.622]', 'G', 'SPEAKER_01']
When this loop runs, it returns an empty 'end' string:
for l in range(len(k)):
j = k[l].split(" ")
start = int(millisec(j[1]))
end = int(millisec(j[3]))
The array position you want for 'end' is 4, not 3. Plus, it has a ']' symbol, so it must be cleaned up:
for l in range(len(k)):
j = k[l].split(" ")
start = int(millisec(j[1].rstrip(']'))) # remove trailing ']'
end = int(millisec(j[4].rstrip(']'))) # remove trailing ']'
@ThePikkutyyppi Год назад ⁺¹
can i use this program to split speakers to their own files? or is this only for transcription
@masteringpython Год назад
read more about pyannote to see how to split speakers
@ThePikkutyyppi Год назад
@@masteringpython What? Where?
@EhsanEslahchi Год назад ⁺¹
does this model work on languages other than English?
@masteringpython Год назад
onely english
@PaweDuzy 8 месяцев назад
@@masteringpython Only english? What is I change model = whisper.load_model("small.en") to "small"? Acording to Whisper github documentation.
@Yacine_zaki_abderrazzak Год назад ⁺¹
Thanks man, you deserve the best
@Hirotodoroki Год назад ⁺³
trying to run this but getting File contains data in an unknown format. tried several files and tried a wav file too, but no luck
@masteringpython Год назад
I advise you to use python anaconda to create development environment .Then install whisper openai ,after installing this library run a simple test to check if everything works correctly .Then install pyannote library and also run a simple test ( read carefully the installation guides maybe you missed something while installing the library)
@nadeembaig5943 4 месяца назад
@Hirotodoroki were you able to resolve the error (File Contains data in Unknown Format)?
@bootneck2222 Год назад
Great video. Thank you. Can the output be displayed on screen whilst it is processing?
@WhiteShark010 4 месяца назад
You have chance.
@kmillanr 4 месяца назад
no code in video
@patoyrigoyen Год назад ⁺¹
Does this need GPU?
@masteringpython Год назад ⁺²
in this video i did not used GPU, but if you want to use it read the pyannote documentation
@ghulamshabbir9532 Год назад ⁺¹
do this work offline ?
@masteringpython Год назад ⁺¹
yes
@userrjlyj5760g 10 месяцев назад
ما شاء الله تبارك أخ محمد .... شكراً لك
@ApparaoMulpuri-d6m 10 месяцев назад
Hi, Thanks for the Video. Need approach on how we can implement the solution with the large Audio with duration of 3 hours.
@KamilKaczmarekSolutions 10 месяцев назад
chunks
@KamilKaczmarekSolutions 10 месяцев назад
chunks and saving .txt from these chunks in files, add logic to see what chunks it already has (if you face error or sth, and you want to come back and don't have to start over, just continue where it left off)

Следующие

Автовоспроизведение

Speaker diarization -- Herve Bredin -- JSALT 2023

Speaker diarization -- Herve Bredin -- JSALT 2023

Multi Speaker Transcription with Speaker IDs with Local Whisper

Multi Speaker Transcription with Speaker IDs with Local Whisper

OpenAI Whisper: The Ultimate Tool for Audio Transcription and Sentiment Analysis

OpenAI Whisper: The Ultimate Tool for Audio Transcription and Sentiment Analysis

Kenji's Sushi Shop Showdown - Brawl Stars Animation

Kenji's Sushi Shop Showdown - Brawl Stars Animation

Bhool Bhulaiyaa 3 (Teaser): Kartik Aaryan, Vidya Balan, Triptii Dimri | Anees Bazmee | Bhushan Kumar

Bhool Bhulaiyaa 3 (Teaser): Kartik Aaryan, Vidya Balan, Triptii Dimri | Anees Bazmee | Bhushan Kumar

Tribute to Maggie Smith: Our Favourite Moments From The Dowager Countess of Grantham | Downton Abbey

Tribute to Maggie Smith: Our Favourite Moments From The Dowager Countess of Grantham | Downton Abbey

Software Engineers and IT Leaders are Dead Wrong about AI

Software Engineers and IT Leaders are Dead Wrong about AI

Speaker Recognition Using Machine Learning

Speaker Recognition Using Machine Learning

OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

OpenAI Whisper Speaker Diarization - Transcription with Speaker Names

Speaker Diarization In Python - Transcription with Speaker Labels

Speaker Diarization In Python - Transcription with Speaker Labels

The Secret to Instant Meeting Summaries: Whisper Diarization Revealed

The Secret to Instant Meeting Summaries: Whisper Diarization Revealed

63. How to automate transcripts with Amazon Transcribe and OpenAI Whisper

63. How to automate transcripts with Amazon Transcribe and OpenAI Whisper

How to Install & Use Whisper AI Voice to Text

How to Install & Use Whisper AI Voice to Text

Training Speaker Identification Models in Python with Speakerbox

Training Speaker Identification Models in Python with Speakerbox

LangChain and Ollama: Build Your Personal Coding Assistant in 10 Minutes

LangChain and Ollama: Build Your Personal Coding Assistant in 10 Minutes

ТЕПЕРЬ Я ВИЖУ СКВОЗЬ СТЕНЫ #луана #анимация #мультик

ТЕПЕРЬ Я ВИЖУ СКВОЗЬ СТЕНЫ #луана #анимация #мультик

I CAN’T BELIEVE! The smallest Phone😱😍 #tiktok #elsarca

I CAN’T BELIEVE! The smallest Phone😱😍 #tiktok #elsarca

Ко мне подкатил бармен

Ко мне подкатил бармен

DAXSHAT!!! Avaz Oxun sahnada yeg'lab yubordi

DAXSHAT!!! Avaz Oxun sahnada yeg'lab yubordi

Катаю тележки 🛒

Катаю тележки 🛒

Свадьба Раяна Асланбекова ❤️

Свадьба Раяна Асланбекова ❤️

Озеро Морской Глаз ПРЕВРАТИЛИ В СВАЛКУ / Вики Шоу

Озеро Морской Глаз ПРЕВРАТИЛИ В СВАЛКУ / Вики Шоу

Я КУПИЛА КВАРТИРУ !!! Девочка из общаги

Я КУПИЛА КВАРТИРУ !!! Девочка из общаги