Extracting transcripts and subtitles using Azure Speech and the Speech Python SDK

Roberto Prevato

Просмотров 3,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 окт 2024

Комментарии • 21

@kylefoley76 Год назад
Thanks. It took me about 10 hours how to figure how to use Azure's speech to text service. This was the video that did the trick.
@vvvvv432 Год назад ⁺¹
It works great! I tested it in a couple of audio transcriptions using Azure and it was super accurate! Thank you so much!
@RobertoPrevato86 Год назад ⁺¹
Thank You for your kind words, I'm glad that you found it useful. :)
@JamesManjackalMSFS Год назад
I've got an SRT file from Whisper, then I translated it with Google translate or Deepl. Do Azure have a free or low cost way to do the Text to Speech? From .srt to mp3?
@RobertoPrevato86 Год назад
Hi James, sorry for replying late. Yes, Azure Cognitive Services offer Text to Speech feature. I just took a look at the examples, and it doesn't seem to offer out of the box support for .srt to mp3. Something similar to my "Like-a-srt" would be needed but working the other way around. If I wasn't busy with other projects, I could extend my CLI with support for that scenario.
@rahuljaguar5638 10 месяцев назад
Hey @roberto ,
Will it work for other languages than english for generating srt ? ( languages supported by azure )
@RobertoPrevato86 10 месяцев назад
Hi @rahuljaguar5638! Yes, it should. The CLI supports a --language parameter - for example: las --language it-it The source code is in GitHub here if you want to take a look: github.com/RobertoPrevato/Like-a-srt/blob/main/likeasrt/commands/srt.py - I didn't try much with other languages, though.
@rahuljaguar5638 10 месяцев назад
@@RobertoPrevato86 thank you
@sethhammock3602 2 года назад
I've been trying to use Black Hole to capture the sound from Zoom (Built-in output). However, I cannot figure out how to name devices using ALSA standards. Instead of audio_config = speechsdk.audio.AudioConfig(device_name='Built-in output') using device_name='h0:1' or whatever. Do you have any idea how to accomplish it? Thanks!
@RobertoPrevato86 Год назад
Hi, I apologize for replying so late. I didn't have time at first, then forgot to reply. I don't know about that subject, I cannot help.
@faisalrehman2588 Год назад
Hi, if I want to run the python file separately from like-a-srt to generate srt file, which file should I use?
@RobertoPrevato86 Год назад
Hi Faisal, I replied under the other comment.
@giridharnair9370 Год назад
Thank you so much.
@mariaroscio4796 2 года назад
Grande Roby 😗
@mohammadamerkhan1934 2 года назад
Hi Roberto, Thanks for video Can you please tell me how we can increase the sentence splitting duration time like we know that in a audio file we have number of sentence and some sentence take long pause and some take very short pause.so can you tell me how we can do that ?
@RobertoPrevato86 2 года назад ⁺¹
Hi Mohammad, I just checked the source code I wrote months ago, and I see this is currently not configurable. I created a issue in GitHub for this github.com/RobertoPrevato/Like-a-srt/issues/4 - I can fix this easily.
@mohammadamerkhan1934 2 года назад
Hi Roberto, actually my doubt is let's say we have 2 audio sentence which is splitting by long pause like 4-5 second pause .But In second sentence there is a small pause let say 2 second.
So when we are passing the audio file in the azure cognitive speech to text then we see it split the sentence into 3 text line because in the 2 sentence we have a small pause but actually we want 2 text sentence and we are getting 3 text sentence. So can we control the pause in the azure cognitive speech to text?
Sorry i m writing too much, but i m explaining my problem.
@RobertoPrevato86 2 года назад
@@mohammadamerkhan1934 the feature I described is related to the detail you are describing. The Azure Speech API returns information about when sounds started and how much time they lasted. The splitting of words into groups is done by my function in Python - by default it splits words into groups that last 3 seconds each. It's better to discuss on GitHub. :)
@RobertoPrevato86 2 года назад ⁺¹
As far as I know, it's not possible to control the pause in Azure Cognitive Speech.
@faisalrehman2588 Год назад
giving error: "cannot install azure ver 1.19" can any body help?
also how to run it in python code?
@RobertoPrevato86 Год назад
Hi Faisal, sorry for replying late. As written in the README and explained in the video, Python 3.9 is required (this is a limitation of the official SDK from Microsoft, although I didn't check if the situation changed recently). And about how to run it in Python code, you can do it this way: in general, Python console applications have their commands described in `setup.py` files (legacy) or in pyproject.toml files (more modern way) - in "entry_points" or "[project.scripts]" respectively. To find the answer in any kind of situation, check these entry points or "[project.scripts]" to know how to run the code directly in Python without using a CLI. In this specific case, you are looking for: `from likeasrt.domain.azurespeech import generate_srt_from_file`. You can find the source code here: github.com/RobertoPrevato/Like-a-srt/blob/main/likeasrt/commands/srt.py

Следующие

Автовоспроизведение

Configuring Azure Active Directory app registrations for Machine to Machine communication.