Extracting transcripts and subtitles using Azure Speech and the Speech Python SDK

Поделиться
HTML-код
  • Опубликовано: 21 окт 2024

Комментарии • 21

  • @kylefoley76
    @kylefoley76 Год назад

    Thanks. It took me about 10 hours how to figure how to use Azure's speech to text service. This was the video that did the trick.

  • @vvvvv432
    @vvvvv432 Год назад +1

    It works great! I tested it in a couple of audio transcriptions using Azure and it was super accurate! Thank you so much!

    • @RobertoPrevato86
      @RobertoPrevato86  Год назад +1

      Thank You for your kind words, I'm glad that you found it useful. :)

  • @JamesManjackalMSFS
    @JamesManjackalMSFS Год назад

    I've got an SRT file from Whisper, then I translated it with Google translate or Deepl. Do Azure have a free or low cost way to do the Text to Speech? From .srt to mp3?

    • @RobertoPrevato86
      @RobertoPrevato86  Год назад

      Hi James, sorry for replying late. Yes, Azure Cognitive Services offer Text to Speech feature. I just took a look at the examples, and it doesn't seem to offer out of the box support for .srt to mp3. Something similar to my "Like-a-srt" would be needed but working the other way around. If I wasn't busy with other projects, I could extend my CLI with support for that scenario.

  • @rahuljaguar5638
    @rahuljaguar5638 10 месяцев назад

    Hey @roberto ,
    Will it work for other languages than english for generating srt ? ( languages supported by azure )

    • @RobertoPrevato86
      @RobertoPrevato86  10 месяцев назад

      Hi @rahuljaguar5638! Yes, it should. The CLI supports a --language parameter - for example: las --language it-it The source code is in GitHub here if you want to take a look: github.com/RobertoPrevato/Like-a-srt/blob/main/likeasrt/commands/srt.py - I didn't try much with other languages, though.

    • @rahuljaguar5638
      @rahuljaguar5638 10 месяцев назад

      @@RobertoPrevato86 thank you

  • @sethhammock3602
    @sethhammock3602 2 года назад

    I've been trying to use Black Hole to capture the sound from Zoom (Built-in output). However, I cannot figure out how to name devices using ALSA standards. Instead of audio_config = speechsdk.audio.AudioConfig(device_name='Built-in output') using device_name='h0:1' or whatever. Do you have any idea how to accomplish it? Thanks!

    • @RobertoPrevato86
      @RobertoPrevato86  Год назад

      Hi, I apologize for replying so late. I didn't have time at first, then forgot to reply. I don't know about that subject, I cannot help.

  • @faisalrehman2588
    @faisalrehman2588 Год назад

    Hi, if I want to run the python file separately from like-a-srt to generate srt file, which file should I use?

  • @giridharnair9370
    @giridharnair9370 Год назад

    Thank you so much.

  • @mariaroscio4796
    @mariaroscio4796 2 года назад

    Grande Roby 😗

  • @mohammadamerkhan1934
    @mohammadamerkhan1934 2 года назад

    Hi Roberto, Thanks for video Can you please tell me how we can increase the sentence splitting duration time like we know that in a audio file we have number of sentence and some sentence take long pause and some take very short pause.so can you tell me how we can do that ?

    • @RobertoPrevato86
      @RobertoPrevato86  2 года назад +1

      Hi Mohammad, I just checked the source code I wrote months ago, and I see this is currently not configurable. I created a issue in GitHub for this github.com/RobertoPrevato/Like-a-srt/issues/4 - I can fix this easily.

    • @mohammadamerkhan1934
      @mohammadamerkhan1934 2 года назад

      Hi Roberto, actually my doubt is let's say we have 2 audio sentence which is splitting by long pause like 4-5 second pause .But In second sentence there is a small pause let say 2 second.
      So when we are passing the audio file in the azure cognitive speech to text then we see it split the sentence into 3 text line because in the 2 sentence we have a small pause but actually we want 2 text sentence and we are getting 3 text sentence. So can we control the pause in the azure cognitive speech to text?
      Sorry i m writing too much, but i m explaining my problem.

    • @RobertoPrevato86
      @RobertoPrevato86  2 года назад

      @@mohammadamerkhan1934 the feature I described is related to the detail you are describing. The Azure Speech API returns information about when sounds started and how much time they lasted. The splitting of words into groups is done by my function in Python - by default it splits words into groups that last 3 seconds each. It's better to discuss on GitHub. :)

    • @RobertoPrevato86
      @RobertoPrevato86  2 года назад +1

      As far as I know, it's not possible to control the pause in Azure Cognitive Speech.

  • @faisalrehman2588
    @faisalrehman2588 Год назад

    giving error: "cannot install azure ver 1.19" can any body help?
    also how to run it in python code?

    • @RobertoPrevato86
      @RobertoPrevato86  Год назад

      Hi Faisal, sorry for replying late. As written in the README and explained in the video, Python 3.9 is required (this is a limitation of the official SDK from Microsoft, although I didn't check if the situation changed recently). And about how to run it in Python code, you can do it this way: in general, Python console applications have their commands described in `setup.py` files (legacy) or in pyproject.toml files (more modern way) - in "entry_points" or "[project.scripts]" respectively. To find the answer in any kind of situation, check these entry points or "[project.scripts]" to know how to run the code directly in Python without using a CLI. In this specific case, you are looking for: `from likeasrt.domain.azurespeech import generate_srt_from_file`. You can find the source code here: github.com/RobertoPrevato/Like-a-srt/blob/main/likeasrt/commands/srt.py