Use OpenAI Whisper For FREE | Best Speech to Text Model

Поделиться
HTML-код
  • Опубликовано: 15 окт 2024

Комментарии • 66

  • @engineerprompt
    @engineerprompt  10 месяцев назад

    Want to connect?
    💼Consulting: calendly.com/engineerprompt/consulting-call
    🦾 Discord: discord.com/invite/t4eYQRUcXB
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Join Patreon: Patreon.com/PromptEngineering

    • @milokornblum8672
      @milokornblum8672 8 месяцев назад

      Can you tell me what is the command to put the results in subtitle format? .srt

  • @Nihilvs
    @Nihilvs 11 месяцев назад +6

    Thanks for the video ! Been using this model for a long while to do translation+transcription of lectures (one and a half hours), mostly it works like a charm. I dont know about large-v3 but large-v2 would sometimes repeat and loop one sentence about half of the transcription.
    So it needs optimization (some solutions clean the audio before whisper).

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад +1

      is small , tiny that model available in the v3 , if yeas please give me link.

    • @marcin8432
      @marcin8432 10 месяцев назад

      @@ACse-v7ylaziness destroys any kind of progress, bear that in mind

  • @ekstrajohn
    @ekstrajohn 11 месяцев назад +1

    I am using V2 on my Nvidia 1080 GPU. The perofrmance difference between the base model and the large model is very small. I tried multiple sources, tones of voice, noise, etc. Base version is really fast, so I recommend that one. Even V2 is really perfect for transcribing speech to text.

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @thunderwh
    @thunderwh 11 месяцев назад +1

    I like the idea of chatting with documents through speech

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @RameshBaburbabu
    @RameshBaburbabu 11 месяцев назад +4

    🎯 Key Takeaways for quick navigation:
    00:00 🎙️ *Overview of Whisper V3 Model*
    - Whisper V3 is OpenAI's latest speech-to-text model.
    - Five configurations available: tiny, base, small, medium, and large V3.
    - Memory requirements vary from 1 GB to 10 GB VRAM.
    01:25 🔄 *Comparison: Whisper V2 vs. V3*
    - V3 generally performs better with lower error rates than V2.
    - There are specific cases where V2 outperforms V3, demonstrated later.
    - Important to consider performance metrics when choosing between V2 and V3.
    03:02 ⚙️ *Setting Up Whisper V3 in Google Colab*
    - Installation of necessary packages: Transformer, Accelerator, and Dataset.
    - GPU availability check and configuration for optimal performance.
    - Loading the Whisper V3 model, setting processor, and creating the pipeline.
    05:27 🎤 *Speech-to-Text Transcription Process*
    - Creating a pipeline for automatic speech recognition using the Whisper V3 model.
    - Uploading and transcribing an audio file in a Google Colab notebook.
    - Additional options such as specifying timestamps during transcription.
    07:45 🌐 *Language Recognition and Translation*
    - V2 may be preferable when language is unknown, as it can automatically recognize it.
    - Whisper supports direct translation from one language to another.
    - Highlighting the importance of specifying the language in V3 if known.
    09:22 ⚡ *Flash Attention and Distal Whisper*
    - Enabling Flash Attention for improved performance if the GPU supports it.
    - Introduction to Distal Whisper, a smaller, faster version of Whisper.
    - Demonstrating how to use Distal Whisper Medium English in code.
    11:54 🌐 *Future Applications and Closing*
    - Exploring potential applications, like enabling speech communication with documents.
    - Encouraging viewers to explore and experiment with the Whisper model.
    - Expressing the usefulness and versatility of the Whisper model in various applications.
    Made with HARPA AI

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @ZaazZ-s8u
    @ZaazZ-s8u 8 месяцев назад

    Hey. I have been trying to reduce the length of the subtitles as the characters generated by whisper can be overwhelming, ranging between 12 - 18 words in a single caption. I am using google colab and so far there's no success. Here are the commands i have used:
    !whisper "FILE NAME" --model medium --word_timestamps True --max_line_width 40
    !whisper "FILE NAME" --model medium --word_timestamps True --max_words_per_line 5
    It works completely fine with the following command but with large number of words
    !whisper "FILE NAME" --model medium
    Could you please help.

  • @farahabdulahi474
    @farahabdulahi474 10 месяцев назад

    is it just me, or is this not a big jump in improvement? at least for me
    i wanted
    1. speaker recogition / diarization
    2. higher accuracy rate in mandarin
    hope they do the first asap, and the second will get better over time i hope. Azure already has a speech-to-text service that includes speaker recognition and is quite good. I wonder if could affect how they prioritise this important feature

  • @ZaazZ-s8u
    @ZaazZ-s8u 8 месяцев назад

    Hey. I have been trying to reduce the length of the subtitles but havent been successful. I am using google colab. so here is the command
    !whisper "FILE NAME" --model medium --word_timestamps True --max_line_width 40 (didnt succeed)
    !whisper "FILE NAME" --model medium --word_timestamps True --max_words_per_line 5 (no success)
    help needed please

  • @WillyFlowerz
    @WillyFlowerz 10 месяцев назад

    Is there still NO practical application for real-time transcribing (+/- translating the text) that is readily available on android ?
    I think I heard about one or two projects that wanted to do that but still nothing concrete more than a full year after this incredible piece of technology appeared
    Am I missing something ? Is whisper incompatible with android ? Is there no way to apply whisper to continuous live audio recording ?
    Has nobody managed to do it ??

  • @JuanGea-kr2li
    @JuanGea-kr2li 10 месяцев назад +3

    VERY interesting, I would love to know how to run it locally, I mean with a UI in a local computer, not in a google notebook, it will be very VERY useful to transcript a video, translate later with another tool or model and then generate subtitles or generate an audio to translate the video, everything locally :)

    • @engineerprompt
      @engineerprompt  10 месяцев назад +2

      Let me see if I can put together a streamlit based UI for it

    • @JuanGea-kr2li
      @JuanGea-kr2li 10 месяцев назад

      @@engineerprompt awesome, thank you!

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

    • @TheScifichik
      @TheScifichik Месяц назад

      @@engineerprompt did this become a reality?

  • @binayaktv2646
    @binayaktv2646 2 месяца назад

    i want to use prompt for little personalization, how to do it ?

  • @bmqww223
    @bmqww223 5 месяцев назад

    Greetings, can this be used for whisperX too? i tried it with v2 model it used to work maybe

  • @thevinnnslair
    @thevinnnslair 9 месяцев назад +1

    Very helpful, thanks for this

  • @matangbaka
    @matangbaka 8 месяцев назад

    Hi, can anyone help me, I'm having problem in following the tutorial, I encountered some error in
    pipe = pipeline("automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=15,
    batch_size=16,
    return_timestams=True,
    torch_dtype=torch_dtype,
    device=device)
    it says that:
    TypeError Traceback (most recent call last)
    in ()
    ----> 1 pipe = pipeline("automatic-speech-recognition",
    2 model=model,
    3 tokenizer=processor.tokenizer,
    4 feature_extractor=processor.feature_extractor,
    5 max_new_tokens=128,

  • @ericneeds1285
    @ericneeds1285 6 месяцев назад

    I'm a "scopist" and I need to edit transcripts with different speakers, I take it this does not differentiate speakers?

    • @engineerprompt
      @engineerprompt  6 месяцев назад

      I have a video on speaker identification on the channel.

  • @krstoevandrus5937
    @krstoevandrus5937 7 месяцев назад

    hi, i am med student i got some med video for transcribe. i tried whisper API vs colab local run large-v3. i found the API MUCH better. the local run large-v3 is NOT acceptable unless there is human edition, while the API one is fully okay on it's own. even for a name called moblitz (correct), API could recognize this, and local run named it mobits(wrong). mind to comment on this for me? thanks

  • @rccarsxdr
    @rccarsxdr 11 месяцев назад +1

    Helpful video! Now I can run code locally. Thanks

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @蘇矩賢
    @蘇矩賢 10 месяцев назад

    Thank you for sharing. May I inquire if you provide Colab code for testing? Does the new version of Whisper still have a 25MB file size limitation? Previously, I was able to split files for batch processing and then integrate them. However, when it comes to batch-processing SRT files, there seems to be a timing issue.

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @WEKINBAD
    @WEKINBAD 23 дня назад

    srry but about transcribing from url like youtube?

  • @ACse-v7y
    @ACse-v7y 10 месяцев назад +2

    in this we are downloading the model , or using the interface API ??? actualy , I am new to it and confused. If it is model then it will be best for me to host it in the server.

  • @benoitmialet9842
    @benoitmialet9842 11 месяцев назад

    Whisper is an amazing model. I use the medium version and with a fine tuning with only 40 min of audio, it is able to adapt to a specific domain (2 epochs are enough) quite well.
    I never tried to fine tune the large V3, but I will. Large V2 seems worse than medium for French language.
    Did you have the opportunity to fine tune large V3 and to compare it's performance with medium ?

    • @engineerprompt
      @engineerprompt  11 месяцев назад +1

      I haven’t looked at fine tuning v3 yet. But seems like it will track with large v2.

    • @billcollins6894
      @billcollins6894 11 месяцев назад

      Can fine tuning help accuracy if the vocabulary is limited? I only want maybe 100 words to be recognized, but I want high probability of a match. I have also tried to see how to get a return value of probability of word or phrase match, but not clear on how to do that.

    • @benoitmialet9842
      @benoitmialet9842 11 месяцев назад

      @@billcollins6894 if you FT Whisper with audios containing these words, normally it should increase model performance on these specific words.
      If you set word timestamps to True as shown on the video, normally the returned dictionary gives you the probability for each word as well. I havent tried with the pipeline but it works.

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @Vollpflock
    @Vollpflock 11 месяцев назад

    So what's the difference to using Whisper through the API? Is this free and even better than using it through the API?

    • @engineerprompt
      @engineerprompt  11 месяцев назад +1

      The main difference is it’s free if you can run it locally compared to the API which will cost you money. Performance seems to be the same

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @SonGoku-pc7jl
    @SonGoku-pc7jl 11 месяцев назад

    thanks :) please, what is the config if audio is english and y want translate to text in spanish ? only translate other-languages to english? or is posible translate english to spanish for example? And model whisper-destil, if is english .en model or not, for translate audio english to text in spanish is posible? thanks! :)

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @Nawaz-lb9eq
    @Nawaz-lb9eq 11 месяцев назад +1

    Can you do a video on multi-speaker identification and transcription using whisper please.

    • @mbrochh82
      @mbrochh82 11 месяцев назад

      don't think it is possible with whisper

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @yuzual9506
    @yuzual9506 10 месяцев назад

    Your a god of pedagogy and i m french!!! thx!

  • @AustinStAubin
    @AustinStAubin 11 месяцев назад

    Do you think you could show an example with diarization?

    • @engineerprompt
      @engineerprompt  11 месяцев назад

      I haven't worked with it directly but I think I have seen some examples. Probably not directly in whisper but it can be used to augment. Here is something that seems interesting. tinyurl.com/yeysk4bz
      I will explore this further and see what I can come up with.

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @contractorwolf
    @contractorwolf 9 месяцев назад

    is there a colab for this?

  • @Hasi105
    @Hasi105 11 месяцев назад

    Yesterday i was trying it out, nice that you explain it. Thanks! Can you create tell me how to use the a microphone for direct transcription? Maybe also a nice use for tavernAI and or MemGPT.

    • @engineerprompt
      @engineerprompt  11 месяцев назад +2

      Sure, will create a video on it

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @trilogen
    @trilogen 9 месяцев назад +1

    People want to run it locally for privacy not route it to Google

  • @DihelsonMendonca
    @DihelsonMendonca 10 месяцев назад

    ⚠️ I need a good text to speech for free. It doesn't help if you can talk to a model, but it can't talk back. So, what to do ? Any good, free text to speech ?? 😮

    • @engineerprompt
      @engineerprompt  10 месяцев назад

      You might want to check out github.com/suno-ai/bark

  • @ardavaneghtedari
    @ardavaneghtedari 10 месяцев назад

    Thanks!

  • @easylife7775
    @easylife7775 9 месяцев назад

    HELLO can i use this from A language to B language for example..

  • @techmoo5595
    @techmoo5595 10 месяцев назад

    Just run the colab note, and compare v3 with v2, I think the v2 is better.

    • @ACse-v7y
      @ACse-v7y 10 месяцев назад +1

      hey please, can you please explain me ,, why the 10 gb model just stored in 3.9Gb . I want to download complete model , because I want to host the model in the server.

  • @Tyrone-Ward
    @Tyrone-Ward 9 месяцев назад +2

    This is NOT running Whisper locally. Misleading title.

    • @listentomusic8160
      @listentomusic8160 8 месяцев назад

      My PC have 128 mb vram 😅. How could I suppose to run 16 gb vram model in my local machine 😂

  • @weslieful
    @weslieful 9 месяцев назад

    どうしようかしら