I tested audio transcription from OpenAI Whisper on Raspberry PI. The results were astonishing!

Поделиться
HTML-код
  • Опубликовано: 18 янв 2025

Комментарии • 16

  • @Ul_Nika
    @Ul_Nika 5 месяцев назад

    wow! great experiment! At the end, I also wondered about the accuracy, so if it's an interesting topic for you I will be grateful for your sharing.

    • @itkacher
      @itkacher  5 месяцев назад

      Thank you!
      You can see the accuracy on 10:57 ruclips.net/video/pH07mng2jBU/видео.htmlsi=pGX2A9TTy_gcHFqc&t=657 .
      The most common differences is punctuation, lowercase/upper case.
      However, I didn't test the real-live scenario.
      The youtube video has a professional sound with a speach from a professional actor.
      I don't know the quality of the transcription if it happens in the noised spaces 😂
      I'll let you know if I try it :)

  • @snaggle202
    @snaggle202 3 часа назад

    Is there any way to cut the chunking time of 10 seconds down?

    • @itkacher
      @itkacher  3 часа назад

      OpenAI whisper doesn’t support streams.this feature. There are some third-party libraries that added this feature, but I don’t know about their performance.
      You can find a link on it in one of comments here.

  • @3DForge-i7i
    @3DForge-i7i 4 месяца назад

    Would it be easy to pass the transcription to lama to summarize it, create task list, etc… ?

    • @itkacher
      @itkacher  4 месяца назад

      It shouldn’t be a problem.
      There are a lot of technical issues with the transcription as the Whisper tries to transcribe sounds that aren’t voices.
      But I haven’t tried this.

    • @3DForge-i7i
      @3DForge-i7i 4 месяца назад

      @@itkacheryeah but I meant on the RPI itself with a lama instance which may run on the Hailo ?

    • @itkacher
      @itkacher  4 месяца назад

      I haven’t try llama. I saw that people run it on CPU. It was very slow.
      Sorry, I have no idea if it supports Hailo.

  • @cedricrueckert2399
    @cedricrueckert2399 5 месяцев назад

    nice work!! so if you would put the text to translate this and give that as sound out... you would have the first life translation. If you do such project im highly interested so see the results :)

    • @itkacher
      @itkacher  5 месяцев назад

      Thank you! To be honest, there are plenty of such solutions on the market.
      Just Google "ai live translation". However, it's not so simple, and the devil is in the details.
      The transcription worked perfectly fine on a speech from Netflix. In real life, sounds and noises will add some false words.
      Additionally, a narrator's quality does matters.
      Then, the translation works great, but it also produces a lot of false-translation.
      So it will work, but the quality wouldn't be so good.
      And the process require something more powerful, like Nvidia Jetson, Xavier, etc.

  • @SamiP111
    @SamiP111 5 месяцев назад

    how can I reach you ? have some questions

    • @itkacher
      @itkacher  5 месяцев назад

      I haven’t received any requests on LinkedIn so I assume you’ve figured out all the questions:)

  • @MadHolms
    @MadHolms 8 дней назад

    chunking in 10 seconds block is no good, you can cut it in the middle of the word, and the LLM will use the context. better to use a wrapper project whisper_streaming which works much more correctly with the streaming audio

    • @itkacher
      @itkacher  8 дней назад

      You are right, the 10 seconds approach could cut in the middle of the word.
      However, I didn't find native solution from OpenAI
      (docs: platform.openai.com/docs/guides/speech-to-text#improving-reliability )
      The purpose of the video was to test performance and I am sure that a wrapper doesn't improve it.
      I guess they "feed" content a few times to cover the "cut" case, but it's an additional operation that will consume even more CPU.
      So if someone is looking for a reliable implementation - yes, they should think about it.
      Thank you for noticing it :)

  • @AB-cd5gd
    @AB-cd5gd Месяц назад

    I gave up whisper and faster whisper for my voice assistant rpi, its slow, inaccurate and has some hallucinations, for some reason google speech recognition is much faster and accurate lol

    • @itkacher
      @itkacher  Месяц назад +1

      Hm… I haven’t tried it. Will do
      Thanks pointing it out!