Offline Speech Recognition on Raspberry Pi 4/Nvidia Jetson Nano with Respeaker

Поделиться
HTML-код
  • Опубликовано: 28 янв 2020
  • Faster than real-time! Based on Mozilla's DeepSpeech Engine 0.6.1.
    www.hackster.io/dmitrywat/off...
    In this video we’re going to run and benchmark Mozilla’s DeepSpeech ASR (automatic speech recognition) engine on different platforms, such as Raspberry Pi 4(1 GB), Nvidia Jetson Nano, Windows PC and Linux PC.
    The hardware for this video was kindly provided by Seeed studio, check out the ReSpeaker USB Mic Array and other hardware at Seeed studio store!
    www.seeedstudio.com/ReSpeaker...
    www.seeedstudio.com/Raspberry...
    Blog articles about DeepSpeech engine:
    hacks.mozilla.org/2019/12/dee...
    hacks.mozilla.org/2017/11/a-j...
    Mozilla's DeepSpeech Github
    github.com/mozilla/DeepSpeech
    Add me on LinkedIn if you have any questions:
    / dmitry-maslov-ai
    Credits for music:
    • Portal 2 - Love as a C...
    Portal 2 - Love as a Construct (Alex Giudici Remix) V2
  • НаукаНаука

Комментарии • 78

  • @jordainewisdom2448
    @jordainewisdom2448 4 года назад +7

    For anybody trying to find a offline speech recognition for wakeword detection on the raspberry pi. Try using Vosk, it’s a new speech recognition API. Its not great at detecting a lot of words at once but it’s good for detecting one or two words accurately which makes it great for wake-word detection. I used it for my python script for my Virtual Assistant for my smart mirror. In my case I used vosk for wakeword and after the wakeword was detected I started using google speech recognition to the cloud which is much more accurate and after it finishes it goes back to vosk. Hope this helps someone, I know can be hard finding packages that work but when you do it’s ecstasy. Good Luck

    • @Hardwareai
      @Hardwareai  4 года назад

      Great suggestion! Actually, for hot word detection it is entirely feasible to train the model yourself, for making another project I used a modified version of the script in this article blog.aspiresys.pl/technology/building-jarvis-nlp-hot-word-detection/

    • @imsteven3044
      @imsteven3044 Год назад +1

      Hi jordanie, do you have a repository of that part? i mean the code of Vosk, I only need to detect the words to send the data to the clooud the same as you

  • @r5bc
    @r5bc 4 года назад +3

    You just won a lot of respect, a subscription, and a bell for all notifications. Thank you for the content. Please keep up the good work

    • @Hardwareai
      @Hardwareai  4 года назад

      Very glad to hear that! Out of curiosity, how do you plan to use DeepSpeech? :)

  • @alx8439
    @alx8439 Год назад +1

    Wow. Thanks for sharing that. I couldn't even dream of offline recognition on weak SBCs, but looks like software has finally catchup

    • @Hardwareai
      @Hardwareai  Год назад

      It does! By the way, I have since made a fresh video on topic: ruclips.net/video/75H12lYz0Lo/видео.html

  • @rams9052
    @rams9052 4 года назад

    Absolutely fantastic and crisp explanation. Good work!

  • @RoboLab19
    @RoboLab19 3 года назад

    Great video about ReSpeaker and offline speech recognition! Already booked one for my robotic projects.

    • @Hardwareai
      @Hardwareai  3 года назад +1

      Cool, thanks! If you're into robotics and want to leverage DeepSpeech for parsing commands to robot, it might be interesting to use DeepSpeech in connection with slot filling algorithm, similar to this one github.com/IsaacAhouma/Slot-Filling-Understanding-Using-RNNs it would help parsing raw natural language into robot commands

    • @RoboLab19
      @RoboLab19 3 года назад

      @@Hardwareai Ok, thanks for link!

  • @jagsdesign
    @jagsdesign 4 года назад +1

    Wonderful video on testing the Mozilla DeepSpeech and have tried it both on Jetson Nano and Raspberry 4 with other voice hats including Respeaker older models ; great learning and will explore; what other equivalent explorations can be possible !

    • @Hardwareai
      @Hardwareai  4 года назад

      Thank you for your kind words!

  • @xiaoniwang4660
    @xiaoniwang4660 Год назад

    the only useful method for jetson nano at 3am, thank you so much !!!!!!!!!!!!!!!!!

  • @FuZZbaLLbee
    @FuZZbaLLbee 4 года назад +4

    Good video. They have a program for other languages that need a lot of people to contribute. Please do this to make your language recognizeable

    • @Hardwareai
      @Hardwareai  4 года назад +2

      Yes, the Common Voice! Thank for mentioning that voice.mozilla.org/en

  • @chindukrishnapankaj5340
    @chindukrishnapankaj5340 3 года назад

    Thanks for the video and it's very useful. I wold like to thank you again for your extra effort to introduce the ReSpeaker mic array DOA. Do you have any suggestion for Indoor mapping using Lidar or any other technology. Thanks.

    • @Hardwareai
      @Hardwareai  3 года назад

      Thanks for the kind words! I assume these two are unrelated question, right? I mean ReSpeaker and indoor mapping.
      Lidar is stable and tried-and-true solution for mapping. It is a bit bulky and still expensive, even the cheapest ones. You can use stereo/monocular cameras for SLAM - there are plenty of packages available for ROS for example. I introduce RTAB-MAP based SLAM with Kinect and Raspberry Pi here for example:
      ruclips.net/video/c5punaP01kU/видео.html
      Kinect is a bit outdated though, you could go for RealSense cameras, which also have good ROS support. Anyways, in the end it all comes down to hardware footprint/price/the amount of sotware development you're willing to undertake.

    • @chrisw1462
      @chrisw1462 2 года назад

      @@Hardwareai Two ReSpeakers could be used to triangulate the position of the speaker - at least in 2D.

  • @fd-ds4061
    @fd-ds4061 4 года назад +2

    Thank you for the interesting video.
    I tried to install Deepspeech on the Jetson Nano. Unfortunately, the link to the preview wheel with.tflite model from your description is no longer valid. Is there another way to install Deepspeech on the Jetson Nano?
    Many thanks in advance

    • @Hardwareai
      @Hardwareai  4 года назад +3

      Yes, it was only temporary build. I already updated the article with permanent link, here it is github.com/mozilla/DeepSpeech/releases/download/v0.7.0-alpha.1/deepspeech-0.7.0a1-cp37-cp37m-linux_aarch64.whl

  • @varungujjar97
    @varungujjar97 4 года назад +1

    Hey great video .. thankyou for putting in the effort ...can u tell which one is better in terms of performance and accuracy Kaldi or Deep Speech if i had to run on a Pi4 ?

    • @Hardwareai
      @Hardwareai  4 года назад +1

      My pleasure! Hm... From what I understand Kaldi is very different from Deep Speech, in the way that Kaldi is a toolkit for speech recognition and not really a ready to deploy solution. You will certainly understand that if you look at their at documentation page. It is a bit ... overwhelming for beginner :) Now DeepSpeech comes in ready-to-use package and is easy to set up and use - and that is where it's main advantage lies.

  • @ChicagoBob123
    @ChicagoBob123 4 года назад

    Awesome

  • @steveb.548
    @steveb.548 4 года назад +1

    Did you try benchmarking with TensorFlow 128 CUDA core accelaration, or only TFLite on a single ARM core?

    • @Hardwareai
      @Hardwareai  4 года назад +1

      As stated in the article, Deepspeech with GPU acceleration is not available for arm64 architecture. So, the results are for .tflite on single ARM core. Read more information on that in this thread discourse.mozilla.org/t/video-and-benchmarking-results/52946/16 :)

    • @AltMarc
      @AltMarc 4 года назад

      @@Hardwareai you can build a GPU version on the Nano, but it uses the .pbmm file, and that's pretty slow even with the GPU: around 3.5 sec ("experience is proof" audio), without GPU 17sec.
      I have more trouble to build the TFlite version, got that native app working (forcing it) but not the python wheel.
      BTW on the Xavier the tflite is as fast as the 1070 GPU, and even the .pbmm model is faster than real time.

  • @eminates5698
    @eminates5698 4 года назад

    Hi Dmitry, Is it possible to run this demo on kendryte k210 chip ? From the Seeedstudio blog; "According to Jia Nan’s official description, the K210’s KPU has a power of 0.8TFLOPS. For comparison, the NVIDIA Jetson Nano with 128 CUDA unit GPUs has a power of 0.47TFLOPS; the latest Raspberry Pi 4 has less than 0.1TFLOPS.". So if it is possible to implement this demo on kendryte k210 it would be more efficient than others(in terms of price performance ratio).

    • @Hardwareai
      @Hardwareai  4 года назад

      Well, as I recently pointed out to our sales manager, TFLOPS and TOPS comparisons are meaningless if comparing hardware of different types. First of all K210 KPU doesn't have any TFLOPS xD as it uses quantized models, which are integer numbers and not floating point numbers (FL in TFLOPS stands for Floating Point). So, there clearly has been mix-up here.
      Now, for implementing this demo on Kendryte K210, it is not possible unfortunately - the memory size is the limitation here. Even the smallest .tflite for DeepSpeech is about 47 Mb, which exceeds K210 available memory by a lot(15.5 Mb). Also, DeepSpeech has RNN cells, which are not supported on K210(not for running with CPU, nor KPU). What is possible to do with K210 is to use it for command and intent recognition by training simpler and smaller fully convolutional models. And I think they already have a couple of demos for that :)

  • @cmyk8964
    @cmyk8964 2 года назад

    I’m not in the market for a home assistant, but I was just thinking about whether it would be possible to make a home assistant that recognizes your voice commands without an Internet connection. It seems like the answer is “I don’t see why not”.

    • @Hardwareai
      @Hardwareai  2 года назад

      Yes, it is possible of course. I've been experimenting with mycroft AI and deepspeech server for offline STT - it worked reasonably well half a year ago.

  • @imsteven3044
    @imsteven3044 3 года назад

    Hi! I want to speech to text and then work with the text in Python with NLTK would this module work for me for this? I want to receive long sentences to convert into text, for example, today I studied 2 hours

    • @Hardwareai
      @Hardwareai  3 года назад +1

      If by module you mean DeepSpeech, then yes. Be aware that the pre-trained model they provide only works well with clear American English.

  • @rotem-whizzytherobotcreato1995
    @rotem-whizzytherobotcreato1995 3 года назад

    Can I use it with Intel Neural Compute Stick2? together with the RPI4? NCS2 runs TensorFlow lite

    • @Hardwareai
      @Hardwareai  3 года назад

      good question. from what I remember, DS uses some LSTM layers, which might be or not supported by NCS2. you can try converting the model and then you'll see

  • @theabyss5647
    @theabyss5647 2 года назад

    A few problems. Any insight?
    1. I couldn't get it to work on Jetson Nano. The furthest I managed to get by myself is installing deepspeech but python3.7 can't install scipy.
    2. DeepSpeech is too inaccurate to be practical. It basically sucks. vosk is way more accurate, but there's no vosk for Jetson...

    • @Hardwareai
      @Hardwareai  2 года назад

      1. Haven't done it for a while. Perhaps something have changed and I need to check again.
      2. From what I saw Vosk small models WER (word error rate) is comparable to DeepSpeech models. Cannot testify myself, but it would be interesting to try it in the future.

  • @HarithaWeerathunga
    @HarithaWeerathunga 4 года назад

    How to implement DeepSpeech in an Android ?

    • @Hardwareai
      @Hardwareai  4 года назад

      Don't really know much about Android development, sorry! Not my cup of tea :)

  • @hasibal-ahmed7385
    @hasibal-ahmed7385 2 года назад

    Can we somehow manage this to deploy in Raspberry Pi Zero?

    • @Hardwareai
      @Hardwareai  2 года назад

      Different architecture. Possible, but if no binaries exist you'll have to do a lot of compilation by yourself.

  • @OfficialNetDevil
    @OfficialNetDevil 3 года назад

    I wish more peopl would use something like Twitch and share their desktop so they live stream open source coding projects. That way audiences could offer useful input how to tweak the code

    • @Hardwareai
      @Hardwareai  3 года назад

      I might do a livestream one day xD "hello and welcome to one hour of me looking up errors on Stack Overflow"

  • @jamesvirtudazo
    @jamesvirtudazo 4 года назад

    I'm thinking of having an offline home automation system.

    • @Hardwareai
      @Hardwareai  4 года назад

      A colleague of mine started making Raspberry Pi smart mirror :) she wanted to use DeepSpeech, but though it is too complicated (she's complete Python beginner). Speaking of home automation, I think DeepSpeech would be great choice for integrating it into one of home automation framewoks, like Domoticz. I wonder if anybody's done that already.

  • @ahmedelshireef
    @ahmedelshireef 3 года назад

    Iam trying to install deepspeech but it Could not find a version that satisfies the requirement deepspeech on raspberry pi 4 help please

    • @Hardwareai
      @Hardwareai  3 года назад

      I installed it successfully just 2 days ago. You're using 32-bit image, right?

    • @ahmedelshireef
      @ahmedelshireef 3 года назад

      @@Hardwareai Iam using a kali linux version for raspberry pi 4

    • @Hardwareai
      @Hardwareai  3 года назад

      @@ahmedelshireef It still should work, as long as your use a compatible python version and architecture. Have a look at list of packages available at github.com/mozilla/DeepSpeech/releases/tag/v0.9.3 --- if there is no version compatible with your Python + architecture there, then you'll need to compile from source. Or use different arch/Python version

    • @ahmedelshireef
      @ahmedelshireef 3 года назад

      @@Hardwareai I tried to build and use every wheel and package nothing worked on kali linux raspberry pi version , i got another sd and setup the raspberry os 64 bit and it worked! But I cant understand why it works on an operating system and not another both are debian and raspberry pi 4 64 bit os

  • @StefanReich
    @StefanReich 4 года назад

    Hmm. I have DeepSpeech running on Raspberry Pi 3 (1 GB), but it's less than real-time (about speed .3 I think)

    • @Hardwareai
      @Hardwareai  4 года назад

      yes, correct. I did try it on R Pi 3, but didn't publish the results.

    • @StefanReich
      @StefanReich 4 года назад

      ​@@Hardwareai It's a bit surprising that the Pi 4 is that much faster. Pi 4 is only ~35% faster than Pi 3. I didn't use the live version though, just ran DeepSpeech on a .wav.

  • @HalfRobot42
    @HalfRobot42 4 года назад

    Great video, but since deepspeech has updated now the model files have changed slightly
    This also means that the hotword code won't work without editing
    (Update)
    I was able to get the deepspeech model to load on your script, but some alsa errors are occurring. I'm using a raspberry pi 4 2gb. Any help would be appreciated.

    • @Hardwareai
      @Hardwareai  4 года назад +1

      Sure! Can you describe your issue at github.com/AIWintermuteAI/DeepSpeech_RaspberryPi4_Hotword issue tracker?

    • @HalfRobot42
      @HalfRobot42 4 года назад

      @@HardwareaiSubmitted. Thank you for the fast response

    • @user-oo5gj1jd9c
      @user-oo5gj1jd9c 4 года назад

      @@Hardwareai I join the question. I am not good at writing scripts and if you correct it for deepspeech 0.7.0 it will be great.

    • @Hardwareai
      @Hardwareai  4 года назад +1

      The article and code has been updated to 0.7.*! Cheers!

    • @HalfRobot42
      @HalfRobot42 4 года назад

      @@Hardwareai much appreciated

  • @claudioguendelman
    @claudioguendelman 3 года назад

    Python 3 Artificial Intelligence: Offline STT and TTS
    Not found ---

    • @Hardwareai
      @Hardwareai  3 года назад

      There's clearly a lot of TTS (not every is good quality), for STT... well, DeepSpeech does work, especially if you're willing to fine-tune it for your specific domain.

  • @yihanghe1167
    @yihanghe1167 4 года назад

    nice shirt

  • @stopPlannedObsolescence
    @stopPlannedObsolescence 8 месяцев назад

    Man you need a camera stabilizer

    • @Hardwareai
      @Hardwareai  7 месяцев назад

      I have changed 2-3 setups since then hahah. But why? Was the image shaking?

  • @chrisw1462
    @chrisw1462 2 года назад

    All of that work... and no demo??

    • @Hardwareai
      @Hardwareai  2 года назад

      4:54 ;) since then I started placing demo in the first part of the video

  • @wobble_cat
    @wobble_cat 3 года назад

    Шевцов?

    • @Hardwareai
      @Hardwareai  3 года назад

      Кто?)

    • @wobble_cat
      @wobble_cat 3 года назад

      @@Hardwareai itpedia))

    • @Hardwareai
      @Hardwareai  3 года назад

      >>>Претендует на звание «самой скандальной личности в интернетах» в период второй половины десятых годов.
      Да ну, куда мне) Я просто тихо спокойно разьясняю людям как строить роботов и тренировать нейросетки.

    • @wobble_cat
      @wobble_cat 3 года назад

      @@Hardwareai не тем мы занимаемся, брат)

    • @Hardwareai
      @Hardwareai  3 года назад

      Я всем доволен)