Offline Speech Recognition on Raspberry Pi 4/Nvidia Jetson Nano with Respeaker
HTML-код
- Опубликовано: 28 янв 2020
- Faster than real-time! Based on Mozilla's DeepSpeech Engine 0.6.1.
www.hackster.io/dmitrywat/off...
In this video we’re going to run and benchmark Mozilla’s DeepSpeech ASR (automatic speech recognition) engine on different platforms, such as Raspberry Pi 4(1 GB), Nvidia Jetson Nano, Windows PC and Linux PC.
The hardware for this video was kindly provided by Seeed studio, check out the ReSpeaker USB Mic Array and other hardware at Seeed studio store!
www.seeedstudio.com/ReSpeaker...
www.seeedstudio.com/Raspberry...
Blog articles about DeepSpeech engine:
hacks.mozilla.org/2019/12/dee...
hacks.mozilla.org/2017/11/a-j...
Mozilla's DeepSpeech Github
github.com/mozilla/DeepSpeech
Add me on LinkedIn if you have any questions:
/ dmitry-maslov-ai
Credits for music:
• Portal 2 - Love as a C...
Portal 2 - Love as a Construct (Alex Giudici Remix) V2 - Наука
For anybody trying to find a offline speech recognition for wakeword detection on the raspberry pi. Try using Vosk, it’s a new speech recognition API. Its not great at detecting a lot of words at once but it’s good for detecting one or two words accurately which makes it great for wake-word detection. I used it for my python script for my Virtual Assistant for my smart mirror. In my case I used vosk for wakeword and after the wakeword was detected I started using google speech recognition to the cloud which is much more accurate and after it finishes it goes back to vosk. Hope this helps someone, I know can be hard finding packages that work but when you do it’s ecstasy. Good Luck
Great suggestion! Actually, for hot word detection it is entirely feasible to train the model yourself, for making another project I used a modified version of the script in this article blog.aspiresys.pl/technology/building-jarvis-nlp-hot-word-detection/
Hi jordanie, do you have a repository of that part? i mean the code of Vosk, I only need to detect the words to send the data to the clooud the same as you
You just won a lot of respect, a subscription, and a bell for all notifications. Thank you for the content. Please keep up the good work
Very glad to hear that! Out of curiosity, how do you plan to use DeepSpeech? :)
Wow. Thanks for sharing that. I couldn't even dream of offline recognition on weak SBCs, but looks like software has finally catchup
It does! By the way, I have since made a fresh video on topic: ruclips.net/video/75H12lYz0Lo/видео.html
Absolutely fantastic and crisp explanation. Good work!
Thank you kindly!
Great video about ReSpeaker and offline speech recognition! Already booked one for my robotic projects.
Cool, thanks! If you're into robotics and want to leverage DeepSpeech for parsing commands to robot, it might be interesting to use DeepSpeech in connection with slot filling algorithm, similar to this one github.com/IsaacAhouma/Slot-Filling-Understanding-Using-RNNs it would help parsing raw natural language into robot commands
@@Hardwareai Ok, thanks for link!
Wonderful video on testing the Mozilla DeepSpeech and have tried it both on Jetson Nano and Raspberry 4 with other voice hats including Respeaker older models ; great learning and will explore; what other equivalent explorations can be possible !
Thank you for your kind words!
the only useful method for jetson nano at 3am, thank you so much !!!!!!!!!!!!!!!!!
You're welcome!
Good video. They have a program for other languages that need a lot of people to contribute. Please do this to make your language recognizeable
Yes, the Common Voice! Thank for mentioning that voice.mozilla.org/en
Thanks for the video and it's very useful. I wold like to thank you again for your extra effort to introduce the ReSpeaker mic array DOA. Do you have any suggestion for Indoor mapping using Lidar or any other technology. Thanks.
Thanks for the kind words! I assume these two are unrelated question, right? I mean ReSpeaker and indoor mapping.
Lidar is stable and tried-and-true solution for mapping. It is a bit bulky and still expensive, even the cheapest ones. You can use stereo/monocular cameras for SLAM - there are plenty of packages available for ROS for example. I introduce RTAB-MAP based SLAM with Kinect and Raspberry Pi here for example:
ruclips.net/video/c5punaP01kU/видео.html
Kinect is a bit outdated though, you could go for RealSense cameras, which also have good ROS support. Anyways, in the end it all comes down to hardware footprint/price/the amount of sotware development you're willing to undertake.
@@Hardwareai Two ReSpeakers could be used to triangulate the position of the speaker - at least in 2D.
Thank you for the interesting video.
I tried to install Deepspeech on the Jetson Nano. Unfortunately, the link to the preview wheel with.tflite model from your description is no longer valid. Is there another way to install Deepspeech on the Jetson Nano?
Many thanks in advance
Yes, it was only temporary build. I already updated the article with permanent link, here it is github.com/mozilla/DeepSpeech/releases/download/v0.7.0-alpha.1/deepspeech-0.7.0a1-cp37-cp37m-linux_aarch64.whl
Hey great video .. thankyou for putting in the effort ...can u tell which one is better in terms of performance and accuracy Kaldi or Deep Speech if i had to run on a Pi4 ?
My pleasure! Hm... From what I understand Kaldi is very different from Deep Speech, in the way that Kaldi is a toolkit for speech recognition and not really a ready to deploy solution. You will certainly understand that if you look at their at documentation page. It is a bit ... overwhelming for beginner :) Now DeepSpeech comes in ready-to-use package and is easy to set up and use - and that is where it's main advantage lies.
Awesome
Agreed!
Did you try benchmarking with TensorFlow 128 CUDA core accelaration, or only TFLite on a single ARM core?
As stated in the article, Deepspeech with GPU acceleration is not available for arm64 architecture. So, the results are for .tflite on single ARM core. Read more information on that in this thread discourse.mozilla.org/t/video-and-benchmarking-results/52946/16 :)
@@Hardwareai you can build a GPU version on the Nano, but it uses the .pbmm file, and that's pretty slow even with the GPU: around 3.5 sec ("experience is proof" audio), without GPU 17sec.
I have more trouble to build the TFlite version, got that native app working (forcing it) but not the python wheel.
BTW on the Xavier the tflite is as fast as the 1070 GPU, and even the .pbmm model is faster than real time.
Hi Dmitry, Is it possible to run this demo on kendryte k210 chip ? From the Seeedstudio blog; "According to Jia Nan’s official description, the K210’s KPU has a power of 0.8TFLOPS. For comparison, the NVIDIA Jetson Nano with 128 CUDA unit GPUs has a power of 0.47TFLOPS; the latest Raspberry Pi 4 has less than 0.1TFLOPS.". So if it is possible to implement this demo on kendryte k210 it would be more efficient than others(in terms of price performance ratio).
Well, as I recently pointed out to our sales manager, TFLOPS and TOPS comparisons are meaningless if comparing hardware of different types. First of all K210 KPU doesn't have any TFLOPS xD as it uses quantized models, which are integer numbers and not floating point numbers (FL in TFLOPS stands for Floating Point). So, there clearly has been mix-up here.
Now, for implementing this demo on Kendryte K210, it is not possible unfortunately - the memory size is the limitation here. Even the smallest .tflite for DeepSpeech is about 47 Mb, which exceeds K210 available memory by a lot(15.5 Mb). Also, DeepSpeech has RNN cells, which are not supported on K210(not for running with CPU, nor KPU). What is possible to do with K210 is to use it for command and intent recognition by training simpler and smaller fully convolutional models. And I think they already have a couple of demos for that :)
I’m not in the market for a home assistant, but I was just thinking about whether it would be possible to make a home assistant that recognizes your voice commands without an Internet connection. It seems like the answer is “I don’t see why not”.
Yes, it is possible of course. I've been experimenting with mycroft AI and deepspeech server for offline STT - it worked reasonably well half a year ago.
Hi! I want to speech to text and then work with the text in Python with NLTK would this module work for me for this? I want to receive long sentences to convert into text, for example, today I studied 2 hours
If by module you mean DeepSpeech, then yes. Be aware that the pre-trained model they provide only works well with clear American English.
Can I use it with Intel Neural Compute Stick2? together with the RPI4? NCS2 runs TensorFlow lite
good question. from what I remember, DS uses some LSTM layers, which might be or not supported by NCS2. you can try converting the model and then you'll see
A few problems. Any insight?
1. I couldn't get it to work on Jetson Nano. The furthest I managed to get by myself is installing deepspeech but python3.7 can't install scipy.
2. DeepSpeech is too inaccurate to be practical. It basically sucks. vosk is way more accurate, but there's no vosk for Jetson...
1. Haven't done it for a while. Perhaps something have changed and I need to check again.
2. From what I saw Vosk small models WER (word error rate) is comparable to DeepSpeech models. Cannot testify myself, but it would be interesting to try it in the future.
How to implement DeepSpeech in an Android ?
Don't really know much about Android development, sorry! Not my cup of tea :)
Can we somehow manage this to deploy in Raspberry Pi Zero?
Different architecture. Possible, but if no binaries exist you'll have to do a lot of compilation by yourself.
I wish more peopl would use something like Twitch and share their desktop so they live stream open source coding projects. That way audiences could offer useful input how to tweak the code
I might do a livestream one day xD "hello and welcome to one hour of me looking up errors on Stack Overflow"
I'm thinking of having an offline home automation system.
A colleague of mine started making Raspberry Pi smart mirror :) she wanted to use DeepSpeech, but though it is too complicated (she's complete Python beginner). Speaking of home automation, I think DeepSpeech would be great choice for integrating it into one of home automation framewoks, like Domoticz. I wonder if anybody's done that already.
Iam trying to install deepspeech but it Could not find a version that satisfies the requirement deepspeech on raspberry pi 4 help please
I installed it successfully just 2 days ago. You're using 32-bit image, right?
@@Hardwareai Iam using a kali linux version for raspberry pi 4
@@ahmedelshireef It still should work, as long as your use a compatible python version and architecture. Have a look at list of packages available at github.com/mozilla/DeepSpeech/releases/tag/v0.9.3 --- if there is no version compatible with your Python + architecture there, then you'll need to compile from source. Or use different arch/Python version
@@Hardwareai I tried to build and use every wheel and package nothing worked on kali linux raspberry pi version , i got another sd and setup the raspberry os 64 bit and it worked! But I cant understand why it works on an operating system and not another both are debian and raspberry pi 4 64 bit os
Hmm. I have DeepSpeech running on Raspberry Pi 3 (1 GB), but it's less than real-time (about speed .3 I think)
yes, correct. I did try it on R Pi 3, but didn't publish the results.
@@Hardwareai It's a bit surprising that the Pi 4 is that much faster. Pi 4 is only ~35% faster than Pi 3. I didn't use the live version though, just ran DeepSpeech on a .wav.
Great video, but since deepspeech has updated now the model files have changed slightly
This also means that the hotword code won't work without editing
(Update)
I was able to get the deepspeech model to load on your script, but some alsa errors are occurring. I'm using a raspberry pi 4 2gb. Any help would be appreciated.
Sure! Can you describe your issue at github.com/AIWintermuteAI/DeepSpeech_RaspberryPi4_Hotword issue tracker?
@@HardwareaiSubmitted. Thank you for the fast response
@@Hardwareai I join the question. I am not good at writing scripts and if you correct it for deepspeech 0.7.0 it will be great.
The article and code has been updated to 0.7.*! Cheers!
@@Hardwareai much appreciated
Python 3 Artificial Intelligence: Offline STT and TTS
Not found ---
There's clearly a lot of TTS (not every is good quality), for STT... well, DeepSpeech does work, especially if you're willing to fine-tune it for your specific domain.
nice shirt
Thanks!
Man you need a camera stabilizer
I have changed 2-3 setups since then hahah. But why? Was the image shaking?
All of that work... and no demo??
4:54 ;) since then I started placing demo in the first part of the video
Шевцов?
Кто?)
@@Hardwareai itpedia))
>>>Претендует на звание «самой скандальной личности в интернетах» в период второй половины десятых годов.
Да ну, куда мне) Я просто тихо спокойно разьясняю людям как строить роботов и тренировать нейросетки.
@@Hardwareai не тем мы занимаемся, брат)
Я всем доволен)