Build a Deep Learning Model that can LIP READ using Python and Tensorflow | Full Tutorial
HTML-код
- Опубликовано: 1 фев 2023
- Building a machine learning model that's able to perform lip reading!
Get notified of the free Python course on the home page at www.coursesfromnick.com
Sign up for the Full Stack course here and use RUclips50 to get 50% off:
www.coursesfromnick.com/bundl...
Hopefully you enjoyed this video.
💼 Find AWESOME ML Jobs: www.jobsfromnick.com
Get the code: github.com/nicknochnack/LipNet
Links: www.tensorflow.org/api_docs/p...
Original Paper: arxiv.org/abs/1611.01599
Associated Code for Paper: github.com/rizkiarm/LipNet
ASR Tutorial: keras.io/examples/audio/ctc_a...
Oh, and don't forget to connect with me!
LinkedIn: bit.ly/324Epgo
Facebook: bit.ly/3mB1sZD
GitHub: bit.ly/3mDJllD
Patreon: bit.ly/2OCn3UW
Join the Discussion on Discord: bit.ly/3dQiZsV
Happy coding!
Nick - Наука
Brilliant step by step tutorial! You earned a subscriber and a fan. Looking forward to more.
Your knowledge and delivery is unmatched, subscribed after one video!
I just discovered your channel and 1 video made me fall in love with your style of sharing! Simple and easy to understand! One project that I think will be interesting is automatic assessment with deep learning esp in computer vision field! (as i saw many people are doing this these days hehe)
Keep up the good work & cant wait for a new video!
Yes we want next part. It would be a perfect portfolio project and to learn alot of stuff.
Thank you
It was awesome deep learning learning experience. Its a fantastic tutorial and the results are TP. its a ...BOOOM.....BOOOM...BOOOM situation after completing this tutorial. Thank you so much Nicks.
This is amazing!
Can it work in real-time? Not only on existing videos?
Amazing! Nick's videos are phenomenal.
Dataprooooo!! Glad you liked it!!
This is really cool 😎😎😎....Would love to watch tutorials like these in future. Keep up the good work
Thank you, Nich! Building the App will be very helpful.
You are the best! thanks Nic for these super quality videos!
Thanks so much for checking it out Siraj!!
Awesome, your way of explaining is incredible
Ohhh thanks @Vinay, means a ton!!
duuuuuuuude i've been looking for such a channel like this since forever your amazing thanks for sharing all your awesome project for free
This project is f***ing awesome.Can you make a video on finding accuracy for LipNet
Hello Nick. Thanks for always uploading such insightful tutorials. If my model training starts with loss=inf (i followed your tutorial throughout), will it get better or their is something else wrong with my code/model? please guide
Great effort man.
Hey Nick, great video as always. I was wondering why the input is normalized, but the tokens representing each character in the output are not.
Pioneering in data science and machine learning! love the video
🙏 🙏
most awaited video ....thanks mate !!!
🎉 anytime @Chaitanya!!
Absolutely awesome, mate!
Cheers Chris 🙏
Brooo, you made my day! Thank you so much!
I was wondering where you went man!! Welcome back! :D
Ayyyy! Just took a few months off at the end of last year to get the course done. Back at it at full steam now!!
@@NicholasRenottehey Nick my epoch execution takes hours eventhough I have rtx 3050 what to do
Will this work in real time on any talking video or realtime@@NicholasRenotte
Awesome 🤯🔥 thanks alot nick
🙏 thanks a mil @Olowu!!
Do words need to appear in the training to ever be predicted, or can they be built up of letter predictions? The dataset would be more useful if the number was a numeral, rather than spelled out, since that part of the alphabet never gets used. I would love to see this built out, maybe even have a "Lip Reading Keyboard" to type with your camera rather than a mouse or microphone, as it would have many accessibility applications, especially in the workplace.
Great channel, I have been watching many of your previous shows, and was so happy to see a new episode!
Heya Michael, nope it doesn't. In fact this IS a character based model so it could build up sentences of words it's never seen before. I need to get more data though to extend it out!!
That was fantastic....fabulous brooo.......u are a real geniusthe god father of Machine Learning, deep learning
Next Tutorial please Nicholas !!!
Loved your content...
Subscribing for greater cause....
This is why I am a subscriber, for amazing content like this.
YESSS, and I am soooo glad to have you here Jimmy!! More intense projects planned this year!
Hi @NicholasRenotte !...Beginner In ML here, I have a doubt
in the last dense layer: model.add(Dense(char_to_num.vocabulary_size()+1, kernel_initializer='he_normal', activation='softmax'))
we doing : char_to_num.vocabulary_size()+1, why are we adding '+1'...... char_to_num.vocabulary_size() includes the oov token implicitly
That looks neat!
Thanks a mil @Enes!!
amazing tutorial , helped me a lot
quick question though, how can I use f1_score metric or accuracy metric for example with this network ??
What kind of data collecting are we talking about if we want to generalise to like, the 1000 most used English words? Could we build the model so that it can recognise phonemes from the preprocessed videos? Then the model associates the most likely phoneme separation to convert to the output string.
Thank you so much!! This is absolutely phenomenal 🔥🔥👏🏾
JABULANI!! Thanks so much for checking it out!!!
Damn ,That crazy !
Thank Nick for an other fantastic video ! Ur the best 🙏🏻🙏🏻
Ohhhh mannn, thanks a mil Kevyn!
we gave you a follow since you are doing good things for the whole
This f***ing awesome!!! 🔥🔥🔥
🙏
@@NicholasRenotte bro add time stamps to the video
hey Nick, this was a very helpful video. Just a small suggestion, if you create a requirements.txt file listing all the version of the packages you used it will be very helpful, even years to come we can install the specific version that yu have used otherwise it keeps creating dependencies issues.😃
did you face mimsave() issue? if yes how did you solved it?
by squeezing (reducing the dimensionality) and converting the data type of the original nested list passed into the function, it does solve the issue.@@akshithchowdary6410
Nick you are legend man.
Thanks a mil @Ahmad!! Appreciate it man!!
Thank you so much for this.
Anytime @princewill!!
Why did you choose fps=10 specifically for imageio statement? How to determine what fps value we have to put in?
Hey Nick! How were you able to find out the exact co ordinates of the lip region in the frame to extract them? Thanks for helping.
I think just trial and error, he does it manually, but in real world, we need to make sure the model can also find and track the lips movements.
that's one of the awesome tutorials
Ayyy thanks so much!!!
This project is making me go insane, and iam very much happy too by its results again, its facinating, its even hard for me to undertand all those words that guy says in the videos. And whats crazy is it learns from the lips movements. Its just crazy, but i think if we want to make it be able to work with any video, to first be able to detect the mouth in the video we give it then be able to read the lips movement. ❤ Project and crazy ideas like this makes me fall in love with neural network more and more, thank you for this wonderful projext, have a nice day nicknacknock
Real cool calculating
Dude! Amazing!!!
We really need the app! 🤯
That's awesome! Do you think it would be possible for it to, instead of sentences, I train it to detect individual phonemes? That should actually work even better, since there are only 44 phonemes in the English language.
You're the most enjoyable software developer youtuber to watch. Not only you are doing amazing job, but you also share it with community. Because of guys like you I'm gaining back my trust to humanity. Thanks a lot!
Ohhh thanks so much @alx84!! Means a ton 🙏
This is so awesome
How cool is it right?!? Thanks for checking it out @kubakakauko!!
Thank you so much!!
Bro, awesome💥🔥
really liked the video, thanks for the tutorial, make a streamlit app too.
Thank you so so much 👊🏾
Anytime my guy 👊🏾
Thanks for the content, I'm learning a lot from your videos sir... but I need help with translating the models into Tflite, with labels.
Thanks for the help you are giving with the videos Nick
anyway I'm studying to make Lip Reading work in real time, a bit like seen for sign recognition
For now I will try to retrain the model with the other samples from the grid dataset
and in real time i tried to build the prediction on the lips every 75 frames but it still doesn't work very well
the predictions obtained by me are currently unusable
I will continue to study waiting for more of your videos..
Let's see what will happen
Did u manage to built the real time lip reading?
@@harshiramani7274 I created a model to make it work in real time, but it's limited since I used a custom dataset generated by me. It would take (in my opinion) more train cycles and a dataset with people with different characteristics (whites, blacks, Asians, children, elderly, etc.) saying every word in the dictionary of every language. Perhaps even a second model to concatenate the predicted words with the first model, and predict a complete sentence. I stopped working on it at the moment. I remember generating for each word in my dataset small videos where I always pronounce the same word from different angles etc., capturing 30frames from these videos.
@@giuseppedimaria6253 Yeah I think that is fair I am thinking of trying to make this model for my local language any idea how I should proceed for the same?
I have a small question, what is the differences between the DNN model you use above and RNN ? I'm doing a college project, hope you can answer me soon, thank you very much.
Could u make a tutorial on how to use TPUs from google colab to run reinforcment learning for example on solving like cartpole or some other envieemnet more complicated i guess. Thanks a lot!! Keep up the excelent work !! i share all ur vids in my team group in university! = )
Hey Nick, can you do a video on implementing the model with custom dataset?? Please
Yes please Nick
how are you not at a million subscribers yet🤯
We want that tutorial with custom dataset. Please please please sir......
Also, great tutorial again. You are a great teacher.
You got it!!
I tried to generate a dataset by saving 'tot' number of video frames in which I say certain words 'hello' for example, isolating only the lips though
I appropriately divided them into different directories, renaming them with the words associated with those frames and used a 'flow_from_directory' function which allows the folder name to be associated with this data as a label.
@@giuseppedimaria6253 can you share the code
@@giuseppedimaria6253 Hey Can you share code
Hi. This is great.
How do i load a custom video and try to perform lipreading on that?
THANKS !!!
Anytime!!!
This is amazing!! Would it be possible to do live predictions using a camera?
That's the gameplan! I'm still mapping out how to do it most efficiently...but definitely!
@@NicholasRenotte pls do buddy, this is really amazing!
Any update on real time camera effort?
hey nick really need real time lipreading
@@NicholasRenotte
@@NicholasRenotte Were you able to map this out?
How to get the dataset?
Hi Nick! I would love to purchase the tutorial but the link is broken and I can't seem to find it on your site?
Hey Nick can you tell me. how they video can be annotated?Which tool are they using?
Hi Nick , I have just started learning about machine learning and dp. What mathematical skills are essential for me to have a solid understanding in this field?
Top three are: linear algebra, probability and calculus!
@@NicholasRenotte Thanks for the advice! I've completed linear algebra needed for deep learning and am now learning probability. Your help is much appreciated!
Hi. can you please (if possible) introduce any natured-inspired optimizer in this code? like GWO
so that we can use less epochs and achieve good accuracy.
Bro i am trying to make my own model but when I try to load the dataset it shows me permission denied error. What should I do?
Superb..
🙏🙏🙏
Yassssss we back!
Daymmmm straight!!! Thanks a mil for checking it out Satoshi!!
Helo sir may I known which dataset is used in this project...
another bomb from you 🔥
Hello Nicholas, first of all thank u so much for ur work. i have discovered deep learning for myself thanks to u. already built and trained some models and im totally into it :D Only sad thing is that i do training on an kinda old cpu of my laptop cuz i have no gpu with cuda compatibility :(
So i decided to upgrade my pc a bit and i rly have no clue what gpu i should choose. Im currently wavering between rtx 3080 and rtx 4070 ti (although im not sure if the 4070 ti got the cuda compatibility, couldnt find this model on the nvidia cuda website). Do you have any suggestions on gpu's for machine learning? Id be very thankful about an advice! :) greetings from germany and pls keep doing with ur videos, ure a blessing for the deep learning community!
Eyo, 4070ti should definitely be cuda compatible. I’d go with that if you can! Although either is a good choice, I’ve got a 3080 ti waiting to go into my machine right now. So it’ll be pretty closely matched for the tutorials I make here if you want to follow along!!
@@NicholasRenotte thank u sir. then i'll go for the 4070 ti :) and i'll definitely follow along :D
Hi @Nicholas,
Is LipNet free to use?
data.as_numpy_iterator error:
For the people who are using colab,
I used tensorflow version 2.10.1
In load_data(), use file_name = path.split('/')[-1].split('.')[0]
For mappable_function, use this code:
from typing import Tuple
def mappable_function(path:str) ->Tuple[tf.Tensor, tf.Tensor]:
result = tf.py_function(load_data, [path], (tf.float32, tf.int64))
return result
(The tf.py_function call returns a tuple of two tensors (tf.float32 and tf.int64), not a list of strings.)
And did you get and error in mimsave() ??
My epoch execution takes hours even though I have rtx 3050 what to do
@@akshithchowdary6410 hey bro i got the same error did you solve it ?
Thanks! this worked.
Thank you so much, It worked for me 😃
thank you sooooooooooooooo much🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥
can you create video about the DNA of a person's voice?
Thankyou Nic. i want to work on this project , can you please guide me how can i use my native language in dataset for lipreading? please guide me im newbie please
Can you please do a tutorial on training on our own dataset/ videos..
At 43:00 what if I want to test this on a new custom video, that is of not that shape.. (75,46,140,1)
I am getting a shape mismatch error, as my frame shape is 99,46,140,1
I tried changing the temporal dim i.e 99 with 75 like yours but it didn't work
Pleasssssseee help
what is the accuracy of your this model sir please reply someone please reply
Thx🔥
Glad you liked it @mustimply!!
Can you make short video covering jupyter notebooks tricks and extensions to easily use notebooks like execution speed time and other etc
Done, added to my list for this week!
@@NicholasRenotte sorry for not mentioning but i was looking forward for lip read video it is fabulous thanks a lot for you efforts ❤️
Ayyy, all good man!! Love that you enjoyed it!
Hi NIck. can you make videos on MLOPS (series for begineers or as per your convience).?
Done, will see what I can do this week!!
Imagine that in all RUclips this is the only video detailed at how to make this project lipnet lip reading. Just shows how this topics are not easy to found
Hey nick i am getting an error if InvalidArgumentError: ((function node wrapped Transpose device_/job:localhost/replica:0/task:0/device:CPU:0)) transpose expects a vector of size 5. But input (1) is a vector of size 3 [Op: Transpose]
This is insane 😶🌫️🔥🔥. Can we get a tutorial for streamlit app of the same? Please.
You got it @Tejas!!
Hey, could you please do a video of this model working on a custom dataset?
@Nicholas Rennotte. Can you please do a full tutorial on YAMNet
It would be great if you could create a tutorial on creating a custom data set.
i wanted to know t how can i make my own training data and also how predict from real time output?
Hey! I got some doubts in the project. Adding to that, the code you provided also has errors in it. Would you please share updated code?
Hey! Superb Work. I have been working on this model following your tutorial but the epochs are taking forever to load. I just completed my first epoch after 9-10 hours. Moreover the colab gets timed out and it starts all over again🙁 is there anything i can do?
Did it complete I'm currently doing it's still running from past 13hours
@@gsettu7255 Hey did you solved that epoch issue
@sadiaafreen1519 Hey did you solved that epoch issue
Your work on this is awesome..and the explanation is also great🙌..I have found on one explaining every step in this way..But, one thing In extracting the Lip region using static values..it doesn't work for other input videos right?? Also the total number of frames from the input video will be getting error which is not matching with the convolutional layer's input_shape
What to do for the error which is getting because of convolutional layer's input shape is not matching with the given input video's shape..can you provide the solution for that??
@@adithya7064 did you find any solution bro, I think I am facing same issue
Nic! Can I use this model to predict a video without alignments?
Yep! That's the beauty of using the CTC loss function!
Hey nick can you make a tutorial on how methods are used
Like applications for this kind of tech?
hey nick where can i download the data set for this? can some one let me know?
Hey Nicholas can I try this pipeline to my custom dataset?
Sure can, with custom videos it should be fine. You might just need to tweak the load_alignments function to handle how your text is structured!!
What does that numbers in alignments represent ?
hey nick, can you please explain the input size as [75,46,140,1], as per my understanding, its 75 frames per video, 1 is for grayscale. How about 46 and 140. i am not able to understand the shape of data. kindly help.
46x140 is the heightxwidth of each frame (focussing on the lip region)
I guess lip reading is a very difficult task because by simply looking at a person lips is very difficult to 100% know what they're saying. Some words are easy to pick while some others can be easily confused with other words...
I guess that the model may work well for vowels because the mouth position changes a lot, but when it comes to consonants, it may be more tricky. One good idea would be to predict the next word based on the previous one. For example, if the model predicts "the", the next word is very likely to be a noun. I guess this would help discern what word is next.
I am abhi from india your video are me alot, but I am facing problem in load_data(tf.convert_to_tensor(test_path)) what should i do. please solve