Build a Deep Learning Model that can LIP READ using Python and Tensorflow | Full Tutorial

Nicholas Renotte

Просмотров 91 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 фев 2023
Building a machine learning model that's able to perform lip reading!
Get notified of the free Python course on the home page at www.coursesfromnick.com
Sign up for the Full Stack course here and use RUclips50 to get 50% off:
www.coursesfromnick.com/bundl...
Hopefully you enjoyed this video.
💼 Find AWESOME ML Jobs: www.jobsfromnick.com
Get the code: github.com/nicknochnack/LipNet
Links: www.tensorflow.org/api_docs/p...
Original Paper: arxiv.org/abs/1611.01599
Associated Code for Paper: github.com/rizkiarm/LipNet
ASR Tutorial: keras.io/examples/audio/ctc_a...
Oh, and don't forget to connect with me!
LinkedIn: bit.ly/324Epgo
Facebook: bit.ly/3mB1sZD
GitHub: bit.ly/3mDJllD
Patreon: bit.ly/2OCn3UW
Join the Discussion on Discord: bit.ly/3dQiZsV
Happy coding!
Nick
Наука

Комментарии • 435

@amartyadebdas2881 Год назад ⁺⁵
Brilliant step by step tutorial! You earned a subscriber and a fan. Looking forward to more.
@ArisFilms Год назад ⁺²
Your knowledge and delivery is unmatched, subscribed after one video!
@whoonie4685 Год назад ⁺¹²
I just discovered your channel and 1 video made me fall in love with your style of sharing! Simple and easy to understand! One project that I think will be interesting is automatic assessment with deep learning esp in computer vision field! (as i saw many people are doing this these days hehe)
Keep up the good work & cant wait for a new video!
@JinxUS00 Год назад ⁺²
Yes we want next part. It would be a perfect portfolio project and to learn alot of stuff.
Thank you
@ronaktawde 7 месяцев назад
It was awesome deep learning learning experience. Its a fantastic tutorial and the results are TP. its a ...BOOOM.....BOOOM...BOOOM situation after completing this tutorial. Thank you so much Nicks.
@user-es8li7pw2u Год назад ⁺²⁰
This is amazing!
Can it work in real-time? Not only on existing videos?
@datapro007 Год назад ⁺⁴
Amazing! Nick's videos are phenomenal.
@NicholasRenotte Год назад ⁺²
Dataprooooo!! Glad you liked it!!
@curious_haldar 9 месяцев назад ⁺¹
This is really cool 😎😎😎....Would love to watch tutorials like these in future. Keep up the good work
@nazarzaki44 4 месяца назад
Thank you, Nich! Building the App will be very helpful.
@Seriosso Год назад ⁺²
You are the best! thanks Nic for these super quality videos!
@NicholasRenotte Год назад ⁺¹
Thanks so much for checking it out Siraj!!
@vinayshukla3800 Год назад ⁺¹
Awesome, your way of explaining is incredible
@NicholasRenotte Год назад
Ohhh thanks @Vinay, means a ton!!
@chakerayachi8468 3 месяца назад
duuuuuuuude i've been looking for such a channel like this since forever your amazing thanks for sharing all your awesome project for free
@user-xz7pr6rf2x Год назад ⁺¹
This project is f***ing awesome.Can you make a video on finding accuracy for LipNet
@AreebaAmin52 Год назад ⁺⁵
Hello Nick. Thanks for always uploading such insightful tutorials. If my model training starts with loss=inf (i followed your tutorial throughout), will it get better or their is something else wrong with my code/model? please guide
@maheshwarikuntal1493 Год назад ⁺¹
Great effort man.
@bvdlio Год назад ⁺²
Hey Nick, great video as always. I was wondering why the input is normalized, but the tokens representing each character in the output are not.
@jerbky Год назад ⁺¹
Pioneering in data science and machine learning! love the video
@NicholasRenotte Год назад
🙏 🙏
@chaitanyasaibeesabathuni928 Год назад ⁺¹
most awaited video ....thanks mate !!!
@NicholasRenotte Год назад ⁺¹
🎉 anytime @Chaitanya!!
@c016smith52 Год назад ⁺¹
Absolutely awesome, mate!
@NicholasRenotte Год назад
Cheers Chris 🙏
@solomon_leo_27 Год назад
Brooo, you made my day! Thank you so much!
@gdavisiv Год назад ⁺²
I was wondering where you went man!! Welcome back! :D
@NicholasRenotte Год назад
Ayyyy! Just took a few months off at the end of last year to get the course done. Back at it at full steam now!!
@ghostplays5291 9 месяцев назад
@@NicholasRenottehey Nick my epoch execution takes hours eventhough I have rtx 3050 what to do
@CreatingUtopia 4 месяца назад
Will this work in real time on any talking video or realtime@@NicholasRenotte
@olu_the_ai_guy Год назад ⁺²
Awesome 🤯🔥 thanks alot nick
@NicholasRenotte Год назад
🙏 thanks a mil @Olowu!!
@HideyHoleOrg Год назад ⁺⁷
Do words need to appear in the training to ever be predicted, or can they be built up of letter predictions? The dataset would be more useful if the number was a numeral, rather than spelled out, since that part of the alphabet never gets used. I would love to see this built out, maybe even have a "Lip Reading Keyboard" to type with your camera rather than a mouse or microphone, as it would have many accessibility applications, especially in the workplace.
Great channel, I have been watching many of your previous shows, and was so happy to see a new episode!
@NicholasRenotte Год назад ⁺³
Heya Michael, nope it doesn't. In fact this IS a character based model so it could build up sentences of words it's never seen before. I need to get more data though to extend it out!!
@ditsyc6939 Месяц назад
That was fantastic....fabulous brooo.......u are a real geniusthe god father of Machine Learning, deep learning
@Bhaveshyoutube Год назад
Next Tutorial please Nicholas !!!
Loved your content...
Subscribing for greater cause....
@lifeofcode Год назад ⁺¹
This is why I am a subscriber, for amazing content like this.
@NicholasRenotte Год назад ⁺¹
YESSS, and I am soooo glad to have you here Jimmy!! More intense projects planned this year!
@TanmayPrakash-gt5ll 4 дня назад
Hi @NicholasRenotte !...Beginner In ML here, I have a doubt
in the last dense layer: model.add(Dense(char_to_num.vocabulary_size()+1, kernel_initializer='he_normal', activation='softmax'))
we doing : char_to_num.vocabulary_size()+1, why are we adding '+1'...... char_to_num.vocabulary_size() includes the oov token implicitly
@morgomi Год назад ⁺¹
That looks neat!
@NicholasRenotte Год назад ⁺¹
Thanks a mil @Enes!!
@yasmeennour47 Год назад ⁺¹
amazing tutorial , helped me a lot
quick question though, how can I use f1_score metric or accuracy metric for example with this network ??
@gogyoo 7 месяцев назад ⁺¹
What kind of data collecting are we talking about if we want to generalise to like, the 1000 most used English words? Could we build the model so that it can recognise phonemes from the preprocessed videos? Then the model associates the most likely phoneme separation to convert to the output string.
@Jaybaas Год назад ⁺²
Thank you so much!! This is absolutely phenomenal 🔥🔥👏🏾
@NicholasRenotte Год назад
JABULANI!! Thanks so much for checking it out!!!
@kevynkrancenblum5350 Год назад ⁺¹
Damn ,That crazy !
Thank Nick for an other fantastic video ! Ur the best 🙏🏻🙏🏻
@NicholasRenotte Год назад
Ohhhh mannn, thanks a mil Kevyn!
@thebusinesscentre Год назад
we gave you a follow since you are doing good things for the whole
@katu406 Год назад ⁺¹⁸
This f***ing awesome!!! 🔥🔥🔥
@NicholasRenotte Год назад ⁺²
🙏
@zenfestics 10 месяцев назад ⁺²
@@NicholasRenotte bro add time stamps to the video
@shivakumarbhatkere1807 9 месяцев назад ⁺²
hey Nick, this was a very helpful video. Just a small suggestion, if you create a requirements.txt file listing all the version of the packages you used it will be very helpful, even years to come we can install the specific version that yu have used otherwise it keeps creating dependencies issues.😃
@akshithchowdary6410 9 месяцев назад
did you face mimsave() issue? if yes how did you solved it?
@kunalkumar2717 8 месяцев назад
by squeezing (reducing the dimensionality) and converting the data type of the original nested list passed into the function, it does solve the issue.@@akshithchowdary6410
@ahmadshabaz2724 Год назад ⁺¹
Nick you are legend man.
@NicholasRenotte Год назад
Thanks a mil @Ahmad!! Appreciate it man!!
@abuprincewill9205 Год назад ⁺¹
Thank you so much for this.
@NicholasRenotte Год назад
Anytime @princewill!!
@mehulkumar6504 Год назад
Why did you choose fps=10 specifically for imageio statement? How to determine what fps value we have to put in?
@vamsik1453 Год назад ⁺⁹
Hey Nick! How were you able to find out the exact co ordinates of the lip region in the frame to extract them? Thanks for helping.
@FireFly969 Месяц назад
I think just trial and error, he does it manually, but in real world, we need to make sure the model can also find and track the lips movements.
@mugunthking8735 Год назад ⁺¹
that's one of the awesome tutorials
@NicholasRenotte Год назад
Ayyy thanks so much!!!
@FireFly969 Месяц назад
This project is making me go insane, and iam very much happy too by its results again, its facinating, its even hard for me to undertand all those words that guy says in the videos. And whats crazy is it learns from the lips movements. Its just crazy, but i think if we want to make it be able to work with any video, to first be able to detect the mouth in the video we give it then be able to read the lips movement. ❤ Project and crazy ideas like this makes me fall in love with neural network more and more, thank you for this wonderful projext, have a nice day nicknacknock
@dclxviclan 11 месяцев назад
Real cool calculating
@einsteinsboi Год назад
Dude! Amazing!!!
@NoorNoor-ki5dd 7 месяцев назад ⁺¹
We really need the app! 🤯
@MrPhili10 Год назад
That's awesome! Do you think it would be possible for it to, instead of sentences, I train it to detect individual phonemes? That should actually work even better, since there are only 44 phonemes in the English language.
@alx8439 Год назад ⁺¹
You're the most enjoyable software developer youtuber to watch. Not only you are doing amazing job, but you also share it with community. Because of guys like you I'm gaining back my trust to humanity. Thanks a lot!
@NicholasRenotte Год назад ⁺¹
Ohhh thanks so much @alx84!! Means a ton 🙏
@kubakakauko Год назад ⁺¹
This is so awesome
@NicholasRenotte Год назад
How cool is it right?!? Thanks for checking it out @kubakakauko!!
@00saad00 Год назад
Thank you so much!!
@salmanulfaris2139 Год назад
Bro, awesome💥🔥
@namangoyal8477 Год назад
really liked the video, thanks for the tutorial, make a streamlit app too.
@Les_decouvertes Год назад ⁺¹
Thank you so so much 👊🏾
@NicholasRenotte Год назад ⁺¹
Anytime my guy 👊🏾
@martinsilungwe2725 Год назад
Thanks for the content, I'm learning a lot from your videos sir... but I need help with translating the models into Tflite, with labels.
@giuseppedimaria6253 Год назад ⁺⁶
Thanks for the help you are giving with the videos Nick
anyway I'm studying to make Lip Reading work in real time, a bit like seen for sign recognition
For now I will try to retrain the model with the other samples from the grid dataset
and in real time i tried to build the prediction on the lips every 75 frames but it still doesn't work very well
the predictions obtained by me are currently unusable
I will continue to study waiting for more of your videos..
Let's see what will happen
@harshiramani7274 10 месяцев назад
Did u manage to built the real time lip reading?
@giuseppedimaria6253 10 месяцев назад
@@harshiramani7274 I created a model to make it work in real time, but it's limited since I used a custom dataset generated by me. It would take (in my opinion) more train cycles and a dataset with people with different characteristics (whites, blacks, Asians, children, elderly, etc.) saying every word in the dictionary of every language. Perhaps even a second model to concatenate the predicted words with the first model, and predict a complete sentence. I stopped working on it at the moment. I remember generating for each word in my dataset small videos where I always pronounce the same word from different angles etc., capturing 30frames from these videos.
@harshiramani7274 10 месяцев назад
@@giuseppedimaria6253 Yeah I think that is fair I am thinking of trying to make this model for my local language any idea how I should proceed for the same?
@user-on8it8ke6y Месяц назад
I have a small question, what is the differences between the DNN model you use above and RNN ? I'm doing a college project, hope you can answer me soon, thank you very much.
@joaobentes8391 Год назад
Could u make a tutorial on how to use TPUs from google colab to run reinforcment learning for example on solving like cartpole or some other envieemnet more complicated i guess. Thanks a lot!! Keep up the excelent work !! i share all ur vids in my team group in university! = )
@doomatyourservice2218 3 месяца назад ⁺²
Hey Nick, can you do a video on implementing the model with custom dataset?? Please
@not_humorouss 3 месяца назад
Yes please Nick
@ademolaadenekan1372 10 месяцев назад
how are you not at a million subscribers yet🤯
@rbanondo Год назад ⁺⁴
We want that tutorial with custom dataset. Please please please sir......
Also, great tutorial again. You are a great teacher.
@NicholasRenotte Год назад ⁺³
You got it!!
@giuseppedimaria6253 Год назад
I tried to generate a dataset by saving 'tot' number of video frames in which I say certain words 'hello' for example, isolating only the lips though
I appropriately divided them into different directories, renaming them with the words associated with those frames and used a 'flow_from_directory' function which allows the folder name to be associated with this data as a label.
@HarshilDangar-tc3ns 11 месяцев назад
@@giuseppedimaria6253 can you share the code
@mramanidevi5646 Месяц назад
@@giuseppedimaria6253 Hey Can you share code
@parthrangarajan7474 Год назад
Hi. This is great.
How do i load a custom video and try to perform lipreading on that?
@wasgeht2409 Год назад ⁺²
THANKS !!!
@NicholasRenotte Год назад ⁺¹
Anytime!!!
@addisonpratt8916 Год назад ⁺⁶
This is amazing!! Would it be possible to do live predictions using a camera?
@NicholasRenotte Год назад ⁺⁵
That's the gameplan! I'm still mapping out how to do it most efficiently...but definitely!
@tamizhmalar9843 11 месяцев назад
@@NicholasRenotte pls do buddy, this is really amazing!
@mikemurphyWebSurfinMurf 8 месяцев назад
Any update on real time camera effort?
@ghostplays5291 4 месяца назад
hey nick really need real time lipreading
@@NicholasRenotte
@samanthaeaton414 Месяц назад
@@NicholasRenotte Were you able to map this out?
@satyajeetshashwat4115 8 месяцев назад ⁺⁶
How to get the dataset?
@RobertThomsonDev 7 месяцев назад ⁺¹
Hi Nick! I would love to purchase the tutorial but the link is broken and I can't seem to find it on your site?
@harsh_the_walker 6 месяцев назад
Hey Nick can you tell me. how they video can be annotated?Which tool are they using?
@a1mm288 Год назад ⁺²
Hi Nick , I have just started learning about machine learning and dp. What mathematical skills are essential for me to have a solid understanding in this field?
@NicholasRenotte Год назад
Top three are: linear algebra, probability and calculus!
@a1mm288 Год назад ⁺¹
@@NicholasRenotte Thanks for the advice! I've completed linear algebra needed for deep learning and am now learning probability. Your help is much appreciated!
@lucndjoli1975 9 месяцев назад
Hi. can you please (if possible) introduce any natured-inspired optimizer in this code? like GWO
so that we can use less epochs and achieve good accuracy.
@Blaze-ie2rm Год назад ⁺¹
Bro i am trying to make my own model but when I try to load the dataset it shows me permission denied error. What should I do?
@MrAzu1000 Год назад ⁺¹
Superb..
@NicholasRenotte Год назад
🙏🙏🙏
@satoshinakamoto5710 Год назад ⁺¹
Yassssss we back!
@NicholasRenotte Год назад ⁺¹
Daymmmm straight!!! Thanks a mil for checking it out Satoshi!!
@nithyashreej 7 месяцев назад ⁺¹
Helo sir may I known which dataset is used in this project...
@DIY_Foodie Год назад
another bomb from you 🔥
@bennet5467 Год назад ⁺¹
Hello Nicholas, first of all thank u so much for ur work. i have discovered deep learning for myself thanks to u. already built and trained some models and im totally into it :D Only sad thing is that i do training on an kinda old cpu of my laptop cuz i have no gpu with cuda compatibility :(
So i decided to upgrade my pc a bit and i rly have no clue what gpu i should choose. Im currently wavering between rtx 3080 and rtx 4070 ti (although im not sure if the 4070 ti got the cuda compatibility, couldnt find this model on the nvidia cuda website). Do you have any suggestions on gpu's for machine learning? Id be very thankful about an advice! :) greetings from germany and pls keep doing with ur videos, ure a blessing for the deep learning community!
@NicholasRenotte Год назад ⁺¹
Eyo, 4070ti should definitely be cuda compatible. I’d go with that if you can! Although either is a good choice, I’ve got a 3080 ti waiting to go into my machine right now. So it’ll be pretty closely matched for the tutorials I make here if you want to follow along!!
@bennet5467 Год назад ⁺¹
@@NicholasRenotte thank u sir. then i'll go for the 4070 ti :) and i'll definitely follow along :D
@AnilSingh-dy2yd 10 месяцев назад
Hi @Nicholas,
Is LipNet free to use?
@perceptronnn 10 месяцев назад ⁺²
data.as_numpy_iterator error:
For the people who are using colab,
I used tensorflow version 2.10.1
In load_data(), use file_name = path.split('/')[-1].split('.')[0]
For mappable_function, use this code:
from typing import Tuple
def mappable_function(path:str) ->Tuple[tf.Tensor, tf.Tensor]:
result = tf.py_function(load_data, [path], (tf.float32, tf.int64))
return result
(The tf.py_function call returns a tuple of two tensors (tf.float32 and tf.int64), not a list of strings.)
@akshithchowdary6410 9 месяцев назад
And did you get and error in mimsave() ??
@ghostplays5291 9 месяцев назад
My epoch execution takes hours even though I have rtx 3050 what to do
@varakalachandrakanth6972 7 месяцев назад
@@akshithchowdary6410 hey bro i got the same error did you solve it ?
@devez007 3 месяца назад
Thanks! this worked.
@suryasreemanth3387 Месяц назад
Thank you so much, It worked for me 😃
@miladjurablu9032 Год назад
thank you sooooooooooooooo much🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥
can you create video about the DNA of a person's voice?
@GemsofPakistan1 Год назад
Thankyou Nic. i want to work on this project , can you please guide me how can i use my native language in dataset for lipreading? please guide me im newbie please
@sunilb533 Год назад
Can you please do a tutorial on training on our own dataset/ videos..
@CreatingUtopia 4 месяца назад
At 43:00 what if I want to test this on a new custom video, that is of not that shape.. (75,46,140,1)
I am getting a shape mismatch error, as my frame shape is 99,46,140,1
I tried changing the temporal dim i.e 99 with 75 like yours but it didn't work
Pleasssssseee help
@happy-mo1qc 7 месяцев назад ⁺¹
what is the accuracy of your this model sir please reply someone please reply
@mustimply9317 Год назад ⁺¹
Thx🔥
@NicholasRenotte Год назад ⁺¹
Glad you liked it @mustimply!!
@srikanthkoltur6911 Год назад ⁺¹
Can you make short video covering jupyter notebooks tricks and extensions to easily use notebooks like execution speed time and other etc
@NicholasRenotte Год назад ⁺¹
Done, added to my list for this week!
@srikanthkoltur6911 Год назад ⁺¹
@@NicholasRenotte sorry for not mentioning but i was looking forward for lip read video it is fabulous thanks a lot for you efforts ❤️
@NicholasRenotte Год назад
Ayyy, all good man!! Love that you enjoyed it!
@raj4624 Год назад ⁺¹
Hi NIck. can you make videos on MLOPS (series for begineers or as per your convience).?
@NicholasRenotte Год назад
Done, will see what I can do this week!!
@user-il1hu5xp2x 12 дней назад
Imagine that in all RUclips this is the only video detailed at how to make this project lipnet lip reading. Just shows how this topics are not easy to found
@srisahi57 5 месяцев назад
Hey nick i am getting an error if InvalidArgumentError: ((function node wrapped Transpose device_/job:localhost/replica:0/task:0/device:CPU:0)) transpose expects a vector of size 5. But input (1) is a vector of size 3 [Op: Transpose]
@tejassrivastava6971 Год назад ⁺¹
This is insane 😶‍🌫️🔥🔥. Can we get a tutorial for streamlit app of the same? Please.
@NicholasRenotte Год назад
You got it @Tejas!!
@not_humorouss 3 месяца назад
Hey, could you please do a video of this model working on a custom dataset?
@SVSingam273 Год назад ⁺¹
@Nicholas Rennotte. Can you please do a full tutorial on YAMNet
@deepakgahlot6879 9 месяцев назад
It would be great if you could create a tutorial on creating a custom data set.
@HarshilDangar-tc3ns 11 месяцев назад
i wanted to know t how can i make my own training data and also how predict from real time output?
@arshvahora4224 2 месяца назад
Hey! I got some doubts in the project. Adding to that, the code you provided also has errors in it. Would you please share updated code?
@sadiaafreen1519 Год назад ⁺²
Hey! Superb Work. I have been working on this model following your tutorial but the epochs are taking forever to load. I just completed my first epoch after 9-10 hours. Moreover the colab gets timed out and it starts all over again🙁 is there anything i can do?
@gsettu7255 9 месяцев назад
Did it complete I'm currently doing it's still running from past 13hours
@OfficialGamer-kc4dz 2 месяца назад
@@gsettu7255 Hey did you solved that epoch issue
@OfficialGamer-kc4dz 2 месяца назад
@sadiaafreen1519 Hey did you solved that epoch issue
@adithya7064 Год назад
Your work on this is awesome..and the explanation is also great🙌..I have found on one explaining every step in this way..But, one thing In extracting the Lip region using static values..it doesn't work for other input videos right?? Also the total number of frames from the input video will be getting error which is not matching with the convolutional layer's input_shape
@adithya7064 Год назад
What to do for the error which is getting because of convolutional layer's input shape is not matching with the given input video's shape..can you provide the solution for that??
@CreatingUtopia 4 месяца назад
@@adithya7064 did you find any solution bro, I think I am facing same issue
@solomon_leo_27 Год назад ⁺¹
Nic! Can I use this model to predict a video without alignments?
@NicholasRenotte Год назад
Yep! That's the beauty of using the CTC loss function!
@Football_moments8 Год назад ⁺¹
Hey nick can you make a tutorial on how methods are used
@NicholasRenotte Год назад
Like applications for this kind of tech?
@PABBATHIBLSHREEHARSHABCD 4 месяца назад
hey nick where can i download the data set for this? can some one let me know?
@Sumonms Год назад ⁺¹
Hey Nicholas can I try this pipeline to my custom dataset?
@NicholasRenotte Год назад
Sure can, with custom videos it should be fine. You might just need to tweak the load_alignments function to handle how your text is structured!!
@pranav-patil Год назад
What does that numbers in alignments represent ?
@namangoyal8477 Год назад
hey nick, can you please explain the input size as [75,46,140,1], as per my understanding, its 75 frames per video, 1 is for grayscale. How about 46 and 140. i am not able to understand the shape of data. kindly help.
@derilraju2106 Год назад
46x140 is the heightxwidth of each frame (focussing on the lip region)
@blancoarnau 4 месяца назад
I guess lip reading is a very difficult task because by simply looking at a person lips is very difficult to 100% know what they're saying. Some words are easy to pick while some others can be easily confused with other words...
I guess that the model may work well for vowels because the mouth position changes a lot, but when it comes to consonants, it may be more tricky. One good idea would be to predict the next word based on the previous one. For example, if the model predicts "the", the next word is very likely to be a noun. I guess this would help discern what word is next.
@user-wg1wx7yx3t Месяц назад ⁺¹
I am abhi from india your video are me alot, but I am facing problem in load_data(tf.convert_to_tensor(test_path)) what should i do. please solve

Следующие

Автовоспроизведение

I tried to build a ML Text to Image App with Stable Diffusion in 15 Minutes