Build a Deep Audio Classifier with Python and Tensorflow

Nicholas Renotte

Просмотров 174 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 сен 2024

Комментарии • 231

@sheikhshafayat6984 2 года назад ⁺¹³
This is exactly what I was looking for the past one month, and suddenly popped up on my recommendation!
Can't thank you enough for this. You saved my semester!!
@captainlennyjapan27 2 года назад ⁺¹⁵
1 minute into the video, absolutely amazed by the high high quality of this video. You are my favorite programming youtuber along with FireShip and NomadCoders! Thanks so much Nicholas!
@guillaumegalante 2 года назад ⁺²²
Thanks so much for all these great tutorials! I’ve discovered your channel a few days ago, your way of teaching makes it really easy to understand and learn. I was wondering if you’d be able to do a series or video around recommender systems: building a recommendation engine (content-based, collaborative filtering), rather Netflix (movie) recommendations, Spotify’s music recommendation (could include audio modeling) or Amazon (purchases) predictions. Many thanks! Keep up the amazing tutorials :)
@NicholasRenotte 2 года назад ⁺³
Definitely! I’m doing my own little deep learning challenge atm, will add it to the list!
@prajiltp8852 Год назад ⁺¹
Can we use same if I wanted to seperate my bpos call recording from a conversation files, like if I train it based on my bpos recording and after that if I give a audio will it seperate my bpos sound?? Please help
@dwiechannel3196 Год назад
@@NicholasRenotte please answer my question, I really need some direction.🙏🙏🙏
@captainlennyjapan27 2 года назад ⁺²
41 minutes into the video. Not even a second I was bored. Amazing
@Maxwell-fm8jf 2 года назад ⁺²
I worked on similar project on Audio classification hooked up on raspberry with some sensors three months ago but using rcnn and librosa. A different approach from yours basically the same steps. Thumb up mate!!
@NicholasRenotte 2 года назад
Woahhh, nice! What was the latency like on the rpi? Noticed when I started throwing more hardcore stuff at it, it kinda struggled a little.
@farhankhan5951 Год назад
What you have developed in your project?
@ellenoorcastricum 7 месяцев назад
What where you using the pi for and have any tips on how to make a system that recognizes certain sound in real time?
@adarshd249 2 года назад ⁺⁶
Another great content from Nick. Thrilled to do a project on this
@NicholasRenotte 2 года назад ⁺¹
Yess! Let me know how you go with it!!
@IronChad_ 2 года назад ⁺⁵
You’re the absolute best with these tutorials
@abrh2793 2 года назад ⁺⁵
Nice one!
Looking forward to a multi label text classification if you can!
Thanks
@NicholasRenotte 2 года назад ⁺²
Yup, code is ready, should be out this week or next!
@abrh2793 2 года назад ⁺²
@@NicholasRenotte yo thanks a lot!
The way you get inputs from the community and interact is nice to see
@rachitjasoria9041 2 года назад ⁺¹⁶
A much needed tutorial !! btw can you make a tutorial on tts synthesis? not with pyttsx3... train a model to speak from provided voice data of a human
@NicholasRenotte 2 года назад ⁺⁶
You got it!
@rachitjasoria9041 2 года назад ⁺⁴
@@NicholasRenotte 😃
@gaspardbos 9 месяцев назад
Mc Shubap is spinning the decks in your memory palace 😆 Great tutorial so far.
@enzy7497 2 года назад
Just discovered this channel on my recommended. Really awesome stuff man! Thanks for the great content.
@lakshman587 Год назад
This video is Awesome!!!
I got to know from this video that we convert Audio data to image data, to approach audio related tasks in ML!!!
@luisalmazan4183 Год назад
Thank you so much for these tutorials, Nicolas. Will be great a tutorial about few shot learning. Grettings from México!
@henkhbit5748 2 года назад
Awesome sound classification project👍I need a capuchino break after hearing the capuchind bird sound😎
@DarceyLloyd Год назад ⁺³
Great video. Would love to see a version of this done using the GPU, with multiple classifications, not just binary.
@0e0 10 месяцев назад
Tensorflow has GPU builds
@ChrisKeller 7 месяцев назад
Super, super helpful getting my project off the ground!
@davidcastellotejera442 Год назад ⁺²
Man these tutorials are amazing. Congrats for creating such great content. And thank!!
@malestripper1 Год назад ⁺¹
16:36
44.1KHz is the sampling rate. Meaning every 1/44.1K sec, the amplitude of the audio wave is sampled.
@orange-dd5rw 29 дней назад
how can i implement detection for example when the initial capuchin call started and ended how can i get this information in the end result (like in result it should show capuchin call - 2.3s)
@sederarandi1507 3 месяца назад
bro you are absolute gold, thank you so much for all the effort you put on your videos and teachings
+1 subscriber
@gregoryshklover3088 Год назад ⁺²
Nice tutorial. A few inaccuracies there though about stft() usage: "abs()" there is not for getting rid of negatives, but for complex values amplitude. frame_length would probably better be power of 2...
@orange-dd5rw 29 дней назад
how can i implement detection for example when the initial capuchin call started and ended how can i get this information in the end result( like in result it should show capuchin call - 2.3s)
@gregoryshklover3088 29 дней назад
The classification works on sliding windows of fixed size (3sec in this tutorial). One can slide the window with overlap to try to approximate the start of the matching sequence, or use other signal processing methods to find start of the sequence.
@supphachaithaicharoen7929 2 месяца назад
Thank you very much for your hard work. I really enjoy the video.
@stevew2418 2 года назад ⁺¹
Amazing content and explanations. You have a new subscriber and fan!
@NicholasRenotte 2 года назад
Welcome to the team @Steve, glad you liked it!
@pedrobotsaris2036 Год назад
good tutorial. Note that sample rate has nothing to do with the amplitude of an audio file but rather the number of times the audio file is sampled per seconds.
@GuidoOliveira 2 года назад
Incredible video, much appreciated, on the side note, I love your face cam, also audio is excellent!
@primaryanthonychristian2419 Год назад
Bro, great video and very good detailed explanation. 👍👍👍
@urielcalderon1661 2 года назад ⁺¹
It's him, he is back.
@NicholasRenotte 2 года назад ⁺¹
Ayyyyy Uriel!! What's happening!! Thanks a mill!
@urielcalderon1661 2 года назад
@@NicholasRenotte Always faithful man, while there deep learning tutorials we will be there
@zainhassan8421 2 года назад
Awesome, kindly make a video on Speech Recognition model using Deep Learning.
@leventelcicek6445 Месяц назад
you are wanderfull sir
@thoseeyes0 Год назад ⁺²
if anyone get error at 22:14 for
pos = tf.data.Dataset.list_files(POS+'\*.wav')
neg = tf.data.Dataset.list_files(NEG+'\*.wav')
just use the / instead of \.
pos = tf.data.Dataset.list_files(POS+'/*.wav')
neg = tf.data.Dataset.list_files(NEG+'/*.wav')
@TheHearts567 6 месяцев назад
thank you
@chamithdilshan3547 2 года назад
What a great video is this. Thank you so much !!! 😍
@Uncle19 2 года назад
What an amazing video. Definitely earned my sub.
@ronaktawde 2 года назад ⁺¹
Very Cool video Nick Bro!!
@NicholasRenotte 2 года назад
Thanks homie! Good to see you @Ronak!
@周淼-k3u Год назад ⁺¹
Thank you so much for these nice tutorials! They are quite helpful! I have a small question. I saw your process of building up models and training and testing them. If I want to spend less time in classifying the model, do you think it's possible to introduce some existing datasets such as esc-10 or esc-50 in your method?
@dimmybandeira7 2 года назад
Very smart! can you identify a person speaking in the midst of others speaking more quietly?
@oaydas 2 года назад ⁺¹
Great content, keep it up man!
@guruprasadkulkarni635 2 года назад ⁺²
can I use this for classifying different guitar chords' audio?
@mosharofhossain3504 Год назад ⁺²
Thanks for such a great tutorial. I have a question:
What happens when resampling is done to an audio file? Does its total time changes or its number of sample changes or both changes or it depends on specific algorithm?
@andycastellon919 9 месяцев назад
Us humans can hear up to 22kHz approximately and due to Nyquist frequency, you need to sample it twice as its higher frequency, hence that 44100Hz you may have seen. However, on audio analysis, most useful data is found in up to 8000Hz, so we resample it up to 16000Hz, losing the rest of higher freq. The length of audio does not change. What changes is the amount of bits we need to save the audio.
@anthonylwalker 2 года назад ⁺²
Another great video. I love the setup and style of your videos now, have to love a good whiteboard! I'd be interested in seeing a drone video - I've recently got a Ryze tello. Enjoying the python package to control, and computer vision capabilities that come with it!
@NicholasRenotte 2 года назад
Definitely!! I've got one floating around somewhere, will get cracking on it! Thanks for checking it out @Anthony!
@NoMercy8008 2 года назад ⁺³
LET'S GO, NICK :D
This is actually pretty awesome and once again one of those things that i feel you can do TONS of different things with.
Voice commands and animal calls are obvious examples, but maybe you could build a device that listen for a human's breath and heartrate and stuff like that and detects irregularities? This could be used for diagnostic purposes, but also as a warning device for for example elderly people. The moment it hears weird heart or breath sounds or something, it gives a warning and tells them to see a doctor.
Same thing can be applied in a bunch of different fields, i think. Listening for weird engine sounds for example to help diagnose engine problems before internal parts suddenly and violently become external parts.
Also, astronomy! Listening for gravitational wave events and things like that, though I'm pretty sure they're already using tons of AI for this anyway, so it's probably being done already.
By the by, you posted about crowdsourcing labels/labeled data the other day, i think that's a great idea, especially if you're sharing that labeled data with the public!
Doing it this way is a much more manageable way to get labeled data since hopefully the work is being done in parallel by distributed resources ("many humans") and sharing it online means that this project essentially helps everyone wanting to play around with ML and use the data for something cool, learn about things and maybe come up with awesome ideas to change our future.
So, awesome idea and great way to leverage the outreach you have here on YT, I love it! ❤
What i really liked about this video in particular is the exploratory data analysis and the preprocessing alongside it.
As we all know, it's very important to feed your data to your network in a way that makes it as easy to digest as possible, so learning more about this is absolutely essential and really fun aswell! Much appreciated!
As always, Nick, thanks a ton for your videos and for doing all this for us, much much appreciated!
Really love this video and looking forward to the next one!
All the best, have a great week! :)
@NicholasRenotte 2 года назад ⁺³
Heyyy, I had a good laugh at this comment "before internal parts suddenly and violently become external parts" 😂 but yes definitely agree!
I'm hoping the crowdsourced labelling can become a thing! I know there's datasets out there but for a lot of the niche and practical stuff I have in mind, I can't really seem to find anything. I figure if people are willing to help label then I can give back to the community by showing everyone how to use and build with it!
Have an awesome weekend!!
@carlitos5336 2 года назад ⁺¹
Thank you so much!!
@kavinyudhitia Год назад
Great tutorial! Thanks!!
@sarash5061 2 года назад ⁺¹
This is amazing
@vigneshm4916 2 года назад ⁺¹
Thanks for a great video. Could please explain why we need tf.abs in preprocess function?
@ayamekajou291 Год назад ⁺²
Hey nicholas, this project is great but how do i classify multiple animal calls using this model? I can classify the audio as capuchin or not capuchin this way but if i included more audio classes, how could i classify the audio file as the animal as well as the number of counts ?
@marioskadriu441 2 года назад
Amazing tutorial. Really enjoyed that Nick 🙏🏼
I guess in case we wanted to detect multiple sounds from the same animal the procedure would be the same but we would need an equal number of samples to train the neural network?
Furthermore in case we wanted to detect sounds from multiple animals and categorize them would we follow the same procedure just at the end we would put softmax instead of sigmoid ?
@benbelkacemdrifa-ft1xr Год назад ⁺¹
It's a very interesting video. But can we do the test using sound sensor?
@riyazshaik4006 8 месяцев назад
Thanks so much sir, one request sir can you explain about how to classify audio as positive, negative and neutral
@insidecode Год назад
Amazing job
@mayankt28 2 месяца назад
If you're encountering a shape issue when calling model.fit and getting the error "cannot take length of shape with unknown rank," the solution might be to explicitly set the shape of your tensors during preprocessing.
@MahmoudSayed-hg8rb Месяц назад
can you elaborate further ?
@empedocle Год назад
Amazing job Nicholas!! I have just a question, why didn't you calculate also the standard deviation of files' lenght so to have a more precise interval for your window?
@eranfeit Год назад
Thank you
Eran
@prxninpmi 2 года назад ⁺¹
VERY COOL!!
@jawadmansoor2456 Год назад
Thank you for the great comment. How do you classify multiple sounds in a file and get time information as well like a sound was made at time 5 seconds into the audio file and another was made at 8 seconds how do we get time and class?
@harsh9558 9 месяцев назад
This was awesome 🔥
@Uebermensch03 2 года назад
Thanks for uploading a great video and one question! You sliced an audio file from the very start of the file. I think the position where we start slicing can affect the model accuracy. For instance, if you skip 1s and start slicing, it may yield different wav data. Do you think slicing audio from the very start of the file is golden rule?
@vishalm2338 Год назад ⁺¹
How to decide the values of frame_length and frame_step in tf.signal.stft(wav, frame_length=320, frame_step=32) ? Appreciate any help !
@NuncNuncNuncNunc 2 года назад
Maybe a basic question, but what does zero padding do when getting the frequency spectrum?
@ahmedgon1845 Год назад
Great video thanks so much,
I have a small question, In the line
Spectogram = tf.signal.sftf
Why you choose
Fram_step =320
Fram_length=32
Can some one explain the method of choosing this please?
@orange-dd5rw 29 дней назад
how can i implement detection for example when the initial capuchin call started and ended how can i get this information in the end result ( like in result it should show capuchin call - 2.3s)
@lorenzocastagno1305 2 года назад ⁺¹
Grazie!
@mendjevanelle9549 4 месяца назад
Hello sir!
I installed tensor flow as presented but I don't understand the reason of the error message,no module named tensor flow.
@badcatprod 2 года назад
1k! ) //THANK YOU! 🤗
@md.abdullaalmamun3965 2 года назад ⁺²
Please make video on image segmentation (semantic and instance).
@NicholasRenotte 2 года назад ⁺¹
Coming soon!
@Ankur-be7dz 2 года назад ⁺¹
data = data.map(preprocess)
in this part im getting an error -----------------> TypeError: tf__decode_wav() got an unexpected keyword argument 'rate_in' although rate_in is the parameter of tfio.audio.resample
@Kishi1969 2 года назад ⁺¹
Always given inspiration of new knowledge...You are Great Thank
Advice please
Please my question is that can I buy Graphic card of 2G(NVidia for starting Computer Vision because my PC is too slow when I'm using my CPU..
@NicholasRenotte 2 года назад ⁺¹
I would be looking at something with more RAM is you can swing it @Benya, 2gb is a little too small for most image computer vision tasks.
@unteejo3678 2 года назад
Use Google Colab’s TPU (Tensor Processing Unit).
It’s very fast if your model use only CNN.
@Kishi1969 2 года назад
@@NicholasRenotte Thanks 🙏
@Varadi6207 Год назад
Awesome explanation. Please help me to create audio augmentation for health records without losing information. I worked with time_shift(-0.5 to 0.5 variation in the wav). But, model ACC is not up to the mark.
@asfandiyar5829 10 месяцев назад
Had a lot of issues getting this to work. You need python 3.8.18 for this to work. I had that version of python on my conda env.
@rajkumaraj6848 2 года назад ⁺¹
@NicholasRenotte The kernel appears to have died, It will restart automatically. Got this error while running model.fit. How can I solve this?
@ellenoorcastricum 7 месяцев назад ⁺¹
Is it possible to run this while i have my mic always listening and to do live proccesing on that? Btw this will be my first project and i know its a lot.
@rohithak9419 2 года назад
I subscribed to this channel last October
But today I see that is unsubscribed today
I feel it's better to list all channels in a piece of paper and keep it in wallet
@thewatersavior 2 года назад
58:00 - Another great one, thank you, already looking forward to applying. Quick question - why mix the audio signals on the MP3. I get that it gets us to one channel - is there a way to just process one channel at a time. Im imagining that would allow for some spatial awareness in the model? Or perhaps too many variables because we are just looking for the one sound? Thinking that it would be useful to associate density with directionality... but not sure that's accurate if the original recordings were not setup to actually be directional...
@cadsonmikael9119 Год назад
I think this might also introduce distortion in the result, since we have to deal with stereo microphone separation, ideally about 100-150mm for human perception. I think the best idea is to just look at one channel in case of stereo, at least if the microphone separation is high or unknown.
@tims.4396 Год назад
Im not sure about the batch and prefetch part, for me i generates empty training sets afterwards and also it only takes 8 prefetched files for training?
@Computer.Music.And.I 3 месяца назад
Hello Nicholas,
I have been using this great video in my beginners courses and last year everything was fine. Unfortunately in today's lecture the code did not run on any of my machines or configurations ... The load_16k_ wav function is not able to resample the audio files, and much worse, the model.fit function complains about an input that could not be -1, 0 or 1.
Are you willing to check and update your code ? (Spend 6 hours now to find the error😊)
Thx jtm
@tatvamkrishnam6691 Год назад ⁺¹
Tried to recreate the same. Somehow the program abruptly stops at
hist = model.fit(train, epochs=4, validation_data=test)
It has to do with using lot of RAM. Anyway for me? Thanks!
@karlwatkins7054 Год назад
did you solve this issue?
@malice112 2 года назад ⁺¹
thanks for the great video, is it possible to use .mp3 files in python instead of .wav to save disk space?
@GArvinthKrishna 4 месяца назад
what approach is the best to find a number of blows in the recording of a Jackhammer?
@mrsilver8151 5 месяцев назад
thanks for the great tutorial as always sir
in case i want to make voice recognition to identify for which person is this voice
is this steps will help me to reach that or do i need to look for something which is more specific to this task.
@dumi7177 3 месяца назад
what computer specifications do you have? training the model for me took 8 hrs
@TheOfficalPointBlankE 6 месяцев назад
Hey Nicholas, I was wondering if there was a way to change the code to print the timestamps in the audio clip that each sound is recognized?
@Sachinkenny 2 года назад
What happens when there are multiple birds in the dataset. Now how good is a CNN model on this kinda dataset? Again the source training audio samples can vary in length, sometimes in minutes. How can we do the pre processing in such cases?
@kundansaha2369 4 месяца назад
i am getting a error and tried to debug so many way but not solve. Error is "The procedure entry point could not be located in the dynamic link library
- to Positive and Negati
a', 'Parsed_Capuchinbird_Clips') a', 'Parsed_Not_Capuchinbird_Clips')
C:\ProgramData\anaconda3\lib\site-packages\tensorflow_io\ python\ops\libtensorflow_i0.50."
@mohammedyacinmeshi3398 2 года назад ⁺²
Hi Nick ! Thank you for your extremely useful contribution to the AI learners' community
@Dr.AhmedQusaySabri Год назад
Thanks a lot
@Lhkk28 Год назад
Hello Nicholas, thanks for you video :)
I have a question
I am aiming to build a model for sound detection using deep learning algorithms ( I am thinking about using LSTM). for now I am done with preprocessing step. I have the spectrograms of the sounds (generated using Short time Fourier transform) also I have the labels (binary labels as arrays, 0s where there are no events and 1s where the events are present). I am now confused about who to fed this data to the model. The shape of each spectrogram is (257, 626) and the shape of each label is (626,). How should I give this data to the LSTM. Can I build a model that takes the spectrograms with their current shape and give the labels as sequence of ones and zeros or I have to segment the spectrograms and give each segment a label?
@SA-oj3bo 2 года назад ⁺¹
If I want to count how many times 1 specific dog barks / day ? Then it is clear that samples of this dog barking are needed, but how many? And what other sounds must be sampled and how many if at the same place many other sounds can be heard? Thx!
@NicholasRenotte 2 года назад
Heya, I would suggest starting out with 100+ samples of the dog, you can then augment with different background sounds and white noise to build out the dataset. This is purely a starting point though, would need to do error analysis from there to determine where to put emphasis next!
@SA-oj3bo 2 года назад ⁺¹
@@NicholasRenotte It would be very interesting if an accurate counter can be made. This is a case of animal abuse ( dog in a small cache and neglected by the owner and barking for attention for over 2 years), so an accurate counter would be very helpful and usefull for other projects.
What I not understand is why 1 sample/spectogram of the barking dog would not work good enough to detect it in a recording of for example 24h, because there must be very few or no sounds that have the same spectogram. I understand it will always be different but can 2 different sounds ( cat and dog for example) have 2 spectograms that are very similar? So my question is why it is not possible to identify a specific sound in a recording by comparing the spectogram of the sound to detect to all possible spectograms in the recording ms after ms? If you accept payed projects I would love your help, because this is all new for me. Regards!
@NicholasRenotte 2 года назад
@@SA-oj3bo I think you could. If there were multiple dogs in the sample you would probably need a ton of data though to be able to clearly identify which dog is barking. This would allow the net to pick up the nuances for that particular bark.
@ChristianErwin01 7 месяцев назад
I've gotten through to the part where you start testing the predictions and my validation_data isn't showing up. The epochs run fine, but I have no val_precision or val_loss values. All I have are loss and precision_2.
Any fixes?
@tatvamkrishnam6691 2 года назад
23:30 What is the significance of that len(pos) or len(neg)?
When len(pos) is replaced with 2 , I expect only first 2 sample data to have '1' label.
However when I run -> positives.as_numpy_iterator().next(), I get '1' labelled not only for the first 2 samples but also for the rest.
@eggwarrior5630 10 месяцев назад
Hi i am working with a new audio dataset which does not require audio slicing part? What should I modify to loop through the folder for the last part. Any help would be greatly appreciated
@amruthgadag4813 Год назад
AMAZING
@barutistudio1397 2 года назад
Thanks
@SaiCharan-ev8hu 4 месяца назад
hey nicholas,trying to execute this but facing issue as you havent done any preprocessing on the training data,looking for help from you
@kiss-bws Год назад
Please make video on making very own speech recognition using tensor flow
@akashmishrahaha 7 месяцев назад
Why are we reducing the sample rate from 44khz to 16khz, it was not clear to me?
@abraranita 2 года назад ⁺¹
Hi Mr. Nicholas, I thank you for your RUclips content, I benefited a lot from it I am a PhD student in electronics. I am programming a robot. I am working on a program that is similar to the video, real time sign language. I am facing some errors and I could not get past them. If I can send a picture of the error and correct it for me, please
@Maxwell-fm8jf 2 года назад ⁺¹
What error did you get? I worked on similar project but different approach on rpi hooked with some sensors.
@abraranita 2 года назад
@@Maxwell-fm8jf in # Restore checkpoint
ckpt = tf.compat.v2.train.Checkpoint(model=detection_model)
ValueError: `Checkpoint` was expecting a trackable object (an object derived from `TrackableBase`), got . If you believe this object should be trackable (i.e. it is part of the TensorFlow Python API and manages state), please open an issue.
@gauranshluthra7520 3 месяца назад
How did you uploaded the file as colab does not support folder upload until it is in zip file format
@masialk9156 2 года назад ⁺¹
Great video Nich 👏, can you do a tutorial about sign language to text convert which takes sign videos not sign images as dataset, I am stuck with it 😴
@NicholasRenotte 2 года назад
Definitely, added video classification to the deep learning list yesterday @Masia!
@masialk9156 2 года назад
@Nicholas Renotte thanks
@plazon8499 2 года назад ⁺¹
Dear Mr. Renotte,
I'm trying to use this tutorial as a basis to build a classifier over several music genres. The only thing that I don't know how to adapt is the last layer of the CNN. How should I modify it so that it can get me as output let's say 10 different labels ? Should the labeling be modified upstream ?
(I want to have 10 outputs instead of 1 at the last Dense layer, but I can't just modify it like it, so I'm wondering how I should do it)
Thanks a lot !
@armelayimdji 2 года назад
Since the time you asked the question, you have probably solved it. However, my guess is that for your multi class problem, you should first have data for the 10 classes (samples of each of 9 music genres, plus a non classified genre) and the last layer should be a Dense layer with 10 neurons activated by a softmax function (instead of sigmoid) that gives the predicted probability of each class. You also have to change the loss function to be one of the 'categorical crossentropy' available in tf keras.
@plazon8499 2 года назад ⁺¹
@@armelayimdji Hey Armel Tjanks a lot for the advice ! Obviously I'm done with my project and I went for something else : Instead of taking the spectrograms as input of my CNN, I extracted features from the sound wave and all the physical aspects of the music to have an input vector of features that I passed through an MLP and it worked well !
@farhankhan5951 Год назад
I have similar kind of problem, can you help me?
@konradriedel4853 2 года назад ⁺¹
hey nick, i was rebuilding this project of yours now on a local machine - i was struggling with the spectogram size and memory allocation for my gpu rtx3070.. i downscaled to frame_length=160, frame_step=64 with input_shape=(748, 129,1) so far so good, now regarding the .fit of the model my training time is 50ms per epoch, final results tend to yours...did the colab train on cpu for the sake of memory with "highscale" spectograms maybe? could you run that locally and give some info? im very insecure regarding the train time of 50ms to the 3mins in the vid.. thanks anyways for the great tutorial man!!!
@GGBetmen 2 года назад
hello, can I ask you about this things?
@Joker-lr5he 2 года назад
Hey great video, this is actually the one video I understood regarding audio classification. Also I have a question, I'm working on a musical instruments classifier which predicts what instrument/instruments is/are used in a wav file. So please can you give me some suggestions or the changes I have implement in my project as I'm referring to your video throughout in the development of my project. It Would be a great help
@Joker-lr5he 2 года назад
Also, will it be a better idea to go by the mfcc approach?
@BARATHANM-ln4cd Год назад
@@Joker-lr5he hello...
what is the progress now..
im about to create a project which is more related to urs..
and then i just saw ur comment and popped up
@iPhoneekillerr 7 месяцев назад
please help me, why doesn't colaboratory open this code? How should it be changed so that it can be opened in the colaboratory?

Следующие

Автовоспроизведение

Types of Audio Features for Machine Learning