Training a Custom Object Detector with TensorFlow 2.0 Custom Object Detection API 2020

Поделиться
HTML-код
  • Опубликовано: 19 сен 2024

Комментарии • 370

  • @Armaan_Priyadarshan
    @Armaan_Priyadarshan  3 года назад +5

    I've just posted a guide on TensorFlow Lite Conversion here: ruclips.net/video/2ofuUdCDppc/видео.html

    • @brokeunistudent2474
      @brokeunistudent2474 3 года назад

      hey I dont understand why is my installation wrong, i literally followed everything you did but then when i tried
      python object_detection\builders\model_builder_tf2_test.py
      i get
      AttributeError: module 'tensorflow' has no attribute 'contrib'

    • @brokeunistudent2474
      @brokeunistudent2474 3 года назад

      but isnt contrib from tensorflow1?

  • @AbdulHaseeb-ej4gk
    @AbdulHaseeb-ej4gk 4 года назад +8

    I was following the EdgeElectronics tutorial but encountered with so many errors so I followed your tutorial, everything worked fine. Thank you so much bro!

  • @thomasb.3657
    @thomasb.3657 2 года назад

    Great video man, you're explaining everything clearly and even anticipate almost all errors we can encounter.
    You have now a new subscriber, your other videos seem interesting too !

  • @NoName-xs3bg
    @NoName-xs3bg 2 года назад

    Hi!! thank you so much for your video Armaan you help me with working with TensorFlow custom model.

  • @samsam-qi6qo
    @samsam-qi6qo 3 года назад +1

    Hi Armaan, Great work! Finally a TF tutorial that actually works. I have two requests:
    1. Can you create a video about converting to TFLite, and actually running on a mobile phone. An interesting framework would be React Native, as it works on both IOS & Android.
    2. Show how to train the model on the cloud (e.g. COLAB).

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! I'm working on the TFLite conversion tutorial at the moment. I just recently had a breakthrough. I might take a look at training with Colab although I feel there might be some other videos on RUclips that already cover this topic in depth.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      Update: I just posted a video about TFLite Conversion here: ruclips.net/video/2ofuUdCDppc/видео.html

    • @samsam-qi6qo
      @samsam-qi6qo 3 года назад

      @@Armaan_Priyadarshan great Armaan! I will check it out and let you know what I think. Many thanks!

    • @fadilahduljeet2979
      @fadilahduljeet2979 2 года назад

      @@samsam-qi6qo hello, did it works for you?

  • @kaanakn6711
    @kaanakn6711 2 года назад +2

    Hi
    I tried to train data but i have a error message like : Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
    What can I do?

    • @nojax1522
      @nojax1522 2 года назад

      I've got the same problem

    • @mertylmaz4212
      @mertylmaz4212 2 года назад +1

      stuck there too

    • @mohdsaquib380
      @mohdsaquib380 5 месяцев назад

      I am also stuck here. Anyone find the solution?

  • @emmanuelboafo3613
    @emmanuelboafo3613 3 года назад +2

    Hello Armaan, Great work but please can you make a similar tutorial video but this time how to train the model on Google Colab. My pc doesn't support Nvidia graphics driver

  • @siddharthbondarde5448
    @siddharthbondarde5448 2 года назад

    Hi Armaan, after entering the command to train the model I am getting this error:
    ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
    How can I fix this??

  • @EdjeElectronics
    @EdjeElectronics 4 года назад +4

    Nice work man! 😃

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Thank You! I’ve used a lot of your code in the past, and I love your work with TensorFlow on the Raspberry Pi!

  • @creeda.7621
    @creeda.7621 2 года назад

    @31:53 in the object detector window, my output doesn't have green boxes label. Do you know what the problem is?

  • @foxil4370
    @foxil4370 3 года назад +1

    Hi Armaan, thanks you for this Amazing tutorial.
    could you help me ?
    I followed to your tutorial and everything is ok, but when I tried to convert this model into JS model with using console command from official guide
    Model was converted fine (without an errors)
    But when I tried to load this model into JS, I got an error like "unknown Layer"
    maybe you have any advices or even tutorial how to convert and load custom model into JS ?

  • @anirudhatadapatri6213
    @anirudhatadapatri6213 3 года назад +2

    Hi Armaan. I just tried running model_main_tf2.py on my Anaconda command prompt, after a couple of minutes it says "Windows fatal exception:Access violation". What might be the problem? Would really appreciate if you could lend me a helping hand in this. Thanks in advance

  • @tetianaluhacheva7682
    @tetianaluhacheva7682 3 года назад +2

    Hi, i have many questions about it. I'm surprised at how accurately you describe everything. This is the clearest tutorial I've found on youtube.
    First of all, I would like to congratulate everyone on the New Year and wish everyone happiness and health!
    And now to the point.
    I make a model Object Detection with Tensorflow and architecture SSD MobileNet V2 FPNLite 640x640 like you. I use GPU.
    Questions:
    1. You said have to stop training when loss between 0.15 and 0.20. But you have 100 images. I have 756 images of two objects. Is it means than my losses have to be between 0.10 and 0.05 ? If i stop model at the moment when losses are equal 0.02 it's overfitting ?
    2. If numbers of images equal 756, have I to change in pipeline.config the number epochs in field num_epoch. I put 1 and 3(optimal) earlier, but I didn't see the difference and now 10. I know that the more epochs, it can recognize better , but it will be possible to overfitting if epochs are too many, so what is the optimal number of epochs to put in num_epoch for Model Net 2, it's important ?
    3. I was told that overfitting is impossible if I use this method of creation model Mobile Net, that you used. It's true?
    4. If it is wrong, what are conditions of overfitting and how will I notice overfitting, how to understand what it is happening, and under what characteristics of model MobileNet in pipeline.config ?
    I think the answers to these questions will help me solve this problem :
    - the model recognizes perfectly the objects that it should, but at the same time it recognizes arbitrary objects that are not even an object. It's just like areas. It just outlines areas that are not understandable, but somewhat similar to each other, but she should not recognize these areas. I think this is underfitting , and I want to change that by adjusting the number of eras.
    I would be glad for any help.
    It's vary important for me !
    Thank you for the answers !

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! If you're facing accuracy issues. you can definitely try training for longer. I'm not sure the size of the dataset affects the target loss. However, you may have too many images. You can try maybe halving your dataset. If you're not finding success in adjusting epochs, I'd look into lowering learning rate for more precision. You cam notice overfitting if there aren't any detections at all. Good luck! Let me know if you have any more questions.

    • @tetianaluhacheva7682
      @tetianaluhacheva7682 3 года назад

      @@Armaan_Priyadarshan Okay ! Thank you !!

    • @GarethBolton
      @GarethBolton 3 года назад

      @@tetianaluhacheva7682 Hey! What number did you set your num_epochs too and roughly how long did you train? I have roughly the same number of images as you and I think I need to let it sit between 0.10 and 0.05? Curious to know how you setup your pipeline! :)

    • @tetianaluhacheva7682
      @tetianaluhacheva7682 3 года назад +1

      ​@@GarethBolton I set many value of num_epochs : 1 - 20. The model seems to have gotten better with last values 20. But sometimes it seems to me that the num_epochs does not matter. Ok, 0.10 and 0.05, did you mean loss between 0.10 and 0.05. If "Yes" then read above, Armaan Priyadarshan
      said that the size of the dataset don't affects the target loss. I think the size of the dataset affects the time. You need to sit between 0,20 and 0,15. I understand this in such a way that if you have more images, then the model will go longer in time to a value of 0.20, for example, with 100 images, you will train the model, for example, in 3 hours, and you will see loss = 0.20 after 3 hours. With 370 images, training will be longer, for example 5 hours, you will see loss 0.20 after 5 h (the number of hours are taken only for example and do not correspond to the hours of real training of the model). I set checkpoints, labelmap, train.record, test.reccord, batch size as 4 - because i can't set more, class 2, num_epochs differently each time.

  • @quentinbleneau7753
    @quentinbleneau7753 3 года назад

    Hi, when I run the command to generate the records this error is displayed :
    ...
    File "C:\Users\lynxd\anaconda3\envs\tfod\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 80, in _preread _check
    compat.path_to_str(self.__name), 1024 * 512)
    UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x92 in position 122: invalid start byte
    Can you help me pls?

  • @sathyabalamurugan4095
    @sathyabalamurugan4095 3 года назад +2

    Hi i tried implementing the transfer learning model as shown and ran into the same error that u faced when training. Could you let me know what changes u made in order to run the program? Cheers

  • @gibsonvarghese7932
    @gibsonvarghese7932 2 года назад

    How to solve == AttributeError: module 'tensorflow' has no attribute 'contrib' , iam traning with google colab , tensorflow 2.3.0 , the error shows when run this line of code " ! sudo python3 model_builder_tf2_test.py install ", can you help me !.

  • @celalutku
    @celalutku 4 года назад +3

    Hello Armaan,
    This was the best tutorial video about Training a Custom Object Detector with TensorFlow2. I did everything on tutorial and it worked like a charm. Thank you very much. Now, I want to convert my trained model to tflite in order to use on an Android device. Can you make a tutorial video for that step?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Hi! I'm glad it worked for you! I've been looking in to the TensorFlow Lite Conversion for a bit as well. So far, I've had quite a few issues due to model shape and structure. I'll try to make a video with more details when I figure it out, but you can look at these links for more information on the tflite conversion tool.
      www.tensorflow.org/lite/convert/python_api
      www.tensorflow.org/lite/convert/cmdline?hl=el

    • @celalutku
      @celalutku 4 года назад

      ​@@Armaan_Priyadarshan Hello! I've tried those links at first but I guess the converters are problematic right now.
      Converter works with the code below but the generated tflite model doesn't work on Android. I think model shape and structure must be defined well. In addition, with out "tf.lite.OpsSet.SELECT_TF_OPS" argument converter fails. When you use "tf.lite.OpsSet.SELECT_TF_OPS" argument Android can't interfere with the model (I guess it is related to that TensorFlow Lite lack some of the OPs used in TensorFlow 2.)
      github.com/tensorflow/tensorflow/issues/42114#issuecomment-671593386
      Moreover, I have found an issue record on Tensorflow's github. Maybe this is the problem and I hope they will fix the issue soon.
      github.com/tensorflow/models/issues/9033
      Please let me know if you have any progress on converting to tflite and running on an Android device.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +2

      celalutku Oh ok. I am not too familiar with TensorFlow on Android, but if I make any progress I’ll let you know.

    • @maxbro1844
      @maxbro1844 4 года назад

      Tutorial on converting to TFLite model please ! :D

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Max Bro Hi! Unfortunately there has been some errors with TensorFlow Lite Conversion with TF2. It ended up messing up the original model and converting it to a single layer and the given tflite model had a few errors as well.😞

  • @user-md1hq4bg6g
    @user-md1hq4bg6g 4 года назад +1

    Hi, I have a problem: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 117: invalid continuation byte
    and
    TypeError: memoryview: a bytes-like object is required, not 'str'?
    HELP!!!!

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Эмиль Дюмеев This is a sign that one of the paths provided is invalid. Make sure you have provided the right path to each needed file or directory.

  • @nikolajkatkjr5058
    @nikolajkatkjr5058 3 года назад

    the setup.py, that is runed by issuing "python -m pip install ." is upgrading tensorflow to version 2.4.1 and numpy to 1.19.5

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      This should be fine unless you want to use an older version.

  • @luyaoxu1413
    @luyaoxu1413 3 года назад +1

    Good Job! Excellent tutorials

  • @adelinvaduva7832
    @adelinvaduva7832 3 года назад

    Hi there, I started with my own model and trained it but now a just want to train for other things and when I start the training there is continuing from the step where I stopped the process last time. For example it last time I stopped the process at 6000 steps and now when I start it again with different pictures and files it starts from step 6000 and I'm just wondering if it is okay or I should do a cleanup or start with a clean setup again. Thank you in advance for your answer!

  • @bhatvishvesh8
    @bhatvishvesh8 4 года назад +1

    This tutorial is brilliant! Thanks a lot

  • @andreacecchetto5748
    @andreacecchetto5748 3 года назад

    Hello I am trying with Raspberry 4/Raspbian but always stopped at workspace preparation (Python m pip install). Error is something like tensorflow has no attribute contrib. Any advice? Thanks

  • @sohailali5741
    @sohailali5741 4 года назад

    Very easy tutorial, thank you so much. It would be great if you can make tutorial on how to evaluate this trained model in terms of mAP (mean average precision) or Average precision and Intersection Over Union IoU.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Hi! I'm glad everything worked for you. Unfortunately, I'm not entirely sure how to measure model mAp or IoU. All model metrics available can be monitored with TensorBoard. I don't know if I can make a video guide, but the written tutorial can be found here: github.com/armaanpriyadarshan/Training-a-Custom-TensorFlow-2.x-Object-Detector#monitoring-training-with-tensorboard-optional

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
      To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.

  • @ashishshrivastava9966
    @ashishshrivastava9966 3 года назад

    Hello. I am using tensorflow version 2.4.1 in my system. In your video when you have run the training command for the first time you encountered an eerie named "function call stack: _dummy_computation_fn". I am also encountering same issue. How to solve this issue. Please let me know.

  • @mikecooper8142
    @mikecooper8142 3 года назад

    Great video Armaan. Keep up with good work.
    I have a real quick question that, I want to disable the confidence percentage is shown on each bounding box. How can I deal with that?
    Thank you,

  • @james870123
    @james870123 3 года назад

    I got an error here when i run model_main_tf2.py, it says ModuleNotFoundError: No module named 'official.modeling.optimization' what's the problem?

  • @handeozcan1297
    @handeozcan1297 3 года назад

    How long does the "python -m pip install ." command take. Mine is taking too long. İs it a problem ? I could never finish

  • @TheOfficialTester
    @TheOfficialTester 4 года назад +2

    hello just wandering how and if changing the number of epochs would help improve the model thanks

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Hi! This is a great question! I haven't done much experimenting, so I'm not what the answer is. You can test it out and compare evaluation results with different epochs. Evaluating the model is a step I added recently so it's only in the written guide.

    • @TheOfficialTester
      @TheOfficialTester 4 года назад +1

      @@Armaan_Priyadarshan i looked at the pipeline.config file and see I can change the number of epochs there but when would it start running the second epoch as with the default of one the model training seem not to stop or have i just not run it for long enough. thanks

  • @jojushaji3010
    @jojushaji3010 3 года назад

    Wow good job buddy

  • @endlessformsmostbeautiful8442
    @endlessformsmostbeautiful8442 3 года назад

    Hey again, I had a friend ask me, if he trains his model on his own images of say 100, and later wants to add say another 100 images, is there a way to train on top, so keep the previous 100 images it learnt on and add the 100 more, or do you have to retrain the whole thing on the 200 images instead. Thank you, this is all for educational purpose at university. Thanks again. P.s If this is possible would you consider making another video? I am sure there are lots of others out there that could benefit too

  • @solongotserendorj8203
    @solongotserendorj8203 4 года назад

    Hello. i am trying to detect beverage bottles on the shelf. How to improve accuracy.... Please give me any tips....

  • @Armaan_Priyadarshan
    @Armaan_Priyadarshan  4 года назад +2

    *Update*
    I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
    To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.

    •  4 года назад

      bro, cam is opening but it does not detect any object? can you help me?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      @ I don't know if that's a camera issue. Make sure to use pip install opencv-python and does your model work on images or video?

    •  4 года назад

      Armaan Priyadarshan pic or cam work perfectly but they dont detect any object.

    •  4 года назад

      Armaan Priyadarshan actually while training i got an error but i ignored, does it error cause problem?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      @ That's probably why it's not detecting. Make sure training is working.

  • @bhawna1997
    @bhawna1997 3 года назад

    Hi, Can you tell how can I get the bounding boxes for each detected class? Like right now I am getting an array with 100 boxes' coordinates if i print the boxes in the visualize_boxes_and_labels_on_image_array function from viz_utils. Can you look into this please??

  • @jaswanthreddysareddy1682
    @jaswanthreddysareddy1682 3 года назад

    Can you please say in which part of the code you are save the model and checkpoint - when we run model_main_tf2.py file at the end we will get saved model so which part of code it is doing

  • @하진석-t6f
    @하진석-t6f 3 года назад

    Thank you for video. It is very helpful! I have a question about the model config file. In that file, there are some lines like eval_config, and eval_input_reader. Those eval means validation during the train? or test the model?

  • @husamjalamneh4392
    @husamjalamneh4392 3 года назад +1

    nice work, i wanna add an image generator to get a more accurate model can you help me with that?

  • @vonimanitrasarobidyravelom9612
    @vonimanitrasarobidyravelom9612 3 года назад

    Great job😁!! Can you tell me how many epoch the program use during the trainning and can we change it? And what's will happen if we force(ctrl+c) to stp the training?
    Thank you bro!!

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! Sorry for the late reply, but you should be able to configure epochs in the training pipeline under eval_input_reader. Stopping training means everything done so far will be retained and you can continue whenever you want I believe.

  • @geraldmusandirire5446
    @geraldmusandirire5446 3 года назад

    Do you know how I can get the coordinates of the bounding boxes in real time from your code? I need to use them for something else. Thank you

  • @yanzhoufu2048
    @yanzhoufu2048 3 года назад

    Great work, thanks!!

  • @mikemiller991
    @mikemiller991 3 года назад

    Quick question for you - in the TF-webcam-opencv.py file - let's assume I have 2 classes and want to show the count for each class...what would you suggest is the approach to take to show object 1: count, object 2: count instead of the default Objects Detected that is currently displayed.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! This is totally feasible. I'd add a separate total count variable for each class and just use a conditional statement to add on to each respective variable. To do so, you can use the class of each detection.

  • @edisonvillamer7320
    @edisonvillamer7320 3 года назад +1

    I can't download SSD MobileNet V2 FPNLite 640x640. " This site can't be reached" issue

    • @topherpante6163
      @topherpante6163 3 года назад

      Same problem

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      Try this link download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz

  • @kangsalep4881
    @kangsalep4881 4 года назад +1

    hi Armaan thank you for making this video very easy to follow! just one question, after i finished training my custom dataset to detect couple of sushi types, and try to test it with your TF-image-od.py code, somehow ive managed to produced this error :
    2020-10-05 11:51:36.745875: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error
    my device is as follows:
    windows 10 64bit
    CUDA 10.1
    cudnn 7.6.5
    tensorflow 2.3.0
    python 3.8
    nvidia gt 740m with 425.31 driver (april 2019, this is the 'latest' driver for my gpu)
    someone suggest to reinstall the tensorflow but it still produce this error, and the other suggest to upgrade the latest gpu driver but in my case the latest driver is as old as april 2019, any idea to navigate this issue? thanks again

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Hi! Unfortunately it seems your GPU is a bit too old to run TensorFlow-GPU. I believe you must have a GeForce GTX 650+ Graphics Card. You can try out TensorFlow on the CPU but it’ll be significantly slower.

    • @kangsalep4881
      @kangsalep4881 4 года назад

      @@Armaan_Priyadarshan thanks for your reply! It works with tensorflow cpu. Maybe its time to buy new laptop!

  • @endlessformsmostbeautiful8442
    @endlessformsmostbeautiful8442 3 года назад

    Fantastic Tutorial, is there a way to save the output of the image script and more importantly the video script, would be great to save the video it displays

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      Hi! At the moment, the best option is just taking a screenshot or screen recording. However, the code for saving output wouldn't be too hard to implement. . Here's an example with OpenCV that I believe would be easy to integrate with my code: www.geeksforgeeks.org/saving-a-video-using-opencv/

  • @amarantito
    @amarantito 3 года назад

    hello Armaan.
    Do you know what loss function does the ssd model uses?

  • @barsozkan63
    @barsozkan63 3 года назад

    What was the resolution of the model you used? Did you detect an object from the video, if so what was fps? Thanks for answer :)

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! I used the 640x640 model in the video. Yes, detection worked fine for me. I haven't measure FPS however so I can't answer your final question.

  • @bhuvneshsaini93
    @bhuvneshsaini93 3 года назад

    how to save the checkpoint in every step and also save the best checkpoint?

  • @kesinirajesh7986
    @kesinirajesh7986 3 года назад +1

    Hi Armaan. it is an excellent video, thank you for that. While doing the testing of my custom images, i faced an error like "cv2.error: OpenCV(4.3.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-7o5pnn96\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'". please help me to solve this.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      There are two possible issues. Either your OpenCV installation didn't work, or OpenCV couldn't find the image you provided. In the first case, you can fix it with pip install opencv-python. For the second case, make sure you have provided the path to one, single image in the .jpg or .png file format.

    • @kesinirajesh7986
      @kesinirajesh7986 3 года назад

      @@Armaan_Priyadarshan python TF-image-od.py --image img.jpg - i worked this command and it worked. But this for single image, can we do this programs for many images in a folder at a time

    • @jubileem.sibandajubbs2175
      @jubileem.sibandajubbs2175 2 года назад

      I had the same problem, and I am still looking for a solution, please help

    • @kesinirajesh7986
      @kesinirajesh7986 2 года назад

      @@jubileem.sibandajubbs2175I edited TF-image-od.py file and wrote the commands for multiple images selection, it worked

  • @PhilippDominicSiedler
    @PhilippDominicSiedler 4 года назад

    Amazing! Good job.

  • @nothing21797
    @nothing21797 3 года назад

    Great Work! Wonderful. I tried your way and it worked well . I had this warning /error -" with ops with custom gradients. Will likely fail if a gradient is requested". What should I do ? Please help me out

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! I don't think this is an error. It's very common to encounter many such warnings before training starts. You should try proceeding and let me know if you find any other errors while testing.

  • @mikemiller991
    @mikemiller991 3 года назад

    Armaan - how important is consistent image size when training/labelling the images? I've got big 2000x2000 pixels images - should I downscale these first?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! Yes, I would definitely resize the images. Larger image sizes will lead to longer training times. There aren't too many downsides, so I would go for it.

    • @mikemiller991
      @mikemiller991 3 года назад

      @@Armaan_Priyadarshan Thank you - I did this earlier today and the model trains well. I still am getting some issues every now and then with my gpu blowing up (I have a Geforce 1650 GTX).... I get messages saying it can't allocate enough memory etc. or CUDA memory errors...not sure if there are other settings we can try to limit how much of a GPU it uses . Also would you suggest gpu for training and then maybe cpu for inference?

  • @matnem
    @matnem 3 года назад

    Hey man, great tutorial. Do you know why I get the "module 'tensorflow' has no attribute 'contrib'" error? Have you ever encountered this when trying to train the model? I'm using google colab, maybe that's the issue?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      I'm not sure, I think someone else was asking about this error, but I've never experienced it.

  • @hajaralewi8958
    @hajaralewi8958 3 года назад

    hey, love the video! one question: got any tips to test multiple pictures after training?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! I haven't made a script for testing multiple images at once yet. You might be able to edit one of the programs to cycle through a directory of images. The only other thing I could think of is using a Python terminal to load the model and run on multiple images.

  • @thomasgoode9135
    @thomasgoode9135 4 года назад

    hello
    I'm not sure if you are still answering these comments but I dont have the option to run anaconda prompt as admin. I am doing this tutorial in an account with admin controls but it still doesn't come up. Thanks in advance

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      thomas goode That should be fine for Anaconda, but can you download stuff such as CUDA and cuDNN? Those need Admin Access so just asking.

  • @monaibrahim1482
    @monaibrahim1482 3 года назад

    now i am training the model but the loss is increasing is that normal and it is going to decrease after few steps or this means that i have something wrong..?
    after a while i get loss=nan what should i do

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! This definitely isn't normal. I would try lowering the learning rate maybe.

    • @monaibrahim1482
      @monaibrahim1482 3 года назад

      @@Armaan_Priyadarshan can you please guide me to lower it

  • @nilsk9015
    @nilsk9015 3 года назад

    Hi. Did you get to run evaluation on your data? For me it always says "Waiting for new checkpoint"

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      For me it worked. Other people also had the same issue but it worked eventually for them. You might want to make sure that you have enough checkpoint files in your training directory.

  • @lisimon6779
    @lisimon6779 4 года назад

    Hello Armaan, nice to meet you.
    I have got some problem in the first part.
    Do you have any suggestions?
    Thank you so much!
    >>> import tensorflow as tf
    Traceback (most recent call last):
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 64, in
    from tensorflow.python._pywrap_tensorflow_internal import *
    ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模組。
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "", line 1, in
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\__init__.py", line 41, in
    from tensorflow.python.tools import module_util as _module_util
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\__init__.py", line 40, in
    from tensorflow.python.eager import context
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\context.py", line 35, in
    from tensorflow.python import pywrap_tfe
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tfe.py", line 28, in
    from tensorflow.python import pywrap_tensorflow
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 83, in
    raise ImportError(msg)
    ImportError: Traceback (most recent call last):
    File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 64, in
    from tensorflow.python._pywrap_tensorflow_internal import *
    ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模組。
    Failed to load the native TensorFlow runtime.
    See www.tensorflow.org/install/errors
    for some common reasons and solutions. Include the entire stack trace
    above this error message when asking for help.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      li simon Do you have Visual Studio 2019 with C++ Build Tools?

    • @lisimon6779
      @lisimon6779 4 года назад +2

      @@Armaan_Priyadarshan
      Dear Armaan,
      After installing Visual Studio 2019 with C++ Build Tools, I appear to install the TensorFlow GPU successfully.
      Thank you very much.
      >>> import tensorflow as tf
      2020-09-15 21:50:01.315223: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
      2020-09-15 21:50:01.322622: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
      >>> print(tf.__version__)
      2.3.0
      I will try the remaining parts in this Sunday :)
      Again, thank you!!

  • @NN-bh3wn
    @NN-bh3wn 3 года назад

    Hi, Armaan very easy tutorial. I am getting an error while training the model at this command
    python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config
    and the error is
    tensorflow.python.framework.errors_impl.InvalidArgumentError: NewRandomAccessFile failed to Create/Open: C:\Tensorflow\workspace raining_demonnotations\label_map.pbtxt : The filename, directory name, or volume label syntax is incorrect.
    ; no protocol option

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! Inside your label map, try using forward slashes instead of backslashes in the paths.

    • @NN-bh3wn
      @NN-bh3wn 3 года назад

      Hi, Armaan thank you very very much for the suggestion. Yes, it worked. Training is going on. I also put forward slash in checkpoint, train.record and test.record files too

  • @arifalsyahbana3544
    @arifalsyahbana3544 3 года назад

    great work brother
    in this video we are using fpn 640x640
    is it important to change the resolution if my source image is 1280x720?
    if yes
    i'm already do it but i found
    ValueError: Dimensions must be equal, but are 46 and 45 for '{{node ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/add}} = AddV2[T=DT_FLOAT](ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/Reshape, ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/projection_2/BiasAdd)' with input shapes: [3,46,80,128], [3,45,80,128].
    can you please help me brother?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      I believe each dimension has to be the same. If 1280x720 is your resolution adjust the model parameters to 1280x1280 maybe.

  • @JordanDarbyshire
    @JordanDarbyshire 4 года назад

    Hi Armaan, I wanted to thank you so much for making this video, you literally saved my life. I'm doing an object detection project for my Data Science course. Can you please help me get metrics for evaluating my test set, without having to look at each in OpenCV individually? I need concrete numbers such as Average Precision and Average Recall to compare it to other models I've tried. I can't find how to do this anywhere.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Jordan Darbyshire Hi! I’m glad the video helped you out! As to finding precision, recall, and mAP, I’m sorry to say I haven’t found anything yet. I did a bit of research, and I do have a few ideas. The first is training a different model. The Efficientdet and other TF2 specific models should have the ability to log mAP during training if the argument -alsologtostderr is given while running the training script. The other option I found was using matplotlib for which you can find more info here stackoverflow.com/questions/46274514/precision-recall-curve-in-tensorflow-object-detection-api. I’m sorry for not being able to give a better answer, as there isn’t much documentation or information online.

    • @JordanDarbyshire
      @JordanDarbyshire 4 года назад

      @@Armaan_Priyadarshan Ok thanks a lot Armaan. I know I'm surprised there isn't much information about this. I'll let you know if I find out anything helpful. Keep up the good work, look forward to seeing more videos from you!

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Hi!
      I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
      To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.

  • @muhammadwaqasali4155
    @muhammadwaqasali4155 4 года назад

    @Armaan Priyadarshan Yes, I am using SSD MobileNet V2 FPNLite 640x640 model. And yes i also reduced the batch size upto 4 but no result. My GPU is of 2GB is that really mater in this regards?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Yes, you should try reducing batch_size to 2, or maybe 1 if that doesn't work

    • @edisonvillamer7320
      @edisonvillamer7320 3 года назад

      How did you download the SSD MobileNet V2 FPNLite 640x640 model? "This site can’t be reached" error.

  • @ramanabbaspour8311
    @ramanabbaspour8311 4 года назад +1

    Hi Armaan. Thanks a lot for your amazing video. I was working on this for many days until I saw your video and found the solution. I was wondering how I can export the number of pixels for the box it draws on the pills for image or video? Just like the xml file we used to annotate images but for the output. Thanks again

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Top Games Hmmmm.... This is definitely something new to try. Unfortunately, I haven’t attempted anything similar, so I’m unsure how to help. I’m sure there might be a way using OpenCV as it has a vast number of functions. If you do find anything though, feel free to share and I’d be happy to take a look.

    • @ramanabbaspour8311
      @ramanabbaspour8311 4 года назад

      @@Armaan_Priyadarshan Thanks man. I figured out that "visualize_boxes_and_labels_on_image_array" will return only the image with the box already attached to it. So I modified it and now it returns the coordinates as well as the image.

    • @ramanabbaspour8311
      @ramanabbaspour8311 4 года назад

      @@Armaan_Priyadarshan Another question if its possible. I am wondering if the "ssd_mobilenet_v2_fpnlite" model is trained, why did we train it again? Would this training process configure "ssd_mobilenet_v2_fpnlite" so it can detect our object better? or is it gonna create a new model? Thanks a lot

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Top Games Hi! We mostly use the pre-trained model for the pipeline.config file as well as certain checkpoint files needed for training. The my_ssd_mobilenet_v2_fpnlite folder is then used as a training directory so we can export the model later on. And I’m glad you found a solution. If you take a look at the TF-image-object-counting.py script, OpenCV might be a bit easier to work with than viz_utils as there’s a bit more flexibility. For example formatting and printing the xmin, ymin, xmax, and ymax variables from inside the loop can provide box coordinates if that’s what you wanted to do. Thanks for sharing too! You never know, someone else might be trying the same thing!

  • @papabudzz
    @papabudzz 4 года назад

    How to validate the accuracy? Also how to show the mAp graph,test loss, and validation loss?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Budz Altar You can follow the step regarding TensorBoard and view various model metrics. The written guide on GitHub has better instructions as there was an editing error around that time in the video

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
      To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.

  • @陈周-z1s
    @陈周-z1s 3 года назад

    How to use script export_ inference_ graph.py tensorflow2.3

  • @kbh24758
    @kbh24758 4 года назад

    So what are the meanings of the two folders "test" and "train" in
    C:\TensorFlow\workspace\training_demo\images
    Is it mean that my tensorflow learn what are the things in "train" ?
    Then what about the "test" ?
    Is it just for testing our script work or not ?
    Or basically what I want to ask is "What things should I place in these two folders ?"
    By the way ! You did a great tutorial ! Nice job!

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Hi! The test and train folders contain the images and labels of your test and train set. TensorFlow uses Supervised Learning, so these are required for your dataset. You should place your images in these folders, 20% of them in the test folder and the other 80% in the train folder. After you've labelled your images, you can generate RECORD files and train the model.

    • @kbh24758
      @kbh24758 4 года назад

      @@Armaan_Priyadarshan
      Can these two folder's images been repeat?
      Or it have to totally different?
      Thanks for reply. :>

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      @@kbh24758 You should definitely put different images in each folder. Once you've prepared your dataset, put 4/5 of the images in the training folder and rest inside the test folder.

    • @kbh24758
      @kbh24758 4 года назад

      @@Armaan_Priyadarshan
      Sorry,Still got one problem ><
      Why I can't change another photo to detect?
      It always appear this
      File "TF-image-od.py", line 96, in
      image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
      cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
      Why?

  • @danil-old-web
    @danil-old-web 4 года назад

    Have you a tutorial about classifying images, not detecting object on them?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Секреты Успеха No I haven’t I might look in to that. But from what I’ve seen so far, there’s a command line tool which makes the process quite a bit simpler than this one.

  • @vasut6047
    @vasut6047 3 года назад

    Thanks, boss

  • @AlejandroJimenez-mb3vl
    @AlejandroJimenez-mb3vl 3 года назад +1

    hey nice work ... i'm having the exact same error that you had at time 23:41

    • @markpretorius6854
      @markpretorius6854 3 года назад

      I also get the same area at that exact same point. Please explain how you fixed this error

  • @topherpante6163
    @topherpante6163 3 года назад

    Sir. I can't download the SSD MobileNet V2 FPNLite 640×640. This site can't be reached issue

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      Try download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz

    • @topherpante6163
      @topherpante6163 3 года назад

      Thank you Sir

  • @lionking3608
    @lionking3608 4 года назад

    Hello, how did you fix the error when training @ 23:40?
    Errors may have originated from an input operation.
    Input Source operations connected to node ssd_mobile_net_v2fpn_keras_feature_extractor/model/Conv1/Conv2D:
    fn_1 (defined at C:\Users\a1812\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py:367)

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      It seems the object_detection API didn't build correctly. Maybe try downgrading version to 2.2.0 or repeat steps to build object_detection.

    • @lionking3608
      @lionking3608 4 года назад

      Armaan Priyadarshan Do you mean pip install tensorflow-gpu==2.2.0 ? Because I already tried this but not working

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      @@lionking3608 Are you using TensorFlow CPU or GPU? You aren't able to install?

    • @lionking3608
      @lionking3608 3 года назад

      @@Armaan_Priyadarshan I used TensorFlow GPU 2.2.0 with 960M graphics card and it finally worked thanks bro

    • @lionking3608
      @lionking3608 3 года назад

      ​@@Armaan_Priyadarshan But when I ran TF-image-od.py script. The output image is zoomed in a lot and doesn't fit within my laptop screen. how do I fix this?

  • @KaranSharma-in5ug
    @KaranSharma-in5ug 4 года назад

    How to find test accuracy and train accuracy?
    Where is the test.record file is used

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Karan Sharma Hi! The test.record file is used in the training pipeline.config, it mostly contains information about the labels and bounding boxes given for your images. For model accuracy, you can use the step with TensorBoard to find model metrics. If you want numbers on your pre-trained model, you can check the TensorFlow Model Zoo which has speed and accuracy listed for each pre-trained model.

  • @arielcairoli2473
    @arielcairoli2473 3 года назад

    It is a great tutorial. I could train my model fine but I used the ssd_mobilenet_v2_fpnlite_320x320.
    I have been trying to convert to TFLite but I couldn´t.
    Could you make a tutorial for TFLite conversion.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      I've just posted a video on TensorFlow Lite Conversion here: ruclips.net/video/2ofuUdCDppc/видео.html

  • @woonie3134
    @woonie3134 3 года назад

    Im trying to detect food dishes from pictures and recommend a recipe after it detects what the picture contains. Any idea on how to approach this? I think your video will help training the model to detect the food dishes but how will I recommend a recipe for that dish? Do you think using a recipe api will work?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      Hi! I'm pretty sure this is possible. A recipe API can work too. You can check the label of the food detected and declare a recipe based on that.

    • @woonie3134
      @woonie3134 3 года назад

      @@Armaan_Priyadarshan Tysm!!!!!!!!

    • @woonie3134
      @woonie3134 3 года назад

      @@Armaan_Priyadarshan Hi could u give me some insight on how to check the label after classifying an image? I am not understanding how to do that. If you could make a turorial on this would be so helpful

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад +1

      @@woonie3134 Hi! To do so, you can write some code in the for loop I defined on line 121 in the TF-image-object-counting.py script. This line right here determines the object name of every detection: object_name = category_index[int(classes[i])]['name']. With a simple API, you can probably preserve this information to determine the label of every detection in the frame.

    • @woonie3134
      @woonie3134 3 года назад

      @@Armaan_Priyadarshan I have decided I will be storing the recipes in a dictionary and returning it as a result to the image classified. I choose to do this mainly for simplicity and focus more on classifying food. However I have never worked with dictionaries before so how could I possibly evaluate a result from it? If you have any knowledge on this I would aprecciate if you shared it 😁

  • @bawz97
    @bawz97 4 года назад

    Is there any way to run this on my mac? If not, is there an equal alternative that is similar? Thank you

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Mr.Sloth you should be able to just adjust the commands and paths

    • @bawz97
      @bawz97 4 года назад

      @@Armaan_Priyadarshan Well, the problem is the Cuda download. It is only a limited version for MAC

  • @GamingIn30s
    @GamingIn30s 3 года назад

    Hey bro , I have done everything exactly same . but at the end when the image is displayed without any detection. is there anyhthing i need to change? , kindly reply.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      You should try lowering the confidence threshold to see if any detections come up. This can be done by specifying it in the command (eg. python TF-image-od.py --threshold 0.3)

    • @GamingIn30s
      @GamingIn30s 3 года назад

      @@Armaan_Priyadarshan Hii , Thank you so much for your reply. I cant appreciate enough. I tried by reducing the threshold to 0.3 , it didn't work . but changed it to 0.2 then it show some detection but not good enough. what do you think the problem is,?. I stopped training after loss reached 0.205 , it could get any lower as I waited for 6000 steps .what do you think the problem is?
      Again thank you so much:-)

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@GamingIn30s That seems to a fine loss, I'm not sure if that's the problem. By any chance, could you tell me what you're trying to detect?

    • @GamingIn30s
      @GamingIn30s 3 года назад

      @@Armaan_Priyadarshan I am not using any new data set . I am first trying to duplicate your results. I am using your dataset and trying to get same results as you. I trained again and i got the loss 0.19 but still there are no proper detections.

  • @user-db2md6wv3c
    @user-db2md6wv3c 3 года назад

    Hello, my computer has a Nvidia Geforce RTX 3080 GPU. I installed cuda 10.1 ,cudnn 7.6.5 and network Faster R-CNN ResNet50 V1 640x640. When I trained the network, loss is nan. How to solve the question?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! The last time I tried using ResNet models there were quite a few errors, so I haven't revisited it yet, Secondly, you'll want to update your CUDA and cuDNN versions to the newest version of TensorFlow. Here's a link to the tested configurations. www.tensorflow.org/install/source#gpu

    • @user-db2md6wv3c
      @user-db2md6wv3c 3 года назад

      @@Armaan_Priyadarshan When I refered to this web( www.tensorflow.org/install/source#gpu), I installed cuda 10.1 and cudnn 7.5.0(Cudnn doesn't have 7.4) with windows 10. I have an error:
      2021-01-19 10:06:57.282583: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
      2021-01-19 10:08:00.512682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
      2021-01-19 10:18:18.632749: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
      2021-01-19 10:18:18.638607: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
      Traceback (most recent call last):
      File "model_main_tf2.py", line 114, in
      tf.compat.v1.app.run()
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
      _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run
      _run_main(main, args)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
      sys.exit(main(argv))
      File "model_main_tf2.py", line 105, in main
      model_lib_v2.train_loop(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop
      load_fine_tune_checkpoint(detection_model,
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 367, in load_fine_tune_checkpoint
      strategy.run(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1211, in run
      return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2585, in call_for_each_replica
      return self._call_for_each_replica(fn, args, kwargs)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 584, in _call_for_each_replica
      return mirrored_run.call_for_each_replica(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 78, in call_for_each_replica
      return wrapped(args, kwargs)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
      result = self._call(*args, **kwds)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
      return self._stateless_fn(*args, **kwds)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
      return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
      return self._call_flat(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
      return self._build_call_outputs(self._inference_function.call(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
      outputs = execute.execute(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
      tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
      tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
      (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
      [[node functional_1/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
      [[Loss/RPNLoss/BalancedPositiveNegativeSampler/Cast_8/_302]]
      (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
      [[node functional_1/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
      0 successful operations.
      0 derived errors ignored. [Op:__inference__dummy_computation_fn_21950]
      Errors may have originated from an input operation.
      Input Source operations connected to node functional_1/conv1_conv/Conv2D:
      functional_1/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
      esnet_v1.py:49)
      Input Source operations connected to node functional_1/conv1_conv/Conv2D:
      functional_1/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
      esnet_v1.py:49)
      Function call stack:
      _dummy_computation_fn -> _dummy_computation_fn

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@user-db2md6wv3c Hi! I believe your versions are outdated. TensorFlow 2 versions can't use cuDNN 7.5. The newest version supports CUDA 11.0 and cuDNN 8.0. The download links are: developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork and
      developer.nvidia.com/compute/machine-learning/cudnn/secure/8.0.4/11.0_20200923/cudnn-11.0-windows-x64-v8.0.4.30.zip

    • @user-db2md6wv3c
      @user-db2md6wv3c 3 года назад

      @@Armaan_Priyadarshan I refered to your instruction to install cuda and cudnn.
      But I have an error:
      See `tf.nn.softmax_cross_entropy_with_logits_v2`.
      2021-01-20 15:11:34.676290: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
      2021-01-20 15:11:35.241074: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
      2021-01-20 15:11:35.255702: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
      2021-01-20 15:11:36.015093: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
      2021-01-20 15:11:36.017068: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
      2021-01-20 15:11:36.021009: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
      2021-01-20 15:11:36.022768: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
      Traceback (most recent call last):
      File "model_main_tf2.py", line 114, in
      tf.compat.v1.app.run()
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
      _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run
      _run_main(main, args)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
      sys.exit(main(argv))
      File "model_main_tf2.py", line 105, in main
      model_lib_v2.train_loop(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop
      load_fine_tune_checkpoint(detection_model,
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 367, in load_fine_tune_checkpoint
      strategy.run(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1259, in run
      return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2730, in call_for_each_replica
      return self._call_for_each_replica(fn, args, kwargs)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 628, in _call_for_each_replica
      return mirrored_run.call_for_each_replica(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 75, in call_for_each_replica
      return wrapped(args, kwargs)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
      result = self._call(*args, **kwds)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
      return self._stateless_fn(*args, **kwds)
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
      return graph_function._call_flat(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
      return self._build_call_outputs(self._inference_function.call(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
      outputs = execute.execute(
      File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
      tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
      tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
      (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
      [[node model/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
      [[Loss/ToAbsoluteCoordinates/Assert/AssertGuard/pivot_f/_83/_55]]
      (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
      [[node model/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
      0 successful operations.
      0 derived errors ignored. [Op:__inference__dummy_computation_fn_16270]
      Errors may have originated from an input operation.
      Input Source operations connected to node model/conv1_conv/Conv2D:
      model/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
      esnet_v1.py:49)
      Input Source operations connected to node model/conv1_conv/Conv2D:
      model/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
      esnet_v1.py:49)
      Function call stack:
      _dummy_computation_fn -> _dummy_computation_fn

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@user-db2md6wv3c Have you tried downloading the most recent driver version for your graphics card?

  • @陈周-z1s
    @陈周-z1s 3 года назад

    hello guys ,Follow your training model,it's OK!But I used tensorflow C API to load the model,report errors!Address at if (TF_ GetCode(status) != TF_ OK) ,pls why ?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      I'm not entirely sure. I haven't done much work with the C API, so I don't have the answer to that.

    • @陈周-z1s
      @陈周-z1s 3 года назад

      @@Armaan_Priyadarshan python export_ tflite_ ssd_ graph.py
      What's his second parameter? Is pipeline.config?
      why error 'utf-8' codec can't decode byte 0xbe in position 140: invalid start byte

    • @陈周-z1s
      @陈周-z1s 3 года назад

      @@Armaan_PriyadarshanI want pb to tflite

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@陈周-z1s Hi! I'll try to put out a tutorial when Raspberry Pi Support is released as I feel it might do a bit better. Here's a guide you can follow for now. github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md
      You can add me on discord with the tag in the description if you want more detailed instructions.

  • @bigyansubedi3386
    @bigyansubedi3386 3 года назад

    Hey great tutorial, I wanted to ask you How do I make the model_main_tf2 to evaluate only??

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from the training demo directory. Downgrade Numpy to 1.17.3 if you get Numpy errors.

    • @bigyansubedi3386
      @bigyansubedi3386 3 года назад

      @@Armaan_Priyadarshanoh thank you very much for the solution but how am i supposed to know it is running in evaluation mode

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@bigyansubedi3386 It will provide various metrics instead of showing training step logs.

    • @bigyansubedi3386
      @bigyansubedi3386 3 года назад

      Hey thank you for the advice i evaluated the model i found some images that were predicted correctly but there is no scalar or graph for any precision or accuracy.I don't understand why?

  • @muhammadwaqasali4155
    @muhammadwaqasali4155 4 года назад

    Hello! @Armaan Priyadarshan As per your suggestion i downgraded my tensor flow from version 2.3.0 to 2.2.0. and now i am getting the new error, i.e.
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
    (0) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
    [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    [[Loss/regularization_loss/write_summary/summary_cond/pivot_t/_24/_219]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    (1) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
    [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
    Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
    0 successful operations.
    0 derived errors ignored. [Op:__inference__dist_train_step_44094]
    Function call stack:
    _dist_train_step -> _dist_train_step
    Do you have any idea, how to resolve it?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Muhammad Waqas Ali May I ask which pre-trained model you’re using? And can you try reducing the batch_size in your pipeline.config to 4

    • @muhammadwaqasali4155
      @muhammadwaqasali4155 4 года назад

      @@Armaan_Priyadarshan Yes, I am using SSD MobileNet V2 FPNLite 640x640 model. And yes i also reduced the batch size upto 4 but no result. My GPU is of 2GB is that really mater in this regards?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Muhammad Waqas Ali Oh yes, that might be a bit of an issue. Try reducing to 2 and if that doesn’t work reduce it to 1 and try retraining otherwise you might get OOM errors.

  • @ismailfarhan7603
    @ismailfarhan7603 3 года назад

    can i put the price for the object model?

  • @Armaan_Priyadarshan
    @Armaan_Priyadarshan  4 года назад +2

    *Update*
    I've just made a tutorial for the Raspberry Pi as quite a few people had questions about it: ruclips.net/video/PWMQQAL0PCM/видео.html

  • @carolinelee3398
    @carolinelee3398 4 года назад

    Hi Armaan, I actually got the same error as you did at 23:44. Can you tell me how you fixed this on your computer?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Caroline Lee Hi! I think I got this error while recording as training couldn’t run simultaneously with OBS due to the limited system resources. I rebuilt the Object Detection API and fixed all the Python package versions just to make sure this was the case. Some others fixed the error by downgrading the TensorFlow version to 2.2.0.

    • @carolinelee3398
      @carolinelee3398 4 года назад

      @@Armaan_Priyadarshan thank you! That was it. My system only had 3 gb of graphic memory but even though I put it as a a batch size of 3, I still had to reduce it to 2. Also closed some background programs out too which helped

    • @carolinelee3398
      @carolinelee3398 4 года назад

      @@Armaan_Priyadarshan I got another error when I was trying to export the inference graph: TypeError: 'NoneType' object is not iterable.
      The third error occurred when I ran the python TF-image-od.py file:
      Running inference for images/test/i-1e092ec6eabf47f9b85795a9e069181b.jpg... Traceback (most recent call last):
      File "TF-image-od.py", line 96, in
      image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
      cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
      Any ideas on how to fix these errors? I'm trying to train a model to detect potholes in the road, so I'm not using your test files (but when I did use yours it worked perfectly). I can't figure out why Python is still looking for the old files since it was replaced with the pothole ones. I did the pip install open-cv-python command and I still had the same error.
      Also if we're testing the model out on the images under the "test" folder, why do we still need to label them?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Caroline Lee Hi! I’m pretty sure I’ve figured out how to fix your issue. The path to the image provided to the program seems to still be the default. You can provide the path to your image with the -image argument or you can edit the default path from within the program. To answer your second question, I believe TensorFlow uses the test set to repeatedly train and provide the loss. Unlike the train set, the test set will not be immediately recognized by the model and is still a decent way to measure accuracy and test.

    • @carolinelee3398
      @carolinelee3398 4 года назад

      @@Armaan_Priyadarshan Thanks for answering!
      The pothole images were placed in the same place as the pill images. I'm not sure what you mean by the image argument, I looked at the python file and it looks like you already have that there (args.images) so I created a copied the path of the test folder and pasted into where it asked for the image paths. It threw me the same error. How would I go about editing the default path from the program?
      Update: I think I know what you mean now. I edited the python file a bit and ran this in anaconda: TF-image-od.py --image images/test/pothole.jpg, but I still got hit with this error:
      Running inference for {'image': 'images/test/pothole.jpg'}... Traceback (most recent call last):
      File "TF-image-od.py", line 98, in
      image = cv2.imread(IMAGE_PATHS)
      SystemError: returned NULL without setting an error
      Update 2: Alright, nevermind I think I got it. Do you have any tips on how I should train the model? I'm think I should go from non-busy background > busy background because right now it's identifying the sky as the pothole lol

  • @anuragdalal6908
    @anuragdalal6908 4 года назад

    I'm getting this error: "tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes at component 1: expected [6,640,640,3] but got [6,640,640,1]." Probably because my images are grayscale. Can you say what I have to change for using grayscale image? Thanks.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      I'm not sure if this is because your images are grayscale, it might be a model format or model shape error. Can you give me some more information on what step you get this error on. Are you maybe using a different model or pipeline?

    • @anuragdalal6908
      @anuragdalal6908 4 года назад

      @@Armaan_Priyadarshan , No I am using same models. In this step:
      python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config

    • @anuragdalal6908
      @anuragdalal6908 4 года назад

      and I have bmp files not jpeg or png

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      ​@@anuragdalal6908 Hi, sorry for the late reply but my comments weren't posting for some reason. Unfortunately TensorFlow is used to triple-channel RGB images; Grayscale images are known as single-channel. I'm not entirely sure how to help with this issue, but I found some links with more info. stackoverflow.com/questions/48744666/tensorflow-object-detection-api-1-channel-image & github.com/tensorflow/models/issues/3369
      And it would be great if you could convert the bmp images to jpg with an online converter just in case it matters.

    • @anuragdalal6908
      @anuragdalal6908 4 года назад

      @@Armaan_Priyadarshan I will try to do it using opencv then, I mean the image conversion

  • @muhammadwaqasali4155
    @muhammadwaqasali4155 4 года назад

    Hello, Firstly thank you so much for this tutorial. Well I am getting some error on running " python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config "
    and the error is that, " tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse " . Will you please tell me how to sort it out? Thank You.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Hi! I'm unsure of this error. Did every previous step work? Can you try downgrading to TensorFlow 2.2.0 and re-trying?

    • @muhammadwaqasali4155
      @muhammadwaqasali4155 4 года назад +1

      @@Armaan_Priyadarshan Thank you so much for your reply. let me try it on 2.2.0, will inform you.

    • @muhammadwaqasali4155
      @muhammadwaqasali4155 4 года назад +1

      Every previous steps worked fine.

    • @muhammadwaqasali4155
      @muhammadwaqasali4155 4 года назад

      Hello! @@Armaan_Priyadarshan As per your suggestion i downgraded my tensor flow from version 2.3.0 to 2.2.0. no i am getting the new one, i.e.
      tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
      (0) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
      [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
      Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
      [[Loss/regularization_loss/write_summary/summary_cond/pivot_t/_24/_219]]
      Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
      (1) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
      [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
      Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
      0 successful operations.
      0 derived errors ignored. [Op:__inference__dist_train_step_44094]
      Function call stack:
      _dist_train_step -> _dist_train_step
      Do you have any idea, how to resolve it?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      @@muhammadwaqasali4155 That seems to be an OOM error. Sorry for the late reply, but you should try reducing your batch_size to 2 or maybe 1 and see what happens.

  • @josecarloscouso3466
    @josecarloscouso3466 3 года назад

    Hello, thank you for your awesome tutorial. I just encountered an error at the end when trying to do the training.
    tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
    [[node EfficientDet-D0/model/stem_conv2d/Conv2D (defined at C:\Python36\lib\site-packages\object_detection\models\ssd_efficientnet_bifpn_feature_extractor.py:220) ]] [Op:__inference__dummy_computation_fn_24408]
    Errors may have originated from an input operation.
    Input Source operations connected to node EfficientDet-D0/model/stem_conv2d/Conv2D:
    args_1 (defined at C:\Python36\lib\site-packages\object_detection\model_lib_v2.py:372)
    Function call stack:
    _dummy_computation_fn
    Do you happen to know what might be causing this error? I've run two different models, and still get the same error.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      I've only encountered this error when my system was overtaxed (like when recording). To fix it, the only methods that worked for me were either restarting my system or reinstalling TensorFlow.

    • @ashishshrivastava9966
      @ashishshrivastava9966 3 года назад

      @@Armaan_Priyadarshan After restarting my system and even keeping the batch size as 1 and after reinstalling the tensorflow version I encountered "Allocator GPU_0_bfc ran out of memory trying to allocate 2.26GiB" message. Even though script continues to run but nothing appears in the command prompt console with regards to loss information. How to resolve it. Please tell....

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@ashishshrivastava9966 Hi! In this case, you'll want to lower the batch size in the pipeline. First try 4, then if it still doesn't work, you can try 2, and then 1.

    • @ashishshrivastava9966
      @ashishshrivastava9966 3 года назад

      @@Armaan_Priyadarshan In my case I had kept batch size equal to one only....but I am getting the above error.

  • @kabak2abak625
    @kabak2abak625 4 года назад

    Hi, I am struggling to continue until testing the installation (python ...\model_builder_tf2_test.py). I always receive the following message:
    from tensorflow.python.keras.layers.preprocessing import image_preprocessing as image_ops
    ImportError: cannot import name 'image_preprocessing'
    TF version is 2.10
    Keras is 2.3.1
    I did install Microsoft Built tools 2019, cuda 10.1, cudnn 7.6. Unfortunately, I have to downgrade my TF due to persistent error messages like DLL failed load, etc.
    I have been doing this for 3 days.
    Where am I doing wrong here?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      kabak2abak Do you have Microsoft Visual Studio 2019 with C++ Build Tools because I believe that is dependency for TensorFlow 2.3.0. And just a question, are you using the CPU or GPU version?

    • @kabak2abak625
      @kabak2abak625 4 года назад

      @@Armaan_Priyadarshan I followed every of your instructions, including download CUDA, cudnn, C++ Build Tools. When I tried to import tensorflow, I always keep receiving 'ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.'
      I installed pip install tensorflow-gpu
      Tensorflow GPU: 2.3.0
      Python: 3.8.3

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      kabak2abak Can you try restarting the process, as I usually find this to be cause of most errors. Following the written guide on GitHub might help. Try uninstalling and reinstalling MSVC, Visual Studio, Anaconda, and CUDA & cuDNN. I’ve seen this error before, and it’s usually just a Visual Studio Error. Make sure everything is on your PATH as well because this can be important!

    • @kabak2abak625
      @kabak2abak625 4 года назад

      @@Armaan_Priyadarshan Well, I will try. I have been doing the installation more than 3 days now, and none of which is successful. You mean I should uninstall MSVC, Visual Studio, Anaconda, and CUDA & cuDNN? then reinstall them again?
      What does it mean by everything is on my Path?
      I have set environment for all. The variables are PythonPath and Path. Should I remove them as well before uninstalling?

    • @kabak2abak625
      @kabak2abak625 4 года назад

      @@Armaan_Priyadarshan I had been reinstalling everything. I still cannot get it done, unfortunately. Do you install nvidia drivers?

  • @ernestoyounes2946
    @ernestoyounes2946 4 года назад

    yo how do i fix this, it took like forever man...
    INFO:tensorflow:Waiting for new checkpoint at models\my_ssd_mobilenet_v2_fpnlite
    I1010 16:50:29.150513 23484 checkpoint_utils.py:125] Waiting for new checkpoint at models\my_ssd_mobilenet_v2_fpnlite

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Which step did this occur at? If you have finished training, could you let me know how many checkpoint files you have in your models\my_ssd_mobilenet_v2_fpnlite directory.

    • @ernestoyounes2946
      @ernestoyounes2946 4 года назад

      @@Armaan_Priyadarshan never mind man... it worked, i cancelled the waiting for checkpoint then i basically just do the next step

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Ernesto Younes That’s great!

  • @solongotserendorj8203
    @solongotserendorj8203 4 года назад

    Thank you

  • @NevaEdizUfuk
    @NevaEdizUfuk Год назад

    hello armaan i geting like fail
    (tf2) C:\Tensorflow\workspace\training_demo>python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config
    Traceback (most recent call last):
    File "__init__.pxd", line 942, in numpy.import_array
    RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "C:\Tensorflow\workspace\training_demo\model_main_tf2.py", line 32, in
    from object_detection import model_lib_v2
    File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\model_lib_v2.py", line 29, in
    from object_detection import eval_util
    File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\eval_util.py", line 35, in
    from object_detection.metrics import coco_evaluation
    File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\metrics\coco_evaluation.py", line 25, in
    from object_detection.metrics import coco_tools
    File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\metrics\coco_tools.py", line 51, in
    from pycocotools import coco
    File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\pycocotools\coco.py", line 56, in
    from . import mask as maskUtils
    File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\pycocotools\mask.py", line 3, in
    import pycocotools._mask as _mask
    File "pycocotools\_mask.pyx", line 23, in init pycocotools._mask
    File "__init__.pxd", line 944, in numpy.import_array
    ImportError: numpy.core.multiarray failed to import

  • @wgf91045
    @wgf91045 4 года назад

    Hi, thanks for sharing this video. I have fished training models with my own data. However, when I tried to test out the finished model, I got the following error:
    --------------------------------------------------------------------------------------------------------------
    Loading model...Done! Took 24.696925401687622 seconds
    Running inference for images/test/i-1e092ec6eabf47f9b85795a9e069181b.jpg... Traceback (most recent call last):
    File "TF-image-od.py", line 96, in
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
    ----------------------------------------------------------------------------------------------------------------
    I've tried to install opencv-python with version 4.3.0.38 but still got this error.
    Could you please help me fix it? Thanks a lot.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      You don't need to install that version I don't think. I'm pretty sure you didn't provide the right path to the image. The program seems to be running inference on the default image which you shouldn't have unless you're using my dataset. You can use the --image argument to specify the path to the image, or you can edit the default value in the program.

    • @wgf91045
      @wgf91045 4 года назад

      @@Armaan_Priyadarshan Thanks for your reply. Yeah, you're right. I just forgot to specify my directory to the image. The rest of the program ran perfectly as expected. This tutorial is very helpful.

  • @majstor76
    @majstor76 4 года назад

    working now trough tutorial, i stuck a bit at git of tensorflow models, maybe it would be good to put a link to git, for us nonprogramers can be big problem why command doesn't work

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Majstorsky Majstor Hi! This is a good idea. You can install git from here:git-scm.com/download/win. Then just open up a new terminal and everything should be good to go.

    • @majstor76
      @majstor76 4 года назад

      @@Armaan_Priyadarshan tnx , im already finished and currently training the model. I tried lots of tutorials but this one is the best, its quite the achievement to make tutorial for such heavy field and that average person can follow it.BTW I tried to start tensorboard server but it cannot recognize command? And is there option for deploying this object detector in a simple way , like some exe app that you just install and use?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Majstorsky Majstor Thanks! The step with TensorBoard is optional and not necessary for training. If you are interested, you should follow the written tutorial as there was an editing error in the video at that time stamp. As for making a simple application, that might take a bit of work. TensorFlow is a popular field when it comes to Android and iOS, but it’s a bit more of a complicated process.

    • @majstor76
      @majstor76 4 года назад

      @@Armaan_Priyadarshan here is my first try ruclips.net/video/C1eR6todLto/видео.html . Is there a way to make detection cutoff, for example to show only detections above some number , like 90%. And is there a bit of GUI where i can load files and start scripts? There will be lots of typing if i want to check 100 files.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Majstorsky Majstor There is an argument while running called threshold where you can specify your minimum confidence. For example python TF-image-is.py -threshold 0.9 will only show detections above 90% confidence. As for a GUI, that will take a little editing of the program. However, you can make it perform inference on a directory of images.

  • @krishdesai1097
    @krishdesai1097 3 года назад

    Can you use this on a Macbook Pro 2020?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! You can, but you'll have to use the TensorFlow CPU variant. This means training will take longer and you won't be able to use your computer during training. It's probably possible though.

    • @krishdesai1097
      @krishdesai1097 3 года назад

      @@Armaan_Priyadarshan How long would the training take if I use the CPU variant? Do you go over the CPU variant in the video?

    • @krishdesai1097
      @krishdesai1097 3 года назад

      I also do not have an NVIDIA chip

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@krishdesai1097 I attempted it on my old laptop, and it took a few days. Currently, my estimate would be at least one day.

  • @muhammadhammadsaleem8569
    @muhammadhammadsaleem8569 3 года назад

    Impressive work @
    Armaan Priyadarshan ... Please guide me with this error after I ran python object_detection\builders\model_builder_tf2_test.py
    "tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse"
    Please note that I have reinstalled tensorflow-gpu==2.2.0 but still getting this error.
    Thanks.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      This error occurs when you have a background process running. For example, if you are running two python programs that use TensorFlow at the same time. I'd recommend just restarting your machine and retrying.

    • @muhammadhammadsaleem8569
      @muhammadhammadsaleem8569 3 года назад

      @@Armaan_Priyadarshan Thanks for the reply.. I restarted the system, but still getting the error.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@muhammadhammadsaleem8569 Hi! Were you able to compile your protos correctly?

    • @muhammadhammadsaleem8569
      @muhammadhammadsaleem8569 3 года назад

      @@Armaan_Priyadarshan Yeh I compiled protos and checked the .proto files are displayed in the protos folder.

  • @user-db2md6wv3c
    @user-db2md6wv3c 4 года назад

    Does it have a webcam to display ?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Not yet, but I will be adding that in a few hours.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      I've added webcam support if you were still wondering

    • @shubham2190
      @shubham2190 4 года назад

      @@Armaan_Priyadarshan where can i find the code for implementing object detection through webcam?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад +1

      Shubham The script should be located in the workspace/training_demo directory. It’s called TF-webcam-opencv.py.

    • @shubham2190
      @shubham2190 4 года назад

      Thanks 😊@@Armaan_Priyadarshan you are really a saviour 😊

  • @СоглаевПавел
    @СоглаевПавел 3 года назад +1

    Красава

  • @handletruck
    @handletruck 4 года назад

    Hi Armaan, I was just following the video and training with the dataset that I labelled. Everything went smoothly and was blocked at the 'Generating Training Data' stage. I changed the label map to two ids and went to the "scripts/preprocessing" folder and entered "python generate_tfrecord.py", and the error "Index Error: child index out of range" appeared. I'm doing a tutorial same as the video, I just changed the images that I labelled in the image folder. what's the problem?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Hmmmmm.... Can you raise an issue in the Github Repository? It would be great if you can send your labelmap.pbtxt, tell me where you saved the XML documents, as well as what command you ran for generate_tfrecord.py. The images should be labelled with LabelImg with a rectangular box drawn as well as a label provided. If you could send maybe one or 2 of your XML documents it would be great, because I could see if you labelled the images right. You should also make sure that you deleted the given train.record and test.record along with all the images and XML Documents of my Pill Detector.

    • @handletruck
      @handletruck 4 года назад

      @@Armaan_Priyadarshan If I run a command "python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config", I got a error like this :
      pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
      tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [[0.611111104][0.758454144][0.724637747]...] [[0.205314025][0.432367146][0.533816457]...]
      [[{{node Assert_1/AssertGuard/else/_35/Assert}}]]
      [[MultiDeviceIteratorGetNextFromShard]]
      [[RemoteCall]]
      what's the matter of this error? please help

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      김응찬 I am not sure of this error. Can you try to configure training with the Pill Model and given data to see if it starts. It might be a pipeline issue or a data issue, just start training, complete the example, and find the nature of the issue.

  • @hoyt2603
    @hoyt2603 3 года назад

    What to do when I want to train by Faster RCNN ??

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      The process for altering the pipeline should be a bit different but there's not much to do.

    • @hoyt2603
      @hoyt2603 3 года назад

      @@Armaan_Priyadarshan I fixed the pipeline
      .
      Do I need to fix anything more ?

    • @hoyt2603
      @hoyt2603 3 года назад

      I fix the pipeline as you instructed

    • @hoyt2603
      @hoyt2603 3 года назад

      I can run SSD MobileNet V2 FPNLite 640x640,
      I also run SSD MobileNet V1 FPN 640x640
      but I can't run SSD ResNet50 V1 FPN 640x640 (RetinaNet50) and the algorithms below

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      @@hoyt2603 Hi! There are often a few similar errors sometimes. Last time I tried using ResNet I had some issues as well. I'm pretty sure TensorFlow is trying to fix them, but I'm sure if there's anything you can try.

  • @lisimon6779
    @lisimon6779 4 года назад

    Hello Armaan,
    I am doing to test my installation with:
    python object_detection\builders\model_builder_tf2_test.py
    However, I have got some problem
    .
    [ FAILED ] ModelBuilderTF2Test.test_create_ssd_models_from_config
    File "", line 3, in raise_from
    tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,1,512,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Add]
    ----------------------------------------------------------------------
    Ran 20 tests in 93.266s
    FAILED (errors=1, skipped=1)
    What should I do?
    Thank you very much.

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      li simon Hi! How much memory does your GPU have? If you’re using an older GPU, you might want to just install the CPU only version of TensorFlow to avoid Out of Memory or OOM Errors.

    • @lisimon6779
      @lisimon6779 4 года назад

      @@Armaan_Priyadarshan
      Dear Armaan,
      I am now using the CPU only version of TensorFlow, and I successfully to test my installation with:
      python object_detection\builders\model_builder_tf2_test.py
      Thank you very much.
      However, I got another problem in "Training the Model"
      It spent 30 mins to proceed only one INFO and I0927, is it normal?
      I am using your pictures for the tutorial.
      .
      .
      .
      Use fn_output_signature instead
      INFO:tensorflow:Step 100 per-step time 17.890s loss=0.545
      I0927 17:20:11.610697 6572 model_lib_v2.py:649] Step 100 per-step time 17.890s loss=0.545
      INFO:tensorflow:Step 200 per-step time 18.110s loss=0.372
      I0927 17:50:35.465189 6572 model_lib_v2.py:649] Step 200 per-step time 18.110s loss=0.372
      INFO:tensorflow:Step 300 per-step time 18.313s loss=0.320
      I0927 18:20:46.434653 6572 model_lib_v2.py:649] Step 300 per-step time 18.313s loss=0.320
      INFO:tensorflow:Step 400 per-step time 18.350s loss=0.288
      I0927 18:50:58.178737 6572 model_lib_v2.py:649] Step 400 per-step time 18.350s loss=0.288
      .
      .
      .
      .
      .
      .

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      li simon Hi! Yes, this is totally normal with the CPU Version as it’s quite slower than GPU. As you can see your per-step time is quite different than mine in the video which is why it’s a bit slow. Just to make sure, could you clarify which CPU and GPU you’re running with?

    • @lisimon6779
      @lisimon6779 4 года назад

      @@Armaan_Priyadarshan
      Dear Armaan,
      I am not familiar with computer and programming.
      I am using an old laptop with:
      -Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz 2.40 GHz
      -8 GB RAM
      -Intel(R) HD Graphics Family
      -NVIDIA GeForce GT 730M
      I think the CPU Version is too slow and it is impossible to play object detection by using it :(
      Thank you very much!

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      li simon Hi! This makes sense. Unfortunately your GPU doesn’t support CUDA and cuDNN which is why you got OOM Errors. Your CPU is also a bit old. You might want to try out Google Colab if your laptop can’t take the workload.

  • @kevinchristian2805
    @kevinchristian2805 4 года назад

    hai, I follow all your code and tutorial in this video, can anybody tell me why my val_loss start is in 1.99 its decreasing and until 1.1 its not decreasing anymore its 1.4 1.1 1.3 1.2 0.9 and cannot decreasing and stable... is imbalanced dataset is a big problem? I like to convert tf.lite and detecting 5 object in realtime, my dataset is imbalanced with 1000 car , 1000 human , 1000 bicycle , 1000 motorcycle , and 336 stop sign and with testing 100 image each class except the stop sign with only 50 image maybe, I take this dataset from google open image dataset V6,
    Can anybody help me?

    • @kevinchristian2805
      @kevinchristian2805 4 года назад

      how to add accuracy metric on the tensorboard?

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      Hi! This is totally normal, don’t worry about it! Each individual steps loss is unpredictable. It has nothing to do with your dataset. But when your loss is consistently between 1.5 and 2, you can stop the program. Just try waiting and seeing!

    • @kevinchristian2805
      @kevinchristian2805 4 года назад

      @@Armaan_Priyadarshan sorry I am new for this topic , this is my thesis / final project, and I don't know anything about this hehehe so please be patient with me hahaha,
      so I think I dont understand about waiting it if it constantly 1,5 to 2,0 . sorry for bad english , so my tensorboard scalar is not overfit, its normal but like what i said before, its stabel ini 1,3. so is that normal?
      so how to see that my training is well done and my model is good
      do I need to add accuracy metric?
      Thank youuuu

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  4 года назад

      KEVIN CHRISTIAN TensorBoard can be ignored for the most part during model training unless you want to visualize the process. If your loss has an odd value, it’s most likely just an outlier. As long as it follows a continual decaying pattern, you should be fine. In the command prompt, the loss should be shown after each 100 steps. Once the loss is consistently between 1.5 and 2 for a few hundred steps your model is done. If it only goes between 1.5 and 2 once after 5 logs, that’s a different story as that would be an outlier. If it consistently goes below 1.3, that would mean it’s a bit more specific but should still be fine for testing. Just stop the program, try it out and assess the results. If you are unhappy, you can always retrain! And feel free to share your issues or ask for help! That’s what I’m here for! Good luck with your Thesis!

    • @kevinchristian2283
      @kevinchristian2283 4 года назад

      @@Armaan_Priyadarshan Thank you so much for the help..
      I add you on discord, maybe we can chat.. because I dont have any experience in this object detection stuff

  • @jihadanwar1893
    @jihadanwar1893 3 года назад

    Hi Armaan. Thanks a lot for your amazing video,
    When I want to Training the Model
    I have this error, please help me to fixed:
    tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node SGD/NcclAllReduce}} with these attrs: [reduction="sum", shared_name="c1", T=DT_FLOAT, num_devices=2]
    Registered devices: [CPU, GPU, XLA_CPU, XLA_GPU]
    Registered kernels:

    [[SGD/NcclAllReduce]] [Op:__inference__dist_train_step_209125]

    • @Armaan_Priyadarshan
      @Armaan_Priyadarshan  3 года назад

      Hi! Which GPU are you using? If it's a bit older with lesser memory you might want to think about changing the batch size or training on your CPU