hey I dont understand why is my installation wrong, i literally followed everything you did but then when i tried python object_detection\builders\model_builder_tf2_test.py i get AttributeError: module 'tensorflow' has no attribute 'contrib'
I was following the EdgeElectronics tutorial but encountered with so many errors so I followed your tutorial, everything worked fine. Thank you so much bro!
Great video man, you're explaining everything clearly and even anticipate almost all errors we can encounter. You have now a new subscriber, your other videos seem interesting too !
Hi Armaan, Great work! Finally a TF tutorial that actually works. I have two requests: 1. Can you create a video about converting to TFLite, and actually running on a mobile phone. An interesting framework would be React Native, as it works on both IOS & Android. 2. Show how to train the model on the cloud (e.g. COLAB).
Hi! I'm working on the TFLite conversion tutorial at the moment. I just recently had a breakthrough. I might take a look at training with Colab although I feel there might be some other videos on RUclips that already cover this topic in depth.
Hi I tried to train data but i have a error message like : Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations. What can I do?
Hello Armaan, Great work but please can you make a similar tutorial video but this time how to train the model on Google Colab. My pc doesn't support Nvidia graphics driver
Hi Armaan, after entering the command to train the model I am getting this error: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject How can I fix this??
Hi Armaan, thanks you for this Amazing tutorial. could you help me ? I followed to your tutorial and everything is ok, but when I tried to convert this model into JS model with using console command from official guide Model was converted fine (without an errors) But when I tried to load this model into JS, I got an error like "unknown Layer" maybe you have any advices or even tutorial how to convert and load custom model into JS ?
Hi Armaan. I just tried running model_main_tf2.py on my Anaconda command prompt, after a couple of minutes it says "Windows fatal exception:Access violation". What might be the problem? Would really appreciate if you could lend me a helping hand in this. Thanks in advance
Hi, i have many questions about it. I'm surprised at how accurately you describe everything. This is the clearest tutorial I've found on youtube. First of all, I would like to congratulate everyone on the New Year and wish everyone happiness and health! And now to the point. I make a model Object Detection with Tensorflow and architecture SSD MobileNet V2 FPNLite 640x640 like you. I use GPU. Questions: 1. You said have to stop training when loss between 0.15 and 0.20. But you have 100 images. I have 756 images of two objects. Is it means than my losses have to be between 0.10 and 0.05 ? If i stop model at the moment when losses are equal 0.02 it's overfitting ? 2. If numbers of images equal 756, have I to change in pipeline.config the number epochs in field num_epoch. I put 1 and 3(optimal) earlier, but I didn't see the difference and now 10. I know that the more epochs, it can recognize better , but it will be possible to overfitting if epochs are too many, so what is the optimal number of epochs to put in num_epoch for Model Net 2, it's important ? 3. I was told that overfitting is impossible if I use this method of creation model Mobile Net, that you used. It's true? 4. If it is wrong, what are conditions of overfitting and how will I notice overfitting, how to understand what it is happening, and under what characteristics of model MobileNet in pipeline.config ? I think the answers to these questions will help me solve this problem : - the model recognizes perfectly the objects that it should, but at the same time it recognizes arbitrary objects that are not even an object. It's just like areas. It just outlines areas that are not understandable, but somewhat similar to each other, but she should not recognize these areas. I think this is underfitting , and I want to change that by adjusting the number of eras. I would be glad for any help. It's vary important for me ! Thank you for the answers !
Hi! If you're facing accuracy issues. you can definitely try training for longer. I'm not sure the size of the dataset affects the target loss. However, you may have too many images. You can try maybe halving your dataset. If you're not finding success in adjusting epochs, I'd look into lowering learning rate for more precision. You cam notice overfitting if there aren't any detections at all. Good luck! Let me know if you have any more questions.
@@tetianaluhacheva7682 Hey! What number did you set your num_epochs too and roughly how long did you train? I have roughly the same number of images as you and I think I need to let it sit between 0.10 and 0.05? Curious to know how you setup your pipeline! :)
@@GarethBolton I set many value of num_epochs : 1 - 20. The model seems to have gotten better with last values 20. But sometimes it seems to me that the num_epochs does not matter. Ok, 0.10 and 0.05, did you mean loss between 0.10 and 0.05. If "Yes" then read above, Armaan Priyadarshan said that the size of the dataset don't affects the target loss. I think the size of the dataset affects the time. You need to sit between 0,20 and 0,15. I understand this in such a way that if you have more images, then the model will go longer in time to a value of 0.20, for example, with 100 images, you will train the model, for example, in 3 hours, and you will see loss = 0.20 after 3 hours. With 370 images, training will be longer, for example 5 hours, you will see loss 0.20 after 5 h (the number of hours are taken only for example and do not correspond to the hours of real training of the model). I set checkpoints, labelmap, train.record, test.reccord, batch size as 4 - because i can't set more, class 2, num_epochs differently each time.
Hi, when I run the command to generate the records this error is displayed : ... File "C:\Users\lynxd\anaconda3\envs\tfod\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 80, in _preread _check compat.path_to_str(self.__name), 1024 * 512) UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x92 in position 122: invalid start byte Can you help me pls?
Hi i tried implementing the transfer learning model as shown and ran into the same error that u faced when training. Could you let me know what changes u made in order to run the program? Cheers
How to solve == AttributeError: module 'tensorflow' has no attribute 'contrib' , iam traning with google colab , tensorflow 2.3.0 , the error shows when run this line of code " ! sudo python3 model_builder_tf2_test.py install ", can you help me !.
Hello Armaan, This was the best tutorial video about Training a Custom Object Detector with TensorFlow2. I did everything on tutorial and it worked like a charm. Thank you very much. Now, I want to convert my trained model to tflite in order to use on an Android device. Can you make a tutorial video for that step?
Hi! I'm glad it worked for you! I've been looking in to the TensorFlow Lite Conversion for a bit as well. So far, I've had quite a few issues due to model shape and structure. I'll try to make a video with more details when I figure it out, but you can look at these links for more information on the tflite conversion tool. www.tensorflow.org/lite/convert/python_api www.tensorflow.org/lite/convert/cmdline?hl=el
@@Armaan_Priyadarshan Hello! I've tried those links at first but I guess the converters are problematic right now. Converter works with the code below but the generated tflite model doesn't work on Android. I think model shape and structure must be defined well. In addition, with out "tf.lite.OpsSet.SELECT_TF_OPS" argument converter fails. When you use "tf.lite.OpsSet.SELECT_TF_OPS" argument Android can't interfere with the model (I guess it is related to that TensorFlow Lite lack some of the OPs used in TensorFlow 2.) github.com/tensorflow/tensorflow/issues/42114#issuecomment-671593386 Moreover, I have found an issue record on Tensorflow's github. Maybe this is the problem and I hope they will fix the issue soon. github.com/tensorflow/models/issues/9033 Please let me know if you have any progress on converting to tflite and running on an Android device.
Max Bro Hi! Unfortunately there has been some errors with TensorFlow Lite Conversion with TF2. It ended up messing up the original model and converting it to a single layer and the given tflite model had a few errors as well.😞
Hi, I have a problem: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 117: invalid continuation byte and TypeError: memoryview: a bytes-like object is required, not 'str'? HELP!!!!
Hi there, I started with my own model and trained it but now a just want to train for other things and when I start the training there is continuing from the step where I stopped the process last time. For example it last time I stopped the process at 6000 steps and now when I start it again with different pictures and files it starts from step 6000 and I'm just wondering if it is okay or I should do a cleanup or start with a clean setup again. Thank you in advance for your answer!
Hello I am trying with Raspberry 4/Raspbian but always stopped at workspace preparation (Python m pip install). Error is something like tensorflow has no attribute contrib. Any advice? Thanks
Very easy tutorial, thank you so much. It would be great if you can make tutorial on how to evaluate this trained model in terms of mAP (mean average precision) or Average precision and Intersection Over Union IoU.
Hi! I'm glad everything worked for you. Unfortunately, I'm not entirely sure how to measure model mAp or IoU. All model metrics available can be monitored with TensorBoard. I don't know if I can make a video guide, but the written tutorial can be found here: github.com/armaanpriyadarshan/Training-a-Custom-TensorFlow-2.x-Object-Detector#monitoring-training-with-tensorboard-optional
I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
Hello. I am using tensorflow version 2.4.1 in my system. In your video when you have run the training command for the first time you encountered an eerie named "function call stack: _dummy_computation_fn". I am also encountering same issue. How to solve this issue. Please let me know.
Great video Armaan. Keep up with good work. I have a real quick question that, I want to disable the confidence percentage is shown on each bounding box. How can I deal with that? Thank you,
Hi! This is a great question! I haven't done much experimenting, so I'm not what the answer is. You can test it out and compare evaluation results with different epochs. Evaluating the model is a step I added recently so it's only in the written guide.
@@Armaan_Priyadarshan i looked at the pipeline.config file and see I can change the number of epochs there but when would it start running the second epoch as with the default of one the model training seem not to stop or have i just not run it for long enough. thanks
Hey again, I had a friend ask me, if he trains his model on his own images of say 100, and later wants to add say another 100 images, is there a way to train on top, so keep the previous 100 images it learnt on and add the 100 more, or do you have to retrain the whole thing on the 200 images instead. Thank you, this is all for educational purpose at university. Thanks again. P.s If this is possible would you consider making another video? I am sure there are lots of others out there that could benefit too
*Update* I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
4 года назад
bro, cam is opening but it does not detect any object? can you help me?
Hi, Can you tell how can I get the bounding boxes for each detected class? Like right now I am getting an array with 100 boxes' coordinates if i print the boxes in the visualize_boxes_and_labels_on_image_array function from viz_utils. Can you look into this please??
Can you please say in which part of the code you are save the model and checkpoint - when we run model_main_tf2.py file at the end we will get saved model so which part of code it is doing
Thank you for video. It is very helpful! I have a question about the model config file. In that file, there are some lines like eval_config, and eval_input_reader. Those eval means validation during the train? or test the model?
Great job😁!! Can you tell me how many epoch the program use during the trainning and can we change it? And what's will happen if we force(ctrl+c) to stp the training? Thank you bro!!
Hi! Sorry for the late reply, but you should be able to configure epochs in the training pipeline under eval_input_reader. Stopping training means everything done so far will be retained and you can continue whenever you want I believe.
Quick question for you - in the TF-webcam-opencv.py file - let's assume I have 2 classes and want to show the count for each class...what would you suggest is the approach to take to show object 1: count, object 2: count instead of the default Objects Detected that is currently displayed.
Hi! This is totally feasible. I'd add a separate total count variable for each class and just use a conditional statement to add on to each respective variable. To do so, you can use the class of each detection.
hi Armaan thank you for making this video very easy to follow! just one question, after i finished training my custom dataset to detect couple of sushi types, and try to test it with your TF-image-od.py code, somehow ive managed to produced this error : 2020-10-05 11:51:36.745875: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error my device is as follows: windows 10 64bit CUDA 10.1 cudnn 7.6.5 tensorflow 2.3.0 python 3.8 nvidia gt 740m with 425.31 driver (april 2019, this is the 'latest' driver for my gpu) someone suggest to reinstall the tensorflow but it still produce this error, and the other suggest to upgrade the latest gpu driver but in my case the latest driver is as old as april 2019, any idea to navigate this issue? thanks again
Hi! Unfortunately it seems your GPU is a bit too old to run TensorFlow-GPU. I believe you must have a GeForce GTX 650+ Graphics Card. You can try out TensorFlow on the CPU but it’ll be significantly slower.
Fantastic Tutorial, is there a way to save the output of the image script and more importantly the video script, would be great to save the video it displays
Hi! At the moment, the best option is just taking a screenshot or screen recording. However, the code for saving output wouldn't be too hard to implement. . Here's an example with OpenCV that I believe would be easy to integrate with my code: www.geeksforgeeks.org/saving-a-video-using-opencv/
Hi Armaan. it is an excellent video, thank you for that. While doing the testing of my custom images, i faced an error like "cv2.error: OpenCV(4.3.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-7o5pnn96\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'". please help me to solve this.
There are two possible issues. Either your OpenCV installation didn't work, or OpenCV couldn't find the image you provided. In the first case, you can fix it with pip install opencv-python. For the second case, make sure you have provided the path to one, single image in the .jpg or .png file format.
@@Armaan_Priyadarshan python TF-image-od.py --image img.jpg - i worked this command and it worked. But this for single image, can we do this programs for many images in a folder at a time
Great Work! Wonderful. I tried your way and it worked well . I had this warning /error -" with ops with custom gradients. Will likely fail if a gradient is requested". What should I do ? Please help me out
Hi! I don't think this is an error. It's very common to encounter many such warnings before training starts. You should try proceeding and let me know if you find any other errors while testing.
Armaan - how important is consistent image size when training/labelling the images? I've got big 2000x2000 pixels images - should I downscale these first?
Hi! Yes, I would definitely resize the images. Larger image sizes will lead to longer training times. There aren't too many downsides, so I would go for it.
@@Armaan_Priyadarshan Thank you - I did this earlier today and the model trains well. I still am getting some issues every now and then with my gpu blowing up (I have a Geforce 1650 GTX).... I get messages saying it can't allocate enough memory etc. or CUDA memory errors...not sure if there are other settings we can try to limit how much of a GPU it uses . Also would you suggest gpu for training and then maybe cpu for inference?
Hey man, great tutorial. Do you know why I get the "module 'tensorflow' has no attribute 'contrib'" error? Have you ever encountered this when trying to train the model? I'm using google colab, maybe that's the issue?
Hi! I haven't made a script for testing multiple images at once yet. You might be able to edit one of the programs to cycle through a directory of images. The only other thing I could think of is using a Python terminal to load the model and run on multiple images.
hello I'm not sure if you are still answering these comments but I dont have the option to run anaconda prompt as admin. I am doing this tutorial in an account with admin controls but it still doesn't come up. Thanks in advance
now i am training the model but the loss is increasing is that normal and it is going to decrease after few steps or this means that i have something wrong..? after a while i get loss=nan what should i do
For me it worked. Other people also had the same issue but it worked eventually for them. You might want to make sure that you have enough checkpoint files in your training directory.
Hello Armaan, nice to meet you. I have got some problem in the first part. Do you have any suggestions? Thank you so much! >>> import tensorflow as tf Traceback (most recent call last): File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 64, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模組。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\__init__.py", line 41, in from tensorflow.python.tools import module_util as _module_util File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\__init__.py", line 40, in from tensorflow.python.eager import context File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\context.py", line 35, in from tensorflow.python import pywrap_tfe File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tfe.py", line 28, in from tensorflow.python import pywrap_tensorflow File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 83, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 64, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模組。 Failed to load the native TensorFlow runtime. See www.tensorflow.org/install/errors for some common reasons and solutions. Include the entire stack trace above this error message when asking for help.
@@Armaan_Priyadarshan Dear Armaan, After installing Visual Studio 2019 with C++ Build Tools, I appear to install the TensorFlow GPU successfully. Thank you very much. >>> import tensorflow as tf 2020-09-15 21:50:01.315223: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-09-15 21:50:01.322622: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. >>> print(tf.__version__) 2.3.0 I will try the remaining parts in this Sunday :) Again, thank you!!
Hi, Armaan very easy tutorial. I am getting an error while training the model at this command python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config and the error is tensorflow.python.framework.errors_impl.InvalidArgumentError: NewRandomAccessFile failed to Create/Open: C:\Tensorflow\workspace raining_demonnotations\label_map.pbtxt : The filename, directory name, or volume label syntax is incorrect. ; no protocol option
Hi, Armaan thank you very very much for the suggestion. Yes, it worked. Training is going on. I also put forward slash in checkpoint, train.record and test.record files too
great work brother in this video we are using fpn 640x640 is it important to change the resolution if my source image is 1280x720? if yes i'm already do it but i found ValueError: Dimensions must be equal, but are 46 and 45 for '{{node ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/add}} = AddV2[T=DT_FLOAT](ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/Reshape, ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/projection_2/BiasAdd)' with input shapes: [3,46,80,128], [3,45,80,128]. can you please help me brother?
Hi Armaan, I wanted to thank you so much for making this video, you literally saved my life. I'm doing an object detection project for my Data Science course. Can you please help me get metrics for evaluating my test set, without having to look at each in OpenCV individually? I need concrete numbers such as Average Precision and Average Recall to compare it to other models I've tried. I can't find how to do this anywhere.
Jordan Darbyshire Hi! I’m glad the video helped you out! As to finding precision, recall, and mAP, I’m sorry to say I haven’t found anything yet. I did a bit of research, and I do have a few ideas. The first is training a different model. The Efficientdet and other TF2 specific models should have the ability to log mAP during training if the argument -alsologtostderr is given while running the training script. The other option I found was using matplotlib for which you can find more info here stackoverflow.com/questions/46274514/precision-recall-curve-in-tensorflow-object-detection-api. I’m sorry for not being able to give a better answer, as there isn’t much documentation or information online.
@@Armaan_Priyadarshan Ok thanks a lot Armaan. I know I'm surprised there isn't much information about this. I'll let you know if I find out anything helpful. Keep up the good work, look forward to seeing more videos from you!
Hi! I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
@Armaan Priyadarshan Yes, I am using SSD MobileNet V2 FPNLite 640x640 model. And yes i also reduced the batch size upto 4 but no result. My GPU is of 2GB is that really mater in this regards?
Hi Armaan. Thanks a lot for your amazing video. I was working on this for many days until I saw your video and found the solution. I was wondering how I can export the number of pixels for the box it draws on the pills for image or video? Just like the xml file we used to annotate images but for the output. Thanks again
Top Games Hmmmm.... This is definitely something new to try. Unfortunately, I haven’t attempted anything similar, so I’m unsure how to help. I’m sure there might be a way using OpenCV as it has a vast number of functions. If you do find anything though, feel free to share and I’d be happy to take a look.
@@Armaan_Priyadarshan Thanks man. I figured out that "visualize_boxes_and_labels_on_image_array" will return only the image with the box already attached to it. So I modified it and now it returns the coordinates as well as the image.
@@Armaan_Priyadarshan Another question if its possible. I am wondering if the "ssd_mobilenet_v2_fpnlite" model is trained, why did we train it again? Would this training process configure "ssd_mobilenet_v2_fpnlite" so it can detect our object better? or is it gonna create a new model? Thanks a lot
Top Games Hi! We mostly use the pre-trained model for the pipeline.config file as well as certain checkpoint files needed for training. The my_ssd_mobilenet_v2_fpnlite folder is then used as a training directory so we can export the model later on. And I’m glad you found a solution. If you take a look at the TF-image-object-counting.py script, OpenCV might be a bit easier to work with than viz_utils as there’s a bit more flexibility. For example formatting and printing the xmin, ymin, xmax, and ymax variables from inside the loop can provide box coordinates if that’s what you wanted to do. Thanks for sharing too! You never know, someone else might be trying the same thing!
Budz Altar You can follow the step regarding TensorBoard and view various model metrics. The written guide on GitHub has better instructions as there was an editing error around that time in the video
I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
So what are the meanings of the two folders "test" and "train" in C:\TensorFlow\workspace\training_demo\images Is it mean that my tensorflow learn what are the things in "train" ? Then what about the "test" ? Is it just for testing our script work or not ? Or basically what I want to ask is "What things should I place in these two folders ?" By the way ! You did a great tutorial ! Nice job!
Hi! The test and train folders contain the images and labels of your test and train set. TensorFlow uses Supervised Learning, so these are required for your dataset. You should place your images in these folders, 20% of them in the test folder and the other 80% in the train folder. After you've labelled your images, you can generate RECORD files and train the model.
@@kbh24758 You should definitely put different images in each folder. Once you've prepared your dataset, put 4/5 of the images in the training folder and rest inside the test folder.
@@Armaan_Priyadarshan Sorry,Still got one problem >< Why I can't change another photo to detect? It always appear this File "TF-image-od.py", line 96, in image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor' Why?
Секреты Успеха No I haven’t I might look in to that. But from what I’ve seen so far, there’s a command line tool which makes the process quite a bit simpler than this one.
Hello, how did you fix the error when training @ 23:40? Errors may have originated from an input operation. Input Source operations connected to node ssd_mobile_net_v2fpn_keras_feature_extractor/model/Conv1/Conv2D: fn_1 (defined at C:\Users\a1812\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py:367)
@@Armaan_Priyadarshan But when I ran TF-image-od.py script. The output image is zoomed in a lot and doesn't fit within my laptop screen. how do I fix this?
Karan Sharma Hi! The test.record file is used in the training pipeline.config, it mostly contains information about the labels and bounding boxes given for your images. For model accuracy, you can use the step with TensorBoard to find model metrics. If you want numbers on your pre-trained model, you can check the TensorFlow Model Zoo which has speed and accuracy listed for each pre-trained model.
It is a great tutorial. I could train my model fine but I used the ssd_mobilenet_v2_fpnlite_320x320. I have been trying to convert to TFLite but I couldn´t. Could you make a tutorial for TFLite conversion.
Im trying to detect food dishes from pictures and recommend a recipe after it detects what the picture contains. Any idea on how to approach this? I think your video will help training the model to detect the food dishes but how will I recommend a recipe for that dish? Do you think using a recipe api will work?
@@Armaan_Priyadarshan Hi could u give me some insight on how to check the label after classifying an image? I am not understanding how to do that. If you could make a turorial on this would be so helpful
@@woonie3134 Hi! To do so, you can write some code in the for loop I defined on line 121 in the TF-image-object-counting.py script. This line right here determines the object name of every detection: object_name = category_index[int(classes[i])]['name']. With a simple API, you can probably preserve this information to determine the label of every detection in the frame.
@@Armaan_Priyadarshan I have decided I will be storing the recipes in a dictionary and returning it as a result to the image classified. I choose to do this mainly for simplicity and focus more on classifying food. However I have never worked with dictionaries before so how could I possibly evaluate a result from it? If you have any knowledge on this I would aprecciate if you shared it 😁
Hey bro , I have done everything exactly same . but at the end when the image is displayed without any detection. is there anyhthing i need to change? , kindly reply.
You should try lowering the confidence threshold to see if any detections come up. This can be done by specifying it in the command (eg. python TF-image-od.py --threshold 0.3)
@@Armaan_Priyadarshan Hii , Thank you so much for your reply. I cant appreciate enough. I tried by reducing the threshold to 0.3 , it didn't work . but changed it to 0.2 then it show some detection but not good enough. what do you think the problem is,?. I stopped training after loss reached 0.205 , it could get any lower as I waited for 6000 steps .what do you think the problem is? Again thank you so much:-)
@@Armaan_Priyadarshan I am not using any new data set . I am first trying to duplicate your results. I am using your dataset and trying to get same results as you. I trained again and i got the loss 0.19 but still there are no proper detections.
Hello, my computer has a Nvidia Geforce RTX 3080 GPU. I installed cuda 10.1 ,cudnn 7.6.5 and network Faster R-CNN ResNet50 V1 640x640. When I trained the network, loss is nan. How to solve the question?
Hi! The last time I tried using ResNet models there were quite a few errors, so I haven't revisited it yet, Secondly, you'll want to update your CUDA and cuDNN versions to the newest version of TensorFlow. Here's a link to the tested configurations. www.tensorflow.org/install/source#gpu
@@Armaan_Priyadarshan When I refered to this web( www.tensorflow.org/install/source#gpu), I installed cuda 10.1 and cudnn 7.5.0(Cudnn doesn't have 7.4) with windows 10. I have an error: 2021-01-19 10:06:57.282583: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2021-01-19 10:08:00.512682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2021-01-19 10:18:18.632749: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2021-01-19 10:18:18.638607: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. Traceback (most recent call last): File "model_main_tf2.py", line 114, in tf.compat.v1.app.run() File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop load_fine_tune_checkpoint(detection_model, File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 367, in load_fine_tune_checkpoint strategy.run( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1211, in run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2585, in call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 584, in _call_for_each_replica return mirrored_run.call_for_each_replica( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 78, in call_for_each_replica return wrapped(args, kwargs) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__ result = self._call(*args, **kwds) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call return self._stateless_fn(*args, **kwds) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__ return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call return self._call_flat( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat return self._build_call_outputs(self._inference_function.call( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call outputs = execute.execute( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node functional_1/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]] [[Loss/RPNLoss/BalancedPositiveNegativeSampler/Cast_8/_302]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node functional_1/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference__dummy_computation_fn_21950] Errors may have originated from an input operation. Input Source operations connected to node functional_1/conv1_conv/Conv2D: functional_1/lambda/Pad (defined at \site-packages\object_detection\models\keras_models esnet_v1.py:49) Input Source operations connected to node functional_1/conv1_conv/Conv2D: functional_1/lambda/Pad (defined at \site-packages\object_detection\models\keras_models esnet_v1.py:49) Function call stack: _dummy_computation_fn -> _dummy_computation_fn
@@user-db2md6wv3c Hi! I believe your versions are outdated. TensorFlow 2 versions can't use cuDNN 7.5. The newest version supports CUDA 11.0 and cuDNN 8.0. The download links are: developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork and developer.nvidia.com/compute/machine-learning/cudnn/secure/8.0.4/11.0_20200923/cudnn-11.0-windows-x64-v8.0.4.30.zip
@@Armaan_Priyadarshan I refered to your instruction to install cuda and cudnn. But I have an error: See `tf.nn.softmax_cross_entropy_with_logits_v2`. 2021-01-20 15:11:34.676290: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll 2021-01-20 15:11:35.241074: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll 2021-01-20 15:11:35.255702: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll 2021-01-20 15:11:36.015093: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2021-01-20 15:11:36.017068: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows 2021-01-20 15:11:36.021009: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2021-01-20 15:11:36.022768: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows Traceback (most recent call last): File "model_main_tf2.py", line 114, in tf.compat.v1.app.run() File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop load_fine_tune_checkpoint(detection_model, File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 367, in load_fine_tune_checkpoint strategy.run( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1259, in run return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2730, in call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 628, in _call_for_each_replica return mirrored_run.call_for_each_replica( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 75, in call_for_each_replica return wrapped(args, kwargs) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__ result = self._call(*args, **kwds) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call return self._stateless_fn(*args, **kwds) File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__ return graph_function._call_flat( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat return self._build_call_outputs(self._inference_function.call( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call outputs = execute.execute( File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node model/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]] [[Loss/ToAbsoluteCoordinates/Assert/AssertGuard/pivot_f/_83/_55]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node model/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference__dummy_computation_fn_16270] Errors may have originated from an input operation. Input Source operations connected to node model/conv1_conv/Conv2D: model/lambda/Pad (defined at \site-packages\object_detection\models\keras_models esnet_v1.py:49) Input Source operations connected to node model/conv1_conv/Conv2D: model/lambda/Pad (defined at \site-packages\object_detection\models\keras_models esnet_v1.py:49) Function call stack: _dummy_computation_fn -> _dummy_computation_fn
hello guys ,Follow your training model,it's OK!But I used tensorflow C API to load the model,report errors!Address at if (TF_ GetCode(status) != TF_ OK) ,pls why ?
@@陈周-z1s Hi! I'll try to put out a tutorial when Raspberry Pi Support is released as I feel it might do a bit better. Here's a guide you can follow for now. github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md You can add me on discord with the tag in the description if you want more detailed instructions.
Run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from the training demo directory. Downgrade Numpy to 1.17.3 if you get Numpy errors.
Hey thank you for the advice i evaluated the model i found some images that were predicted correctly but there is no scalar or graph for any precision or accuracy.I don't understand why?
Hello! @Armaan Priyadarshan As per your suggestion i downgraded my tensor flow from version 2.3.0 to 2.2.0. and now i am getting the new error, i.e. tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Loss/regularization_loss/write_summary/summary_cond/pivot_t/_24/_219]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. (1) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 0 successful operations. 0 derived errors ignored. [Op:__inference__dist_train_step_44094] Function call stack: _dist_train_step -> _dist_train_step Do you have any idea, how to resolve it?
@@Armaan_Priyadarshan Yes, I am using SSD MobileNet V2 FPNLite 640x640 model. And yes i also reduced the batch size upto 4 but no result. My GPU is of 2GB is that really mater in this regards?
Muhammad Waqas Ali Oh yes, that might be a bit of an issue. Try reducing to 2 and if that doesn’t work reduce it to 1 and try retraining otherwise you might get OOM errors.
Caroline Lee Hi! I think I got this error while recording as training couldn’t run simultaneously with OBS due to the limited system resources. I rebuilt the Object Detection API and fixed all the Python package versions just to make sure this was the case. Some others fixed the error by downgrading the TensorFlow version to 2.2.0.
@@Armaan_Priyadarshan thank you! That was it. My system only had 3 gb of graphic memory but even though I put it as a a batch size of 3, I still had to reduce it to 2. Also closed some background programs out too which helped
@@Armaan_Priyadarshan I got another error when I was trying to export the inference graph: TypeError: 'NoneType' object is not iterable. The third error occurred when I ran the python TF-image-od.py file: Running inference for images/test/i-1e092ec6eabf47f9b85795a9e069181b.jpg... Traceback (most recent call last): File "TF-image-od.py", line 96, in image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor' Any ideas on how to fix these errors? I'm trying to train a model to detect potholes in the road, so I'm not using your test files (but when I did use yours it worked perfectly). I can't figure out why Python is still looking for the old files since it was replaced with the pothole ones. I did the pip install open-cv-python command and I still had the same error. Also if we're testing the model out on the images under the "test" folder, why do we still need to label them?
Caroline Lee Hi! I’m pretty sure I’ve figured out how to fix your issue. The path to the image provided to the program seems to still be the default. You can provide the path to your image with the -image argument or you can edit the default path from within the program. To answer your second question, I believe TensorFlow uses the test set to repeatedly train and provide the loss. Unlike the train set, the test set will not be immediately recognized by the model and is still a decent way to measure accuracy and test.
@@Armaan_Priyadarshan Thanks for answering! The pothole images were placed in the same place as the pill images. I'm not sure what you mean by the image argument, I looked at the python file and it looks like you already have that there (args.images) so I created a copied the path of the test folder and pasted into where it asked for the image paths. It threw me the same error. How would I go about editing the default path from the program? Update: I think I know what you mean now. I edited the python file a bit and ran this in anaconda: TF-image-od.py --image images/test/pothole.jpg, but I still got hit with this error: Running inference for {'image': 'images/test/pothole.jpg'}... Traceback (most recent call last): File "TF-image-od.py", line 98, in image = cv2.imread(IMAGE_PATHS) SystemError: returned NULL without setting an error Update 2: Alright, nevermind I think I got it. Do you have any tips on how I should train the model? I'm think I should go from non-busy background > busy background because right now it's identifying the sky as the pothole lol
I'm getting this error: "tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes at component 1: expected [6,640,640,3] but got [6,640,640,1]." Probably because my images are grayscale. Can you say what I have to change for using grayscale image? Thanks.
I'm not sure if this is because your images are grayscale, it might be a model format or model shape error. Can you give me some more information on what step you get this error on. Are you maybe using a different model or pipeline?
@@Armaan_Priyadarshan , No I am using same models. In this step: python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config
@@anuragdalal6908 Hi, sorry for the late reply but my comments weren't posting for some reason. Unfortunately TensorFlow is used to triple-channel RGB images; Grayscale images are known as single-channel. I'm not entirely sure how to help with this issue, but I found some links with more info. stackoverflow.com/questions/48744666/tensorflow-object-detection-api-1-channel-image & github.com/tensorflow/models/issues/3369 And it would be great if you could convert the bmp images to jpg with an online converter just in case it matters.
Hello, Firstly thank you so much for this tutorial. Well I am getting some error on running " python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config " and the error is that, " tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse " . Will you please tell me how to sort it out? Thank You.
Hello! @@Armaan_Priyadarshan As per your suggestion i downgraded my tensor flow from version 2.3.0 to 2.2.0. no i am getting the new one, i.e. tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. (0) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [[Loss/regularization_loss/write_summary/summary_cond/pivot_t/_24/_219]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. (1) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc [[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 0 successful operations. 0 derived errors ignored. [Op:__inference__dist_train_step_44094] Function call stack: _dist_train_step -> _dist_train_step Do you have any idea, how to resolve it?
@@muhammadwaqasali4155 That seems to be an OOM error. Sorry for the late reply, but you should try reducing your batch_size to 2 or maybe 1 and see what happens.
Hello, thank you for your awesome tutorial. I just encountered an error at the end when trying to do the training. tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node EfficientDet-D0/model/stem_conv2d/Conv2D (defined at C:\Python36\lib\site-packages\object_detection\models\ssd_efficientnet_bifpn_feature_extractor.py:220) ]] [Op:__inference__dummy_computation_fn_24408] Errors may have originated from an input operation. Input Source operations connected to node EfficientDet-D0/model/stem_conv2d/Conv2D: args_1 (defined at C:\Python36\lib\site-packages\object_detection\model_lib_v2.py:372) Function call stack: _dummy_computation_fn Do you happen to know what might be causing this error? I've run two different models, and still get the same error.
I've only encountered this error when my system was overtaxed (like when recording). To fix it, the only methods that worked for me were either restarting my system or reinstalling TensorFlow.
@@Armaan_Priyadarshan After restarting my system and even keeping the batch size as 1 and after reinstalling the tensorflow version I encountered "Allocator GPU_0_bfc ran out of memory trying to allocate 2.26GiB" message. Even though script continues to run but nothing appears in the command prompt console with regards to loss information. How to resolve it. Please tell....
@@ashishshrivastava9966 Hi! In this case, you'll want to lower the batch size in the pipeline. First try 4, then if it still doesn't work, you can try 2, and then 1.
Hi, I am struggling to continue until testing the installation (python ...\model_builder_tf2_test.py). I always receive the following message: from tensorflow.python.keras.layers.preprocessing import image_preprocessing as image_ops ImportError: cannot import name 'image_preprocessing' TF version is 2.10 Keras is 2.3.1 I did install Microsoft Built tools 2019, cuda 10.1, cudnn 7.6. Unfortunately, I have to downgrade my TF due to persistent error messages like DLL failed load, etc. I have been doing this for 3 days. Where am I doing wrong here?
kabak2abak Do you have Microsoft Visual Studio 2019 with C++ Build Tools because I believe that is dependency for TensorFlow 2.3.0. And just a question, are you using the CPU or GPU version?
@@Armaan_Priyadarshan I followed every of your instructions, including download CUDA, cudnn, C++ Build Tools. When I tried to import tensorflow, I always keep receiving 'ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.' I installed pip install tensorflow-gpu Tensorflow GPU: 2.3.0 Python: 3.8.3
kabak2abak Can you try restarting the process, as I usually find this to be cause of most errors. Following the written guide on GitHub might help. Try uninstalling and reinstalling MSVC, Visual Studio, Anaconda, and CUDA & cuDNN. I’ve seen this error before, and it’s usually just a Visual Studio Error. Make sure everything is on your PATH as well because this can be important!
@@Armaan_Priyadarshan Well, I will try. I have been doing the installation more than 3 days now, and none of which is successful. You mean I should uninstall MSVC, Visual Studio, Anaconda, and CUDA & cuDNN? then reinstall them again? What does it mean by everything is on my Path? I have set environment for all. The variables are PythonPath and Path. Should I remove them as well before uninstalling?
yo how do i fix this, it took like forever man... INFO:tensorflow:Waiting for new checkpoint at models\my_ssd_mobilenet_v2_fpnlite I1010 16:50:29.150513 23484 checkpoint_utils.py:125] Waiting for new checkpoint at models\my_ssd_mobilenet_v2_fpnlite
Which step did this occur at? If you have finished training, could you let me know how many checkpoint files you have in your models\my_ssd_mobilenet_v2_fpnlite directory.
hello armaan i geting like fail (tf2) C:\Tensorflow\workspace\training_demo>python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config Traceback (most recent call last): File "__init__.pxd", line 942, in numpy.import_array RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Tensorflow\workspace\training_demo\model_main_tf2.py", line 32, in from object_detection import model_lib_v2 File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\model_lib_v2.py", line 29, in from object_detection import eval_util File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\eval_util.py", line 35, in from object_detection.metrics import coco_evaluation File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\metrics\coco_evaluation.py", line 25, in from object_detection.metrics import coco_tools File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\metrics\coco_tools.py", line 51, in from pycocotools import coco File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\pycocotools\coco.py", line 56, in from . import mask as maskUtils File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\pycocotools\mask.py", line 3, in import pycocotools._mask as _mask File "pycocotools\_mask.pyx", line 23, in init pycocotools._mask File "__init__.pxd", line 944, in numpy.import_array ImportError: numpy.core.multiarray failed to import
Hi, thanks for sharing this video. I have fished training models with my own data. However, when I tried to test out the finished model, I got the following error: -------------------------------------------------------------------------------------------------------------- Loading model...Done! Took 24.696925401687622 seconds Running inference for images/test/i-1e092ec6eabf47f9b85795a9e069181b.jpg... Traceback (most recent call last): File "TF-image-od.py", line 96, in image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor' ---------------------------------------------------------------------------------------------------------------- I've tried to install opencv-python with version 4.3.0.38 but still got this error. Could you please help me fix it? Thanks a lot.
You don't need to install that version I don't think. I'm pretty sure you didn't provide the right path to the image. The program seems to be running inference on the default image which you shouldn't have unless you're using my dataset. You can use the --image argument to specify the path to the image, or you can edit the default value in the program.
@@Armaan_Priyadarshan Thanks for your reply. Yeah, you're right. I just forgot to specify my directory to the image. The rest of the program ran perfectly as expected. This tutorial is very helpful.
working now trough tutorial, i stuck a bit at git of tensorflow models, maybe it would be good to put a link to git, for us nonprogramers can be big problem why command doesn't work
Majstorsky Majstor Hi! This is a good idea. You can install git from here:git-scm.com/download/win. Then just open up a new terminal and everything should be good to go.
@@Armaan_Priyadarshan tnx , im already finished and currently training the model. I tried lots of tutorials but this one is the best, its quite the achievement to make tutorial for such heavy field and that average person can follow it.BTW I tried to start tensorboard server but it cannot recognize command? And is there option for deploying this object detector in a simple way , like some exe app that you just install and use?
Majstorsky Majstor Thanks! The step with TensorBoard is optional and not necessary for training. If you are interested, you should follow the written tutorial as there was an editing error in the video at that time stamp. As for making a simple application, that might take a bit of work. TensorFlow is a popular field when it comes to Android and iOS, but it’s a bit more of a complicated process.
@@Armaan_Priyadarshan here is my first try ruclips.net/video/C1eR6todLto/видео.html . Is there a way to make detection cutoff, for example to show only detections above some number , like 90%. And is there a bit of GUI where i can load files and start scripts? There will be lots of typing if i want to check 100 files.
Majstorsky Majstor There is an argument while running called threshold where you can specify your minimum confidence. For example python TF-image-is.py -threshold 0.9 will only show detections above 90% confidence. As for a GUI, that will take a little editing of the program. However, you can make it perform inference on a directory of images.
Hi! You can, but you'll have to use the TensorFlow CPU variant. This means training will take longer and you won't be able to use your computer during training. It's probably possible though.
Impressive work @ Armaan Priyadarshan ... Please guide me with this error after I ran python object_detection\builders\model_builder_tf2_test.py "tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse" Please note that I have reinstalled tensorflow-gpu==2.2.0 but still getting this error. Thanks.
This error occurs when you have a background process running. For example, if you are running two python programs that use TensorFlow at the same time. I'd recommend just restarting your machine and retrying.
Hi Armaan, I was just following the video and training with the dataset that I labelled. Everything went smoothly and was blocked at the 'Generating Training Data' stage. I changed the label map to two ids and went to the "scripts/preprocessing" folder and entered "python generate_tfrecord.py", and the error "Index Error: child index out of range" appeared. I'm doing a tutorial same as the video, I just changed the images that I labelled in the image folder. what's the problem?
Hmmmmm.... Can you raise an issue in the Github Repository? It would be great if you can send your labelmap.pbtxt, tell me where you saved the XML documents, as well as what command you ran for generate_tfrecord.py. The images should be labelled with LabelImg with a rectangular box drawn as well as a label provided. If you could send maybe one or 2 of your XML documents it would be great, because I could see if you labelled the images right. You should also make sure that you deleted the given train.record and test.record along with all the images and XML Documents of my Pill Detector.
@@Armaan_Priyadarshan If I run a command "python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config", I got a error like this : pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [[0.611111104][0.758454144][0.724637747]...] [[0.205314025][0.432367146][0.533816457]...] [[{{node Assert_1/AssertGuard/else/_35/Assert}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] what's the matter of this error? please help
김응찬 I am not sure of this error. Can you try to configure training with the Pill Model and given data to see if it starts. It might be a pipeline issue or a data issue, just start training, complete the example, and find the nature of the issue.
I can run SSD MobileNet V2 FPNLite 640x640, I also run SSD MobileNet V1 FPN 640x640 but I can't run SSD ResNet50 V1 FPN 640x640 (RetinaNet50) and the algorithms below
@@hoyt2603 Hi! There are often a few similar errors sometimes. Last time I tried using ResNet I had some issues as well. I'm pretty sure TensorFlow is trying to fix them, but I'm sure if there's anything you can try.
Hello Armaan, I am doing to test my installation with: python object_detection\builders\model_builder_tf2_test.py However, I have got some problem . [ FAILED ] ModelBuilderTF2Test.test_create_ssd_models_from_config File "", line 3, in raise_from tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,1,512,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Add] ---------------------------------------------------------------------- Ran 20 tests in 93.266s FAILED (errors=1, skipped=1) What should I do? Thank you very much.
li simon Hi! How much memory does your GPU have? If you’re using an older GPU, you might want to just install the CPU only version of TensorFlow to avoid Out of Memory or OOM Errors.
@@Armaan_Priyadarshan Dear Armaan, I am now using the CPU only version of TensorFlow, and I successfully to test my installation with: python object_detection\builders\model_builder_tf2_test.py Thank you very much. However, I got another problem in "Training the Model" It spent 30 mins to proceed only one INFO and I0927, is it normal? I am using your pictures for the tutorial. . . . Use fn_output_signature instead INFO:tensorflow:Step 100 per-step time 17.890s loss=0.545 I0927 17:20:11.610697 6572 model_lib_v2.py:649] Step 100 per-step time 17.890s loss=0.545 INFO:tensorflow:Step 200 per-step time 18.110s loss=0.372 I0927 17:50:35.465189 6572 model_lib_v2.py:649] Step 200 per-step time 18.110s loss=0.372 INFO:tensorflow:Step 300 per-step time 18.313s loss=0.320 I0927 18:20:46.434653 6572 model_lib_v2.py:649] Step 300 per-step time 18.313s loss=0.320 INFO:tensorflow:Step 400 per-step time 18.350s loss=0.288 I0927 18:50:58.178737 6572 model_lib_v2.py:649] Step 400 per-step time 18.350s loss=0.288 . . . . . .
li simon Hi! Yes, this is totally normal with the CPU Version as it’s quite slower than GPU. As you can see your per-step time is quite different than mine in the video which is why it’s a bit slow. Just to make sure, could you clarify which CPU and GPU you’re running with?
@@Armaan_Priyadarshan Dear Armaan, I am not familiar with computer and programming. I am using an old laptop with: -Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz 2.40 GHz -8 GB RAM -Intel(R) HD Graphics Family -NVIDIA GeForce GT 730M I think the CPU Version is too slow and it is impossible to play object detection by using it :( Thank you very much!
li simon Hi! This makes sense. Unfortunately your GPU doesn’t support CUDA and cuDNN which is why you got OOM Errors. Your CPU is also a bit old. You might want to try out Google Colab if your laptop can’t take the workload.
hai, I follow all your code and tutorial in this video, can anybody tell me why my val_loss start is in 1.99 its decreasing and until 1.1 its not decreasing anymore its 1.4 1.1 1.3 1.2 0.9 and cannot decreasing and stable... is imbalanced dataset is a big problem? I like to convert tf.lite and detecting 5 object in realtime, my dataset is imbalanced with 1000 car , 1000 human , 1000 bicycle , 1000 motorcycle , and 336 stop sign and with testing 100 image each class except the stop sign with only 50 image maybe, I take this dataset from google open image dataset V6, Can anybody help me?
Hi! This is totally normal, don’t worry about it! Each individual steps loss is unpredictable. It has nothing to do with your dataset. But when your loss is consistently between 1.5 and 2, you can stop the program. Just try waiting and seeing!
@@Armaan_Priyadarshan sorry I am new for this topic , this is my thesis / final project, and I don't know anything about this hehehe so please be patient with me hahaha, so I think I dont understand about waiting it if it constantly 1,5 to 2,0 . sorry for bad english , so my tensorboard scalar is not overfit, its normal but like what i said before, its stabel ini 1,3. so is that normal? so how to see that my training is well done and my model is good do I need to add accuracy metric? Thank youuuu
KEVIN CHRISTIAN TensorBoard can be ignored for the most part during model training unless you want to visualize the process. If your loss has an odd value, it’s most likely just an outlier. As long as it follows a continual decaying pattern, you should be fine. In the command prompt, the loss should be shown after each 100 steps. Once the loss is consistently between 1.5 and 2 for a few hundred steps your model is done. If it only goes between 1.5 and 2 once after 5 logs, that’s a different story as that would be an outlier. If it consistently goes below 1.3, that would mean it’s a bit more specific but should still be fine for testing. Just stop the program, try it out and assess the results. If you are unhappy, you can always retrain! And feel free to share your issues or ask for help! That’s what I’m here for! Good luck with your Thesis!
@@Armaan_Priyadarshan Thank you so much for the help.. I add you on discord, maybe we can chat.. because I dont have any experience in this object detection stuff
Hi Armaan. Thanks a lot for your amazing video, When I want to Training the Model I have this error, please help me to fixed: tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node SGD/NcclAllReduce}} with these attrs: [reduction="sum", shared_name="c1", T=DT_FLOAT, num_devices=2] Registered devices: [CPU, GPU, XLA_CPU, XLA_GPU] Registered kernels:
I've just posted a guide on TensorFlow Lite Conversion here: ruclips.net/video/2ofuUdCDppc/видео.html
hey I dont understand why is my installation wrong, i literally followed everything you did but then when i tried
python object_detection\builders\model_builder_tf2_test.py
i get
AttributeError: module 'tensorflow' has no attribute 'contrib'
but isnt contrib from tensorflow1?
I was following the EdgeElectronics tutorial but encountered with so many errors so I followed your tutorial, everything worked fine. Thank you so much bro!
This is what I wann hear!!
Great video man, you're explaining everything clearly and even anticipate almost all errors we can encounter.
You have now a new subscriber, your other videos seem interesting too !
Hi!! thank you so much for your video Armaan you help me with working with TensorFlow custom model.
Hi Armaan, Great work! Finally a TF tutorial that actually works. I have two requests:
1. Can you create a video about converting to TFLite, and actually running on a mobile phone. An interesting framework would be React Native, as it works on both IOS & Android.
2. Show how to train the model on the cloud (e.g. COLAB).
Hi! I'm working on the TFLite conversion tutorial at the moment. I just recently had a breakthrough. I might take a look at training with Colab although I feel there might be some other videos on RUclips that already cover this topic in depth.
Update: I just posted a video about TFLite Conversion here: ruclips.net/video/2ofuUdCDppc/видео.html
@@Armaan_Priyadarshan great Armaan! I will check it out and let you know what I think. Many thanks!
@@samsam-qi6qo hello, did it works for you?
Hi
I tried to train data but i have a error message like : Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
What can I do?
I've got the same problem
stuck there too
I am also stuck here. Anyone find the solution?
Hello Armaan, Great work but please can you make a similar tutorial video but this time how to train the model on Google Colab. My pc doesn't support Nvidia graphics driver
Hi Armaan, after entering the command to train the model I am getting this error:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
How can I fix this??
Nice work man! 😃
Thank You! I’ve used a lot of your code in the past, and I love your work with TensorFlow on the Raspberry Pi!
@31:53 in the object detector window, my output doesn't have green boxes label. Do you know what the problem is?
Hi Armaan, thanks you for this Amazing tutorial.
could you help me ?
I followed to your tutorial and everything is ok, but when I tried to convert this model into JS model with using console command from official guide
Model was converted fine (without an errors)
But when I tried to load this model into JS, I got an error like "unknown Layer"
maybe you have any advices or even tutorial how to convert and load custom model into JS ?
Hi Armaan. I just tried running model_main_tf2.py on my Anaconda command prompt, after a couple of minutes it says "Windows fatal exception:Access violation". What might be the problem? Would really appreciate if you could lend me a helping hand in this. Thanks in advance
Hey did you solve it?
Hi, i have many questions about it. I'm surprised at how accurately you describe everything. This is the clearest tutorial I've found on youtube.
First of all, I would like to congratulate everyone on the New Year and wish everyone happiness and health!
And now to the point.
I make a model Object Detection with Tensorflow and architecture SSD MobileNet V2 FPNLite 640x640 like you. I use GPU.
Questions:
1. You said have to stop training when loss between 0.15 and 0.20. But you have 100 images. I have 756 images of two objects. Is it means than my losses have to be between 0.10 and 0.05 ? If i stop model at the moment when losses are equal 0.02 it's overfitting ?
2. If numbers of images equal 756, have I to change in pipeline.config the number epochs in field num_epoch. I put 1 and 3(optimal) earlier, but I didn't see the difference and now 10. I know that the more epochs, it can recognize better , but it will be possible to overfitting if epochs are too many, so what is the optimal number of epochs to put in num_epoch for Model Net 2, it's important ?
3. I was told that overfitting is impossible if I use this method of creation model Mobile Net, that you used. It's true?
4. If it is wrong, what are conditions of overfitting and how will I notice overfitting, how to understand what it is happening, and under what characteristics of model MobileNet in pipeline.config ?
I think the answers to these questions will help me solve this problem :
- the model recognizes perfectly the objects that it should, but at the same time it recognizes arbitrary objects that are not even an object. It's just like areas. It just outlines areas that are not understandable, but somewhat similar to each other, but she should not recognize these areas. I think this is underfitting , and I want to change that by adjusting the number of eras.
I would be glad for any help.
It's vary important for me !
Thank you for the answers !
Hi! If you're facing accuracy issues. you can definitely try training for longer. I'm not sure the size of the dataset affects the target loss. However, you may have too many images. You can try maybe halving your dataset. If you're not finding success in adjusting epochs, I'd look into lowering learning rate for more precision. You cam notice overfitting if there aren't any detections at all. Good luck! Let me know if you have any more questions.
@@Armaan_Priyadarshan Okay ! Thank you !!
@@tetianaluhacheva7682 Hey! What number did you set your num_epochs too and roughly how long did you train? I have roughly the same number of images as you and I think I need to let it sit between 0.10 and 0.05? Curious to know how you setup your pipeline! :)
@@GarethBolton I set many value of num_epochs : 1 - 20. The model seems to have gotten better with last values 20. But sometimes it seems to me that the num_epochs does not matter. Ok, 0.10 and 0.05, did you mean loss between 0.10 and 0.05. If "Yes" then read above, Armaan Priyadarshan
said that the size of the dataset don't affects the target loss. I think the size of the dataset affects the time. You need to sit between 0,20 and 0,15. I understand this in such a way that if you have more images, then the model will go longer in time to a value of 0.20, for example, with 100 images, you will train the model, for example, in 3 hours, and you will see loss = 0.20 after 3 hours. With 370 images, training will be longer, for example 5 hours, you will see loss 0.20 after 5 h (the number of hours are taken only for example and do not correspond to the hours of real training of the model). I set checkpoints, labelmap, train.record, test.reccord, batch size as 4 - because i can't set more, class 2, num_epochs differently each time.
Hi, when I run the command to generate the records this error is displayed :
...
File "C:\Users\lynxd\anaconda3\envs\tfod\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 80, in _preread _check
compat.path_to_str(self.__name), 1024 * 512)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x92 in position 122: invalid start byte
Can you help me pls?
Hi i tried implementing the transfer learning model as shown and ran into the same error that u faced when training. Could you let me know what changes u made in order to run the program? Cheers
hi ... did you solve it?
How to solve == AttributeError: module 'tensorflow' has no attribute 'contrib' , iam traning with google colab , tensorflow 2.3.0 , the error shows when run this line of code " ! sudo python3 model_builder_tf2_test.py install ", can you help me !.
Hello Armaan,
This was the best tutorial video about Training a Custom Object Detector with TensorFlow2. I did everything on tutorial and it worked like a charm. Thank you very much. Now, I want to convert my trained model to tflite in order to use on an Android device. Can you make a tutorial video for that step?
Hi! I'm glad it worked for you! I've been looking in to the TensorFlow Lite Conversion for a bit as well. So far, I've had quite a few issues due to model shape and structure. I'll try to make a video with more details when I figure it out, but you can look at these links for more information on the tflite conversion tool.
www.tensorflow.org/lite/convert/python_api
www.tensorflow.org/lite/convert/cmdline?hl=el
@@Armaan_Priyadarshan Hello! I've tried those links at first but I guess the converters are problematic right now.
Converter works with the code below but the generated tflite model doesn't work on Android. I think model shape and structure must be defined well. In addition, with out "tf.lite.OpsSet.SELECT_TF_OPS" argument converter fails. When you use "tf.lite.OpsSet.SELECT_TF_OPS" argument Android can't interfere with the model (I guess it is related to that TensorFlow Lite lack some of the OPs used in TensorFlow 2.)
github.com/tensorflow/tensorflow/issues/42114#issuecomment-671593386
Moreover, I have found an issue record on Tensorflow's github. Maybe this is the problem and I hope they will fix the issue soon.
github.com/tensorflow/models/issues/9033
Please let me know if you have any progress on converting to tflite and running on an Android device.
celalutku Oh ok. I am not too familiar with TensorFlow on Android, but if I make any progress I’ll let you know.
Tutorial on converting to TFLite model please ! :D
Max Bro Hi! Unfortunately there has been some errors with TensorFlow Lite Conversion with TF2. It ended up messing up the original model and converting it to a single layer and the given tflite model had a few errors as well.😞
Hi, I have a problem: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 117: invalid continuation byte
and
TypeError: memoryview: a bytes-like object is required, not 'str'?
HELP!!!!
Эмиль Дюмеев This is a sign that one of the paths provided is invalid. Make sure you have provided the right path to each needed file or directory.
the setup.py, that is runed by issuing "python -m pip install ." is upgrading tensorflow to version 2.4.1 and numpy to 1.19.5
This should be fine unless you want to use an older version.
Good Job! Excellent tutorials
Hi there, I started with my own model and trained it but now a just want to train for other things and when I start the training there is continuing from the step where I stopped the process last time. For example it last time I stopped the process at 6000 steps and now when I start it again with different pictures and files it starts from step 6000 and I'm just wondering if it is okay or I should do a cleanup or start with a clean setup again. Thank you in advance for your answer!
This tutorial is brilliant! Thanks a lot
Thanks! Glad it worked for you :)
Hello I am trying with Raspberry 4/Raspbian but always stopped at workspace preparation (Python m pip install). Error is something like tensorflow has no attribute contrib. Any advice? Thanks
Very easy tutorial, thank you so much. It would be great if you can make tutorial on how to evaluate this trained model in terms of mAP (mean average precision) or Average precision and Intersection Over Union IoU.
Hi! I'm glad everything worked for you. Unfortunately, I'm not entirely sure how to measure model mAp or IoU. All model metrics available can be monitored with TensorBoard. I don't know if I can make a video guide, but the written tutorial can be found here: github.com/armaanpriyadarshan/Training-a-Custom-TensorFlow-2.x-Object-Detector#monitoring-training-with-tensorboard-optional
I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
Hello. I am using tensorflow version 2.4.1 in my system. In your video when you have run the training command for the first time you encountered an eerie named "function call stack: _dummy_computation_fn". I am also encountering same issue. How to solve this issue. Please let me know.
Great video Armaan. Keep up with good work.
I have a real quick question that, I want to disable the confidence percentage is shown on each bounding box. How can I deal with that?
Thank you,
I got an error here when i run model_main_tf2.py, it says ModuleNotFoundError: No module named 'official.modeling.optimization' what's the problem?
How long does the "python -m pip install ." command take. Mine is taking too long. İs it a problem ? I could never finish
did you find the answer?
hello just wandering how and if changing the number of epochs would help improve the model thanks
Hi! This is a great question! I haven't done much experimenting, so I'm not what the answer is. You can test it out and compare evaluation results with different epochs. Evaluating the model is a step I added recently so it's only in the written guide.
@@Armaan_Priyadarshan i looked at the pipeline.config file and see I can change the number of epochs there but when would it start running the second epoch as with the default of one the model training seem not to stop or have i just not run it for long enough. thanks
Wow good job buddy
Hey again, I had a friend ask me, if he trains his model on his own images of say 100, and later wants to add say another 100 images, is there a way to train on top, so keep the previous 100 images it learnt on and add the 100 more, or do you have to retrain the whole thing on the 200 images instead. Thank you, this is all for educational purpose at university. Thanks again. P.s If this is possible would you consider making another video? I am sure there are lots of others out there that could benefit too
Hello. i am trying to detect beverage bottles on the shelf. How to improve accuracy.... Please give me any tips....
*Update*
I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
bro, cam is opening but it does not detect any object? can you help me?
@ I don't know if that's a camera issue. Make sure to use pip install opencv-python and does your model work on images or video?
Armaan Priyadarshan pic or cam work perfectly but they dont detect any object.
Armaan Priyadarshan actually while training i got an error but i ignored, does it error cause problem?
@ That's probably why it's not detecting. Make sure training is working.
Hi, Can you tell how can I get the bounding boxes for each detected class? Like right now I am getting an array with 100 boxes' coordinates if i print the boxes in the visualize_boxes_and_labels_on_image_array function from viz_utils. Can you look into this please??
Can you please say in which part of the code you are save the model and checkpoint - when we run model_main_tf2.py file at the end we will get saved model so which part of code it is doing
Thank you for video. It is very helpful! I have a question about the model config file. In that file, there are some lines like eval_config, and eval_input_reader. Those eval means validation during the train? or test the model?
nice work, i wanna add an image generator to get a more accurate model can you help me with that?
Great job😁!! Can you tell me how many epoch the program use during the trainning and can we change it? And what's will happen if we force(ctrl+c) to stp the training?
Thank you bro!!
Hi! Sorry for the late reply, but you should be able to configure epochs in the training pipeline under eval_input_reader. Stopping training means everything done so far will be retained and you can continue whenever you want I believe.
Do you know how I can get the coordinates of the bounding boxes in real time from your code? I need to use them for something else. Thank you
Great work, thanks!!
Quick question for you - in the TF-webcam-opencv.py file - let's assume I have 2 classes and want to show the count for each class...what would you suggest is the approach to take to show object 1: count, object 2: count instead of the default Objects Detected that is currently displayed.
Hi! This is totally feasible. I'd add a separate total count variable for each class and just use a conditional statement to add on to each respective variable. To do so, you can use the class of each detection.
I can't download SSD MobileNet V2 FPNLite 640x640. " This site can't be reached" issue
Same problem
Try this link download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
hi Armaan thank you for making this video very easy to follow! just one question, after i finished training my custom dataset to detect couple of sushi types, and try to test it with your TF-image-od.py code, somehow ive managed to produced this error :
2020-10-05 11:51:36.745875: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error
my device is as follows:
windows 10 64bit
CUDA 10.1
cudnn 7.6.5
tensorflow 2.3.0
python 3.8
nvidia gt 740m with 425.31 driver (april 2019, this is the 'latest' driver for my gpu)
someone suggest to reinstall the tensorflow but it still produce this error, and the other suggest to upgrade the latest gpu driver but in my case the latest driver is as old as april 2019, any idea to navigate this issue? thanks again
Hi! Unfortunately it seems your GPU is a bit too old to run TensorFlow-GPU. I believe you must have a GeForce GTX 650+ Graphics Card. You can try out TensorFlow on the CPU but it’ll be significantly slower.
@@Armaan_Priyadarshan thanks for your reply! It works with tensorflow cpu. Maybe its time to buy new laptop!
Fantastic Tutorial, is there a way to save the output of the image script and more importantly the video script, would be great to save the video it displays
Hi! At the moment, the best option is just taking a screenshot or screen recording. However, the code for saving output wouldn't be too hard to implement. . Here's an example with OpenCV that I believe would be easy to integrate with my code: www.geeksforgeeks.org/saving-a-video-using-opencv/
hello Armaan.
Do you know what loss function does the ssd model uses?
What was the resolution of the model you used? Did you detect an object from the video, if so what was fps? Thanks for answer :)
Hi! I used the 640x640 model in the video. Yes, detection worked fine for me. I haven't measure FPS however so I can't answer your final question.
how to save the checkpoint in every step and also save the best checkpoint?
Hi Armaan. it is an excellent video, thank you for that. While doing the testing of my custom images, i faced an error like "cv2.error: OpenCV(4.3.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-7o5pnn96\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'". please help me to solve this.
There are two possible issues. Either your OpenCV installation didn't work, or OpenCV couldn't find the image you provided. In the first case, you can fix it with pip install opencv-python. For the second case, make sure you have provided the path to one, single image in the .jpg or .png file format.
@@Armaan_Priyadarshan python TF-image-od.py --image img.jpg - i worked this command and it worked. But this for single image, can we do this programs for many images in a folder at a time
I had the same problem, and I am still looking for a solution, please help
@@jubileem.sibandajubbs2175I edited TF-image-od.py file and wrote the commands for multiple images selection, it worked
Amazing! Good job.
Thanks!
Great Work! Wonderful. I tried your way and it worked well . I had this warning /error -" with ops with custom gradients. Will likely fail if a gradient is requested". What should I do ? Please help me out
Hi! I don't think this is an error. It's very common to encounter many such warnings before training starts. You should try proceeding and let me know if you find any other errors while testing.
Armaan - how important is consistent image size when training/labelling the images? I've got big 2000x2000 pixels images - should I downscale these first?
Hi! Yes, I would definitely resize the images. Larger image sizes will lead to longer training times. There aren't too many downsides, so I would go for it.
@@Armaan_Priyadarshan Thank you - I did this earlier today and the model trains well. I still am getting some issues every now and then with my gpu blowing up (I have a Geforce 1650 GTX).... I get messages saying it can't allocate enough memory etc. or CUDA memory errors...not sure if there are other settings we can try to limit how much of a GPU it uses . Also would you suggest gpu for training and then maybe cpu for inference?
Hey man, great tutorial. Do you know why I get the "module 'tensorflow' has no attribute 'contrib'" error? Have you ever encountered this when trying to train the model? I'm using google colab, maybe that's the issue?
I'm not sure, I think someone else was asking about this error, but I've never experienced it.
hey, love the video! one question: got any tips to test multiple pictures after training?
Hi! I haven't made a script for testing multiple images at once yet. You might be able to edit one of the programs to cycle through a directory of images. The only other thing I could think of is using a Python terminal to load the model and run on multiple images.
hello
I'm not sure if you are still answering these comments but I dont have the option to run anaconda prompt as admin. I am doing this tutorial in an account with admin controls but it still doesn't come up. Thanks in advance
thomas goode That should be fine for Anaconda, but can you download stuff such as CUDA and cuDNN? Those need Admin Access so just asking.
now i am training the model but the loss is increasing is that normal and it is going to decrease after few steps or this means that i have something wrong..?
after a while i get loss=nan what should i do
Hi! This definitely isn't normal. I would try lowering the learning rate maybe.
@@Armaan_Priyadarshan can you please guide me to lower it
Hi. Did you get to run evaluation on your data? For me it always says "Waiting for new checkpoint"
For me it worked. Other people also had the same issue but it worked eventually for them. You might want to make sure that you have enough checkpoint files in your training directory.
Hello Armaan, nice to meet you.
I have got some problem in the first part.
Do you have any suggestions?
Thank you so much!
>>> import tensorflow as tf
Traceback (most recent call last):
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 64, in
from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模組。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\__init__.py", line 41, in
from tensorflow.python.tools import module_util as _module_util
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\__init__.py", line 40, in
from tensorflow.python.eager import context
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\eager\context.py", line 35, in
from tensorflow.python import pywrap_tfe
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tfe.py", line 28, in
from tensorflow.python import pywrap_tensorflow
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 83, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\Users\simon\anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 64, in
from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: 找不到指定的模組。
Failed to load the native TensorFlow runtime.
See www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
li simon Do you have Visual Studio 2019 with C++ Build Tools?
@@Armaan_Priyadarshan
Dear Armaan,
After installing Visual Studio 2019 with C++ Build Tools, I appear to install the TensorFlow GPU successfully.
Thank you very much.
>>> import tensorflow as tf
2020-09-15 21:50:01.315223: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-09-15 21:50:01.322622: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> print(tf.__version__)
2.3.0
I will try the remaining parts in this Sunday :)
Again, thank you!!
Hi, Armaan very easy tutorial. I am getting an error while training the model at this command
python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config
and the error is
tensorflow.python.framework.errors_impl.InvalidArgumentError: NewRandomAccessFile failed to Create/Open: C:\Tensorflow\workspace raining_demonnotations\label_map.pbtxt : The filename, directory name, or volume label syntax is incorrect.
; no protocol option
Hi! Inside your label map, try using forward slashes instead of backslashes in the paths.
Hi, Armaan thank you very very much for the suggestion. Yes, it worked. Training is going on. I also put forward slash in checkpoint, train.record and test.record files too
great work brother
in this video we are using fpn 640x640
is it important to change the resolution if my source image is 1280x720?
if yes
i'm already do it but i found
ValueError: Dimensions must be equal, but are 46 and 45 for '{{node ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/add}} = AddV2[T=DT_FLOAT](ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/Reshape, ssd_mobile_net_v2fpn_keras_feature_extractor/FeatureMaps/top_down/projection_2/BiasAdd)' with input shapes: [3,46,80,128], [3,45,80,128].
can you please help me brother?
I believe each dimension has to be the same. If 1280x720 is your resolution adjust the model parameters to 1280x1280 maybe.
Hi Armaan, I wanted to thank you so much for making this video, you literally saved my life. I'm doing an object detection project for my Data Science course. Can you please help me get metrics for evaluating my test set, without having to look at each in OpenCV individually? I need concrete numbers such as Average Precision and Average Recall to compare it to other models I've tried. I can't find how to do this anywhere.
Jordan Darbyshire Hi! I’m glad the video helped you out! As to finding precision, recall, and mAP, I’m sorry to say I haven’t found anything yet. I did a bit of research, and I do have a few ideas. The first is training a different model. The Efficientdet and other TF2 specific models should have the ability to log mAP during training if the argument -alsologtostderr is given while running the training script. The other option I found was using matplotlib for which you can find more info here stackoverflow.com/questions/46274514/precision-recall-curve-in-tensorflow-object-detection-api. I’m sorry for not being able to give a better answer, as there isn’t much documentation or information online.
@@Armaan_Priyadarshan Ok thanks a lot Armaan. I know I'm surprised there isn't much information about this. I'll let you know if I find out anything helpful. Keep up the good work, look forward to seeing more videos from you!
Hi!
I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
@Armaan Priyadarshan Yes, I am using SSD MobileNet V2 FPNLite 640x640 model. And yes i also reduced the batch size upto 4 but no result. My GPU is of 2GB is that really mater in this regards?
Yes, you should try reducing batch_size to 2, or maybe 1 if that doesn't work
How did you download the SSD MobileNet V2 FPNLite 640x640 model? "This site can’t be reached" error.
Hi Armaan. Thanks a lot for your amazing video. I was working on this for many days until I saw your video and found the solution. I was wondering how I can export the number of pixels for the box it draws on the pills for image or video? Just like the xml file we used to annotate images but for the output. Thanks again
Top Games Hmmmm.... This is definitely something new to try. Unfortunately, I haven’t attempted anything similar, so I’m unsure how to help. I’m sure there might be a way using OpenCV as it has a vast number of functions. If you do find anything though, feel free to share and I’d be happy to take a look.
@@Armaan_Priyadarshan Thanks man. I figured out that "visualize_boxes_and_labels_on_image_array" will return only the image with the box already attached to it. So I modified it and now it returns the coordinates as well as the image.
@@Armaan_Priyadarshan Another question if its possible. I am wondering if the "ssd_mobilenet_v2_fpnlite" model is trained, why did we train it again? Would this training process configure "ssd_mobilenet_v2_fpnlite" so it can detect our object better? or is it gonna create a new model? Thanks a lot
Top Games Hi! We mostly use the pre-trained model for the pipeline.config file as well as certain checkpoint files needed for training. The my_ssd_mobilenet_v2_fpnlite folder is then used as a training directory so we can export the model later on. And I’m glad you found a solution. If you take a look at the TF-image-object-counting.py script, OpenCV might be a bit easier to work with than viz_utils as there’s a bit more flexibility. For example formatting and printing the xmin, ymin, xmax, and ymax variables from inside the loop can provide box coordinates if that’s what you wanted to do. Thanks for sharing too! You never know, someone else might be trying the same thing!
How to validate the accuracy? Also how to show the mAp graph,test loss, and validation loss?
Budz Altar You can follow the step regarding TensorBoard and view various model metrics. The written guide on GitHub has better instructions as there was an editing error around that time in the video
I've finally figured out how to evalaute the model! The TF documentation is here:github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
To find metrics such as Average Precision and IoU just run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from within the C:\TensorFlow\workspace\training_demo. If you find TypeError: object of type cannot be safely interpreted as an integer, just downgrade your Numpy Version to 1.17.3 which worked for me.
How to use script export_ inference_ graph.py tensorflow2.3
So what are the meanings of the two folders "test" and "train" in
C:\TensorFlow\workspace\training_demo\images
Is it mean that my tensorflow learn what are the things in "train" ?
Then what about the "test" ?
Is it just for testing our script work or not ?
Or basically what I want to ask is "What things should I place in these two folders ?"
By the way ! You did a great tutorial ! Nice job!
Hi! The test and train folders contain the images and labels of your test and train set. TensorFlow uses Supervised Learning, so these are required for your dataset. You should place your images in these folders, 20% of them in the test folder and the other 80% in the train folder. After you've labelled your images, you can generate RECORD files and train the model.
@@Armaan_Priyadarshan
Can these two folder's images been repeat?
Or it have to totally different?
Thanks for reply. :>
@@kbh24758 You should definitely put different images in each folder. Once you've prepared your dataset, put 4/5 of the images in the training folder and rest inside the test folder.
@@Armaan_Priyadarshan
Sorry,Still got one problem ><
Why I can't change another photo to detect?
It always appear this
File "TF-image-od.py", line 96, in
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
Why?
Have you a tutorial about classifying images, not detecting object on them?
Секреты Успеха No I haven’t I might look in to that. But from what I’ve seen so far, there’s a command line tool which makes the process quite a bit simpler than this one.
Thanks, boss
hey nice work ... i'm having the exact same error that you had at time 23:41
I also get the same area at that exact same point. Please explain how you fixed this error
Sir. I can't download the SSD MobileNet V2 FPNLite 640×640. This site can't be reached issue
Try download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
Thank you Sir
Hello, how did you fix the error when training @ 23:40?
Errors may have originated from an input operation.
Input Source operations connected to node ssd_mobile_net_v2fpn_keras_feature_extractor/model/Conv1/Conv2D:
fn_1 (defined at C:\Users\a1812\anaconda3\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py:367)
It seems the object_detection API didn't build correctly. Maybe try downgrading version to 2.2.0 or repeat steps to build object_detection.
Armaan Priyadarshan Do you mean pip install tensorflow-gpu==2.2.0 ? Because I already tried this but not working
@@lionking3608 Are you using TensorFlow CPU or GPU? You aren't able to install?
@@Armaan_Priyadarshan I used TensorFlow GPU 2.2.0 with 960M graphics card and it finally worked thanks bro
@@Armaan_Priyadarshan But when I ran TF-image-od.py script. The output image is zoomed in a lot and doesn't fit within my laptop screen. how do I fix this?
How to find test accuracy and train accuracy?
Where is the test.record file is used
Karan Sharma Hi! The test.record file is used in the training pipeline.config, it mostly contains information about the labels and bounding boxes given for your images. For model accuracy, you can use the step with TensorBoard to find model metrics. If you want numbers on your pre-trained model, you can check the TensorFlow Model Zoo which has speed and accuracy listed for each pre-trained model.
It is a great tutorial. I could train my model fine but I used the ssd_mobilenet_v2_fpnlite_320x320.
I have been trying to convert to TFLite but I couldn´t.
Could you make a tutorial for TFLite conversion.
I've just posted a video on TensorFlow Lite Conversion here: ruclips.net/video/2ofuUdCDppc/видео.html
Im trying to detect food dishes from pictures and recommend a recipe after it detects what the picture contains. Any idea on how to approach this? I think your video will help training the model to detect the food dishes but how will I recommend a recipe for that dish? Do you think using a recipe api will work?
Hi! I'm pretty sure this is possible. A recipe API can work too. You can check the label of the food detected and declare a recipe based on that.
@@Armaan_Priyadarshan Tysm!!!!!!!!
@@Armaan_Priyadarshan Hi could u give me some insight on how to check the label after classifying an image? I am not understanding how to do that. If you could make a turorial on this would be so helpful
@@woonie3134 Hi! To do so, you can write some code in the for loop I defined on line 121 in the TF-image-object-counting.py script. This line right here determines the object name of every detection: object_name = category_index[int(classes[i])]['name']. With a simple API, you can probably preserve this information to determine the label of every detection in the frame.
@@Armaan_Priyadarshan I have decided I will be storing the recipes in a dictionary and returning it as a result to the image classified. I choose to do this mainly for simplicity and focus more on classifying food. However I have never worked with dictionaries before so how could I possibly evaluate a result from it? If you have any knowledge on this I would aprecciate if you shared it 😁
Is there any way to run this on my mac? If not, is there an equal alternative that is similar? Thank you
Mr.Sloth you should be able to just adjust the commands and paths
@@Armaan_Priyadarshan Well, the problem is the Cuda download. It is only a limited version for MAC
Hey bro , I have done everything exactly same . but at the end when the image is displayed without any detection. is there anyhthing i need to change? , kindly reply.
You should try lowering the confidence threshold to see if any detections come up. This can be done by specifying it in the command (eg. python TF-image-od.py --threshold 0.3)
@@Armaan_Priyadarshan Hii , Thank you so much for your reply. I cant appreciate enough. I tried by reducing the threshold to 0.3 , it didn't work . but changed it to 0.2 then it show some detection but not good enough. what do you think the problem is,?. I stopped training after loss reached 0.205 , it could get any lower as I waited for 6000 steps .what do you think the problem is?
Again thank you so much:-)
@@GamingIn30s That seems to a fine loss, I'm not sure if that's the problem. By any chance, could you tell me what you're trying to detect?
@@Armaan_Priyadarshan I am not using any new data set . I am first trying to duplicate your results. I am using your dataset and trying to get same results as you. I trained again and i got the loss 0.19 but still there are no proper detections.
Hello, my computer has a Nvidia Geforce RTX 3080 GPU. I installed cuda 10.1 ,cudnn 7.6.5 and network Faster R-CNN ResNet50 V1 640x640. When I trained the network, loss is nan. How to solve the question?
Hi! The last time I tried using ResNet models there were quite a few errors, so I haven't revisited it yet, Secondly, you'll want to update your CUDA and cuDNN versions to the newest version of TensorFlow. Here's a link to the tested configurations. www.tensorflow.org/install/source#gpu
@@Armaan_Priyadarshan When I refered to this web( www.tensorflow.org/install/source#gpu), I installed cuda 10.1 and cudnn 7.5.0(Cudnn doesn't have 7.4) with windows 10. I have an error:
2021-01-19 10:06:57.282583: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-01-19 10:08:00.512682: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-01-19 10:18:18.632749: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-01-19 10:18:18.638607: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.5. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
File "model_main_tf2.py", line 114, in
tf.compat.v1.app.run()
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 105, in main
model_lib_v2.train_loop(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop
load_fine_tune_checkpoint(detection_model,
File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 367, in load_fine_tune_checkpoint
strategy.run(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1211, in run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2585, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 584, in _call_for_each_replica
return mirrored_run.call_for_each_replica(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 78, in call_for_each_replica
return wrapped(args, kwargs)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
return self._call_flat(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
outputs = execute.execute(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node functional_1/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
[[Loss/RPNLoss/BalancedPositiveNegativeSampler/Cast_8/_302]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node functional_1/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dummy_computation_fn_21950]
Errors may have originated from an input operation.
Input Source operations connected to node functional_1/conv1_conv/Conv2D:
functional_1/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
esnet_v1.py:49)
Input Source operations connected to node functional_1/conv1_conv/Conv2D:
functional_1/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
esnet_v1.py:49)
Function call stack:
_dummy_computation_fn -> _dummy_computation_fn
@@user-db2md6wv3c Hi! I believe your versions are outdated. TensorFlow 2 versions can't use cuDNN 7.5. The newest version supports CUDA 11.0 and cuDNN 8.0. The download links are: developer.nvidia.com/cuda-11.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exenetwork and
developer.nvidia.com/compute/machine-learning/cudnn/secure/8.0.4/11.0_20200923/cudnn-11.0-windows-x64-v8.0.4.30.zip
@@Armaan_Priyadarshan I refered to your instruction to install cuda and cudnn.
But I have an error:
See `tf.nn.softmax_cross_entropy_with_logits_v2`.
2021-01-20 15:11:34.676290: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-01-20 15:11:35.241074: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-01-20 15:11:35.255702: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-01-20 15:11:36.015093: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-20 15:11:36.017068: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-01-20 15:11:36.021009: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-01-20 15:11:36.022768: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
Traceback (most recent call last):
File "model_main_tf2.py", line 114, in
tf.compat.v1.app.run()
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 105, in main
model_lib_v2.train_loop(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 564, in train_loop
load_fine_tune_checkpoint(detection_model,
File "C:\Anaconda\envs\tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 367, in load_fine_tune_checkpoint
strategy.run(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 1259, in run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2730, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_strategy.py", line 628, in _call_for_each_replica
return mirrored_run.call_for_each_replica(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\distribute\mirrored_run.py", line 75, in call_for_each_replica
return wrapped(args, kwargs)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 2942, in __call__
return graph_function._call_flat(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\function.py", line 555, in call
outputs = execute.execute(
File "C:\Anaconda\envs\tensorflow\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
[[Loss/ToAbsoluteCoordinates/Assert/AssertGuard/pivot_f/_83/_55]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv1_conv/Conv2D (defined at \site-packages\object_detection\meta_architectures\faster_rcnn_meta_arch.py:1340) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dummy_computation_fn_16270]
Errors may have originated from an input operation.
Input Source operations connected to node model/conv1_conv/Conv2D:
model/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
esnet_v1.py:49)
Input Source operations connected to node model/conv1_conv/Conv2D:
model/lambda/Pad (defined at \site-packages\object_detection\models\keras_models
esnet_v1.py:49)
Function call stack:
_dummy_computation_fn -> _dummy_computation_fn
@@user-db2md6wv3c Have you tried downloading the most recent driver version for your graphics card?
hello guys ,Follow your training model,it's OK!But I used tensorflow C API to load the model,report errors!Address at if (TF_ GetCode(status) != TF_ OK) ,pls why ?
I'm not entirely sure. I haven't done much work with the C API, so I don't have the answer to that.
@@Armaan_Priyadarshan python export_ tflite_ ssd_ graph.py
What's his second parameter? Is pipeline.config?
why error 'utf-8' codec can't decode byte 0xbe in position 140: invalid start byte
@@Armaan_PriyadarshanI want pb to tflite
@@陈周-z1s Hi! I'll try to put out a tutorial when Raspberry Pi Support is released as I feel it might do a bit better. Here's a guide you can follow for now. github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_mobile_tf2.md
You can add me on discord with the tag in the description if you want more detailed instructions.
Hey great tutorial, I wanted to ask you How do I make the model_main_tf2 to evaluate only??
Run python model_main_tf2.py --pipeline_config_path models\my_ssd_mobilenet_v2_fpnlite\pipeline.config --model_dir models\my_ssd_mobilenet_v2_fpnlite --checkpoint_dir models\my_ssd_mobilenet_v2_fpnlite --alsologtostderr from the training demo directory. Downgrade Numpy to 1.17.3 if you get Numpy errors.
@@Armaan_Priyadarshanoh thank you very much for the solution but how am i supposed to know it is running in evaluation mode
@@bigyansubedi3386 It will provide various metrics instead of showing training step logs.
Hey thank you for the advice i evaluated the model i found some images that were predicted correctly but there is no scalar or graph for any precision or accuracy.I don't understand why?
Hello! @Armaan Priyadarshan As per your suggestion i downgraded my tensor flow from version 2.3.0 to 2.2.0. and now i am getting the new error, i.e.
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
[[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Loss/regularization_loss/write_summary/summary_cond/pivot_t/_24/_219]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
[[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference__dist_train_step_44094]
Function call stack:
_dist_train_step -> _dist_train_step
Do you have any idea, how to resolve it?
Muhammad Waqas Ali May I ask which pre-trained model you’re using? And can you try reducing the batch_size in your pipeline.config to 4
@@Armaan_Priyadarshan Yes, I am using SSD MobileNet V2 FPNLite 640x640 model. And yes i also reduced the batch size upto 4 but no result. My GPU is of 2GB is that really mater in this regards?
Muhammad Waqas Ali Oh yes, that might be a bit of an issue. Try reducing to 2 and if that doesn’t work reduce it to 1 and try retraining otherwise you might get OOM errors.
can i put the price for the object model?
Hi! I’m not entirely sure what you mean.
*Update*
I've just made a tutorial for the Raspberry Pi as quite a few people had questions about it: ruclips.net/video/PWMQQAL0PCM/видео.html
Hi Armaan, I actually got the same error as you did at 23:44. Can you tell me how you fixed this on your computer?
Caroline Lee Hi! I think I got this error while recording as training couldn’t run simultaneously with OBS due to the limited system resources. I rebuilt the Object Detection API and fixed all the Python package versions just to make sure this was the case. Some others fixed the error by downgrading the TensorFlow version to 2.2.0.
@@Armaan_Priyadarshan thank you! That was it. My system only had 3 gb of graphic memory but even though I put it as a a batch size of 3, I still had to reduce it to 2. Also closed some background programs out too which helped
@@Armaan_Priyadarshan I got another error when I was trying to export the inference graph: TypeError: 'NoneType' object is not iterable.
The third error occurred when I ran the python TF-image-od.py file:
Running inference for images/test/i-1e092ec6eabf47f9b85795a9e069181b.jpg... Traceback (most recent call last):
File "TF-image-od.py", line 96, in
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
Any ideas on how to fix these errors? I'm trying to train a model to detect potholes in the road, so I'm not using your test files (but when I did use yours it worked perfectly). I can't figure out why Python is still looking for the old files since it was replaced with the pothole ones. I did the pip install open-cv-python command and I still had the same error.
Also if we're testing the model out on the images under the "test" folder, why do we still need to label them?
Caroline Lee Hi! I’m pretty sure I’ve figured out how to fix your issue. The path to the image provided to the program seems to still be the default. You can provide the path to your image with the -image argument or you can edit the default path from within the program. To answer your second question, I believe TensorFlow uses the test set to repeatedly train and provide the loss. Unlike the train set, the test set will not be immediately recognized by the model and is still a decent way to measure accuracy and test.
@@Armaan_Priyadarshan Thanks for answering!
The pothole images were placed in the same place as the pill images. I'm not sure what you mean by the image argument, I looked at the python file and it looks like you already have that there (args.images) so I created a copied the path of the test folder and pasted into where it asked for the image paths. It threw me the same error. How would I go about editing the default path from the program?
Update: I think I know what you mean now. I edited the python file a bit and ran this in anaconda: TF-image-od.py --image images/test/pothole.jpg, but I still got hit with this error:
Running inference for {'image': 'images/test/pothole.jpg'}... Traceback (most recent call last):
File "TF-image-od.py", line 98, in
image = cv2.imread(IMAGE_PATHS)
SystemError: returned NULL without setting an error
Update 2: Alright, nevermind I think I got it. Do you have any tips on how I should train the model? I'm think I should go from non-busy background > busy background because right now it's identifying the sky as the pothole lol
I'm getting this error: "tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes at component 1: expected [6,640,640,3] but got [6,640,640,1]." Probably because my images are grayscale. Can you say what I have to change for using grayscale image? Thanks.
I'm not sure if this is because your images are grayscale, it might be a model format or model shape error. Can you give me some more information on what step you get this error on. Are you maybe using a different model or pipeline?
@@Armaan_Priyadarshan , No I am using same models. In this step:
python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config
and I have bmp files not jpeg or png
@@anuragdalal6908 Hi, sorry for the late reply but my comments weren't posting for some reason. Unfortunately TensorFlow is used to triple-channel RGB images; Grayscale images are known as single-channel. I'm not entirely sure how to help with this issue, but I found some links with more info. stackoverflow.com/questions/48744666/tensorflow-object-detection-api-1-channel-image & github.com/tensorflow/models/issues/3369
And it would be great if you could convert the bmp images to jpg with an online converter just in case it matters.
@@Armaan_Priyadarshan I will try to do it using opencv then, I mean the image conversion
Hello, Firstly thank you so much for this tutorial. Well I am getting some error on running " python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config "
and the error is that, " tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse " . Will you please tell me how to sort it out? Thank You.
Hi! I'm unsure of this error. Did every previous step work? Can you try downgrading to TensorFlow 2.2.0 and re-trying?
@@Armaan_Priyadarshan Thank you so much for your reply. let me try it on 2.2.0, will inform you.
Every previous steps worked fine.
Hello! @@Armaan_Priyadarshan As per your suggestion i downgraded my tensor flow from version 2.3.0 to 2.2.0. no i am getting the new one, i.e.
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
[[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Loss/regularization_loss/write_summary/summary_cond/pivot_t/_24/_219]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[6,128,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator gpu_host_bfc
[[{{node swap_out_gradient_tape/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3_1/separable_conv2d/Conv2DBackpropFilter_0}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference__dist_train_step_44094]
Function call stack:
_dist_train_step -> _dist_train_step
Do you have any idea, how to resolve it?
@@muhammadwaqasali4155 That seems to be an OOM error. Sorry for the late reply, but you should try reducing your batch_size to 2 or maybe 1 and see what happens.
Hello, thank you for your awesome tutorial. I just encountered an error at the end when trying to do the training.
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node EfficientDet-D0/model/stem_conv2d/Conv2D (defined at C:\Python36\lib\site-packages\object_detection\models\ssd_efficientnet_bifpn_feature_extractor.py:220) ]] [Op:__inference__dummy_computation_fn_24408]
Errors may have originated from an input operation.
Input Source operations connected to node EfficientDet-D0/model/stem_conv2d/Conv2D:
args_1 (defined at C:\Python36\lib\site-packages\object_detection\model_lib_v2.py:372)
Function call stack:
_dummy_computation_fn
Do you happen to know what might be causing this error? I've run two different models, and still get the same error.
I've only encountered this error when my system was overtaxed (like when recording). To fix it, the only methods that worked for me were either restarting my system or reinstalling TensorFlow.
@@Armaan_Priyadarshan After restarting my system and even keeping the batch size as 1 and after reinstalling the tensorflow version I encountered "Allocator GPU_0_bfc ran out of memory trying to allocate 2.26GiB" message. Even though script continues to run but nothing appears in the command prompt console with regards to loss information. How to resolve it. Please tell....
@@ashishshrivastava9966 Hi! In this case, you'll want to lower the batch size in the pipeline. First try 4, then if it still doesn't work, you can try 2, and then 1.
@@Armaan_Priyadarshan In my case I had kept batch size equal to one only....but I am getting the above error.
Hi, I am struggling to continue until testing the installation (python ...\model_builder_tf2_test.py). I always receive the following message:
from tensorflow.python.keras.layers.preprocessing import image_preprocessing as image_ops
ImportError: cannot import name 'image_preprocessing'
TF version is 2.10
Keras is 2.3.1
I did install Microsoft Built tools 2019, cuda 10.1, cudnn 7.6. Unfortunately, I have to downgrade my TF due to persistent error messages like DLL failed load, etc.
I have been doing this for 3 days.
Where am I doing wrong here?
kabak2abak Do you have Microsoft Visual Studio 2019 with C++ Build Tools because I believe that is dependency for TensorFlow 2.3.0. And just a question, are you using the CPU or GPU version?
@@Armaan_Priyadarshan I followed every of your instructions, including download CUDA, cudnn, C++ Build Tools. When I tried to import tensorflow, I always keep receiving 'ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed.'
I installed pip install tensorflow-gpu
Tensorflow GPU: 2.3.0
Python: 3.8.3
kabak2abak Can you try restarting the process, as I usually find this to be cause of most errors. Following the written guide on GitHub might help. Try uninstalling and reinstalling MSVC, Visual Studio, Anaconda, and CUDA & cuDNN. I’ve seen this error before, and it’s usually just a Visual Studio Error. Make sure everything is on your PATH as well because this can be important!
@@Armaan_Priyadarshan Well, I will try. I have been doing the installation more than 3 days now, and none of which is successful. You mean I should uninstall MSVC, Visual Studio, Anaconda, and CUDA & cuDNN? then reinstall them again?
What does it mean by everything is on my Path?
I have set environment for all. The variables are PythonPath and Path. Should I remove them as well before uninstalling?
@@Armaan_Priyadarshan I had been reinstalling everything. I still cannot get it done, unfortunately. Do you install nvidia drivers?
yo how do i fix this, it took like forever man...
INFO:tensorflow:Waiting for new checkpoint at models\my_ssd_mobilenet_v2_fpnlite
I1010 16:50:29.150513 23484 checkpoint_utils.py:125] Waiting for new checkpoint at models\my_ssd_mobilenet_v2_fpnlite
Which step did this occur at? If you have finished training, could you let me know how many checkpoint files you have in your models\my_ssd_mobilenet_v2_fpnlite directory.
@@Armaan_Priyadarshan never mind man... it worked, i cancelled the waiting for checkpoint then i basically just do the next step
Ernesto Younes That’s great!
Thank you
hello armaan i geting like fail
(tf2) C:\Tensorflow\workspace\training_demo>python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config
Traceback (most recent call last):
File "__init__.pxd", line 942, in numpy.import_array
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Tensorflow\workspace\training_demo\model_main_tf2.py", line 32, in
from object_detection import model_lib_v2
File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\model_lib_v2.py", line 29, in
from object_detection import eval_util
File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\eval_util.py", line 35, in
from object_detection.metrics import coco_evaluation
File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\metrics\coco_evaluation.py", line 25, in
from object_detection.metrics import coco_tools
File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\object_detection\metrics\coco_tools.py", line 51, in
from pycocotools import coco
File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\pycocotools\coco.py", line 56, in
from . import mask as maskUtils
File "C:\Users\ufuka\.conda\envs\tf2\lib\site-packages\pycocotools\mask.py", line 3, in
import pycocotools._mask as _mask
File "pycocotools\_mask.pyx", line 23, in init pycocotools._mask
File "__init__.pxd", line 944, in numpy.import_array
ImportError: numpy.core.multiarray failed to import
How to solve this error?
Hi, thanks for sharing this video. I have fished training models with my own data. However, when I tried to test out the finished model, I got the following error:
--------------------------------------------------------------------------------------------------------------
Loading model...Done! Took 24.696925401687622 seconds
Running inference for images/test/i-1e092ec6eabf47f9b85795a9e069181b.jpg... Traceback (most recent call last):
File "TF-image-od.py", line 96, in
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-9d_dfo3_\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
----------------------------------------------------------------------------------------------------------------
I've tried to install opencv-python with version 4.3.0.38 but still got this error.
Could you please help me fix it? Thanks a lot.
You don't need to install that version I don't think. I'm pretty sure you didn't provide the right path to the image. The program seems to be running inference on the default image which you shouldn't have unless you're using my dataset. You can use the --image argument to specify the path to the image, or you can edit the default value in the program.
@@Armaan_Priyadarshan Thanks for your reply. Yeah, you're right. I just forgot to specify my directory to the image. The rest of the program ran perfectly as expected. This tutorial is very helpful.
working now trough tutorial, i stuck a bit at git of tensorflow models, maybe it would be good to put a link to git, for us nonprogramers can be big problem why command doesn't work
Majstorsky Majstor Hi! This is a good idea. You can install git from here:git-scm.com/download/win. Then just open up a new terminal and everything should be good to go.
@@Armaan_Priyadarshan tnx , im already finished and currently training the model. I tried lots of tutorials but this one is the best, its quite the achievement to make tutorial for such heavy field and that average person can follow it.BTW I tried to start tensorboard server but it cannot recognize command? And is there option for deploying this object detector in a simple way , like some exe app that you just install and use?
Majstorsky Majstor Thanks! The step with TensorBoard is optional and not necessary for training. If you are interested, you should follow the written tutorial as there was an editing error in the video at that time stamp. As for making a simple application, that might take a bit of work. TensorFlow is a popular field when it comes to Android and iOS, but it’s a bit more of a complicated process.
@@Armaan_Priyadarshan here is my first try ruclips.net/video/C1eR6todLto/видео.html . Is there a way to make detection cutoff, for example to show only detections above some number , like 90%. And is there a bit of GUI where i can load files and start scripts? There will be lots of typing if i want to check 100 files.
Majstorsky Majstor There is an argument while running called threshold where you can specify your minimum confidence. For example python TF-image-is.py -threshold 0.9 will only show detections above 90% confidence. As for a GUI, that will take a little editing of the program. However, you can make it perform inference on a directory of images.
Can you use this on a Macbook Pro 2020?
Hi! You can, but you'll have to use the TensorFlow CPU variant. This means training will take longer and you won't be able to use your computer during training. It's probably possible though.
@@Armaan_Priyadarshan How long would the training take if I use the CPU variant? Do you go over the CPU variant in the video?
I also do not have an NVIDIA chip
@@krishdesai1097 I attempted it on my old laptop, and it took a few days. Currently, my estimate would be at least one day.
Impressive work @
Armaan Priyadarshan ... Please guide me with this error after I ran python object_detection\builders\model_builder_tf2_test.py
"tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse"
Please note that I have reinstalled tensorflow-gpu==2.2.0 but still getting this error.
Thanks.
This error occurs when you have a background process running. For example, if you are running two python programs that use TensorFlow at the same time. I'd recommend just restarting your machine and retrying.
@@Armaan_Priyadarshan Thanks for the reply.. I restarted the system, but still getting the error.
@@muhammadhammadsaleem8569 Hi! Were you able to compile your protos correctly?
@@Armaan_Priyadarshan Yeh I compiled protos and checked the .proto files are displayed in the protos folder.
Does it have a webcam to display ?
Not yet, but I will be adding that in a few hours.
I've added webcam support if you were still wondering
@@Armaan_Priyadarshan where can i find the code for implementing object detection through webcam?
Shubham The script should be located in the workspace/training_demo directory. It’s called TF-webcam-opencv.py.
Thanks 😊@@Armaan_Priyadarshan you are really a saviour 😊
Красава
Hi Armaan, I was just following the video and training with the dataset that I labelled. Everything went smoothly and was blocked at the 'Generating Training Data' stage. I changed the label map to two ids and went to the "scripts/preprocessing" folder and entered "python generate_tfrecord.py", and the error "Index Error: child index out of range" appeared. I'm doing a tutorial same as the video, I just changed the images that I labelled in the image folder. what's the problem?
Hmmmmm.... Can you raise an issue in the Github Repository? It would be great if you can send your labelmap.pbtxt, tell me where you saved the XML documents, as well as what command you ran for generate_tfrecord.py. The images should be labelled with LabelImg with a rectangular box drawn as well as a label provided. If you could send maybe one or 2 of your XML documents it would be great, because I could see if you labelled the images right. You should also make sure that you deleted the given train.record and test.record along with all the images and XML Documents of my Pill Detector.
@@Armaan_Priyadarshan If I run a command "python model_main_tf2.py --model_dir=models\my_ssd_mobilenet_v2_fpnlite --pipeline_config_path=models\my_ssd_mobilenet_v2_fpnlite\pipeline.config", I got a error like this :
pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [[0.611111104][0.758454144][0.724637747]...] [[0.205314025][0.432367146][0.533816457]...]
[[{{node Assert_1/AssertGuard/else/_35/Assert}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
what's the matter of this error? please help
김응찬 I am not sure of this error. Can you try to configure training with the Pill Model and given data to see if it starts. It might be a pipeline issue or a data issue, just start training, complete the example, and find the nature of the issue.
What to do when I want to train by Faster RCNN ??
The process for altering the pipeline should be a bit different but there's not much to do.
@@Armaan_Priyadarshan I fixed the pipeline
.
Do I need to fix anything more ?
I fix the pipeline as you instructed
I can run SSD MobileNet V2 FPNLite 640x640,
I also run SSD MobileNet V1 FPN 640x640
but I can't run SSD ResNet50 V1 FPN 640x640 (RetinaNet50) and the algorithms below
@@hoyt2603 Hi! There are often a few similar errors sometimes. Last time I tried using ResNet I had some issues as well. I'm pretty sure TensorFlow is trying to fix them, but I'm sure if there's anything you can try.
Hello Armaan,
I am doing to test my installation with:
python object_detection\builders\model_builder_tf2_test.py
However, I have got some problem
.
[ FAILED ] ModelBuilderTF2Test.test_create_ssd_models_from_config
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,1,512,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Add]
----------------------------------------------------------------------
Ran 20 tests in 93.266s
FAILED (errors=1, skipped=1)
What should I do?
Thank you very much.
li simon Hi! How much memory does your GPU have? If you’re using an older GPU, you might want to just install the CPU only version of TensorFlow to avoid Out of Memory or OOM Errors.
@@Armaan_Priyadarshan
Dear Armaan,
I am now using the CPU only version of TensorFlow, and I successfully to test my installation with:
python object_detection\builders\model_builder_tf2_test.py
Thank you very much.
However, I got another problem in "Training the Model"
It spent 30 mins to proceed only one INFO and I0927, is it normal?
I am using your pictures for the tutorial.
.
.
.
Use fn_output_signature instead
INFO:tensorflow:Step 100 per-step time 17.890s loss=0.545
I0927 17:20:11.610697 6572 model_lib_v2.py:649] Step 100 per-step time 17.890s loss=0.545
INFO:tensorflow:Step 200 per-step time 18.110s loss=0.372
I0927 17:50:35.465189 6572 model_lib_v2.py:649] Step 200 per-step time 18.110s loss=0.372
INFO:tensorflow:Step 300 per-step time 18.313s loss=0.320
I0927 18:20:46.434653 6572 model_lib_v2.py:649] Step 300 per-step time 18.313s loss=0.320
INFO:tensorflow:Step 400 per-step time 18.350s loss=0.288
I0927 18:50:58.178737 6572 model_lib_v2.py:649] Step 400 per-step time 18.350s loss=0.288
.
.
.
.
.
.
li simon Hi! Yes, this is totally normal with the CPU Version as it’s quite slower than GPU. As you can see your per-step time is quite different than mine in the video which is why it’s a bit slow. Just to make sure, could you clarify which CPU and GPU you’re running with?
@@Armaan_Priyadarshan
Dear Armaan,
I am not familiar with computer and programming.
I am using an old laptop with:
-Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz 2.40 GHz
-8 GB RAM
-Intel(R) HD Graphics Family
-NVIDIA GeForce GT 730M
I think the CPU Version is too slow and it is impossible to play object detection by using it :(
Thank you very much!
li simon Hi! This makes sense. Unfortunately your GPU doesn’t support CUDA and cuDNN which is why you got OOM Errors. Your CPU is also a bit old. You might want to try out Google Colab if your laptop can’t take the workload.
hai, I follow all your code and tutorial in this video, can anybody tell me why my val_loss start is in 1.99 its decreasing and until 1.1 its not decreasing anymore its 1.4 1.1 1.3 1.2 0.9 and cannot decreasing and stable... is imbalanced dataset is a big problem? I like to convert tf.lite and detecting 5 object in realtime, my dataset is imbalanced with 1000 car , 1000 human , 1000 bicycle , 1000 motorcycle , and 336 stop sign and with testing 100 image each class except the stop sign with only 50 image maybe, I take this dataset from google open image dataset V6,
Can anybody help me?
how to add accuracy metric on the tensorboard?
Hi! This is totally normal, don’t worry about it! Each individual steps loss is unpredictable. It has nothing to do with your dataset. But when your loss is consistently between 1.5 and 2, you can stop the program. Just try waiting and seeing!
@@Armaan_Priyadarshan sorry I am new for this topic , this is my thesis / final project, and I don't know anything about this hehehe so please be patient with me hahaha,
so I think I dont understand about waiting it if it constantly 1,5 to 2,0 . sorry for bad english , so my tensorboard scalar is not overfit, its normal but like what i said before, its stabel ini 1,3. so is that normal?
so how to see that my training is well done and my model is good
do I need to add accuracy metric?
Thank youuuu
KEVIN CHRISTIAN TensorBoard can be ignored for the most part during model training unless you want to visualize the process. If your loss has an odd value, it’s most likely just an outlier. As long as it follows a continual decaying pattern, you should be fine. In the command prompt, the loss should be shown after each 100 steps. Once the loss is consistently between 1.5 and 2 for a few hundred steps your model is done. If it only goes between 1.5 and 2 once after 5 logs, that’s a different story as that would be an outlier. If it consistently goes below 1.3, that would mean it’s a bit more specific but should still be fine for testing. Just stop the program, try it out and assess the results. If you are unhappy, you can always retrain! And feel free to share your issues or ask for help! That’s what I’m here for! Good luck with your Thesis!
@@Armaan_Priyadarshan Thank you so much for the help..
I add you on discord, maybe we can chat.. because I dont have any experience in this object detection stuff
Hi Armaan. Thanks a lot for your amazing video,
When I want to Training the Model
I have this error, please help me to fixed:
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node SGD/NcclAllReduce}} with these attrs: [reduction="sum", shared_name="c1", T=DT_FLOAT, num_devices=2]
Registered devices: [CPU, GPU, XLA_CPU, XLA_GPU]
Registered kernels:
[[SGD/NcclAllReduce]] [Op:__inference__dist_train_step_209125]
Hi! Which GPU are you using? If it's a bit older with lesser memory you might want to think about changing the batch size or training on your CPU