Object detection Using Detection Transformer (Detr) on custom dataset
HTML-код
- Опубликовано: 6 окт 2024
- Step by step implementation of Object detection Using Detection Transformer (Detr) on custom dataset.
Github: github.com/Aar...
Dataset: universe.robof...
*********************************************************************
For queries: You can comment in comment section or you can mail me at aarohisingla1987@gmail.com
********************************************************************
DETR stands for "DEtection TRansformer," is a object detection model that uses a transformer architecture.
It was introduced in a research paper titled "End-to-End Object Detection with Transformers," published by researchers from Facebook AI Research (FAIR) in 2020.
CNN extracts features and then send them to transformer for relationship modeling and then obtained output is matches with the ground truth on the picture using bipartite graph matching algorithm.
The features extracted by CNN are flattened and then positional encoding is added to obtain the sequence features which are then fed to transformer encoder.
Each encoder layer contains self attention mechanism and each decoder contains self attention and cross-attention.
#transformers #detr #computervision
Thanks for making me finally understand Detection transformers.
Glad I could help!
Thanks again for the series your explanation made me to understand many things
Glad to hear that!
This code is incomplete with many issues! I can't believe you can run this training.
The code is complete. It would be better if you share the issues you are facing and I will guide you how to run it successfully.
Hi arohi, I am getting this error while doing the training part, rest of the errors I have solved as there is lot of missing code in this but this one I was not able to solve :
NameError Traceback (most recent call last)
in ()
----> 1 model = Detr(lr=1e-4, lr_backbone=1e-5, weight_decay=1e-4)
2
3 batch = next(iter(TRAIN_DATALOADER))
4 outputs = model(pixel_values=batch['pixel_values'], pixel_mask=batch['pixel_mask'])
in __init__(self, lr, lr_backbone, weight_decay)
9 super().__init__()
10 self.model = DetrForObjectDetection.from_pretrained(
---> 11 pretrained_model_name_or_path=CHECKPOINT,
12 num_labels=len(id2label),
13 ignore_mismatched_sizes=True
NameError: name 'CHECKPOINT' is not defined
Hi,
Great tutorial .
The training process take a lots of time . About 7 hours.
How to you estimate the number of Epochs ?
Eran
There is no one-size-fits-all answer for the number of epochs, and experimentation is often necessary. You may start with a reasonable number of epochs, monitor the training process, and make adjustments based on the observed behavior.
Additionally, training time can be influenced by hardware specifications, such as the type of GPU used and the size of the batch size.
Great! very cool :) ... do you have any videos for detecting objects from videos using ViT pretained model or custom dataset.
While running the code, I am getting an error NameError: name 'CHECKPOINT' is not defined .....model = Detr(lr=1e-4, lr_backbone=1e-5, weight_decay=1e-4) in this line . How to fix this issue?
Hi maam,thankyou for this tutorial !
Can you make videos on Video vision transformer and Audio spectrogram transformer using custom datasets?
Will try
@@CodeWithAarohi in general ,When we are selecting a dataset do we have to consider in which annotations the dataset is made wheather it is csv,text or json etc? Or can we select a dataset without considering that ma’am? Can you explain how please 🙏
@@CodeWithAarohi
Can the lady upload a video about estimating the volume of a pile of dirt?
It can be a pile of anything like sugar, sand, corn 🌽 or dollar 💵 bills .
detections = sv.Detections.from_coco_annotations(coco_annotation=annotations)
NameError: name 'sv' is not defined
import supervision as sv
Mam please upload a video on the topic "Object detection using CNN and transformers" on any custom dataset. i am finding it difficult to work on this project as an final year student from nit..
I will try.
Very knowledgeable video, keep sharing mam all these valuable stuff
Sure 😊
Thanks for this amazing tutorial.Maam,could you please make a tutorial for the custom training of HAT model for image restoration.(HAT: Hybrid Attention Transformer for Image Restoration)
I hope u will consider my request
Sure
@@CodeWithAarohiMaam, possibly how much days you would take to make a video on that because i have to submit my project related to it, as i got some error in its training.
I will be very thankful to you ,if you make the tutorial asap.
can you please make a video about how to handle polygon annotations?
Noted. I will try to cover it.
@@CodeWithAarohi
I am very interested in the polygon annotation video.
What code would you use to train the neural nets? Yolo? Tensorflow ?
A video of how to do image detection and the LabelME annotation software will be very nice.
Thank you very much for this tutorial. I tried to replicate it. But i got the error
"NameError: name 'image_processor' is not defined"
while trying to run the following line
"TRAIN_DATASET = CocoDetection(image_directory_path=TRAIN_DIRECTORY, image_processor=image_processor, train=True)".
Did anyone of you have the same problem? How did you fix it? As what do I need to initialize image_preprocessor beforehand?
I did this:
from transformers import DetrImageProcessor
image_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
In these lines of code, we just process the image in the way that DETR requires. Let me know if worked for you :D
@@luizakazawa3577 i have new error :
FileNotFoundError: [Errno 2] No such file or directory: 'D:/transformer_detr_env/bone_fracture_coco/train/annotations.json'
@@luizakazawa3577 I'm running 20 epochs, but it says 005121…jpg image not found in the train folder. but when I look in the folder there are those images, why is that?
@@1907hasancan is the variable TRAIN_DIRECTORY equals to your train folder directory?
@@luizakazawa3577 yes bro
In save and load part ı have a error. ıt said model.device is not defined. What can ı do for this error? Can u help me?
Thank you for the tutorial. Please, how can we generate the confusion matrix, and also the F1-score, recall and mAP? I look forward to your reply soon.
Will try to cover that in next video.
@@CodeWithAarohi please make a video regarding that.
Thank you soo much mam for this amazing video
Most welcome 😊
This is the only channel one only needs to become an expert on object detection and computer vision.
👏 great video
thank you for this video
i would like to ask you if i want to continue training the model from the last epoch what should i do
Yesterday I implemented the code of vision transformers image classification with my datasets but final running the datasets it shows errors, using pytorch cpu version, madam
Maximum I am implementing your codes its works
Can you make video on conditional gan or conditional dcgan, you did dcgan madam
I will cover dc-gan after finishing my pipelined work.
Thank you madam
Do image datasets like dermatology related datasets not mnist or cifar10 datasets and please work it on tensorflow or keras madam
When i was making more researches on the different types of vision transformers i saw the Vanilla vision transformers but i read the paper and didn't really understand. Can you help me for a tutorial on that please
The terms "Vanilla Vision Transformers" and "Vision Transformers" are often used interchangeably, and both refer to the same fundamental concept which is applying the Transformer architecture directly to image data for computer vision tasks.
hi, in i cant run because AttributeError: type object 'Detections' has no attribute 'from_coco_annotations' in
detections = sv.Detections.from_coco_annotations(coco_annotation=annotations)
how can u help me, thank you so much
Thank you very much! Why we need it please if we have YOLOv8 for example?
YOLOv8 is faster, more efficient, simpler to implement, and easier to deploy on resource-constrained devices compared to transformer-based models. YOLOv8 is ideal for real-time applications and systems with limited computational resources, while transformers, though potentially more accurate, are more complex and resource-intensive. The choice depends on the specific requirements of the task.
@@CodeWithAarohi for very small or objects that are ocludded across the scene and we want to make sure we detect all parts and finally detect if all parts of the objects are in correct place based on colors as well, which one you advice to implement?
Thanks!! What is diference of this to yolo?
DETR (Detection Transformer) and YOLO (You Only Look Once) are both object detection algorithms, but they use different approaches and architectures to achieve their goals.
DETR is a transformer-based architecture for object detection.
YOLO is a family of object detection algorithms that divides the input image into a grid and predicts bounding boxes and class probabilities directly from the grid cells.
DETR treats object detection as a set prediction problem. It predicts the coordinates of object bounding boxes directly, along with class labels. It uses a bipartite matching mechanism to associate predicted boxes with ground truth boxes during training.
YOLO predicts bounding boxes, class probabilities, and confidence scores for each grid cell. It uses non-maximum suppression to refine and select the final set of detections from overlapping predictions.
@@CodeWithAarohi
Great explanation. Thanks.
I had the same question.
Mam, is there any way to get the precision recall values for vision transformer after training?
What is sv in box_annotator = sv.BoxAnnotator() ? A lot of place it's initialized but I could not find the reference to any module! Thanks in advance.
supervision module
hi i want Use DETR network to predict and calculate the missed detection rate and could you tell me how to finlish it? thanks!
Does the summary function works for model containing transformer layers cause in my case its showing error
This code is not running. error
TypeError: 'NoneType' object is not subscriptable
I'm getting error - "NameError: name 'image_processor' is not defined"
from transformers import DetrImageProcessor
image_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
Hi Aarohi. Can you do a video that detects sequential video using Transformers?
I will try.
share code for detr model evaluation and how to print confusion matrix also share code for the same
Hy MAM
I want to make an project on image classification using vit on desease prediction and I want to know what skill we have before starting the project and what software and skills we need to create this project pls guide me
Basics of Machine Learning and Deep Learning: Understand neural networks, training, loss functions, and optimization.
Python Programming: Learn Python and libraries like NumPy and Pandas for data manipulation.
Deep Learning Frameworks: Get familiar with TensorFlow or PyTorch for building and training models.
Convolutional Neural Networks (CNNs): Learn about CNNs, the common architecture for image tasks.
Computer Vision Basics: Understand image preprocessing, augmentation, and normalization.
Image Classification Fundamentals: Learn about data splitting, evaluation metrics, and model performance.
Vision Transformers (ViTs): Study ViT architecture, self-attention mechanisms, and differences from CNNs.
@@CodeWithAarohi Cnn And Vit both are similar things? We use any one of them ? And how I take input image ?
Mam can you please tell me from where you have downloaded the dataset??i want to run the same dataset which you have used here..so that i can work on a bigger dataset after this..
universe.roboflow.com/enrico-garaiman/flowers-y6mda/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true
@@CodeWithAarohi thank you mam
Hello mam, do we have to write all these code or we get it from the open-source repos?
You can clone the repo and work.
Hello Madam,
Nice Video.When I try to run this code I get the following error stating NameError: name 'CHECKPOINT' is not defined.Can you kindly explain why I might be getting this error?
bro did u find solution for this error
Try directly the model name , replace below:
self.model = DetrForObjectDetection.from_pretrained(
pretrained_model_name_or_path=CHECKPOINT,
num_labels=len(id2label),
ignore_mismatched_sizes=True
)
with
self.model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50",
num_labels=len(id2label),
ignore_mismatched_sizes=True)
thank you sir. Yeah it works@@amitparmar8076
One more thing sir. Can we make more than two directories like daisy and dandelion in both test and train sets. I mean what if we have more than two classes@@amitparmar8076
Hi Aarohi when implementing your code in colab I noticed two things first I did not find annotation file second, I got an error in this statement
TRAIN_DATASET = CocoDetection(image_directory_path=TRAIN_DIRECTORY, image_processor=image_processor, train=True)
image_processor not found I tried to find it but was not found
from transformers import DetrImageProcessor
image_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
Hi, thank you for this tutorial, I have a question about segmentation and tracking. Is there a tracking algorithm with takes segmentation mask as input and shows segmentation instead of bounding box in the output? Thank you
Not that I know of
Trackformer
If you can give an answer for me ı will be pleased ı want to ask something for your object detection with detr videos. In last step, I get an error like "'NoneType' object has no attribute 'copy'". What can ı do for this error?
Hi, does the model accept polygonal annotations ? Because they are better than rectangles
DETR is designed to work with bounding box annotations, So you can't work with polygonal annotations.
@@CodeWithAarohi I mean for training
How we can use Transformers models as 3D detection. Kitti datasets like point cloud datasets?
Never tried transformers on 3D dataset.
Where is image_processor? You didn't define it
from transformers import DetrImageProcessor
image_processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
I'm running 20 epochs, but it says 005121…jpg image not found in the train folder. but when I look in the folder there are those images, why is that?
Working up to 8 epochs then not working
Hi I’m working with DETR for mitosis detection and my losses are decreasing very slowly. So I was thinking maybe modifying learning rate will help. But as soon as I change the learning rate in the model instantiation , the model training doesn’t work. So is there any way I can change the learning rate?
Try increasing the batch size
Very well explained
Glad it was helpful!
Can you please show how your training loss and validation loss change? Because when I run this training, the training loss changes very little in each epoch(
Thank you, What is the link to the dataset ?
You can download from: universe.roboflow.com/roboflow-100/bone-fracture-7fylg
Thanks.
Heyy Mam, Iam getting this error
NameError: name 'CHECKPOINT' is not defined .....plzz help me
solve hua?
bro is it get solved?
Replace that section with:
self.model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50",
num_labels=len(id2label),
ignore_mismatched_sizes=True
)
It should work!
Hi, in your model , it was pretrained ? thank you.
Yes it was
@@CodeWithAarohi thank u, i have the error : AttributeError: type object 'Detections' has no attribute 'from_coco_annotations' in
detections = sv.Detections.from_coco_annotations(coco_annotation=annotations) thank you so much
where can i download the dataset?
Took it from roboflow
# annotate
detections = sv.Detections.from_coco_annotations(coco_annotation=annotations)
NameError: name 'sv' is not defined
anyone can help?
Install supervision module using pip and then import it as “import supervision as sv”
@@CodeWithAarohi thank you mam
hi, ı have a dataset and my folders are train and validation. Is the test folder mandatory?
No
Amazing
Thanks
how to see the summary of the model
Follow these steps:
pip install torchsummary
import torch
from torchsummary import summary
from torchvision.models.detection import detr
# Create an instance of the DETR model
detr_model = detr.detr_resnet50(num_classes=91, pretrained_backbone=True)
# Move the model to the desired device if using GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
detr_model.to(device)
# Print the summary of the model
summary(detr_model, input_size=(3, 800, 800)) # Assuming input image size is 800x800
Make sure to adjust the input_size parameter according to the size of images your model will process.
@@CodeWithAarohi how to know what input size my model will process
@@CodeWithAarohi ImportError: cannot import name 'detr' from 'torchvision.models.detection' (C:\Users\ankit\anaconda3\lib\site-packages\torchvision\models\detection\__init__.py), its showing error
I want this dataset. can u provide the link?
Download it from roboflow.
Tell me exact name of data set
can u show your config file and pytorch
My pytorch version is 1.13.1 and it is compiled with cuda 11.7
Super Akka☺
Thanks
👌👌
Thanks!
thank you
You're welcome
I'm running 20 epochs, but it says 005121…jpg image not found in the train folder. but when I look in the folder there are those images, why is that?
Mam, there is no Custom_model file in GitHub
you will get custom_model folder when you start training.
Hi ,
My name is Ajesh Ashok.I am a college professor , doing research in the area of Vision transformer.Could i get your email id so that i can contact you for more information regarding the area .
aarohisingla1987@gmail.com
Thank you soo much mam for this amazing video
Keep watching
@@CodeWithAarohi Mam,I am getting an error NameError: name 'CHECKPOINT' is not defined .....model = Detr(lr=1e-4, lr_backbone=1e-5, weight_decay=1e-4) in this line
Try directly the model name , replace below:
self.model = DetrForObjectDetection.from_pretrained(
pretrained_model_name_or_path=CHECKPOINT,
num_labels=len(id2label),
ignore_mismatched_sizes=True
)
with
self.model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50",
num_labels=len(id2label),
ignore_mismatched_sizes=True)