Important point to note here is if the data is not balanced, one should use other metrics like AU-PRC score instead of the AU-ROC. But here, we took care of balancing in the preprocessing stage, so it is fine to use ROC as the metric.
Hi! Where can I find the notebook of this episode I could not find it on the github link provided above, and yes please we want to know how to train on multiple GPUs and then how to use TPUs thanks! I am struggling with callbacks and did not understand why do we need them
He Venelin, what would be the benefit of using BCEloss here instead of CrossEntropy, which is more often used for multilabel classification? Im using class weights instead of balancing the dataset
Hi Venelin! One question. Why do you subtract warmup_steps from steps_per_epoch * num_epochs when obtaining the total_steps? Shouldn't the total steps, include the warmup steps as well?
Amazing content. Could you bring what you have built already, and adjust the classes, into adding new features ? Having a custom architecture, using the lightning module, where we have BERT's last layer concatenating it with other numerical features, then feeding that vector into a FC layer as the output? Would really appreciate this. I have a project like this and would solve so much time.
Nice series! It really helped me get up to speed with PyTorch Lightning. However, I found one problem with your code. The way you are returning the optimizers and the schedulers in configure_optimizers is not the proper way if you want to use a scheduler. The reason being that by default pytorch_lightning updates the scheduler once per epoch. If you want to update it every batch (as intended with the warmup) you need to use the following format: return { 'optimizer': optimizer, 'lr_scheduler': { 'scheduler': scheduler, 'interval': 'step' } } The 'interval': 'step' tells the train to adjust the optimizer every batch step instead of every epoch. Another small thing is that you don't have to manually call the DataModule.setup() method. Pytorch Lightning will do that for you on every GPU if you are using an accelerator.
Hey Venelin, I tried ur same code for DistilBERT, when I run the line - > output = self.classifier(output.pooler_output), I get error that "'BaseModelOutput' object has no attribute 'pooler_output' " ...How to rectify this? What changes I need to make to make same code work with DistilBERT?
@ 07:39, ok, I get truncating the samples because they are unbalanced, but why 15k v 15k? You have 6 categories of 'unclean' comments. If you want balance, shouldn't the clean comments in the training set be smaller, say, the mean size of the unclean? I hope you can explain. Thanks.
I am not really trying to balance the dataset. I just wanted to train faster, so I can experiment faster. The sampling reduced the dataset (by a lot) and we still got good results on the unchanged validation set. Feel free to balance it and let me know what results you got. Thanks for watching!
This is awesome - thank you so much. Quick question for anyone: this tutorial is using the base BertModel and building the classifier head "from scratch" rather than using BertForSequenceClassification. If I wanted to tweak the model parameters e.g. number of units / layers, is it sufficient to edit the following lines of code? class ToxicCommentTagger(pl.LightningModule): def __init__(self, n_classes: int, n_training_steps=None, n_warmup_steps=None): super().__init__() self.bert = BertModel.from_pretrained(BERT_MODEL_NAME, return_dict=True) self.linear1 = nn.Linear(self.bert.config.hidden_size, 512) # (!!) I added this linear layer in between the input layer self.bert and the output layer self.classifier self.classifier = nn.Linear(512, n_classes) # (!!) changed this line to fit the # units of my linear1 layer self.n_training_steps = n_training_steps self.n_warmup_steps = n_warmup_steps self.criterion = nn.BCELoss() def forward(self, input_ids, attention_mask, labels=None): output = self.bert(input_ids, attention_mask=attention_mask) output = self.linear1(output.pooler_output) # (!!) output goes through input->linear1->classifier now output = self.classifier(output) output = torch.sigmoid(output) loss = 0 if labels is not None: loss = self.criterion(output, labels) return loss, output Thanks for any help; love your videos.
Hi! Thank you a lot for your guide videos!! I didn't find that notebook anywhere on your sources. I just want to kindly ask you to share this notebook with me if you think it is possible. It would be very helpful for my research project!
When I execute train.fit(model, data_module), it raise ValueError: The `target` has to be an integer tensor. I try to modify labels=torch.FloatTensor(labels) to labels=torch.IntTensor(labels), it raise RuntimeError: Found dtype Int but expected Float, how to deal with this problem?
Complete tutorial (including Jupyter notebook):
curiousily.com/posts/multi-label-text-classification-with-bert-and-pytorch-lightning/
Important point to note here is if the data is not balanced, one should use other metrics like AU-PRC score instead of the AU-ROC. But here, we took care of balancing in the preprocessing stage, so it is fine to use ROC as the metric.
This helped me out a lot, best BeRT text classification fine tuning resource out there
Superb BERT guide :) So far the best BERT guide in RUclips yet. Please make Video regarding Named entity Recognition using BERT.
Very Nice!, Can you pleeease also make a fine-tuning video on how to add domain-specific, text (medical,law, finance etc) to BERT?
How can I use the model to run predictions using a column of text descriptions?
Thanks! and please continue making videos and blogs they are very helpful...
hey are you the guy from codeemporium
Hi! Where can I find the notebook of this episode I could not find it on the github link provided above, and yes please we want to know how to train on multiple GPUs
and then how to use TPUs thanks! I am struggling with callbacks and did not understand why do we need them
Fun! It will be great if you deep dive into Pytorch lightning!
He Venelin, what would be the benefit of using BCEloss here instead of CrossEntropy, which is more often used for multilabel classification? Im using class weights instead of balancing the dataset
Thanks for the great video , can I have the colab link of the notebook please?
Excellent video, very clear, thank you so much. Would love to see more!
Thank you, great video ~ how to log hparams on tensorboard?
Hi Venelin! One question. Why do you subtract warmup_steps from steps_per_epoch * num_epochs when obtaining the total_steps? Shouldn't the total steps, include the warmup steps as well?
Amazing content. Could you bring what you have built already, and adjust the classes, into adding new features ?
Having a custom architecture, using the lightning module, where we have BERT's last layer concatenating it with other numerical features, then feeding that vector into a FC layer as the output?
Would really appreciate this.
I have a project like this and would solve so much time.
Nice series! It really helped me get up to speed with PyTorch Lightning. However, I found one problem with your code. The way you are returning the optimizers and the schedulers in configure_optimizers is not the proper way if you want to use a scheduler. The reason being that by default pytorch_lightning updates the scheduler once per epoch. If you want to update it every batch (as intended with the warmup) you need to use the following format:
return {
'optimizer': optimizer,
'lr_scheduler': {
'scheduler': scheduler,
'interval': 'step'
}
}
The 'interval': 'step' tells the train to adjust the optimizer every batch step instead of every epoch.
Another small thing is that you don't have to manually call the DataModule.setup() method. Pytorch Lightning will do that for you on every GPU if you are using an accelerator.
Both issues are addressed in the text/notebook tutorial. Thanks for the tips!
Thanks a lot for sharing your knowledge, it is amazing content. :) I advise to anyone who wants to get better understanding on fine tuning a LLM.
Hey Venelin,
I tried ur same code for DistilBERT, when I run the line - > output = self.classifier(output.pooler_output), I get error that "'BaseModelOutput' object has no attribute 'pooler_output' " ...How to rectify this? What changes I need to make to make same code work with DistilBERT?
Awesome video! Definitely interested in the deep dive on pytorch lightning!
How do you save this model? an answer would be much appreciated
@ 07:39, ok, I get truncating the samples because they are unbalanced, but why 15k v 15k? You have 6 categories of 'unclean' comments. If you want balance, shouldn't the clean comments in the training set be smaller, say, the mean size of the unclean? I hope you can explain. Thanks.
I am not really trying to balance the dataset. I just wanted to train faster, so I can experiment faster. The sampling reduced the dataset (by a lot) and we still got good results on the unchanged validation set.
Feel free to balance it and let me know what results you got. Thanks for watching!
This is awesome - thank you so much. Quick question for anyone: this tutorial is using the base BertModel and building the classifier head "from scratch" rather than using BertForSequenceClassification. If I wanted to tweak the model parameters e.g. number of units / layers, is it sufficient to edit the following lines of code?
class ToxicCommentTagger(pl.LightningModule):
def __init__(self, n_classes: int, n_training_steps=None, n_warmup_steps=None):
super().__init__()
self.bert = BertModel.from_pretrained(BERT_MODEL_NAME, return_dict=True)
self.linear1 = nn.Linear(self.bert.config.hidden_size, 512) # (!!) I added this linear layer in between the input layer self.bert and the output layer self.classifier
self.classifier = nn.Linear(512, n_classes) # (!!) changed this line to fit the # units of my linear1 layer
self.n_training_steps = n_training_steps
self.n_warmup_steps = n_warmup_steps
self.criterion = nn.BCELoss()
def forward(self, input_ids, attention_mask, labels=None):
output = self.bert(input_ids, attention_mask=attention_mask)
output = self.linear1(output.pooler_output) # (!!) output goes through input->linear1->classifier now
output = self.classifier(output)
output = torch.sigmoid(output)
loss = 0
if labels is not None:
loss = self.criterion(output, labels)
return loss, output
Thanks for any help; love your videos.
Hi! Thank you a lot for your guide videos!!
I didn't find that notebook anywhere on your sources. I just want to kindly ask you to share this notebook with me if you think it is possible. It would be very helpful for my research project!
Thank you so much sir! God bless you and your family!
how to get overall accuracy of the test set
Thank you 😊 you are doing great work .
When I execute train.fit(model, data_module), it raise ValueError: The `target` has to be an integer tensor. I try to modify labels=torch.FloatTensor(labels) to labels=torch.IntTensor(labels), it raise RuntimeError: Found dtype Int but expected Float, how to deal with this problem?
Could you please explain both using TPU and GPU in the same tutorial?
I tried running this notebook but the Colab crashes after using all the available RAM.
Great Insight into huggingface library! One question... does the bert layers also finetune or is just the linear layers?
I think the bert layers are also finetuned
so greeeeeeat