BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs

Abhishek Thakur

Просмотров 34 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 31 дек 2024

Комментарии • 76

@deepakkumarsuresh1921 5 лет назад ⁺²
Amazing tutorial ..Thanks a lot for starting this series ..Looking forward for more such videos explaining advanced concepts..Also ,a suggestion ,if you could also provide some links to learn the theoretical part of the concepts you teach in addition to coding ,it would be really helpful.
@vishwadadhania6623 4 года назад
Very very helpful video :) thanks Abhishek!!
@wilfredomartel7781 2 года назад
Nice work!
@ChaiTimeDataScience 5 лет назад ⁺²²
"Watch the video, Try it yourself and Ask if you need help" 🙌
@oltipreka3599 4 года назад
Hello Sanyam, I asked a question in the comments of this video about the opportunity of pre-training again the language model when using a corpus of text quite different from the one used to pre-train the model. Following you, I am quite sure you have to say something to say about it :).
@abcdefghi2650 3 года назад
Thank you for this amazing tutorial!!
@wilfredomartel7781 2 года назад
Any idea of how to fine tuning bert domain adaptation to generate sentence embedding with mayor accuracy?
@oltipreka3599 4 года назад
Hi Abishek,
First, thank you for your videos. They are much appreciated as you go trough all the steps of the DL training process, and thus explain all concepts which often a given from granted.
I have a question though. it seems to me that you don't fine tune the pre-trained language model to the corpus of data you're using for the Q&A downstream task.
Don't you think such a step, suggested by Jeremy Howard will benefit the performance of the model, especially for corpus of text used in a specific domain (i.e. education, movie reviews, politics or whatever). More specifically, when tokens of the this new corpus of data are not included in the **vocab** file of Bert, they are encoded as "[UNK]" token, right? I am wondering what would happen if these "[UNK]" tokens get increasingly high.
(I know that Bert tokenizer separates words in small chunks, but say we have an Albert vocab obtained with Sentencepiece tokenizer).
So, one solution proposed by Howard is to update the pre-trained model so that it also includes the tokens specific to the new context and train them in a few steps before training the model for the downstream tasks.
Do you agree on such a solution (I mean have you ever tried), and is there a simple way to implement it for Bert(ology) models?
Thanks you in advance. ;)
@gopalb1625 4 года назад ⁺¹
You are true LEGEND in data science community
@rajeshbalakrishnan4161 5 лет назад ⁺⁸
lol am waiting for your content like I used to for Game Of thrones !! Bring more
@exotol 4 года назад ⁺²
@Abhishek Thakur Reproduced your decision, very cool. But I faced such a problem that if several TPUs are used, BERT starts to lose in quality vs single core TPU or GPU, but the learning speed increases several times. Have you ever noticed such a thing? Or maybe you know how to fix it?
@abhishekkrthakur 4 года назад
yes. adjust lr, batch size, steps to fix it. it takes a while unfortunately :(
@bernardogarciadelrio3630 3 года назад
Hi Abhisek, first of all, thank you for all the content you create. I'm specializing on NLP and your videos are really useful. Could I ask the reason why you always use Pytorch over Tensorflow? Is it just because you are more comfortable with it or is there any other advantage of using pytorch in NLP? I haven't done much DL so far so at this point I need to choose a framework and am just trying to figure out which one is better for NLP applications. I know there is big discussion about this but would like to know your opinion. Thank you in advance and keep it up!
@pavelpeskov1895 4 года назад ⁺²
Thanks for video! Why we add a spearman correlation?
@abhishekkrthakur 4 года назад ⁺²
This code was used for a kaggle competition which had spearman as the evaluation metric. That's why :)
@pedro_kangxiao2598 4 года назад ⁺¹
I followed the instruction to set up the virtual machine, but not able to import torch.
It shows: OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory.
OSE
@abhishekkrthakur 4 года назад
you have found the hidden gem. haha
so this is what u need to do:
1. choose the version of intel mkl from here: software.intel.com/en-us/mkl/choose-download
2. download the full version using wget or similar to the vm.
3. follow the on screen instructions and install it.
4. add the path to LD_LIBRARY_PATH
then it will work
@pedro_kangxiao2598 4 года назад
@@abhishekkrthakur : do you think I should use the env: condo activate torch-xla-0.5? It is fine for me to install transformers. Thank you for response, Abhishek!
@DrPepperment 4 года назад ⁺¹
This is a big overlooked step. There should be a video tut on just this install
@abhishekkrthakur 4 года назад
@@DrPepperment i agree. :)
@exploresingapore9477 3 года назад
Hi Abhishek, do you have any idea about NLP SQL, means want create api which should accepts English sentence and convert into SQL queries. Any inbuilt library available?
@antojaa 5 лет назад
Thank you for such great videos, Abhishek! It would be very helpful to see one about how you do model tuning in ml framework.
@vijayendrasdm 4 года назад
Hi Abhishek
Can we prepare custom data inputs using tokenizers and use BERT for it ?
You create data something like
[CLS] [Q-Title] [Q-Body] [SEP] [Answer] [SEP].
Why not create [CLS] [Q-Title] [SEP] [Q-Body] [SEP] [Answer] [SEP] . or something else?
Also how I see the dataset has potential other features. can we append them?
@aungmyat5497 3 года назад
I am new to NLP and My country language is not included in tools like NLTK and spacy and so on. How can I do that and where can I start to learn? Can you give an advice please?
@kvp9553 3 года назад
Hi Abhishek!! Great work:)
May I know whether continuous retraining is possible using BERT?
i.e.,I have a fine-tuned model,Can I further tune it using additional dtaset without merging the new dataset with the old.
@hanman5195 4 года назад
Please help me in setting up Bert in local machine like installing and import required libraries.
@NP-hf5ri 3 года назад
Hello Abhishek, Thanks a lot for the video, its really very helpful.
I am working on a Messages classification task, but I dont have target values for it. Is it somehow possibe to finetune bert on availabe dataset for a Unsupervissed learning task? I was thinking of getting sentence embeddings from bert and then use these embeddings further with some clustering technique to cluster the similar messages.
But I am not sure how i can finetune bert on my dataset for this task. Thank you.
@c0mmment 5 лет назад
Do you ever use position_ids?
does google colab offer multiple nodes for tpu?
how to know the TPU Ip address if on google colab?
@moruyelawrence8806 5 лет назад
Excellent videos!
@vikktorhugoboss 4 года назад ⁺¹
Hello, thank you for this tutorial, one question:
Have you uploaded bert files on tpu cloud machine?
@abhishekkrthakur 4 года назад
yes
@rt-odsc8270 4 года назад ⁺²
I can't thank you enough. I really appreciate your time and efforts. If you have patron page please let us know.
@abhishekkrthakur 4 года назад ⁺¹⁰
Everything is for free :) No patron! Thank you!
@rt-odsc8270 4 года назад ⁺¹
Abhishek Thakur then I am going to buy all your books.
@Proprogrammer001 4 года назад
@@rt-odsc8270 Aw, wholesome threat lol
@aqsa8599 4 года назад
how can i fine tune bert non english language and then classification?
@yassinealouini6658 5 лет назад
Thanks for the great tutorial. Can you share how you get the TPU credit?
@ShekharPrasadRajak 3 года назад
Thank you so much Abhishek for amazing videos. I would like to use ALBERT instead of RoBerta or basic BERT. Can you please suggest, how should I approach ? In huggingface most of the code looks very much same but the performance and loss function value is not that good in ALBERT model v2. Is that the case you use BERT most of the time and not ALBERT ?
Thanks!
@sagnikroy6405 2 года назад ⁺¹
Tooo much introductory videos in YT. Nobody teaches after that. Here you come, Thanks @Abhishek :)
@machinelearning3518 2 года назад
Correct bro
@machinelearning3518 2 года назад
What are you doing, can we connect?
@riasingh2558 4 года назад
This is a helpful video for BERT fine-tuning! Can you also upload such task fine-tuning (say, text classification) using Transformer-XL and Longformer? Thanks, again!
@filipd8734 4 года назад ⁺¹
Thank you for sharing this amazing video!!! Running the code as per this tutorial , I have a training reproducibility problem and results differ between different runs. I am setting the seed as velow :
def seed_everything(seed: int):
random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

Would you share how could I reproduce results after training?
@anishjain3663 4 года назад
Bhai i am try to make this hindi quotes generator using bert so what should be approch , and i try this for about 3 days 12 hrs but did't get anything working, i get demotivated how keep up with things thanks for your reply and help
@stephennfernandes 5 лет назад
Am amazing video Sir , thank you so much !!!
@xumingwang1255 5 лет назад
Thank you !
@souvikpal8436 3 года назад
well I might sound stupid but difference between pytorch and pytorch nightly anyone pls
@王晓康-y6g 4 года назад
Thank you for kindly publishing this tutorial. I tried to run this tutorial on a TPU after I run it on a GPU. But when I run the code on a TPU. It got stuck in the following error:
/device:TPU:5
2020-01-29 10:10:54.036518: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:217] XRT device (LOCAL) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2020-01-29 10:10:54.036536: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:217] XRT device (LOCAL) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2020-01-29 10:10:54.036552: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:221] Worker grpc://34.90.164.126:8470 for /job:tpu_worker/replica:0/task:0
2020-01-29 10:10:54.036604: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:225] XRT default device: TPU:0
2020-01-29 10:10:54.036631: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1114] Configuring TPU for master worker tpu_worker:0 at grpc://34.90.164.126:8470
2020-01-29 10:10:54.055859: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "ErfinvGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_DOUBLE } } }') for unknown op: ErfinvGrad
2020-01-29 10:10:54.055955: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "ErfinvGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }') for unknown op: ErfinvGrad
2020-01-29 10:10:54.055976: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "NdtriGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_DOUBLE } } }') for unknown op: NdtriGrad
2020-01-29 10:10:54.056005: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "NdtriGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }') for unknown op: NdtriGrad
@王晓康-y6g 4 года назад
FYI, I run the container by these two commands: docker pull gcr.io/tpu-pytorch/xla:r0.5; docker run -it --shm-size 16G gcr.io/tpu-pytorch/xla:r0.5
@tamvominh3272 4 года назад
Thank you for your very helpful video. I would like to ask you a couple of questions about BERT. Is it possible to do fine-tuning BERT for a regression task? And if it is possible, how can I modify the code for this purpose, I mean which part I need to keep, which part I need to change? I have followed some instructions but it hasn't worked, and as I know, there is no clear instruction for the regression task using BERT. Thank you so much.
@tamvominh3272 4 года назад
Additionally, my data looks like this:
input - a sentence: "I really like it"
output - a label/score: "2.5"
I really hope to receive your instruction! Thanks.
@favor2012able 4 года назад
Thanks for the great video @Abhishek Thakur! At 44:18, can u show how u fix the error? I think u fixed it in the terminal but the screen was showing VS code.
@logeshbalasubramanian7901 4 года назад
Hi! Nice video!! I have a doubt. Is it possible to apply the BERT or any other model for text generation using keywords? For instance, keywords: Sam dog house
Sentence: Sam along with his dog lives in a beautiful house.
@pythondatascience6601 4 года назад
You can't use BERT for text generation. As BERT don;'t deal with language modeling it wont support language translation either.
@logeshbalasubramanian7901 4 года назад
Ya i found that. Also i found how to use gpt-2 to generate text from keywords.thanks for reply
@harshabongle4564 3 года назад
@@logeshbalasubramanian7901 I am working on a similar problem. Could you please tell how you acheived it using GPT-2 ?
@tapanjain9213 4 года назад
HI Abhishek, Could you point me to few resources on BERT to go through before going through the video.
@sarojchhatoi1307 4 года назад
ruclips.net/video/OyFJWRnt_AY/видео.html This has an amazing explanation of transformer based models
@mayukh_ 4 года назад
Hey Abishek, can you please write a dlframework similar to mlframework, using Tensorflow 2.0 and Tensorflow dataset API? Please think about making it as it can be really useful.
@abhishekkrthakur 4 года назад ⁺¹
mlframework will convert to dlframework eventually. :)
@rakshittherakki99 4 года назад
Hi Abhishek, I tried to replicate the whole code but i was getting this error in loss function
@rakshittherakki99 4 года назад
Resolved the error, i was sending complete target at once instead of sending like self.targets[item] in __getitem__ function.
I used free credits in gcp as it has some limit on number of cores, so was not able to run for batch_size=32 and max_len=512, so reduced length to 256 and bam , code run successfully :)
Thank you for this great video
@Samarpanrai94 4 года назад
What's your IDE?
@abuubaida9812 2 года назад
chnages for TPU: 40:20
@saikrishna3591 4 года назад
I've tried to write the same code , when I run at nprocs = 8 it throws an error like (Exception: process 0 terminated with exit code 1) , when I run at nprocs = 1 it works ! I'm trying it on colab . Single TPU core and cuda one worked fine !
I've setup environment like this ->
VERSION = "20200325" #@param ["1.5" , "20200325", "nightly"]
!curl raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION
Thanks for Videos man ! they help !
@abhishekkrthakur 4 года назад ⁺¹
pytorch nightly is having issues for a while now. see some solutions here: github.com/pytorch/xla/issues/1927
@ojulhao 4 года назад
Did you solve it? I'm having the same issue with Colab
@vishwadadhania6623 4 года назад
@@ojulhao For running in colab >> for function call xmp.spawn() you need to pass in one extra argument
start_method = 'fork'
because colab support only "forking" of processes instead of "spawning" while creating new processes.
@TheAnubhav27 5 лет назад ⁺²
Is there a poor man's version for this video? :P
@abhishekkrthakur 5 лет назад
is there something i was unable to explain properly?
@TheAnubhav27 5 лет назад
@@abhishekkrthakur No I meant is there cheaper GPU or TPU options to learn deep learning. Compute power is costly.
@imranfool 5 лет назад
@@TheAnubhav27 You have access to free GPU and TPU via Google Colab. But if you want to boost the training speed on GPU, try NVIDIA Apex.

Следующие

Автовоспроизведение

Training BERT Language Model From Scratch On TPUs