Amazing tutorial ..Thanks a lot for starting this series ..Looking forward for more such videos explaining advanced concepts..Also ,a suggestion ,if you could also provide some links to learn the theoretical part of the concepts you teach in addition to coding ,it would be really helpful.
Hello Sanyam, I asked a question in the comments of this video about the opportunity of pre-training again the language model when using a corpus of text quite different from the one used to pre-train the model. Following you, I am quite sure you have to say something to say about it :).
Hi Abishek, First, thank you for your videos. They are much appreciated as you go trough all the steps of the DL training process, and thus explain all concepts which often a given from granted. I have a question though. it seems to me that you don't fine tune the pre-trained language model to the corpus of data you're using for the Q&A downstream task. Don't you think such a step, suggested by Jeremy Howard will benefit the performance of the model, especially for corpus of text used in a specific domain (i.e. education, movie reviews, politics or whatever). More specifically, when tokens of the this new corpus of data are not included in the **vocab** file of Bert, they are encoded as "[UNK]" token, right? I am wondering what would happen if these "[UNK]" tokens get increasingly high. (I know that Bert tokenizer separates words in small chunks, but say we have an Albert vocab obtained with Sentencepiece tokenizer). So, one solution proposed by Howard is to update the pre-trained model so that it also includes the tokens specific to the new context and train them in a few steps before training the model for the downstream tasks. Do you agree on such a solution (I mean have you ever tried), and is there a simple way to implement it for Bert(ology) models? Thanks you in advance. ;)
@Abhishek Thakur Reproduced your decision, very cool. But I faced such a problem that if several TPUs are used, BERT starts to lose in quality vs single core TPU or GPU, but the learning speed increases several times. Have you ever noticed such a thing? Or maybe you know how to fix it?
Hi Abhisek, first of all, thank you for all the content you create. I'm specializing on NLP and your videos are really useful. Could I ask the reason why you always use Pytorch over Tensorflow? Is it just because you are more comfortable with it or is there any other advantage of using pytorch in NLP? I haven't done much DL so far so at this point I need to choose a framework and am just trying to figure out which one is better for NLP applications. I know there is big discussion about this but would like to know your opinion. Thank you in advance and keep it up!
I followed the instruction to set up the virtual machine, but not able to import torch. It shows: OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory. OSE
you have found the hidden gem. haha so this is what u need to do: 1. choose the version of intel mkl from here: software.intel.com/en-us/mkl/choose-download 2. download the full version using wget or similar to the vm. 3. follow the on screen instructions and install it. 4. add the path to LD_LIBRARY_PATH then it will work
@@abhishekkrthakur : do you think I should use the env: condo activate torch-xla-0.5? It is fine for me to install transformers. Thank you for response, Abhishek!
Hi Abhishek, do you have any idea about NLP SQL, means want create api which should accepts English sentence and convert into SQL queries. Any inbuilt library available?
Hi Abhishek Can we prepare custom data inputs using tokenizers and use BERT for it ? You create data something like [CLS] [Q-Title] [Q-Body] [SEP] [Answer] [SEP]. Why not create [CLS] [Q-Title] [SEP] [Q-Body] [SEP] [Answer] [SEP] . or something else? Also how I see the dataset has potential other features. can we append them?
I am new to NLP and My country language is not included in tools like NLTK and spacy and so on. How can I do that and where can I start to learn? Can you give an advice please?
Hi Abhishek!! Great work:) May I know whether continuous retraining is possible using BERT? i.e.,I have a fine-tuned model,Can I further tune it using additional dtaset without merging the new dataset with the old.
Hello Abhishek, Thanks a lot for the video, its really very helpful. I am working on a Messages classification task, but I dont have target values for it. Is it somehow possibe to finetune bert on availabe dataset for a Unsupervissed learning task? I was thinking of getting sentence embeddings from bert and then use these embeddings further with some clustering technique to cluster the similar messages. But I am not sure how i can finetune bert on my dataset for this task. Thank you.
Thank you so much Abhishek for amazing videos. I would like to use ALBERT instead of RoBerta or basic BERT. Can you please suggest, how should I approach ? In huggingface most of the code looks very much same but the performance and loss function value is not that good in ALBERT model v2. Is that the case you use BERT most of the time and not ALBERT ? Thanks!
This is a helpful video for BERT fine-tuning! Can you also upload such task fine-tuning (say, text classification) using Transformer-XL and Longformer? Thanks, again!
Thank you for sharing this amazing video!!! Running the code as per this tutorial , I have a training reproducibility problem and results differ between different runs. I am setting the seed as velow : def seed_everything(seed: int): random.seed(seed) os.environ["PYTHONHASHSEED"] = str(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False
Would you share how could I reproduce results after training?
Bhai i am try to make this hindi quotes generator using bert so what should be approch , and i try this for about 3 days 12 hrs but did't get anything working, i get demotivated how keep up with things thanks for your reply and help
Thank you for kindly publishing this tutorial. I tried to run this tutorial on a TPU after I run it on a GPU. But when I run the code on a TPU. It got stuck in the following error: /device:TPU:5 2020-01-29 10:10:54.036518: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:217] XRT device (LOCAL) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6 2020-01-29 10:10:54.036536: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:217] XRT device (LOCAL) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7 2020-01-29 10:10:54.036552: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:221] Worker grpc://34.90.164.126:8470 for /job:tpu_worker/replica:0/task:0 2020-01-29 10:10:54.036604: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:225] XRT default device: TPU:0 2020-01-29 10:10:54.036631: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1114] Configuring TPU for master worker tpu_worker:0 at grpc://34.90.164.126:8470 2020-01-29 10:10:54.055859: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "ErfinvGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_DOUBLE } } }') for unknown op: ErfinvGrad 2020-01-29 10:10:54.055955: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "ErfinvGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }') for unknown op: ErfinvGrad 2020-01-29 10:10:54.055976: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "NdtriGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_DOUBLE } } }') for unknown op: NdtriGrad 2020-01-29 10:10:54.056005: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "NdtriGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }') for unknown op: NdtriGrad
Thank you for your very helpful video. I would like to ask you a couple of questions about BERT. Is it possible to do fine-tuning BERT for a regression task? And if it is possible, how can I modify the code for this purpose, I mean which part I need to keep, which part I need to change? I have followed some instructions but it hasn't worked, and as I know, there is no clear instruction for the regression task using BERT. Thank you so much.
Additionally, my data looks like this: input - a sentence: "I really like it" output - a label/score: "2.5" I really hope to receive your instruction! Thanks.
Thanks for the great video @Abhishek Thakur! At 44:18, can u show how u fix the error? I think u fixed it in the terminal but the screen was showing VS code.
Hi! Nice video!! I have a doubt. Is it possible to apply the BERT or any other model for text generation using keywords? For instance, keywords: Sam dog house Sentence: Sam along with his dog lives in a beautiful house.
Hey Abishek, can you please write a dlframework similar to mlframework, using Tensorflow 2.0 and Tensorflow dataset API? Please think about making it as it can be really useful.
Resolved the error, i was sending complete target at once instead of sending like self.targets[item] in __getitem__ function. I used free credits in gcp as it has some limit on number of cores, so was not able to run for batch_size=32 and max_len=512, so reduced length to 256 and bam , code run successfully :) Thank you for this great video
I've tried to write the same code , when I run at nprocs = 8 it throws an error like (Exception: process 0 terminated with exit code 1) , when I run at nprocs = 1 it works ! I'm trying it on colab . Single TPU core and cuda one worked fine ! I've setup environment like this -> VERSION = "20200325" #@param ["1.5" , "20200325", "nightly"] !curl raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py !python pytorch-xla-env-setup.py --version $VERSION Thanks for Videos man ! they help !
@@ojulhao For running in colab >> for function call xmp.spawn() you need to pass in one extra argument start_method = 'fork' because colab support only "forking" of processes instead of "spawning" while creating new processes.
Amazing tutorial ..Thanks a lot for starting this series ..Looking forward for more such videos explaining advanced concepts..Also ,a suggestion ,if you could also provide some links to learn the theoretical part of the concepts you teach in addition to coding ,it would be really helpful.
Very very helpful video :) thanks Abhishek!!
Nice work!
"Watch the video, Try it yourself and Ask if you need help" 🙌
Hello Sanyam, I asked a question in the comments of this video about the opportunity of pre-training again the language model when using a corpus of text quite different from the one used to pre-train the model. Following you, I am quite sure you have to say something to say about it :).
Thank you for this amazing tutorial!!
Any idea of how to fine tuning bert domain adaptation to generate sentence embedding with mayor accuracy?
Hi Abishek,
First, thank you for your videos. They are much appreciated as you go trough all the steps of the DL training process, and thus explain all concepts which often a given from granted.
I have a question though. it seems to me that you don't fine tune the pre-trained language model to the corpus of data you're using for the Q&A downstream task.
Don't you think such a step, suggested by Jeremy Howard will benefit the performance of the model, especially for corpus of text used in a specific domain (i.e. education, movie reviews, politics or whatever). More specifically, when tokens of the this new corpus of data are not included in the **vocab** file of Bert, they are encoded as "[UNK]" token, right? I am wondering what would happen if these "[UNK]" tokens get increasingly high.
(I know that Bert tokenizer separates words in small chunks, but say we have an Albert vocab obtained with Sentencepiece tokenizer).
So, one solution proposed by Howard is to update the pre-trained model so that it also includes the tokens specific to the new context and train them in a few steps before training the model for the downstream tasks.
Do you agree on such a solution (I mean have you ever tried), and is there a simple way to implement it for Bert(ology) models?
Thanks you in advance. ;)
You are true LEGEND in data science community
lol am waiting for your content like I used to for Game Of thrones !! Bring more
@Abhishek Thakur Reproduced your decision, very cool. But I faced such a problem that if several TPUs are used, BERT starts to lose in quality vs single core TPU or GPU, but the learning speed increases several times. Have you ever noticed such a thing? Or maybe you know how to fix it?
yes. adjust lr, batch size, steps to fix it. it takes a while unfortunately :(
Hi Abhisek, first of all, thank you for all the content you create. I'm specializing on NLP and your videos are really useful. Could I ask the reason why you always use Pytorch over Tensorflow? Is it just because you are more comfortable with it or is there any other advantage of using pytorch in NLP? I haven't done much DL so far so at this point I need to choose a framework and am just trying to figure out which one is better for NLP applications. I know there is big discussion about this but would like to know your opinion. Thank you in advance and keep it up!
Thanks for video! Why we add a spearman correlation?
This code was used for a kaggle competition which had spearman as the evaluation metric. That's why :)
I followed the instruction to set up the virtual machine, but not able to import torch.
It shows: OSError: libmkl_intel_lp64.so: cannot open shared object file: No such file or directory.
OSE
you have found the hidden gem. haha
so this is what u need to do:
1. choose the version of intel mkl from here: software.intel.com/en-us/mkl/choose-download
2. download the full version using wget or similar to the vm.
3. follow the on screen instructions and install it.
4. add the path to LD_LIBRARY_PATH
then it will work
@@abhishekkrthakur : do you think I should use the env: condo activate torch-xla-0.5? It is fine for me to install transformers. Thank you for response, Abhishek!
This is a big overlooked step. There should be a video tut on just this install
@@DrPepperment i agree. :)
Hi Abhishek, do you have any idea about NLP SQL, means want create api which should accepts English sentence and convert into SQL queries. Any inbuilt library available?
Thank you for such great videos, Abhishek! It would be very helpful to see one about how you do model tuning in ml framework.
Hi Abhishek
Can we prepare custom data inputs using tokenizers and use BERT for it ?
You create data something like
[CLS] [Q-Title] [Q-Body] [SEP] [Answer] [SEP].
Why not create [CLS] [Q-Title] [SEP] [Q-Body] [SEP] [Answer] [SEP] . or something else?
Also how I see the dataset has potential other features. can we append them?
I am new to NLP and My country language is not included in tools like NLTK and spacy and so on. How can I do that and where can I start to learn? Can you give an advice please?
Hi Abhishek!! Great work:)
May I know whether continuous retraining is possible using BERT?
i.e.,I have a fine-tuned model,Can I further tune it using additional dtaset without merging the new dataset with the old.
Please help me in setting up Bert in local machine like installing and import required libraries.
Hello Abhishek, Thanks a lot for the video, its really very helpful.
I am working on a Messages classification task, but I dont have target values for it. Is it somehow possibe to finetune bert on availabe dataset for a Unsupervissed learning task? I was thinking of getting sentence embeddings from bert and then use these embeddings further with some clustering technique to cluster the similar messages.
But I am not sure how i can finetune bert on my dataset for this task. Thank you.
Do you ever use position_ids?
does google colab offer multiple nodes for tpu?
how to know the TPU Ip address if on google colab?
Excellent videos!
Hello, thank you for this tutorial, one question:
Have you uploaded bert files on tpu cloud machine?
yes
I can't thank you enough. I really appreciate your time and efforts. If you have patron page please let us know.
Everything is for free :) No patron! Thank you!
Abhishek Thakur then I am going to buy all your books.
@@rt-odsc8270 Aw, wholesome threat lol
how can i fine tune bert non english language and then classification?
Thanks for the great tutorial. Can you share how you get the TPU credit?
Thank you so much Abhishek for amazing videos. I would like to use ALBERT instead of RoBerta or basic BERT. Can you please suggest, how should I approach ? In huggingface most of the code looks very much same but the performance and loss function value is not that good in ALBERT model v2. Is that the case you use BERT most of the time and not ALBERT ?
Thanks!
Tooo much introductory videos in YT. Nobody teaches after that. Here you come, Thanks @Abhishek :)
Correct bro
What are you doing, can we connect?
This is a helpful video for BERT fine-tuning! Can you also upload such task fine-tuning (say, text classification) using Transformer-XL and Longformer? Thanks, again!
Thank you for sharing this amazing video!!! Running the code as per this tutorial , I have a training reproducibility problem and results differ between different runs. I am setting the seed as velow :
def seed_everything(seed: int):
random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Would you share how could I reproduce results after training?
Bhai i am try to make this hindi quotes generator using bert so what should be approch , and i try this for about 3 days 12 hrs but did't get anything working, i get demotivated how keep up with things thanks for your reply and help
Am amazing video Sir , thank you so much !!!
Thank you !
well I might sound stupid but difference between pytorch and pytorch nightly anyone pls
Thank you for kindly publishing this tutorial. I tried to run this tutorial on a TPU after I run it on a GPU. But when I run the code on a TPU. It got stuck in the following error:
/device:TPU:5
2020-01-29 10:10:54.036518: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:217] XRT device (LOCAL) TPU:6 -> /job:tpu_worker/replica:0/task:0/device:TPU:6
2020-01-29 10:10:54.036536: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:217] XRT device (LOCAL) TPU:7 -> /job:tpu_worker/replica:0/task:0/device:TPU:7
2020-01-29 10:10:54.036552: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:221] Worker grpc://34.90.164.126:8470 for /job:tpu_worker/replica:0/task:0
2020-01-29 10:10:54.036604: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:225] XRT default device: TPU:0
2020-01-29 10:10:54.036631: I tensorflow/compiler/xla/xla_client/xrt_computation_client.cc:1114] Configuring TPU for master worker tpu_worker:0 at grpc://34.90.164.126:8470
2020-01-29 10:10:54.055859: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "ErfinvGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_DOUBLE } } }') for unknown op: ErfinvGrad
2020-01-29 10:10:54.055955: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "ErfinvGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }') for unknown op: ErfinvGrad
2020-01-29 10:10:54.055976: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "NdtriGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_DOUBLE } } }') for unknown op: NdtriGrad
2020-01-29 10:10:54.056005: E tensorflow/core/framework/op_kernel.cc:1579] OpKernel ('op: "NdtriGrad" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_FLOAT } } }') for unknown op: NdtriGrad
FYI, I run the container by these two commands: docker pull gcr.io/tpu-pytorch/xla:r0.5; docker run -it --shm-size 16G gcr.io/tpu-pytorch/xla:r0.5
Thank you for your very helpful video. I would like to ask you a couple of questions about BERT. Is it possible to do fine-tuning BERT for a regression task? And if it is possible, how can I modify the code for this purpose, I mean which part I need to keep, which part I need to change? I have followed some instructions but it hasn't worked, and as I know, there is no clear instruction for the regression task using BERT. Thank you so much.
Additionally, my data looks like this:
input - a sentence: "I really like it"
output - a label/score: "2.5"
I really hope to receive your instruction! Thanks.
Thanks for the great video @Abhishek Thakur! At 44:18, can u show how u fix the error? I think u fixed it in the terminal but the screen was showing VS code.
Hi! Nice video!! I have a doubt. Is it possible to apply the BERT or any other model for text generation using keywords? For instance, keywords: Sam dog house
Sentence: Sam along with his dog lives in a beautiful house.
You can't use BERT for text generation. As BERT don;'t deal with language modeling it wont support language translation either.
Ya i found that. Also i found how to use gpt-2 to generate text from keywords.thanks for reply
@@logeshbalasubramanian7901 I am working on a similar problem. Could you please tell how you acheived it using GPT-2 ?
HI Abhishek, Could you point me to few resources on BERT to go through before going through the video.
ruclips.net/video/OyFJWRnt_AY/видео.html This has an amazing explanation of transformer based models
Hey Abishek, can you please write a dlframework similar to mlframework, using Tensorflow 2.0 and Tensorflow dataset API? Please think about making it as it can be really useful.
mlframework will convert to dlframework eventually. :)
Hi Abhishek, I tried to replicate the whole code but i was getting this error in loss function
Resolved the error, i was sending complete target at once instead of sending like self.targets[item] in __getitem__ function.
I used free credits in gcp as it has some limit on number of cores, so was not able to run for batch_size=32 and max_len=512, so reduced length to 256 and bam , code run successfully :)
Thank you for this great video
What's your IDE?
chnages for TPU: 40:20
I've tried to write the same code , when I run at nprocs = 8 it throws an error like (Exception: process 0 terminated with exit code 1) , when I run at nprocs = 1 it works ! I'm trying it on colab . Single TPU core and cuda one worked fine !
I've setup environment like this ->
VERSION = "20200325" #@param ["1.5" , "20200325", "nightly"]
!curl raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION
Thanks for Videos man ! they help !
pytorch nightly is having issues for a while now. see some solutions here: github.com/pytorch/xla/issues/1927
Did you solve it? I'm having the same issue with Colab
@@ojulhao For running in colab >> for function call xmp.spawn() you need to pass in one extra argument
start_method = 'fork'
because colab support only "forking" of processes instead of "spawning" while creating new processes.
Is there a poor man's version for this video? :P
is there something i was unable to explain properly?
@@abhishekkrthakur No I meant is there cheaper GPU or TPU options to learn deep learning. Compute power is costly.
@@TheAnubhav27 You have access to free GPU and TPU via Google Colab. But if you want to boost the training speed on GPU, try NVIDIA Apex.