GPT-2 (basic for understanding for GPT-3)
HTML-код
- Опубликовано: 6 фев 2025
- GPT-3 is super intelligent NLP deep learning model. In order to understand GPT-3 or later version, we should understand fundamental basic of it which is the GPT-2. I covered the how the GPT-2 could achieved zero shot learning and replaced high score on multiple NLP bench marks, and gave examples how GPT-2 as one model could achieve multiple NLP tasks without fine tuning.
3:49 위 GPT-2 는 GPT-1 인 거죠..? 쉬운 설명 감사합니다.
Hello Sir,
if we take a queation similarity task, the input in Bert is:
CLS token + Question one + SEP + Question 2 + SEP
I read that yhe input in GPT-2 is:
Question one + Question 2 + CLS token.
Is this correct?
If yes,
Should we use the CLS token to represent the input for classification as we do in Bert?
Thanks for a good question. Bert and gpt are different, and cls token only exist in bert. You can use the last token from gpt for a classification but the result may worse than bert. Gpt-2 research paper has not covered question similarity task, so there is no official ways for your question, but I honestly don’t understand what would be expected out from gpt for the input q1 + q2 + special token. I think possible solution for question similarity is to use good sentence embedding and checking similarity or using Siamese network to see if the pair is similar. Hope this answer at least have some direction for your question.
@@TheEasyoung Thank you for your prompt reply.
I am not sure about the last token used by gpt-2 for classification. Do you mean the eos token or, should we append the CLS token to the end of the input after adding it to the vocabulary, then use the representation of this CLS token for classification as done in BERT
?
Will the output in gpt-2 be output[0][:,-1], the output embedding for the last token ? In Bert, the pooled output is output[0][:,1], which is the embeddings for CLS token.
With regard to padding:
In Bert, pad token is appended to the end of the input ( to the right), whereas in GPT-2 a pad token is placed at the beginning!
Thanks in advanced.
While bert has cls token for classification, gpt-2 doesn’t have unless you trained with cls token at the end. The more like gpt-2 way is like below,
train data1: how are you, s1, how are you doing, s2, true
Train data2: i am a boy, s1, thanks, s2, false
You will make sure you have enough data for generative training for doing it in gpt-2 way.
BERT has pretrained for classification with cls token so bert should be easy for you to fine tune and use while you can’t find gpt-2 for your usecase.
I hope this answers your question.
@@TheEasyoung many thanks for your clarification.