Multi-Class Language Classification With BERT in TensorFlow

Поделиться
HTML-код
  • Опубликовано: 23 дек 2024

Комментарии • 75

  • @MaryamYassi
    @MaryamYassi Год назад +2

    I wanted to express my sincere appreciation for your videos on RUclips. They have been immensely helpful to me in my Ph.D. thesis, particularly in understanding how to pre-train using MLM and fine-tune the BERT model.
    I thoroughly enjoy watching your videos, and they have provided valuable insights and guidance for my research. Thank you for creating such informative and engaging content.

  • @aditya_01
    @aditya_01 3 года назад +2

    The best video regarding how to use bert in tensorflow ,thank u

  • @tildo64
    @tildo64 10 месяцев назад

    I don't comment on videos, but your video is so clear and easy to understand I had to just say thank you! I have been trying to solve a multi class problem with an LLM for months without significant progress. Using your video, I was able to make more progress by training a BERT model in a few days than I have in months! Please keep posting. It's immensely helpful for the rest of us.

  • @kennethnavarro3496
    @kennethnavarro3496 3 года назад +1

    Thank you so much for this tutorial. Most tutorials really piss me off because they always refer back to other videos they made regarding why things work but you explained each step as you did it and this is super good for someone with a temperant like mine. Appreciate it, you're a beast!

    • @jamesbriggs
      @jamesbriggs  3 года назад

      haha thanks Kenneth, I try to assume we're starting at the start for every video :)

  • @achrafoukouhou1016
    @achrafoukouhou1016 3 года назад +3

    This video is excellent sir, I was looking for video like that in 2 straight days.

    • @jamesbriggs
      @jamesbriggs  3 года назад +1

      That's awesome to hear, happy you found it, thanks!

  • @meredithhurston
    @meredithhurston 2 года назад +2

    Thanks so much, James. On my 1st attempt I was able to get to ~51% accuracy. I will need to make some tweaks, but I'm so excited about this! Woohoo!

  • @krishnanvs5946
    @krishnanvs5946 3 года назад

    Very crisp and nicely structured, with the objective of the exercise stated right at the start

    • @jamesbriggs
      @jamesbriggs  3 года назад

      thanks, useful to know stating the objective helps!

  • @anityagangurde5329
    @anityagangurde5329 2 года назад

    Thank you so much!! I was really stuck with the prediction part for a very long time. This will help me a lot.

  • @chrisp.784
    @chrisp.784 3 года назад

    Thank you so much sir! Best video ever seen on RUclips, clearly explain each steps.

  • @MdSaeemHossainShanto
    @MdSaeemHossainShanto 2 года назад

    at 42:00
    on cell 9, it returns an array of what? What those numbers mean ?

  • @manuadd192
    @manuadd192 2 года назад

    Hey Great Video, just got a question in my data set some texts have multiple lables. Can i just set multiple lables to 1 in the labels[] array at 13:47?

  • @alexaskills3447
    @alexaskills3447 3 года назад +1

    This was great! One question what if you wanted to use additional features besides the Bert embeddings in the training data set. What would be the best approach? Do some type of model stacking where you take the output of the sentiment model and use that combined with other features as input to another model? Or is the a better way to merge/concatenate the additional features onto the BERT word vector training data?

  • @simonlindgren
    @simonlindgren 3 года назад +1

    This is a fantastic tutorial! Excellent stuff, even for non experts. I wonder how one would go about should one want to add (domain specific) tokens to the BERT tokenizer, before training. Where in the workflow can that be done?

    • @jamesbriggs
      @jamesbriggs  3 года назад +1

      Hi Simon, there are two approaches, you train from scratch (obviously this takes some time) OR you can add tokens, I want to cover this soon but here's an example github.com/huggingface/transformers/issues/1413#issuecomment-538083512

    • @simonlindgren
      @simonlindgren 3 года назад

      @@jamesbriggs Great! So add tokens to tokenizer before training on the labeled data, right?

  • @serhatkalkan2339
    @serhatkalkan2339 2 года назад

    Great! Tutorial! I wonder if the seq_length has to be that long if we work with short phrases ?

  • @adityanjsg99
    @adityanjsg99 3 года назад

    This video helped thanks. Usage of BERT does need a GPU subscription though.

  • @dhivyasubburaman8828
    @dhivyasubburaman8828 3 года назад

    Really good tutorial! Thank you so much an awsome teacher.....made the model understanding easy and simple,is there any similar tutorial for bertformultilabelsequenceclassification .....or the same code can be used for mulilabel classification

    • @jamesbriggs
      @jamesbriggs  3 года назад

      Thanks! You should be able to use the same code, just change the output layer dimensions to align with your new number of output labels :)

  • @maxhuttmann4760
    @maxhuttmann4760 2 года назад

    James thank you! I had stuck before with extracting Bert embedding for tf layer as now almost everyone shows this part with use of other libraries like tensor flow hub, text etc. and I cannot use them in my project due to limitations
    Will try your algorithm. Thanks a lot

  • @agahyucel4502
    @agahyucel4502 3 года назад

    hi, first of all thank you for this nice video. How can we make a confisition matrix and classification report here?

  • @minhajulislamchowdhury1101
    @minhajulislamchowdhury1101 2 года назад

    how can i find confusion matrix for this kind of dataset?

  • @marwamiimi1935
    @marwamiimi1935 2 года назад

    Hello, thank you for this great video
    I follow the steps but i have error
    Can you help me please ?

  • @lasimanazrin6212
    @lasimanazrin6212 2 года назад

    Getting this error: Unknown layer: Custom>TFBertMainLayer. Please ensure this object is passed to the `custom_objects`
    Anybody have any idea?

  • @gloriaabuka5644
    @gloriaabuka5644 2 года назад

    Thank you for this very explanatory video. I tried following along with another dataset but each time I try to one-hot-encode my labels with these 3 lines of code
    arr = df['rating'].values
    labels = np.zeros((num_samples, arr.max()))#(my label values are from 1-10)
    labels[np.arange(num_samples), arr] = 1
    numpy.float64' object cannot be interpreted as an integer".

  • @datascientist7802
    @datascientist7802 3 года назад +1

    HI Sir, great explanation, and I followed to implement the same, but I got this error when training the model :
    InvalidArgumentError: Data type mismatch at component 0: expected double but got int32.
    [[node IteratorGetNext (defined at :1) ]] [Op:__inference_train_function_20701]

    • @jamesbriggs
      @jamesbriggs  3 года назад

      seems like one of the datatypes for (probably) your inputs is wrong, you will need to add something like dtype=float32 to your input layer definitions
      OR it may be that your data must be converted to float first before being processed by the model

    • @abhishekchack8065
      @abhishekchack8065 3 года назад

      Xids= np.float64(Xids)
      Xmask=a= np.float64(Xmask)
      dataset = tf.data.Dataset.from_tensor_slices((Xids, Xmask, labels))
      before creating pipeline just convert Xids and Xmask to float64

  • @plashless3406
    @plashless3406 Год назад

    This is awesome.

  • @luiscao7241
    @luiscao7241 3 года назад

    Hi James Briggs, I found that following the way of dividing validation/train data, validation and train sets vary all the time. When I save the trained model and load it to evaluate for validation data again, I got different results for each run time. Should I divide train/and validation data from beginning and do not need to use SPLIT = 0.9 or others? does it compromise the accuracy of the trained model? Thanks

  • @harveenchadha
    @harveenchadha 3 года назад +1

    Excellent! Where can I fnd the code used in the video?

    • @jamesbriggs
      @jamesbriggs  3 года назад +2

      Code is split between a few different notebooks on Github - they're all in this repo folder: github.com/jamescalam/transformers/tree/main/course/project_build_tf_sentiment_model - hope it helps :)

    • @harveenchadha
      @harveenchadha 3 года назад

      @@jamesbriggs Thanks. That surely helps! Keep up the good work James, I see you are working on a Transformers course. Will be looking forward to it!

  • @panophobia8527
    @panophobia8527 2 года назад

    After training I get around 60% accuracy. When I try to predict I never get the model to predict Sentiment 0 or 4. Do you have any idea why the model has problems with these?

  • @luiscao7241
    @luiscao7241 3 года назад

    Great tutorial! Thanks

  • @asimsultan8191
    @asimsultan8191 3 года назад +1

    Thank you for such an amazing collection:) Just 1 question, While loading the model, I get this error: ValueError: Cannot assign to variable bert/embeddings/token_type_embeddings/embeddings:0 due to variable shape (2, 768) and value shape (512, 768) are incompatible.
    Can you let me know why is that so? Thank you so much in advance.

    • @jamesbriggs
      @jamesbriggs  3 года назад +1

      Hey Asim, I would double check that you are tokenizing everything correctly, the 512 that you see is the standard number of tokens consumed by BERT, which we set when encoding our text with the tokenizer :)

    • @asimsultan8191
      @asimsultan8191 3 года назад

      @@jamesbriggs I got it and solved the problem. Thank you so much :)

  • @henkhbit5748
    @henkhbit5748 3 года назад

    Nice example! Could u also use the same technique if you want to classify text in more than 5 categories, for example 10 or 20? And each class is not perfectly balanced and it is NOT an englist text? 😉

    • @jamesbriggs
      @jamesbriggs  3 года назад

      haha yes you could, you have different language BERT models that are pretrained - if there was not the language you wanted, we'd want to train from scratch on the new language (mentioned in the last comment) - as for training with more categories, yes we could do that using the same code we use here, we just switch our training data to the new 10-20 class data, and update classifier layer output size to match :)

  • @meylyssa3666
    @meylyssa3666 3 года назад

    Great tutorial, like always, thanks!

    • @jamesbriggs
      @jamesbriggs  3 года назад

      Thanks I appreciate these comments a lot! :)

  • @salmanshaikh4866
    @salmanshaikh4866 3 года назад

    Hi there, I am trying to generate a confusion matrix, but due to the dataset being shuffled I'm not able to, and it's giving me random values. Any ideas what to do? (The accuracy and loss is pretty good whilst training the model)

  • @Moxgusa
    @Moxgusa 3 года назад

    Hi James, first of all good tutoriel !
    I tried implementing the same architecture with a different dataset but the model training time is insane it's +50h do you have any clue of the reason it takes so much time ?
    thank you !

    • @jamesbriggs
      @jamesbriggs  3 года назад

      it can be a long time, it will depend on the hardware setup you have, I'm using a 3090 GPU so it is reasonably fast, I would double check that you are using GPU (if you have a compatible GPU). If you search something like 'tensorflow GPU setup' you should find some good explanations - hope that helps!

  • @gokulgupta1021
    @gokulgupta1021 3 года назад

    Nice informative video. It would be nice if you can help me to know how can I change this to pytorch
    # create the dataset object
    dataset = tf.data.Dataset.from_tensor_slices((Xids, Xmask, labels))
    def map_func(input_ids, masks, labels):
    # we convert our three-item tuple into a two-item tuple where the input item is a dictionary
    return {'input_ids': input_ids, 'attention_mask': masks}, labels
    # then we use the dataset map method to apply this transformation
    dataset = dataset.map(map_func)

    • @jamesbriggs
      @jamesbriggs  3 года назад

      I'm not using PyTorch for sentiment analysis in this example, instead for masked language modeling, but the dataset build logic is very similar, this video at ~14:57:
      ruclips.net/video/R6hcxMMOrPE/видео.html

  • @faressayah9897
    @faressayah9897 3 года назад +1

    Amazing tutorial 👏👏👏.
    If you are going to use your model on another machine it's better to h5 format.
    # Saving the model
    model = model.save("your_model.h5")
    # Loading the model in another machine
    import tensorflow as tf
    import transformers
    model = tf.keras.models.load_model('your_model.h5',
    custom_objects={'CustomMetric':transformers.TFBertMainLayer})

    • @jamesbriggs
      @jamesbriggs  3 года назад +1

      hey Fares, thanks and appreciate the info - I assume you recommend so due to us then only having a single file to transfer - rather than several?

    • @faressayah9897
      @faressayah9897 3 года назад +1

      @@jamesbriggs
      I am working on a hate speech detection project, I trained the model on kaggle and after saving it, it worked in the same notebook but in my local machine it didn't. saving directly need to save the configuration also.
      I didn't find how to so, so I save the model to h5 format.

  • @soysasu
    @soysasu 3 года назад

    Hi sir, I'm trying step by step at Google Colab but it running out of RAM. They give me 12.69GB; in the most cases that happens due code problems. Any idea? thank you!

    • @jamesbriggs
      @jamesbriggs  3 года назад

      Google Colab can be difficult with the amount of memory you're given, transformers use *a lot* - one thing that can help is loading your data in batches (so you're not storing it all in memory), one of my recent videos covers this, it might help: ruclips.net/video/r-zQQ16wTCA/видео.html

    • @soysasu
      @soysasu 3 года назад

      @@jamesbriggs Okay, I'll see it. Thank you!

  • @gloriaabuka9129
    @gloriaabuka9129 2 года назад

    Thank you for this great video. I tried following along with another dataset but each time I try to one-hot-encode my labels I keep getting an error that says " numpy.float64 object cannot be interpreted as an interger". Any idea how to fix this? Thank you.

    • @abAbhi105
      @abAbhi105 2 года назад

      same here did you find any solution ?

    • @gloriaabuka9129
      @gloriaabuka9129 2 года назад

      @@abAbhi105 Yes, I did. I casted my array elements to integer.
      arr = arr.astype(int)
      Labels[np.arange(num_samples), arr-1] = 1.

  • @digvijayyadav4168
    @digvijayyadav4168 3 года назад

    Hi there, please can you share the notebook?

    • @jamesbriggs
      @jamesbriggs  3 года назад

      Hey it's not necessarily exactly the same, but you will find very similar code here github.com/jamescalam/transformers/tree/main/course/project_build_tf_sentiment_model

  • @amitjaiswar8593
    @amitjaiswar8593 3 года назад

    Its an implementation or fine-tuning?
    #model.layers[2].trainable = False

    • @jamesbriggs
      @jamesbriggs  3 года назад

      hey Amit, this sets the internal BERT layers to not train, but still allows us to train the classifier layers (which are layers 3, 4, etc), we can actually train the BERT layer too by removing that line, but training time will be much longer

  • @faisalq4092
    @faisalq4092 Год назад

    I want something from scratch

  • @vidopulos
    @vidopulos 3 года назад +1

    Hi. Excellent tutorial! I have a problem. When I'm trying to replicate your code and in the part when I'm using tokenizer.encode_plus() and i get ValueError: could not broadcast input array from shape (15) into shape (512)Thanks. It says that the error is here - Xids[i, :] = tokens['input_ids']

    • @jamesbriggs
      @jamesbriggs  3 года назад

      Does it work if you write Xids[:, i] = tokens['input_ids']? Otherwise, double-check the Xids dimensionality with Xids.shape and make sure it lines up to what we would expect (eg num_samples and 512)

    • @francesniu
      @francesniu 3 года назад

      I had the same issue, and I solved it by putting pad_to_max_length = True instead of padding = 'max_length'.

  • @Mrwheelsful
    @Mrwheelsful 3 года назад

    Hi James, at the very end when you predicted your new sentiment data with your model you assigned it to.
    probs = model.predict(test)
    I would like to know how to export that data you predicted into CSV format so that one can submit it on Kaggle.
    test['sentiment'] = model.predict(test['phrase'])
    submission = test[['tweetid', 'sentiment']]
    submission.to_csv('bertmodel.csv',index=False)
    Is this the correct way of going about it :) because I want it in sentiment values when exported.

    • @jamesbriggs
      @jamesbriggs  3 года назад

      I think you might need to perform a np.argmax() operation on the model.predict output, to convert from output logits to predicted labels, but otherwise it looks good :)

  • @madhavimourya1157
    @madhavimourya1157 3 года назад

    HI James, great explanation and I followed to implement the same , but I got this error :
    InvalidArgumentError: indices[2,2] = 29200 is not in [0, 28996)
    [[node model/bert/embeddings/Gather (defined at /usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_tf_bert.py:188) ]] [Op:__inference_train_function_488497]
    I know it's related to embedding token id. can u help me how can I resolve this ?

    • @madhavimourya1157
      @madhavimourya1157 3 года назад

      Luckily, I got the solution :)

    • @jamesbriggs
      @jamesbriggs  3 года назад

      @@madhavimourya1157 Oh good to hear, was it in your dataset definition?