Absolutely fantastic tutorial. Thank you so much. Just a small edit: 1) If you're using this in 2022, it'll give you some error, especially in create_model(). Error is: Cannot convert a symbolic Tensor (bert/encoder/layer_0/attention/self/strided_slice:0) to a numpy array. Please use tensorflow 2.2 and numpy 1.19.5 and don't run tensorflow-gpu line. 2) Also, the github repo and this video have a different create_model function definition. Nothing big and can be handled without any hassle if you carefully apply the basics. I hope this helps.
@42:13 in line 29, the self.max_seq_len will take the longest len(token_ids) even if it's longer than the original max_seq_len. If the goal is to take the largest length if its less then the max_seq_len and then truncate if longer, then at line 19, just introduce a temp variable '_max_seq_len = 0', and in line 29, just assign to '_max_seq_len' and after loop exited, 'self.max_seq_len = min(self.max_seq_len, _max_seq_len)'
after training is completed. model.save_weights('bert_weights.h5') model.load_weights('bert_weights.h5') will help the model to load and test later, Thank you :)
First of all thank you very much Venelin for this great tutorial. Actually best implementation I've seen for Bert with Keras so far. Please keep going on Keras series videos. I just want to give a a bit information why we had more tokens than word number for train.text[0] is that Bert not only tokenize words but also generates sub-words for unkown/out of vocabulary words. Tokens for train.text[0[ are ['listen', 'to', 'west', '##ba', '##m', 'al', '##umb', 'allergic', 'on', 'google', 'music'] which is 11 tokens excluding CLS and SEP. Subwords tokens always starts with "##" unless they are the first subword of unknown word. For example in the sample "westbam" is tokenized as "west" (without ##, cuz the first token of itself) "##ba" and "##m".
Hey nice video mate!. The reason it's giving more number of tokens is that because tokenizers in Bert/Transformers works on the principle of subwords tokens. They split a single word into multiple tokens.
I will just say "You are Awesome". Explained it so nicely and in simplified manner. Keep rocking...!!! Query: You have first added the [CLS] and [SEP] tokens to each text. Suppose, the length of any msg is more than the max_seq_len, in that case, it will be truncated. Now, truncation will drop the [SEP] tag..!!! Isn't that impact the classifier? Please share your thoughts. Thanks.
bert tokenizer does piece wise tokenization, so for some words it might break into multiple sub words to generate an id, hence the larger number of tokens
Hello Venelin thanks for this amazing video do you a tutorial on how to use BERT along with Tensorflow (not pytorch) but for multi-label classification
Congrats ... very well layed out and just the thing that I was looking for, particularly to connect BERT and TF2/Keras. Quick question: ... the Tensorboard will not start with your code (and I never got it work in Colab for myself) ... any idea how to fix this in the code (imports? tools?, code?). Thank you and ... keep up the good job!
impressive, I just wanted to know how can I improve a bert model as a starting point for a master project, I thought of optimizing the layers of the transformers
log_dir = "log/intent_detection/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%s") tensorboard_callback = keras.callbacks.TensorBoard(log_dir=log_dir) history = model.fit( x=data.train_x, y=data.train_y, validation_split=0.1, batch_size=16, shuffle=True, epochs=5, callbacks=[tensorboard_callback] How long does this step take to finish? Please answer me!!! Thank you.
Hi Venelin, I think you made a minor mistake in the function _pad(self...). You actually added two tokens [CLS] and [SEP], and you might remove them from the token list if the length of the entry is larger than the given maximum seq length.
when i call the model function i get TypeError: Layer input_spec must be an instance of InputSpec. Got: InputSpec(shape=(None, 55, 768), ndim=3). How to resolve this?
Thank you, this is the best guide and notebook I've ever seen. Everything is so easy to understand. I would love to see more of your content.
most useful content I've ever seen on the internet about BERT for me ! Thanks alot
this is gold. no one explains in this detail
Absolutely fantastic tutorial. Thank you so much.
Just a small edit:
1) If you're using this in 2022, it'll give you some error, especially in create_model().
Error is: Cannot convert a symbolic Tensor (bert/encoder/layer_0/attention/self/strided_slice:0) to a numpy array.
Please use tensorflow 2.2 and numpy 1.19.5 and don't run tensorflow-gpu line.
2) Also, the github repo and this video have a different create_model function definition. Nothing big and can be handled without any hassle if you carefully apply the basics.
I hope this helps.
thank u :) it helps me a lot
Thanks. I tried tf 2.2 and numpy 1.19.5 but getting the same error.
@42:13 in line 29, the self.max_seq_len will take the longest len(token_ids) even if it's longer than the original max_seq_len. If the goal is to take the largest length if its less then the max_seq_len and then truncate if longer, then at line 19, just introduce a temp variable '_max_seq_len = 0', and in line 29, just assign to '_max_seq_len' and after loop exited, 'self.max_seq_len = min(self.max_seq_len, _max_seq_len)'
after training is completed.
model.save_weights('bert_weights.h5')
model.load_weights('bert_weights.h5')
will help the model to load and test later, Thank you :)
Arun, can you please explain why in the first three epochs of training the validation accuracy is better than that of training?
Arun, can you please explain why the valid accuracy is found to be better than that of training, during the first three epochs?
First of all thank you very much Venelin for this great tutorial. Actually best implementation I've seen for Bert with Keras so far. Please keep going on Keras series videos. I just want to give a a bit information why we had more tokens than word number for train.text[0] is that Bert not only tokenize words but also generates sub-words for unkown/out of vocabulary words. Tokens for train.text[0[ are ['listen', 'to', 'west', '##ba', '##m', 'al', '##umb', 'allergic', 'on', 'google', 'music'] which is 11 tokens excluding CLS and SEP. Subwords tokens always starts with "##" unless they are the first subword of unknown word. For example in the sample "westbam" is tokenized as "west" (without ##, cuz the first token of itself) "##ba" and "##m".
['listen',
'to',
'west',
'##ba',
'##m',
'al',
'##umb',
'allergic',
'on',
'google',
'music']
Hey nice video mate!.
The reason it's giving more number of tokens is that because tokenizers in Bert/Transformers works on the principle of subwords tokens. They split a single word into multiple tokens.
the link you provided for code is not running and giving an error when we are calling the function create_model()
try with tensorflow 2.2, #!pip install tensorflow-gpu >> /dev/null
!pip install tensorflow==2.2
I too have this problem
NotImplementedError Traceback (most recent call last)
in ()
----> 1 model = create_model(data.max_seq_len, bert_ckpt_file)
2 frames
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/autograph/impl/api.py in wrapper(*args, **kwargs)
235 except Exception as e: # pylint:disable=broad-except
236 if hasattr(e, 'ag_error_metadata'):
--> 237 raise e.ag_error_metadata.to_exception(e)
238 else:
239 raise
**NotImplementedError: in converted code:**
/usr/local/lib/python3.7/dist-packages/bert/model.py:80 call *
output = self.encoders_layer(embedding_output, mask=mask, training=training)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:842 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.7/dist-packages/bert/transformer.py:234 call *
layer_output = encoder_layer(layer_input, mask=mask, training=training)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:842 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.7/dist-packages/bert/transformer.py:176 call *
attention_output = self.self_attention_layer(layer_input, mask=mask, training=training)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:842 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.7/dist-packages/bert/transformer.py:121 call *
attention_head = self.attention_layer(layer_input, mask=mask, training=training)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:842 __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
/usr/local/lib/python3.7/dist-packages/bert/attention.py:90 call *
mask = tf.ones(sh[:2], dtype=tf.int32)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/array_ops.py:2571 ones
output = _constant_if_small(one, shape, dtype, name)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/array_ops.py:2306 _constant_if_small
if np.prod(shape) < 1000:
:6 prod
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:3052 prod
keepdims=keepdims, initial=initial, where=where)
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py:86 _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:736 __array__
" array.".format(self.name))
**NotImplementedError: Cannot convert a symbolic Tensor (bert/encoder/layer_0/attention/self/strided_slice:0) to a numpy array.**
Awesome video! I have tried to do it myself and struggled a lot. Thanks for your detailed instruction! :)
I will just say "You are Awesome". Explained it so nicely and in simplified manner. Keep rocking...!!!
Query: You have first added the [CLS] and [SEP] tokens to each text. Suppose, the length of any msg is more than the max_seq_len, in that case, it will be truncated. Now, truncation will drop the [SEP] tag..!!! Isn't that impact the classifier?
Please share your thoughts. Thanks.
Just one request for you. Please continue with keras and tensorlfow. I really don't like pytorch.
u are the OG God
bert tokenizer does piece wise tokenization, so for some words it might break into multiple sub words to generate an id, hence the larger number of tokens
yea, that is exactly what i am thinking
Can anyone explain why have we used tanh in hidden layers instead of ReLu which is most commonly used??
What would you suggest? A custom trained model or a RASA/DialogueFLow integration
Hi nice tutorial I had a request ,could you tell we what method you would have used to deal with unbalanced classes ?
Love the way he pronounces BERT
very nicely explained!!
Hello Venelin thanks for this amazing video
do you a tutorial on how to use BERT along with Tensorflow (not pytorch) but for multi-label classification
Hi, train.intent is not working? please let me know how to handle this? its urgent
where I can get clean csv, also how big is dataset per intent
model = create_model(data.max_seq_len, bert_ckpt_file)
I amgetting error agter i am running this line
Congrats ... very well layed out and just the thing that I was looking for, particularly to connect BERT and TF2/Keras. Quick question: ... the Tensorboard will not start with your code (and I never got it work in Colab for myself) ... any idea how to fix this in the code (imports? tools?, code?). Thank you and
... keep up the good job!
impressive, I just wanted to know how can I improve a bert model as a starting point for a master project, I thought of optimizing the layers of the transformers
log_dir = "log/intent_detection/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%s")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=log_dir)
history = model.fit(
x=data.train_x,
y=data.train_y,
validation_split=0.1,
batch_size=16,
shuffle=True,
epochs=5,
callbacks=[tensorboard_callback]
How long does this step take to finish?
Please answer me!!!
Thank you.
how to detect contradiction in text using bert model ?
Thank you a lot for this video. Also I want to ask you about intents, how is possible to add more intent to with model? Thanks in advance :)
Hi Venelin,
I think you made a minor mistake in the function _pad(self...). You actually added two tokens [CLS] and [SEP], and you might remove them from the token list if the length of the entry is larger than the given maximum seq length.
superb video sir
Really useful! Thanks!
Hey why didn't u use mask input, and input segment ? I'm ur inputs ?
in 28th block line no 14 in keras.Layers.Lambda[ : , 0, : ] why(
Very interesting and useful, thanks Venelin! What about using Bert with other non-English languages?
Looks great, and thanks. But, I'm surprised that you are unaware of wordpiece tokenizer in bert 😃 Confusion was priceless
Very Informative. Thanks.
great video!
thank you so much!
You’re amazing
when i call the model function i get TypeError: Layer input_spec must be an instance of InputSpec. Got: InputSpec(shape=(None, 55, 768), ndim=3). How to resolve this?
#!pip install tensorflow-gpu >> /dev/null
!pip install tensorflow==2.2
very nice !
Very good video, enjoyed watching and applying same time :p subscribed for that ^^
the way he says bert😂
That internet speed, 108 Mbps, damn
great videos!