For anyone who is getting poor results: 1. The small dataset means that a random split might not generalise the problem. for example, the train dataset might contain much higher percentage of a digit than another 2. You can use opencv to perform preprocessing which can improve performance. Using morphological transformations to remove noise can improve performance immensely. 3. To avoid overfitting, I found that a Gaussian noise layer can help. This makes it harder to learn therefore harder to overfit. Hope this helps!
have a question!. how to avoid miss detect of character? especially between two similiar character. example. letter Z detected 2, letter S detected 5, letter I detected 1, etc
@@HarshpreetSingh-jz2lf do you have a class imbalance in the dataset? Is the model built correctly? Is the data preprocessed correctly? I can't help you if you don't provide any context except for "it no work"
I've finally ended with this working configuration: images = sorted(map(str, list(data_dir.glob("*.png")))) labels = [img.split(path.sep)[-1].split(".png")[0] for img in images] vocab = sorted(set("".join(labels))) max_length = max(len(label) for label in labels) char_to_num = StringLookup(vocabulary=vocab, mask_token=None, num_oov_indices=0, oov_token="[UNK]") num_to_char = StringLookup(vocabulary=char_to_num.get_vocabulary(), invert=True, mask_token=None, num_oov_indices=0, oov_token="[UNK]") And rest of the code like in video.
Dear Coding Lib! im here with the Capthcha project! seems like turning the shuffle on messes with the shuffling function and does incorrect tplit. I have yet to find solution, and would really appreciate if you looked into it! If shuffle is off, it works well. Another person pointed the bug out, and its labels being on wrong images
@@HassanKhan-ei2wh When i add num_oov_indices = 0 parameter in stringLookup code then model training code work but it post labels on wrong images. So i removed num_oov_indices and now my model training code of earlystopping is not working. Any solution for this ?
Hi Nicola When i add "num_oov_indices" = 0 parameter in stringLookup code then model training code work but it post labels on wrong images in visualization part before training and creating model. So i removed "num_oov_indices" and now my model training code of earlystopping is not working. Code stop in very first epoch Any solution for this ?
Hey, Thank you very much for this beautiful explanation of the code and the philosophy behind ocr with LSTM and CTC layer. Can you please verify if the code always works well because I was executing it and it was working but now doesn't. I think there is a problem in mapping characters to numbers and mapping numbers to their original characters by the function of ('' layers.experimental.preprocessing.StringLookup''). I tried to compilate it in google colab but when I tried to visualize the data it doesn't give the correct label text. I would be very thankful if you verify it and give some solutions to fIxe the problem of mapping characters to numbers and mapping numbers to their original characters .
Hey i am not getting accurate results, i checked your github for some reason the labels arent matching the captchas during testing what would you recommend to do
Excuse me bro, i have an issue when im running build_model() function after CTC Loss its happen in line 43 about x = layers.Reshape(target_shape = new_shape, name='reshape')(x) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 73 74 # Panggil Functionnya buat bkin model ---> 75 model = build_model() 76 model.summary() in build_model() 41 # floor division menghasilkan nilai berupa hasil dari pembagian bersisa 42 new_shape = ((img_width // 4), (img_height // 4) * 64) ---> 43 x = layers.Reshape(target_shape = new_shape, name='reshape')(x) 44 x = layers.Dense(64, activation='relu', name='dense1')(x) 45 x = layers.Dropout(0.2)(x) /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs) 975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list): 976 return self._functional_construction_call(inputs, args, kwargs, --> 977 input_list) 978 979 # Maintains info about the `Layer.call` stack. /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list) 1113 # Check input assumptions set after layer building, e.g. input shape. 1114 outputs = self._keras_tensor_symbolic_call( -> 1115 inputs, input_masks, args, kwargs) 1116 1117 if outputs is None: /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs) 846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature) 847 else: --> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks) 849 850 def _infer_output_signature(self, inputs, args, kwargs, input_masks): /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks) 886 self._maybe_build(inputs) 887 inputs = self._maybe_cast_inputs(inputs) --> 888 outputs = call_fn(inputs, *args, **kwargs) 889 890 self._handle_activity_regularization(inputs, outputs) /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in call(self, inputs) 537 # Set the static shape for the result since it might lost during array_ops 538 # reshape, eg, some `None` dim in the result could be inferred. --> 539 result.set_shape(self.compute_output_shape(inputs.shape)) 540 return result 541 /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in compute_output_shape(self, input_shape) 528 output_shape = [input_shape[0]] 529 output_shape += self._fix_unknown_dimension(input_shape[1:], --> 530 self.target_shape) 531 return tf.TensorShape(output_shape) 532 /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape) 516 output_shape[unknown] = original // known 517 elif original != known: --> 518 raise ValueError(msg) 519 return output_shape 520 --------------------------------------------------------------------------- and this the error message ValueError: total size of new array must be unchanged, input_shape = [50, 50, 64], output_shape = [50, 768]
Hi, how can I get for captcha that has 6 digits each picture? Currently it’s 5 digits in your example, I know I need to change something in the model but I can’t seem to figure it out, :( the error I keep getting is cannot add tensor to batch. Number of elements does not match. Shapes are: [tensor]: [5] [batch]: [6] How should I change or how do I understand what I need to change?
Hello, in my case I tried changing the dataset for images with 4 characters and it was adapted to all 4, it would mean that you should make a model for each different length.
Hi! Im trying the code but i having an error while training : Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [5], [batch]: [6]. Anyone can help me to fix this?
Hi, I am a beginner in this field and I've watched your video and implemented this code. Its working fine but I need to test a single captcha image how can I do that. I was trying to do that but the prediction was not good . Please help me out if you can. 🥺
Anyone else has the same error?: File "C:\Users\user\PycharmProjects\ocr_gas\ocr.py", line 135, in call * label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
ValueError: slice index 1 of dimension 0 out of bounds. for '{{node ocr_model_v1/ctc_loss/strided_slice_2}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](ocr_model_v1/ctc_loss/Shape_2, ocr_model_v1/ctc_loss/strided_slice_2/stack, ocr_model_v1/ctc_loss/strided_slice_2/stack_1, ocr_model_v1/ctc_loss/strided_slice_2/stack_2)' with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = , input[2] = , input[3] = .
Call arguments received by layer "ctc_loss" " f"(type CTCLayer): • y_true=tf.Tensor(shape=(None,), dtype=float32) • y_pred=tf.Tensor(shape=(None, 50, 12), dtype=float32)
Join My AI Career Program
www.nicolai-nielsen.com/aicareer
Enroll in My School and Technical Courses
www.nicos-school.com
The repository link is not in the description
For anyone who is getting poor results:
1. The small dataset means that a random split might not generalise the problem. for example, the train dataset might contain much higher percentage of a digit than another
2. You can use opencv to perform preprocessing which can improve performance. Using morphological transformations to remove noise can improve performance immensely.
3. To avoid overfitting, I found that a Gaussian noise layer can help. This makes it harder to learn therefore harder to overfit.
Hope this helps!
have a question!. how to avoid miss detect of character? especially between two similiar character. example. letter Z detected 2, letter S detected 5, letter I detected 1, etc
@@kalifardiansyah5863 you may require more training data, or a larger CNN architecture
I tried it with 60000 images, used morphological techniques but still doesn't provide accuracy, val_loss just doesn't go below 14
@@HarshpreetSingh-jz2lf do you have a class imbalance in the dataset? Is the model built correctly? Is the data preprocessed correctly?
I can't help you if you don't provide any context except for "it no work"
I've finally ended with this working configuration:
images = sorted(map(str, list(data_dir.glob("*.png"))))
labels = [img.split(path.sep)[-1].split(".png")[0] for img in images]
vocab = sorted(set("".join(labels)))
max_length = max(len(label) for label in labels)
char_to_num = StringLookup(vocabulary=vocab, mask_token=None, num_oov_indices=0, oov_token="[UNK]")
num_to_char = StringLookup(vocabulary=char_to_num.get_vocabulary(), invert=True, mask_token=None, num_oov_indices=0,
oov_token="[UNK]")
And rest of the code like in video.
Hello, do you have the captcha code for this clip, please send me?
Thank you
@@mortezarisan3261 if u mean model code, yes:
train_model = build_train_model(vocab)
train_model.summary()
early_stopping = EarlyStopping(monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True,
min_delta=1e-5)
history = train_model.fit(train_dataset, validation_data=validation_dataset, epochs=epochs,
callbacks=[early_stopping], verbose=1)
prediction_model = get_prediction_model(train_model)
compile_prediction_model(prediction_model)
prediction_model.summary()
____
def decode_batch_predictions(pred, num_to_char):
results = ctc_decode(pred, tf.ones(pred.shape[0]) * pred.shape[1], "greedy")[0][0][:, :]
return [tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8").replace(num_to_char.oov_token, "") for res
in results]
def build_train_model(vocab: list) -> Model:
input_img = Input(shape=(img_width, img_height, 1), name="image")
labels = Input(name="label", shape=(None,), dtype="float32")
x = Conv2D(32, (3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="Conv1")(input_img)
x = MaxPooling2D((2, 2), name="pool1")(x)
x = Conv2D(64, (3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="Conv2")(x)
x = MaxPooling2D((2, 2), name="pool2")(x)
new_shape = ((img_width // 4), (img_height // 4) * 64)
x = Reshape(target_shape=new_shape, name="reshape")(x)
x = Dense(64, activation="relu", name="dense1")(x)
x = Dropout(.2)(x)
x = Bidirectional(LSTM(128, return_sequences=True, dropout=.25))(x)
x = Bidirectional(LSTM(64, return_sequences=True, dropout=.25))(x)
x = Dense(len(vocab) + 1, activation="softmax", name="out2vec")(x)
output = CTCLayer(name="ctc_loss")(labels, x)
# Define the model
model = Model(inputs=[input_img, labels], outputs=output, name="ocr_model")
model.compile(Adam())
return model
def get_prediction_model(train_model: Model) -> Model:
return Model(inputs=train_model.get_layer(name="image").output,
outputs=train_model.get_layer(name="out2vec").output)
def compile_prediction_model(prediction_model: Model):
prediction_model.compile(Adam())
I m trying to run this code but m getting error like InvalidArgumentError : graph execution error
Anyone can help with this
where is the repository link
i am not able to find it in description
## Preprocessing
# Mapping characters to integers
char_to_num = layers.experimental.preprocessing.StringLookup(
vocabulary=list(characters), mask_token=None
)
# Mapping integers back to original characters
num_to_char = layers.experimental.preprocessing.StringLookup(
vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True
Thanks, thought i was the only one!
Thanks Man
thanks so much!
Dear Coding Lib! im here with the Capthcha project! seems like turning the shuffle on messes with the shuffling function and does incorrect tplit. I have yet to find solution, and would really appreciate if you looked into it! If shuffle is off, it works well. Another person pointed the bug out, and its labels being on wrong images
## Preprocessing
# Mapping characters to integers
char_to_num = layers.experimental.preprocessing.StringLookup(
vocabulary=list(characters), mask_token=None
)
# Mapping integers back to original characters
num_to_char = layers.experimental.preprocessing.StringLookup(
vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True
@@HassanKhan-ei2wh Thanks Man
@@HassanKhan-ei2wh When i add num_oov_indices = 0 parameter in stringLookup code then model training code work but it post labels on wrong images. So i removed num_oov_indices and now my model training code of earlystopping is not working. Any solution for this ?
@@syedmuzammilahmed6872 Just add num_oov_indices=0 to num_to_char also, it help me
Hi Nicola When i add "num_oov_indices" = 0 parameter in stringLookup code then model training code work but it post labels on wrong images in visualization part before training and creating model. So i removed "num_oov_indices" and now my model training code of earlystopping is not working. Code stop in very first epoch Any solution for this ?
Hey, Thank you very much for this beautiful explanation of the code and the philosophy behind ocr with LSTM and CTC layer.
Can you please verify if the code always works well because I was executing it and it was working but now doesn't. I think there is a problem in mapping characters to numbers and mapping numbers to their original characters by the function of ('' layers.experimental.preprocessing.StringLookup''). I tried to compilate it in google colab but when I tried to visualize the data it doesn't give the correct label text.
I would be very thankful if you verify it and give some solutions to fIxe the problem of mapping characters to numbers and mapping numbers to their original characters .
Thank you very much for watching! The code should not depend on anything and should be working every time, hmm 🤔
Hi, I'm getting the same error. Did you find the solution?
Same here. I used only digits as labels therefore I removed "char_to_num" and "num_to_char"
Do you think I can use your code to decode the digits of my water counter?
Maybe u Can try easyocr for that!
can i extract text from images by the way ? My final project is extract text from images but i can not coding . I need to help please .
Hey i am not getting accurate results, i checked your github for some reason the labels arent matching the captchas during testing what would you recommend to do
Fix the code / fix the labels
can anyone provide me wiyh the code
can you provide library versions you used
Excuse me bro, i have an issue when im running build_model() function after CTC Loss
its happen in line 43 about x = layers.Reshape(target_shape = new_shape, name='reshape')(x)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
73
74 # Panggil Functionnya buat bkin model
---> 75 model = build_model()
76 model.summary()
in build_model()
41 # floor division menghasilkan nilai berupa hasil dari pembagian bersisa
42 new_shape = ((img_width // 4), (img_height // 4) * 64)
---> 43 x = layers.Reshape(target_shape = new_shape, name='reshape')(x)
44 x = layers.Dense(64, activation='relu', name='dense1')(x)
45 x = layers.Dropout(0.2)(x)
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
976 return self._functional_construction_call(inputs, args, kwargs,
--> 977 input_list)
978
979 # Maintains info about the `Layer.call` stack.
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
1113 # Check input assumptions set after layer building, e.g. input shape.
1114 outputs = self._keras_tensor_symbolic_call(
-> 1115 inputs, input_masks, args, kwargs)
1116
1117 if outputs is None:
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature)
847 else:
--> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks)
849
850 def _infer_output_signature(self, inputs, args, kwargs, input_masks):
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks)
886 self._maybe_build(inputs)
887 inputs = self._maybe_cast_inputs(inputs)
--> 888 outputs = call_fn(inputs, *args, **kwargs)
889
890 self._handle_activity_regularization(inputs, outputs)
/usr/local/lib/python3.7/dist-packages/keras/layers/core.py in call(self, inputs)
537 # Set the static shape for the result since it might lost during array_ops
538 # reshape, eg, some `None` dim in the result could be inferred.
--> 539 result.set_shape(self.compute_output_shape(inputs.shape))
540 return result
541
/usr/local/lib/python3.7/dist-packages/keras/layers/core.py in compute_output_shape(self, input_shape)
528 output_shape = [input_shape[0]]
529 output_shape += self._fix_unknown_dimension(input_shape[1:],
--> 530 self.target_shape)
531 return tf.TensorShape(output_shape)
532
/usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape)
516 output_shape[unknown] = original // known
517 elif original != known:
--> 518 raise ValueError(msg)
519 return output_shape
520
---------------------------------------------------------------------------
and this the error message
ValueError: total size of new array must be unchanged, input_shape = [50, 50, 64], output_shape = [50, 768]
thanks a lot !
Thanks for watching!
Hi, how can I get for captcha that has 6 digits each picture? Currently it’s 5 digits in your example, I know I need to change something in the model but I can’t seem to figure it out, :( the error I keep getting is cannot add tensor to batch. Number of elements does not match. Shapes are: [tensor]: [5] [batch]: [6]
How should I change or how do I understand what I need to change?
Did you find the solution?
Hello, in my case I tried changing the dataset for images with 4 characters and it was adapted to all 4, it would mean that you should make a model for each different length.
hey did you fix it?
@@arslanmushtaq9774 hey did you fix it?
Hi Nicolai, thanks for great explanation. Could you please explain how to measure accuracy?
Can I use this for model for license plates?
Hi! Im trying the code but i having an error while training : Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [5], [batch]: [6]. Anyone can help me to fix this?
same bro
I was replicating this project with another dataset i made and got that error because I forgot a letter when labelling a file
can it suitable for text recognition task?
Yes if u just train it on what u want to recognize
@@NicolaiAI Thank you I will try.
Hi Nicolai,
I was wondering would there be a way to feed in this kind of network wider images with text or have kind of dynamic input with size?
Hi, I am a beginner in this field and I've watched your video and implemented this code. Its working fine but I need to test a single captcha image how can I do that. I was trying to do that but the prediction was not good . Please help me out if you can. 🥺
Have you found the answer to this?
how to impliment it???You just read all docs
Hi, 80% of the video is implementation
@@NicolaiAI I am having a single captcha and I trained my modes. So how can I solve that captcha?
What do u mean by single captcha? In the video they are passed through the model one by one too
@@NicolaiAI 😔😔😔I am noob with tf.
I wanted to make a api which gets captcha by base6 4 and solves captcha and send back the captcha response
@@NicolaiAI i want to input a single CAPTCHA and I want the model to predict it
Sir
Why we can't use regular objects detection to detect the number ?
How to crack 6 digits and characters captcha
hey did you get a method?
Anyone else has the same error?:
File "C:\Users\user\PycharmProjects\ocr_gas\ocr.py", line 135, in call *
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
ValueError: slice index 1 of dimension 0 out of bounds. for '{{node ocr_model_v1/ctc_loss/strided_slice_2}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](ocr_model_v1/ctc_loss/Shape_2, ocr_model_v1/ctc_loss/strided_slice_2/stack, ocr_model_v1/ctc_loss/strided_slice_2/stack_1, ocr_model_v1/ctc_loss/strided_slice_2/stack_2)' with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = , input[2] = , input[3] = .
Call arguments received by layer "ctc_loss" " f"(type CTCLayer):
• y_true=tf.Tensor(shape=(None,), dtype=float32)
• y_pred=tf.Tensor(shape=(None, 50, 12), dtype=float32)