Building an OCR Model to Crack Captchas: A Neural Network Tutorial with Keras and TensorFlow

Поделиться
HTML-код
  • Опубликовано: 11 ноя 2024
  • НаукаНаука

Комментарии • 69

  • @NicolaiAI
    @NicolaiAI  Год назад +1

    Join My AI Career Program
    www.nicolai-nielsen.com/aicareer
    Enroll in My School and Technical Courses
    www.nicos-school.com

    • @theuser810
      @theuser810 Год назад +2

      The repository link is not in the description

  • @axelanderson2030
    @axelanderson2030 2 года назад +10

    For anyone who is getting poor results:
    1. The small dataset means that a random split might not generalise the problem. for example, the train dataset might contain much higher percentage of a digit than another
    2. You can use opencv to perform preprocessing which can improve performance. Using morphological transformations to remove noise can improve performance immensely.
    3. To avoid overfitting, I found that a Gaussian noise layer can help. This makes it harder to learn therefore harder to overfit.
    Hope this helps!

    • @kalifardiansyah5863
      @kalifardiansyah5863 Год назад

      have a question!. how to avoid miss detect of character? especially between two similiar character. example. letter Z detected 2, letter S detected 5, letter I detected 1, etc

    • @axelanderson2030
      @axelanderson2030 Год назад

      @@kalifardiansyah5863 you may require more training data, or a larger CNN architecture

    • @HarshpreetSingh-jz2lf
      @HarshpreetSingh-jz2lf Год назад

      I tried it with 60000 images, used morphological techniques but still doesn't provide accuracy, val_loss just doesn't go below 14

    • @axelanderson2030
      @axelanderson2030 Год назад

      @@HarshpreetSingh-jz2lf do you have a class imbalance in the dataset? Is the model built correctly? Is the data preprocessed correctly?
      I can't help you if you don't provide any context except for "it no work"

  • @megistone
    @megistone 2 месяца назад +1

    I've finally ended with this working configuration:
    images = sorted(map(str, list(data_dir.glob("*.png"))))
    labels = [img.split(path.sep)[-1].split(".png")[0] for img in images]
    vocab = sorted(set("".join(labels)))
    max_length = max(len(label) for label in labels)
    char_to_num = StringLookup(vocabulary=vocab, mask_token=None, num_oov_indices=0, oov_token="[UNK]")
    num_to_char = StringLookup(vocabulary=char_to_num.get_vocabulary(), invert=True, mask_token=None, num_oov_indices=0,
    oov_token="[UNK]")
    And rest of the code like in video.

    • @mortezarisan3261
      @mortezarisan3261 2 месяца назад

      Hello, do you have the captcha code for this clip, please send me?

    • @int-64
      @int-64 Месяц назад

      Thank you

    • @megistone
      @megistone Месяц назад

      @@mortezarisan3261 if u mean model code, yes:
      train_model = build_train_model(vocab)
      train_model.summary()
      early_stopping = EarlyStopping(monitor="val_loss", patience=early_stopping_patience, restore_best_weights=True,
      min_delta=1e-5)
      history = train_model.fit(train_dataset, validation_data=validation_dataset, epochs=epochs,
      callbacks=[early_stopping], verbose=1)
      prediction_model = get_prediction_model(train_model)
      compile_prediction_model(prediction_model)
      prediction_model.summary()
      ____
      def decode_batch_predictions(pred, num_to_char):
      results = ctc_decode(pred, tf.ones(pred.shape[0]) * pred.shape[1], "greedy")[0][0][:, :]
      return [tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8").replace(num_to_char.oov_token, "") for res
      in results]
      def build_train_model(vocab: list) -> Model:
      input_img = Input(shape=(img_width, img_height, 1), name="image")
      labels = Input(name="label", shape=(None,), dtype="float32")
      x = Conv2D(32, (3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="Conv1")(input_img)
      x = MaxPooling2D((2, 2), name="pool1")(x)
      x = Conv2D(64, (3, 3), activation="relu", kernel_initializer="he_normal", padding="same", name="Conv2")(x)
      x = MaxPooling2D((2, 2), name="pool2")(x)
      new_shape = ((img_width // 4), (img_height // 4) * 64)
      x = Reshape(target_shape=new_shape, name="reshape")(x)
      x = Dense(64, activation="relu", name="dense1")(x)
      x = Dropout(.2)(x)
      x = Bidirectional(LSTM(128, return_sequences=True, dropout=.25))(x)
      x = Bidirectional(LSTM(64, return_sequences=True, dropout=.25))(x)
      x = Dense(len(vocab) + 1, activation="softmax", name="out2vec")(x)
      output = CTCLayer(name="ctc_loss")(labels, x)
      # Define the model
      model = Model(inputs=[input_img, labels], outputs=output, name="ocr_model")
      model.compile(Adam())
      return model
      def get_prediction_model(train_model: Model) -> Model:
      return Model(inputs=train_model.get_layer(name="image").output,
      outputs=train_model.get_layer(name="out2vec").output)
      def compile_prediction_model(prediction_model: Model):
      prediction_model.compile(Adam())

  • @omkarmestry4117
    @omkarmestry4117 Год назад +2

    I m trying to run this code but m getting error like InvalidArgumentError : graph execution error
    Anyone can help with this

  • @adepusairahul7375
    @adepusairahul7375 11 месяцев назад +1

    where is the repository link
    i am not able to find it in description

  • @hsnhsynglk
    @hsnhsynglk 3 года назад +6

    ## Preprocessing
    # Mapping characters to integers
    char_to_num = layers.experimental.preprocessing.StringLookup(
    vocabulary=list(characters), mask_token=None
    )
    # Mapping integers back to original characters
    num_to_char = layers.experimental.preprocessing.StringLookup(
    vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True

  • @GuyJustCool
    @GuyJustCool 3 года назад +4

    Dear Coding Lib! im here with the Capthcha project! seems like turning the shuffle on messes with the shuffling function and does incorrect tplit. I have yet to find solution, and would really appreciate if you looked into it! If shuffle is off, it works well. Another person pointed the bug out, and its labels being on wrong images

    • @HassanKhan-ei2wh
      @HassanKhan-ei2wh Год назад +1

      ## Preprocessing
      # Mapping characters to integers
      char_to_num = layers.experimental.preprocessing.StringLookup(
      vocabulary=list(characters), mask_token=None
      )
      # Mapping integers back to original characters
      num_to_char = layers.experimental.preprocessing.StringLookup(
      vocabulary=char_to_num.get_vocabulary(), mask_token=None, invert=True

    • @syedmuzammilahmed6872
      @syedmuzammilahmed6872 Год назад

      @@HassanKhan-ei2wh Thanks Man

    • @syedmuzammilahmed6872
      @syedmuzammilahmed6872 Год назад

      ​@@HassanKhan-ei2wh When i add num_oov_indices = 0 parameter in stringLookup code then model training code work but it post labels on wrong images. So i removed num_oov_indices and now my model training code of earlystopping is not working. Any solution for this ?

    • @megistone
      @megistone 2 месяца назад

      @@syedmuzammilahmed6872 Just add num_oov_indices=0 to num_to_char also, it help me

  • @syedmuzammilahmed6872
    @syedmuzammilahmed6872 Год назад

    Hi Nicola When i add "num_oov_indices" = 0 parameter in stringLookup code then model training code work but it post labels on wrong images in visualization part before training and creating model. So i removed "num_oov_indices" and now my model training code of earlystopping is not working. Code stop in very first epoch Any solution for this ?

  • @souhailel-ghayam4714
    @souhailel-ghayam4714 3 года назад +8

    Hey, Thank you very much for this beautiful explanation of the code and the philosophy behind ocr with LSTM and CTC layer.
    Can you please verify if the code always works well because I was executing it and it was working but now doesn't. I think there is a problem in mapping characters to numbers and mapping numbers to their original characters by the function of ('' layers.experimental.preprocessing.StringLookup''). I tried to compilate it in google colab but when I tried to visualize the data it doesn't give the correct label text.
    I would be very thankful if you verify it and give some solutions to fIxe the problem of mapping characters to numbers and mapping numbers to their original characters .

    • @NicolaiAI
      @NicolaiAI  3 года назад

      Thank you very much for watching! The code should not depend on anything and should be working every time, hmm 🤔

    • @nadyasudusinghe2213
      @nadyasudusinghe2213 2 года назад +1

      Hi, I'm getting the same error. Did you find the solution?

    • @traderdaniel4749
      @traderdaniel4749 2 года назад +1

      Same here. I used only digits as labels therefore I removed "char_to_num" and "num_to_char"

  • @benoitd94
    @benoitd94 Год назад +2

    Do you think I can use your code to decode the digits of my water counter?

    • @NicolaiAI
      @NicolaiAI  Год назад +2

      Maybe u Can try easyocr for that!

  • @şulemeşe-z7w
    @şulemeşe-z7w 10 месяцев назад

    can i extract text from images by the way ? My final project is extract text from images but i can not coding . I need to help please .

  • @prathamshah5521
    @prathamshah5521 9 месяцев назад

    Hey i am not getting accurate results, i checked your github for some reason the labels arent matching the captchas during testing what would you recommend to do

    • @LucasDM4
      @LucasDM4 4 месяца назад

      Fix the code / fix the labels

  • @alokthakur3298
    @alokthakur3298 10 месяцев назад +1

    can anyone provide me wiyh the code

  • @int-64
    @int-64 2 года назад +1

    can you provide library versions you used

  • @hendrywijaya1017
    @hendrywijaya1017 3 года назад

    Excuse me bro, i have an issue when im running build_model() function after CTC Loss
    its happen in line 43 about x = layers.Reshape(target_shape = new_shape, name='reshape')(x)
    ---------------------------------------------------------------------------
    ValueError Traceback (most recent call last)
    in ()
    73
    74 # Panggil Functionnya buat bkin model
    ---> 75 model = build_model()
    76 model.summary()
    in build_model()
    41 # floor division menghasilkan nilai berupa hasil dari pembagian bersisa
    42 new_shape = ((img_width // 4), (img_height // 4) * 64)
    ---> 43 x = layers.Reshape(target_shape = new_shape, name='reshape')(x)
    44 x = layers.Dense(64, activation='relu', name='dense1')(x)
    45 x = layers.Dropout(0.2)(x)
    /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    975 if _in_functional_construction_mode(self, inputs, args, kwargs, input_list):
    976 return self._functional_construction_call(inputs, args, kwargs,
    --> 977 input_list)
    978
    979 # Maintains info about the `Layer.call` stack.
    /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _functional_construction_call(self, inputs, args, kwargs, input_list)
    1113 # Check input assumptions set after layer building, e.g. input shape.
    1114 outputs = self._keras_tensor_symbolic_call(
    -> 1115 inputs, input_masks, args, kwargs)
    1116
    1117 if outputs is None:
    /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _keras_tensor_symbolic_call(self, inputs, input_masks, args, kwargs)
    846 return tf.nest.map_structure(keras_tensor.KerasTensor, output_signature)
    847 else:
    --> 848 return self._infer_output_signature(inputs, args, kwargs, input_masks)
    849
    850 def _infer_output_signature(self, inputs, args, kwargs, input_masks):
    /usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in _infer_output_signature(self, inputs, args, kwargs, input_masks)
    886 self._maybe_build(inputs)
    887 inputs = self._maybe_cast_inputs(inputs)
    --> 888 outputs = call_fn(inputs, *args, **kwargs)
    889
    890 self._handle_activity_regularization(inputs, outputs)
    /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in call(self, inputs)
    537 # Set the static shape for the result since it might lost during array_ops
    538 # reshape, eg, some `None` dim in the result could be inferred.
    --> 539 result.set_shape(self.compute_output_shape(inputs.shape))
    540 return result
    541
    /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in compute_output_shape(self, input_shape)
    528 output_shape = [input_shape[0]]
    529 output_shape += self._fix_unknown_dimension(input_shape[1:],
    --> 530 self.target_shape)
    531 return tf.TensorShape(output_shape)
    532
    /usr/local/lib/python3.7/dist-packages/keras/layers/core.py in _fix_unknown_dimension(self, input_shape, output_shape)
    516 output_shape[unknown] = original // known
    517 elif original != known:
    --> 518 raise ValueError(msg)
    519 return output_shape
    520
    ---------------------------------------------------------------------------
    and this the error message
    ValueError: total size of new array must be unchanged, input_shape = [50, 50, 64], output_shape = [50, 768]

  • @EnsignerTV
    @EnsignerTV 3 года назад +4

    thanks a lot !

    • @NicolaiAI
      @NicolaiAI  3 года назад

      Thanks for watching!

  • @tricialamjingyi
    @tricialamjingyi 2 года назад +3

    Hi, how can I get for captcha that has 6 digits each picture? Currently it’s 5 digits in your example, I know I need to change something in the model but I can’t seem to figure it out, :( the error I keep getting is cannot add tensor to batch. Number of elements does not match. Shapes are: [tensor]: [5] [batch]: [6]
    How should I change or how do I understand what I need to change?

    • @arslanmushtaq9774
      @arslanmushtaq9774 2 года назад +1

      Did you find the solution?

    • @kentoky6568
      @kentoky6568 2 года назад

      Hello, in my case I tried changing the dataset for images with 4 characters and it was adapted to all 4, it would mean that you should make a model for each different length.

    • @aryangupta2051
      @aryangupta2051 Год назад +1

      hey did you fix it?

    • @aryangupta2051
      @aryangupta2051 Год назад

      @@arslanmushtaq9774 hey did you fix it?

  • @ehsanroshan7068
    @ehsanroshan7068 2 года назад

    Hi Nicolai, thanks for great explanation. Could you please explain how to measure accuracy?

  • @coconutnut21
    @coconutnut21 Год назад

    Can I use this for model for license plates?

  • @Konnits
    @Konnits Год назад +1

    Hi! Im trying the code but i having an error while training : Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]: [5], [batch]: [6]. Anyone can help me to fix this?

    • @arpittalmale6468
      @arpittalmale6468 Год назад

      same bro

    • @lanasillomaster7034
      @lanasillomaster7034 11 месяцев назад

      I was replicating this project with another dataset i made and got that error because I forgot a letter when labelling a file

  • @chelvanchelvam4332
    @chelvanchelvam4332 3 года назад +2

    can it suitable for text recognition task?

    • @NicolaiAI
      @NicolaiAI  3 года назад +2

      Yes if u just train it on what u want to recognize

    • @chelvanchelvam4332
      @chelvanchelvam4332 3 года назад +1

      @@NicolaiAI Thank you I will try.

  • @alexmoruz1993
    @alexmoruz1993 2 года назад

    Hi Nicolai,
    I was wondering would there be a way to feed in this kind of network wider images with text or have kind of dynamic input with size?

  • @abhisekseal8044
    @abhisekseal8044 2 года назад +1

    Hi, I am a beginner in this field and I've watched your video and implemented this code. Its working fine but I need to test a single captcha image how can I do that. I was trying to do that but the prediction was not good . Please help me out if you can. 🥺

    • @GODS_CODM
      @GODS_CODM 2 года назад

      Have you found the answer to this?

  • @bbtvines
    @bbtvines 3 года назад +1

    how to impliment it???You just read all docs

    • @NicolaiAI
      @NicolaiAI  3 года назад +3

      Hi, 80% of the video is implementation

    • @creatur
      @creatur 3 года назад

      @@NicolaiAI I am having a single captcha and I trained my modes. So how can I solve that captcha?

    • @NicolaiAI
      @NicolaiAI  3 года назад

      What do u mean by single captcha? In the video they are passed through the model one by one too

    • @creatur
      @creatur 3 года назад

      @@NicolaiAI 😔😔😔I am noob with tf.
      I wanted to make a api which gets captcha by base6 4 and solves captcha and send back the captcha response

    • @GODS_CODM
      @GODS_CODM 2 года назад +1

      @@NicolaiAI i want to input a single CAPTCHA and I want the model to predict it

  • @Cordic45
    @Cordic45 2 года назад

    Sir
    Why we can't use regular objects detection to detect the number ?

  • @ZainAbdin-e7s
    @ZainAbdin-e7s Год назад

    How to crack 6 digits and characters captcha

  • @traderdaniel4749
    @traderdaniel4749 2 года назад

    Anyone else has the same error?:
    File "C:\Users\user\PycharmProjects\ocr_gas\ocr.py", line 135, in call *
    label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")

    ValueError: slice index 1 of dimension 0 out of bounds. for '{{node ocr_model_v1/ctc_loss/strided_slice_2}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](ocr_model_v1/ctc_loss/Shape_2, ocr_model_v1/ctc_loss/strided_slice_2/stack, ocr_model_v1/ctc_loss/strided_slice_2/stack_1, ocr_model_v1/ctc_loss/strided_slice_2/stack_2)' with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = , input[2] = , input[3] = .


    Call arguments received by layer "ctc_loss" " f"(type CTCLayer):
    • y_true=tf.Tensor(shape=(None,), dtype=float32)
    • y_pred=tf.Tensor(shape=(None, 50, 12), dtype=float32)