Build a Custom OCR Model in TensorFlow: A Step-by-Step Tutorial

Поделиться
HTML-код
  • Опубликовано: 8 янв 2023
  • In this tutorial, we will explore how to recognize text from images using TensorFlow and the CTC loss function in a neural network model. We will start with an introduction to text recognition and the different approaches used to extract text from images. We will then dive into the specifics of using TensorFlow and the CTC loss function to build our custom OCR system. The tutorial will also introduce a new open-source library called MLTU (Machine Learning Training Utilities) that can be used to store code for future projects. By the end of this tutorial, you will have a working OCR model that you can use to recognize text from images. This is the first part of a tutorial series, so stay tuned for more in-depth content on text recognition and other machine-learning topics.
    Text Version Tutorial: pylessons.com/ctc-text-recogn...
    GitHub: github.com/pythonlessons/mltu...
    pypi: pypi.org/project/mltu/
    #machinelearning #python #tensorflow #opencv #ocr

Комментарии • 93

  • @codewithme6974
    @codewithme6974 Год назад +3

    Hi ,Thanks for all your Videos .Those Videos helped me a lot
    you deserve more subs and views on your channel
    Thanks Again

  • @AGASTRONICS
    @AGASTRONICS Год назад +4

    Cool, have been waiting for this.

  • @Rkoleerock
    @Rkoleerock Год назад +2

    you are an angel bro! Thank u

  • @maggiezhang145
    @maggiezhang145 Год назад +2

    Thanks for the tutorial. Curious if there is a good tutorial on Text Detection you recommend?

  • @TeslaTube
    @TeslaTube Год назад +2

    This was a great video! I’m trying to get an OCR model to work with Hebrew handwriting, what’s my best options for gathering a training set?

    • @PyLessons
      @PyLessons  Год назад +1

      Thanks, try to search for open source datasets, otherwise you'll need to make one by your self

  • @winterx9969
    @winterx9969 7 месяцев назад +1

    Where can i find the dataset and the image folder?

  • @hamzaomari7052
    @hamzaomari7052 5 месяцев назад

    which architecture is this model based on ?
    Can you provide me a way for researching more in the state of the art of OCR especially for digital character recognition.

  • @pranay6177
    @pranay6177 7 месяцев назад

    where can i get your complete notebook for the reference

  • @alanferrari7843
    @alanferrari7843 Год назад +3

    Hi, hello. I want to say that this tutorial is amazing!! Thank you very much.

  • @user-pt9vq9dv1g
    @user-pt9vq9dv1g Год назад +5

    Hi I have a question. You trained your model with annotation_train.txt and annotation_test.txt. I am curious about what kind of things you wrote in those files. Because i am also trying to create my custom model. Thanks for your response in advance

    • @constantlearner3755
      @constantlearner3755 3 месяца назад

      It contains image path/name tab separation and label (what's written in the image in general in the video I could either be same or contain am extra colum for BB

  • @sharaths2397
    @sharaths2397 11 месяцев назад

    #Python lessons could you please clarify me where should i upload my custom data and also in models section you have many files in it how to do it ?

  • @primalvision4029
    @primalvision4029 Год назад +1

    Hi thank you for posting ocr videos. can you please tell what is inference speed of the model while using CPU?

    • @PyLessons
      @PyLessons  Год назад +1

      It depends on cpu, but it's pretty quick, I haven't checked that

  • @hovat
    @hovat Год назад

    On an M1 Max GPU training has been running for 3+ hours on first epoch still...I think maybe I've done something wrong but don't want to end the script at this point. I added many custom examples of alphanumeric sequences. I feel like the M1 Max should be able to handle a batch size of 1024. Do you think this sounds like a lower batch size is needed?

    • @PyLessons
      @PyLessons  Год назад

      Even latest GPUs take time, so I think thats normal

  • @jongameshow
    @jongameshow Год назад +1

    First of all, well done for this video. It is very interesting. I just wanted to ask you a question. Is this possible to use this in a video instead of images? I am trying to train a model that reads number plates. However, number plates vary from a country to another and I am trying to train a model with my country numberplates format.

    • @PyLessons
      @PyLessons  Год назад

      Yes you can! You would need to iterate each frame from the video same as with images. But differently, you would need to use some kind of plate detector and run this recognition on that place crop

    • @oroneki
      @oroneki Год назад

      @jonnyseyc14 I am trying to solve the same problem here... Beginning to research... any advice?

    • @avintimilsina
      @avintimilsina 6 месяцев назад

      @jongameshow I was also working on the same thing. Did you manage to make this model work for your license plate detection? if yes, can give me some reference on that?

  • @Anas-nw6mf
    @Anas-nw6mf 9 месяцев назад +2

    hi sir, just a primary question, what pre-knowledge do i need to fully understand the tutroial and the other ones?

    • @PyLessons
      @PyLessons  9 месяцев назад

      familiar with programming, python and tensorflow

  • @nickmoreno1332
    @nickmoreno1332 9 месяцев назад +2

    Hello, I have a quick question. I’m using a custom dataset. What’s the most ideal dataset size? My CER stays at 1.00. Is it because my dataset is too small?

    • @PyLessons
      @PyLessons  8 месяцев назад +1

      There is no such thing as ideal size. It depends on your dataset quality and complexity. If it stays at 1.00 try to expand dataset or improve model architecture

  • @CetDocteur
    @CetDocteur 24 дня назад +1

    Great tutorial, bad accent, but still it helps me a lot,. I love it!

    • @PyLessons
      @PyLessons  16 дней назад

      Glad to hear that! Hope to fix accent sometime in the future :D

  • @alexrobles8883
    @alexrobles8883 11 месяцев назад

    what can I do if i'm getting the next error in training?: Failed to find data adapter that can handle input: ,

    • @PyLessons
      @PyLessons  11 месяцев назад

      First, open issue on GitHub, second check if you doing everything correct with data, because usually this error comes, when you use wrong path or your data is None

  • @astronaut1861
    @astronaut1861 Год назад +1

    Hi Thanks for video. While I watching this, I saw that the WER is 1.000. what does it mean? why does the WER doesn't goes down?

    • @PyLessons
      @PyLessons  Год назад

      1.00 means that there is no correct word, but it does go down while training

  • @mayursgowda934
    @mayursgowda934 Год назад

    Hello sir, can you please tell me how to do the annotations for a custom dataset.

    • @PyLessons
      @PyLessons  Год назад

      Hey, crop a text, give it label -> repeat :)

  • @alanferrari7843
    @alanferrari7843 Год назад

    Sorry for disturbing you :'( . I'm trying to reproduce this but using sentences (between 1 to 6 words for example) instead of only a word, and I don't know how to do it.
    My idea of create the algorithm is to predict sentences instead only a word.
    I don't know how to handle some variables (config's variables). For example:
    Is it correct to replace words instead characters in config.vocab?
    For configs.max_text_length, can be the number of words in a sentence instead the length of a word? (For example, my longest sentence has 64 characters including spaces, and this same sentence has 15 words).
    You said that I can initialize "CWERMetrics" with (padding_token = "padding token"). In my specific case, should be my configs.max_text_length the "padding token"?? or should be len(config.vocab)?
    Thank you and so so so so so sorry, really, so so sorry.

    • @PyLessons
      @PyLessons  Год назад

      I have another tutorial for sentences, check it out

  • @benoitd94
    @benoitd94 11 месяцев назад

    Do you think I can use your code to decode the digit of my water counter?

    • @PyLessons
      @PyLessons  11 месяцев назад

      Yes, you can ;)

  • @astronaut1861
    @astronaut1861 Год назад

    Hi Thanks for great video!! I got the error with "Failed to find data adapter that can handle input: ," in train_data_provider. Is there any parameters that I have to pass?

    • @PyLessons
      @PyLessons  Год назад +1

      Could you raise issue on GitHub, with more details, what you do, what mltu version you use, what python version you use. It's not enought details from one sentence :)

    • @astronaut1861
      @astronaut1861 Год назад +1

      @@PyLessons yap Actually I just solved that problem and now I have another problem that If I want to load model it says "Unknown loss function: CTCloss. Please ensure this object is passed to the `custom_objects` argument." Could you please teach me how to load model...? my mltu version is 0.1.3 and tensorflow is 2.1.0!

    • @PyLessons
      @PyLessons  Год назад

      @@astronaut1861 model.load(path, compile=False) try this

    • @astronaut1861
      @astronaut1861 Год назад

      @@PyLessons Thank you Bro!!

    • @asmabenbrahem5658
      @asmabenbrahem5658 11 месяцев назад

      Hello , how did you solve it please
      ? @@astronaut1861

  • @midhauxgaming5719
    @midhauxgaming5719 4 месяца назад +1

    Hi can I ask for your Dataset I really want to try to train the model again. Thank you!

    • @PyLessons
      @PyLessons  2 месяца назад

      You may be not able to access dataset website from your location, try to use vpn to access dataset

  • @vkrts9176
    @vkrts9176 Год назад

    Hi
    Can this above project can detect "Handwritten Text" from the image?

    • @PyLessons
      @PyLessons  Год назад +1

      Wait for my next tutorial

    • @vkrts9176
      @vkrts9176 Год назад

      @@PyLessons Thank you Dear....

  • @bouchrasaidi1174
    @bouchrasaidi1174 4 месяца назад +1

    Is there any tutorial u recomand for text detection please ?

    • @PyLessons
      @PyLessons  4 месяца назад

      There is plenty tutorials, but soon I'll create a tutorial how to train Yolo detection with mltu package

  • @nareshmalviya3100
    @nareshmalviya3100 4 месяца назад

    When i run prediction code, i getting error
    AttributeError : "ImageToWordModel" object has no attribute 'input_shape'

    • @PyLessons
      @PyLessons  4 месяца назад

      I need to know what mltu version you using, if you using latest version it changes from input_shape to input_shapes (list of shapes)

  • @philipplange4474
    @philipplange4474 Год назад

    could you make a video how to make a Keras R Cnn (ocr) with XML files (from label imager) as annotations? i.e. recognize text from images for training but also comes from images and XML files?

    • @PyLessons
      @PyLessons  Год назад

      Its python basics and it doesn't need any video, you should be able to handle that by your self

  • @zainulabdin8822
    @zainulabdin8822 Год назад

    hi! first of all when I am creating model, onnx file is not generating. And while training model, do we need to name the image with captcha having characters ???????????????????????????????????????????????????????????????????????????????

    • @PyLessons
      @PyLessons  Год назад

      Python version, tensorflow version? Its up to you how you preprocess data, there is no fixed standart

  • @arfazkhankhan74
    @arfazkhankhan74 8 месяцев назад

    I really need help

  • @LordWildbeast
    @LordWildbeast Год назад

    is this work with easyocr library?

    • @PyLessons
      @PyLessons  Год назад

      Not sure, I didn't tried it

  • @mugumemalte8667
    @mugumemalte8667 Год назад

    hello sir ,can i use this to extract character in images?

    • @PyLessons
      @PyLessons  Год назад

      Hi, yes, of course

    • @mugumemalte8667
      @mugumemalte8667 Год назад

      @@PyLessons thanks sir but pip install mltu==0.1.3 seems cant work in the latest 3.11 python

    • @PyLessons
      @PyLessons  Год назад

      @@mugumemalte8667 thanks, I'll check

  • @roshinik4967
    @roshinik4967 Год назад

    Dataset is too large to handle in normal systems, can we use some other dataset? Please suggest the dataset

    • @PyLessons
      @PyLessons  Год назад

      What you mean normal systems? You can always decrease batch size if it doesnt fit on your gpu

    • @roshinik4967
      @roshinik4967 Год назад

      I mean the systems without gpu

    • @PyLessons
      @PyLessons  Год назад

      @@roshinik4967 without gpu, you cant train such models and there is no other option apart google colab or renting GPU computing

    • @roshinik4967
      @roshinik4967 Год назад

      Even Colab pro is not supporting we tried already

    • @roshinik4967
      @roshinik4967 Год назад

      So only I’m asking if there is some other dataset that works well with this code?

  • @davidhrgl
    @davidhrgl Год назад

    I try to load your .h5 model:
    from tensorflow import keras
    model = keras.models.load_model('model.h5')
    but i have the following error.
    "bad marshal data (unknown type code)"
    Am I missing some function when loading the model? I am using Tensorflow 2.11.0

    • @PyLessons
      @PyLessons  Год назад

      Hey,
      I just tried it with TensorFlow 2.11, everything was fine:
      model = keras.models.load_model("Models/1_image_to_word/202212012033/model.h5", compile=False)

    • @davidhrgl
      @davidhrgl Год назад

      @@PyLessons UserWarning: model is not loaded, but a Lambda layer uses it. It may cause errors.
      config, custom_objects, "function", "module", "function_type" , Not Work :(

  • @d_cobra
    @d_cobra Год назад

    can you give the dataset?

    • @PyLessons
      @PyLessons  Год назад

      Link to dataset is in text version tutorial

  • @baskarkumar3902
    @baskarkumar3902 Год назад

    Dataset link?

  • @arfazkhankhan74
    @arfazkhankhan74 8 месяцев назад

    Hi I have mailed you , could you please look into it

  • @coconutnut21
    @coconutnut21 Год назад

    Can I use this in Google colab?

  • @ntchindagiscard3870
    @ntchindagiscard3870 3 месяца назад +1

    Even though the project seems great explanation is really bad so in the end it's just not comprehensible... is it English or something else is barrier to comprehension idk but delivery is awful.

    • @PyLessons
      @PyLessons  2 месяца назад

      Thank you for your feedback. I'll work on improving the clarity of the explanation to ensure better comprehension moving forward

  • @coderoze4176
    @coderoze4176 8 месяцев назад +1

    It is sad that you just handwaved the explanation for CTC loss.

    • @PyLessons
      @PyLessons  8 месяцев назад +1

      Hi, it's not worth explaining ctc loss, because it pure math and only 0.1% of users who use it are interested how it works in depth. There is many sourses that explain how it works step by step if you need

  • @shreyakorada5874
    @shreyakorada5874 6 месяцев назад

    annotation_val.txt and annotation_train.txt im kind of stuck here can you help me with this?

    • @PyLessons
      @PyLessons  6 месяцев назад

      Check it again, it's nothing magical. If you can't solve it open issue on github :)