Build a Custom OCR Model in TensorFlow: A Step-by-Step Tutorial
HTML-код
- Опубликовано: 8 янв 2023
- In this tutorial, we will explore how to recognize text from images using TensorFlow and the CTC loss function in a neural network model. We will start with an introduction to text recognition and the different approaches used to extract text from images. We will then dive into the specifics of using TensorFlow and the CTC loss function to build our custom OCR system. The tutorial will also introduce a new open-source library called MLTU (Machine Learning Training Utilities) that can be used to store code for future projects. By the end of this tutorial, you will have a working OCR model that you can use to recognize text from images. This is the first part of a tutorial series, so stay tuned for more in-depth content on text recognition and other machine-learning topics.
Text Version Tutorial: pylessons.com/ctc-text-recogn...
GitHub: github.com/pythonlessons/mltu...
pypi: pypi.org/project/mltu/
#machinelearning #python #tensorflow #opencv #ocr
Hi ,Thanks for all your Videos .Those Videos helped me a lot
you deserve more subs and views on your channel
Thanks Again
Cool, have been waiting for this.
More to come!
you are an angel bro! Thank u
You're welcome!
Thanks for the tutorial. Curious if there is a good tutorial on Text Detection you recommend?
Savvy😂😂😂😂
This was a great video! I’m trying to get an OCR model to work with Hebrew handwriting, what’s my best options for gathering a training set?
Thanks, try to search for open source datasets, otherwise you'll need to make one by your self
Where can i find the dataset and the image folder?
which architecture is this model based on ?
Can you provide me a way for researching more in the state of the art of OCR especially for digital character recognition.
where can i get your complete notebook for the reference
Hi, hello. I want to say that this tutorial is amazing!! Thank you very much.
You are welcome!
Hi I have a question. You trained your model with annotation_train.txt and annotation_test.txt. I am curious about what kind of things you wrote in those files. Because i am also trying to create my custom model. Thanks for your response in advance
It contains image path/name tab separation and label (what's written in the image in general in the video I could either be same or contain am extra colum for BB
#Python lessons could you please clarify me where should i upload my custom data and also in models section you have many files in it how to do it ?
Hi thank you for posting ocr videos. can you please tell what is inference speed of the model while using CPU?
It depends on cpu, but it's pretty quick, I haven't checked that
On an M1 Max GPU training has been running for 3+ hours on first epoch still...I think maybe I've done something wrong but don't want to end the script at this point. I added many custom examples of alphanumeric sequences. I feel like the M1 Max should be able to handle a batch size of 1024. Do you think this sounds like a lower batch size is needed?
Even latest GPUs take time, so I think thats normal
First of all, well done for this video. It is very interesting. I just wanted to ask you a question. Is this possible to use this in a video instead of images? I am trying to train a model that reads number plates. However, number plates vary from a country to another and I am trying to train a model with my country numberplates format.
Yes you can! You would need to iterate each frame from the video same as with images. But differently, you would need to use some kind of plate detector and run this recognition on that place crop
@jonnyseyc14 I am trying to solve the same problem here... Beginning to research... any advice?
@jongameshow I was also working on the same thing. Did you manage to make this model work for your license plate detection? if yes, can give me some reference on that?
hi sir, just a primary question, what pre-knowledge do i need to fully understand the tutroial and the other ones?
familiar with programming, python and tensorflow
Hello, I have a quick question. I’m using a custom dataset. What’s the most ideal dataset size? My CER stays at 1.00. Is it because my dataset is too small?
There is no such thing as ideal size. It depends on your dataset quality and complexity. If it stays at 1.00 try to expand dataset or improve model architecture
Great tutorial, bad accent, but still it helps me a lot,. I love it!
Glad to hear that! Hope to fix accent sometime in the future :D
what can I do if i'm getting the next error in training?: Failed to find data adapter that can handle input: ,
First, open issue on GitHub, second check if you doing everything correct with data, because usually this error comes, when you use wrong path or your data is None
Hi Thanks for video. While I watching this, I saw that the WER is 1.000. what does it mean? why does the WER doesn't goes down?
1.00 means that there is no correct word, but it does go down while training
Hello sir, can you please tell me how to do the annotations for a custom dataset.
Hey, crop a text, give it label -> repeat :)
Sorry for disturbing you :'( . I'm trying to reproduce this but using sentences (between 1 to 6 words for example) instead of only a word, and I don't know how to do it.
My idea of create the algorithm is to predict sentences instead only a word.
I don't know how to handle some variables (config's variables). For example:
Is it correct to replace words instead characters in config.vocab?
For configs.max_text_length, can be the number of words in a sentence instead the length of a word? (For example, my longest sentence has 64 characters including spaces, and this same sentence has 15 words).
You said that I can initialize "CWERMetrics" with (padding_token = "padding token"). In my specific case, should be my configs.max_text_length the "padding token"?? or should be len(config.vocab)?
Thank you and so so so so so sorry, really, so so sorry.
I have another tutorial for sentences, check it out
Do you think I can use your code to decode the digit of my water counter?
Yes, you can ;)
Hi Thanks for great video!! I got the error with "Failed to find data adapter that can handle input: ," in train_data_provider. Is there any parameters that I have to pass?
Could you raise issue on GitHub, with more details, what you do, what mltu version you use, what python version you use. It's not enought details from one sentence :)
@@PyLessons yap Actually I just solved that problem and now I have another problem that If I want to load model it says "Unknown loss function: CTCloss. Please ensure this object is passed to the `custom_objects` argument." Could you please teach me how to load model...? my mltu version is 0.1.3 and tensorflow is 2.1.0!
@@astronaut1861 model.load(path, compile=False) try this
@@PyLessons Thank you Bro!!
Hello , how did you solve it please
? @@astronaut1861
Hi can I ask for your Dataset I really want to try to train the model again. Thank you!
You may be not able to access dataset website from your location, try to use vpn to access dataset
Hi
Can this above project can detect "Handwritten Text" from the image?
Wait for my next tutorial
@@PyLessons Thank you Dear....
Is there any tutorial u recomand for text detection please ?
There is plenty tutorials, but soon I'll create a tutorial how to train Yolo detection with mltu package
When i run prediction code, i getting error
AttributeError : "ImageToWordModel" object has no attribute 'input_shape'
I need to know what mltu version you using, if you using latest version it changes from input_shape to input_shapes (list of shapes)
could you make a video how to make a Keras R Cnn (ocr) with XML files (from label imager) as annotations? i.e. recognize text from images for training but also comes from images and XML files?
Its python basics and it doesn't need any video, you should be able to handle that by your self
hi! first of all when I am creating model, onnx file is not generating. And while training model, do we need to name the image with captcha having characters ???????????????????????????????????????????????????????????????????????????????
Python version, tensorflow version? Its up to you how you preprocess data, there is no fixed standart
I really need help
is this work with easyocr library?
Not sure, I didn't tried it
hello sir ,can i use this to extract character in images?
Hi, yes, of course
@@PyLessons thanks sir but pip install mltu==0.1.3 seems cant work in the latest 3.11 python
@@mugumemalte8667 thanks, I'll check
Dataset is too large to handle in normal systems, can we use some other dataset? Please suggest the dataset
What you mean normal systems? You can always decrease batch size if it doesnt fit on your gpu
I mean the systems without gpu
@@roshinik4967 without gpu, you cant train such models and there is no other option apart google colab or renting GPU computing
Even Colab pro is not supporting we tried already
So only I’m asking if there is some other dataset that works well with this code?
I try to load your .h5 model:
from tensorflow import keras
model = keras.models.load_model('model.h5')
but i have the following error.
"bad marshal data (unknown type code)"
Am I missing some function when loading the model? I am using Tensorflow 2.11.0
Hey,
I just tried it with TensorFlow 2.11, everything was fine:
model = keras.models.load_model("Models/1_image_to_word/202212012033/model.h5", compile=False)
@@PyLessons UserWarning: model is not loaded, but a Lambda layer uses it. It may cause errors.
config, custom_objects, "function", "module", "function_type" , Not Work :(
can you give the dataset?
Link to dataset is in text version tutorial
Dataset link?
Read description
Hi I have mailed you , could you please look into it
Can I use this in Google colab?
Yes, why not
Even though the project seems great explanation is really bad so in the end it's just not comprehensible... is it English or something else is barrier to comprehension idk but delivery is awful.
Thank you for your feedback. I'll work on improving the clarity of the explanation to ensure better comprehension moving forward
It is sad that you just handwaved the explanation for CTC loss.
Hi, it's not worth explaining ctc loss, because it pure math and only 0.1% of users who use it are interested how it works in depth. There is many sourses that explain how it works step by step if you need
annotation_val.txt and annotation_train.txt im kind of stuck here can you help me with this?
Check it again, it's nothing magical. If you can't solve it open issue on github :)