- Видео 2
- Просмотров 55 404
The Code
Добавлен 14 сен 2021
Tesseract OCR - Lesson 2: Training Tesseract for new font
jTessBox Editor: sourceforge.net/projects/vietocr/files/jTessBoxEditor/
Step 1: Make box files for images that we want to train
Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
Eg:tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox
{*Note: After making box files we have to change or modify wrongly identified characters in box files.}
Step 2: Create .tr file (Compounding image file and box file)
Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train
Eg: tesseract train.my.exp.tif train.my.exp0 box.train
step 3: Extract the charset from the box files (Output for this command i...
Step 1: Make box files for images that we want to train
Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox
Eg:tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox
{*Note: After making box files we have to change or modify wrongly identified characters in box files.}
Step 2: Create .tr file (Compounding image file and box file)
Syntax: tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train
Eg: tesseract train.my.exp.tif train.my.exp0 box.train
step 3: Extract the charset from the box files (Output for this command i...
Просмотров: 49 925
Видео
Creating Tesseract OCR using Python: part-1 installing and getting started with Tesseract
Просмотров 5 тыс.3 года назад
This video is the first part of the series to create an OCR using Tesseract in Python. In this video, I will be teaching how to download and install Tesseract and get started with it. Index of Tesseract: digi.bib.uni-mannheim.de/tesseract/
can you share your traindataset
we can use this command for box file instead of external tool.tesseract /Users/sonam/Downloads/train_sample.png train_sample batch.nochop makebox
this was very helpful thank you very much!
damn you''re my hero
This worked perfectly for me! I trained a model to decipher text from the Gravity Falls ARG (I didn't want to do the soul contract by hand). It needs a little fine tuning, but in the end, it gave me the majority of the text correctly! Thank you!
good job bro (y)
thk!, please upload part 3
Very good video. Please continue your channel and make more such videos please.
thank you for the video, what about if i want to make training for multi images, and result one train file ?
i run mftraining command and it only says no shape table file, and then nothing happens
I'm facing the same issue.
facing error 'tesseract' is not recognized as an internal or external command, operable program or batch file.
set the path correctly , search for path in window's search and then in variables , open path file and create new path ( eg:-c:/programfiles/tesseractocr)
Hey, How can I combine two traineddata files into single traineddata file
cám ơn bạn rất nhiều
How can I use this custom trained tesseract model and use it with YOLOv8 to recognize license plate number????? Pls Help
did you find the solution???
@@dalinsixtus6752 No Sir
Is this some sort of joke? You downloaded jTessBoxEditor and then did the whole process in a command line. What the hell is the purpose of jTessBoxEditor then??
To edit the bounding boxes. You can add bounding boxes wherever necessary when trainning for new languages.
You need jbox to correct data because when you train it befor correcting it will give you failure
This helped a lot in understanding the generation process of traineddata. Thank you!
Thanks a lot for the video! Gave up making part 3?! You should do it! Congratulations!
Where is part 3 ?
you saved my code & my day ... thanks ( stdout is a masterpiece )
How can we train the model with some specific user's handwritten data?
Good tutorial, one of the best, thanks!
Note: if shapetable file didn't create, you need to run shapeclustering command to generate for you. example: shapeclustering -F <font_properties file created previusly> -U <unicharset created previusly> <tr file created previusly> or, in windows shapeclustering.exe -F <font_properties file created previusly> -U <unicharset created previusly> <tr file created previusly>
Hey, thanks for your contribution! I still haven't been able to finish the process because, even after running your command, shapetable doesn't seem to generate. It's only generated after I run the next command (step 5), but the other two files in the video are not created. When I try to run the command again, I get an error saying "Failed to read shape table shapetable" Do you know why this may be?
great video, waiting for Lesson 3
when i copy past this command in cmd tesseract train.my.exp0.tif train.my.exp0 batch.nochop makebox it say that it doesn't recognize it
You saved like a week load of work for me!
I am trying to train tesseract in a Linux machine, I am getting segmentation fault in Step 5??
hi im getting error : "APPLY_BOXES: boxfile line 6/25 ((421,1325),(494,1378)): FAILURE! Couldn't find a matching blob" while creating .tr file if any one know how to solve plese provide soluation
do you have an answer to it?
Hello I have a business inquiry. Please DM me.
Thank you, hope have lesson 3~~
Thank you for your video. It was very much useful. Can you please share the next part too?
Hey! Have you done your work on tesseract or doing?
I have an error at the last step to use it to read the image. it says error opening data file. make sure tessdata_prefix environment variable is set to tessdata directory. But I already put the program file\Tesseract-OCR into my path environment variable. Can you help witht his?
I trying follow with this video in step 5 have error: "Warning: No shape table file present: shapetable" What happen with it?
Hey, did you ever figure it out? I'm getting the same error message.
@@samuelbastias3752 I think doing them in adminstator permissions and deleting the older files will fix your issues
This is old way, pre Tesseract 4, not for LTSM network. Classical Indian youtuber
Thanks @ The Code....not all files generated !!! what should be the issue ?
Thanks a lot! Very useful tutorial, and thanks for the material too!
Hi Man, awesome tutorial. Quick question: Struggling with step 5, my tesseract creating only one file (train.unicharset) instead of four as on your tutorial (missing: inttemp, pffmtable, normproto) , so receiving in cmd: Warning: No shape table file present: shapetable Reading train.my.exp0.tr ... Flat shape table summary: Number of shapes = 11 max unichars = 1 number with multiple unichars = 0 on 04:41 can see that you get 3 more lines from cmd.. maybe you can give me some advice?
Issue occurred on Tesseract 5.X.... after installing Tesseract 4.1 issue is not present
@@adamchochowski5357 Thank you so much for following up with the solution! MVP
For multiple images should i do multiple traineddata or only single traineddata. if single means how to train multiple data
Excellent, thank you. At 1:16, an incidental note on pronunciation, the “v” in “converting” is a voiced “f” sound, rather than any “w” related sounds. “v” is positioned next to “w” but that's misleading-they don't sound alike. Their sound production is different. “v” is more closely related to “f". Say the word “fee.” Make and hold the “f” sound. Then, while holding the “f” sound, hum while making the “f” sound. “v” is a vibrating “f”. Regards
Thanks for the tutorial. How do I train data for Urdu and Arabic Languages. What would be the font properties. I have an urdu font and lots of 100s of urdu data in jpg format. No clue where to start how to start.
Thanks for this. I was able to duplicate the process in Linux. However, there was zero improvement in the recognition of my hand writing at all. I don't know if I did something wrong or Tesseeract is that bad lol. Thanks again.
this is what I get when I test the png. what these errors are? C:\Users\Laser\Desktop\Tesseract>tesseract HONEYBEE FONT.png stdout -l train read_params_file: Can't open stdout read_params_file: Can't open l read_params_file: Can't open train Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Error in fopenReadStream: file not found Error in findFileFormat: image file not found Error during processing. ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 0138B858 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatalstm-punc-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 0138B908 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatalstm-word-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 013AC150 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatalstm-number-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 04899FC0 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatapunc-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 013B52A0 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddataword-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 03646348 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatanumber-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 04D43150 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatabigram-dawg) ObjectCache(6AAB5A88)::~ObjectCache(): WARNING! LEAK! object 04D43788 still has count 1 (id \Program Files (x86)\Tesseract-OCR\tessdata/eng.traineddatafreq-dawg)
Hello can you please upload part 2 how to prepare images for better accuracy.
that does this mean? Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica libpng warning: iCCP: known incorrect sRGB profile
cannot find letters on geometric shapes. how can i solve this?
What is your Tesseract version
4.0
Why my Tesseract just reading .tr file but not write the pffmtable, intemp, and normproto?
have u found the solution bro?
i'm having the same problem\
Yes, I use Tesseract v4.0.0 and work fine
use tesseract v4.0.0 and ensure eng.traineddata file present in tessdata folder.
I tried running mftraining but it never ends? Any fix for this?
Nice explanation, Easley understood the steps. Can you share the content /Video to train and use the GD&T (Mechanical Characters).
Hi did you find some good exapmples with GD&T?
Thank you! Finally, I found somebody that explains this for beginners!
It appears that you need tesseract 4.1 running for this tutorial as with 5.0-alpha i couldn't pass the last steps
that's true
@Devdevdevdev idk, the probably can, but you will need a lot of samples to train that thing
@Devdevdevdev how many pages do you train with
@Devdevdevdev yes you can train more, and you probably should
@Devdevdevdev i didn't post any kind of script, i think you are mistaking me with someone, you should watch some kind of tutorial how to generate the training data, first of all, you should have a font. If you don't have a font, which is obvious in the case of hand written stuff, then the only way to generate 5, 10, or 50+ pages would be to make a software, that can cut the predefined rectangle positions, and then generate a page containing randomly spread letters with predefined rectangles containing data which letter it is, if you can program that shouldn't be hard, then generate many pages containing the letters.
Hi, I am getting error while training the data. Could you please tell which tesseract version you are using?
it's in the movie, it's 4.0