You can look into: 1. In Tesseract 3 there is a metadata result which contains a recognized font. Probably it is not super reliable, but it might work for basic fonts and detect bold and non-bold fonts. 2. In Tesseract 4 you can export HOCR output and configure it in a way to get boxes around each character (not sure about Tesseract 3). I am not sure how reliable these boxes are either, but if it is okay, you could use them to have a second algorithm which just classifies whether a single character is bold or not and remove non-bold text from the tesseract output. 3. In case you have precise line boxes before using tesseract, you could also look into training an algorithm which segments the part of the line which is bold, then crop the image and use tesseract only for the bold parts. This would probably the most technical solution, but I think it could work as well.
very clear and informative. thank you for sharing.
@@jhimitchakma2806 Thanks Keep exploring and learning
Can we fetch bold words with tesseract or any other open source api ?
You can look into:
1. In Tesseract 3 there is a metadata result which contains a recognized font. Probably it is not super reliable, but it might work for basic fonts and detect bold and non-bold fonts.
2. In Tesseract 4 you can export HOCR output and configure it in a way to get boxes around each character (not sure about Tesseract 3). I am not sure how reliable these boxes are either, but if it is okay, you could use them to have a second algorithm which just classifies whether a single character is bold or not and remove non-bold text from the tesseract output.
3. In case you have precise line boxes before using tesseract, you could also look into training an algorithm which segments the part of the line which is bold, then crop the image and use tesseract only for the bold parts. This would probably the most technical solution, but I think it could work as well.
hey i want the data for all the fonts in tesseract ,where can i get it?
Go through this link. Might be helpful what you looking for.
tesseract-ocr.github.io/tessdoc/Fonts.html
Thank you for video!
3rd link dont work!
Yes may be i have not tested that out.