Extract Text From An Image Using Java | tesseract OCR | JavaTalent | Java

Поделиться
HTML-код
  • Опубликовано: 10 ноя 2024

Комментарии • 8

  • @jhimitchakma2806
    @jhimitchakma2806 25 дней назад +1

    very clear and informative. thank you for sharing.

    • @javatalent
      @javatalent  25 дней назад

      @@jhimitchakma2806 Thanks Keep exploring and learning

  • @kushprakash123
    @kushprakash123 8 месяцев назад +1

    Can we fetch bold words with tesseract or any other open source api ?

    • @javatalent
      @javatalent  8 месяцев назад

      You can look into:
      1. In Tesseract 3 there is a metadata result which contains a recognized font. Probably it is not super reliable, but it might work for basic fonts and detect bold and non-bold fonts.
      2. In Tesseract 4 you can export HOCR output and configure it in a way to get boxes around each character (not sure about Tesseract 3). I am not sure how reliable these boxes are either, but if it is okay, you could use them to have a second algorithm which just classifies whether a single character is bold or not and remove non-bold text from the tesseract output.
      3. In case you have precise line boxes before using tesseract, you could also look into training an algorithm which segments the part of the line which is bold, then crop the image and use tesseract only for the bold parts. This would probably the most technical solution, but I think it could work as well.

  • @sukeshpandey9904
    @sukeshpandey9904 6 месяцев назад +1

    hey i want the data for all the fonts in tesseract ,where can i get it?

    • @javatalent
      @javatalent  6 месяцев назад

      Go through this link. Might be helpful what you looking for.
      tesseract-ocr.github.io/tessdoc/Fonts.html

  • @runrunning4359
    @runrunning4359 6 месяцев назад +1

    Thank you for video!
    3rd link dont work!

    • @javatalent
      @javatalent  6 месяцев назад

      Yes may be i have not tested that out.