How to Train Tesseract OCR Engine 5 on Custom Data

Поделиться
HTML-код
  • Опубликовано: 10 дек 2024
  • ИгрыИгры

Комментарии • 15

  • @SL7Tech
    @SL7Tech  2 месяца назад +1

    Important: The name of your image and ground truth file must match without the extension while preparing the dataset. Otherwise the trainer will throw an error.

  • @aritradeb1935
    @aritradeb1935 День назад

    MOst of my data has two lines. What to do in that case?

  • @inkmaze
    @inkmaze Месяц назад

    I got combine_tessdata failed at 12:39 pls help

    • @SL7Tech
      @SL7Tech  Месяц назад +1

      @@inkmaze can you share the log

    • @inkmaze
      @inkmaze Месяц назад

      @@SL7Tech Sure
      You are using make version: 4.4.1
      combine_tessdata -u ../tessdata//deu_latf.traineddata data/deu_latf/engplus
      process_begin: CreateProcess(NULL, combine_tessdata -u ../tessdata//deu_latf.traineddata data/deu_latf/engplus, ...) failed.
      make (e=2): The system cannot find the file specified.
      make: *** [Makefile:207: data/deu_latf/engplus.lstm-unicharset] Error 2

    • @inkmaze
      @inkmaze Месяц назад +1

      @@SL7Tech Oh I forgot to add Tesseract to path LOL

  • @appsscope2487
    @appsscope2487 21 день назад

    If I need to train in Arabic numbers, can I do it in the same way? because there is no Arabic number dataset to download!!

    • @SL7Tech
      @SL7Tech  21 день назад

      @appsscope2487 you can create dataset yourself and yes follow this procedure for fine tuning. remember to pass language type as RTL.

  • @SidhuOp
    @SidhuOp Месяц назад

    Since pytesseract is terrible with alphanumeric words, can we train it with those kind of datasets

    • @st1np
      @st1np Месяц назад

      true, I've been trying for a long time to train for the Consolas alphanumeric font, but tesseract it's very inaccurate. HELP

  • @markmacharia5187
    @markmacharia5187 19 дней назад

    I ran into this error"$ make training MODEL_NAME=kernsys START_MODEL=eng TESSDATA=../tessdata/ MAX_ITERATIONS=2000 LEARNING_RATE=0.001
    You are using make version: 4.4.1
    tesseract "data/kernsys-ground-truth/image_001.png" data/kernsys-ground-truth/image_001 --psm 13 lstm.train
    No box data found in 'data/kernsys-ground-truth/image_001.box'.
    Failed to read boxes from data/kernsys-ground-truth/image_001.png
    Error during processing.
    make: *** [Makefile:248: data/kernsys-ground-truth/image_001.lstmf] Error 1
    "

    • @SL7Tech
      @SL7Tech  19 дней назад

      make sure that ground truth file is not empty

    • @markmacharia5187
      @markmacharia5187 19 дней назад

      @SL7Tech it is not empty