CPU Micro Architecture Levels Are Not Real

Fedora Linux Turns Over A New Leaf

Optical Character Recognition (OCR)

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

Jason Segel Breaks Down His Most Iconic Characters

The Breakfast Club Reacts To Jay-Z’s Attorney Saying Him & Diddy Aren’t Friends + More

Tesseract OCR: Extract Text From Any Image

Brodie Robertson

Просмотров 35 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 дек 2024

Комментарии • 41

@M19pickles 2 года назад ⁺¹⁰
I have used tesseract a bit for digitizing recipes from recipe books. When it does not give good results on a first past I have found that altering the image can help a lot. Altering the image to black and white, altering the contrast, and even enlarging the image can all improve results.
@BrodieRobertson 2 года назад ⁺¹
Sanitizing the image in anyway possible will always improve the results with anything like this
@code8986 2 года назад ⁺¹³
I've been a subscriber to your channel for a couple of years and I like all of your videos, but this is my favorite kind of video -- about a *useful*, open source tool or utility, not specific to Arch linux, not about gaming, not about drama in the industry. I'm not saying the other kinds are bad or that you shouldn't make them, but I like this kind the best.
@BrodieRobertson 2 года назад ⁺⁴
I have many interests but this is one of them
@Fooftilly 2 года назад ⁺¹⁴
So if Google is maintaining this project, is Google Lens just a front-end for Tesseract?
@vaisakh_km 2 года назад
is it 😯
@Fooftilly 2 года назад
@Mr. Rich B.O.B Must be that they have done some sort of optimization to detect language before ocr-ing whole page. I have also seen same mistakes in tesseract as in Google Lens.
@sazk4000 2 года назад ⁺²
i use tesseract in python to read text and train the machine. good job explaining this 👏
@t01 2 года назад ⁺¹
A little script can be done alongside a screenshot utility to get OCR from screenshots directly to the clipboard
@rcht958 2 года назад
Haha, I did something similar with pytesseract back when I used windows. I didn't find it useful a lot though
@muctebanesiri 4 месяца назад
you have video about every topic. that's awesome.
@dergeneralfluff 2 года назад ⁺²
I started using tesseract for a project to gather the text off my memes hosted on my personal szurubooru (In an attempt to be able to search for set text, so you are able to actually find stuff within the thousands of images).
It has been very hit or miss, sometimes it gets text right down to the punctuation, other times it gets nothing, on low res bad contrast images were I think it has no shot it gets it, on clean images it gets nothing. Sometimes doing crazy image manipulation helps, sometimes unmodified is best.
What I can say is that handwritten Latin letters are impossible for it, so manga scanlations text is just blank for it, at least with the English language setting
@negirno 2 года назад
It also doesn't support formatted text like bold, italic etc. Honestly, I usually better off with just transcribing the image text directly by typing myself with the keyboard.
@atomixhawk 2 года назад ⁺¹
I use a keybinding to invoke a script which takes the screenshot of an area, pipe it to tesseract and copy the resulting text contents to the clipboard.
@khaibaromari8178 9 месяцев назад
the sad thing about pytesseract is it works as long as the background of your image is of semi-color, other than that it would mess up everything.
@ELHASSANEMOUMADARFAK Год назад
Anyone, know how could I add a second language in the same command line? I tried the next command and it doesn't work: tesseract filename.jpeg - -l ara[+spa] filename.txt
@EastEndKeith Год назад
Thanks for the helpful video Brodie. Do you know if you can use Tesseract to convert a non-OCR'ed PDF into a PDF that contains OCR'ed text?
@larry_the 2 года назад ⁺²
Have you watched Gabriel Dropout?
@BrodieRobertson 2 года назад ⁺²
I have it's great
@damarh Год назад
OH lol, something just hit me, Google lens uses their own Tesseract OCR for extracting text and send it to your PC where you are logged in with your google account.
@davidr2421 2 года назад
Excellent. I made the mistake of writing a couple thousand small notes in the stock Samsung notepad on my phone, and it turns out the garbage developers only allow you to bulk export them as PDFs instead of plain text. This will come in handy.
@billeterk 2 года назад ⁺²
Pdftotext will work better if there’s a text layer. If not… that’s nuts!
@BrodieRobertson 2 года назад
Are they saved as handwritten notes in the pdf or convert into a font?
@davidr2421 2 года назад
They're typed notes, not handwritten. Surely there's a plain text representation stored somewhere, but the program doesn't allow you to export it and they're not accessably located as text files anywhere I can find.
I'm not sure how the pdfs are constructed but I'll check out pdf to text conversion too and see if that works.
@someonestolemyname 2 года назад
This seems fairly nice for searching tango. Maybe you should also check if it can do well checking the words 1 by 1, perhaps with some other ways of framing them. I am curious how it works on Middle Eastern languages like arabic and hindi though.
@solidhyrax 2 года назад
Are you able to input a URL instead of a local file on your PC? This would be very useful.
@solidhyrax 2 года назад ⁺¹
OK, I just checked, and it does work with URLs. Awesome.
@patrickmclaughlin6013 2 года назад
Any one know how to set xsane to use tesseract?
@vaisakh_km 2 года назад
😔 still have to use google lens...
@mattaku9430 Год назад
Google can't just repeat that, google drive to google docs conversion beats tesseract
@mskiptr 2 года назад
Reupload?
@BrodieRobertson 2 года назад ⁺¹
Maybe a similar thumbnail
@mskiptr 2 года назад
@@BrodieRobertson I just remember first learning about tesseract from some video, and was pretty sure it was yours
@BrodieRobertson 2 года назад
@@mskiptr maybe I mentioned it but I know I never did a video on it
@uksuperrascal 2 года назад
See your odyssey tips and tricks for my comment
@YannMetalhead 2 года назад
Good video.
@ondeexistirumestatistaeues9566 2 года назад
argh, google, no, thanks
@bologna3048 2 года назад
I'm a huge japan-fan, love the culture, the food and people... but anime/weebs? Cringe.
@leoliu2079 2 года назад
I found use --oem=1 helpful, it forces to use the new ml model which helps a lot of cases
@LNDFHACKER 2 года назад ⁺¹
Take a look at ocrmypdf
@uksuperrascal 2 года назад
See your odyssey tips and tricks for my comment

Следующие

Автовоспроизведение

CPU Micro Architecture Levels Are Not Real

CPU Micro Architecture Levels Are Not Real

Fedora Linux Turns Over A New Leaf

Fedora Linux Turns Over A New Leaf

Optical Character Recognition (OCR)

Optical Character Recognition (OCR)

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

Jason Segel Breaks Down His Most Iconic Characters

Jason Segel Breaks Down His Most Iconic Characters

The Breakfast Club Reacts To Jay-Z’s Attorney Saying Him & Diddy Aren’t Friends + More

The Breakfast Club Reacts To Jay-Z’s Attorney Saying Him & Diddy Aren’t Friends + More

Blox Fruits ALL Changes in Dragon Rework Update

Blox Fruits ALL Changes in Dragon Rework Update

Using Tesseract-OCR to extract text from images

Using Tesseract-OCR to extract text from images

Ditch Virtualbox, Get QEMU/Virt Manager

Ditch Virtualbox, Get QEMU/Virt Manager

Automatic OCR Receipt & Invoice Parsing in Python

Automatic OCR Receipt & Invoice Parsing in Python

Browser Choice Alliance Goes To War With Microsoft Edge

Browser Choice Alliance Goes To War With Microsoft Edge

Train tesseract model on custom dataset (Arabic numbers)

Train tesseract model on custom dataset (Arabic numbers)

Switch to these open source apps if you're stuck on Windows or Mac OS!

Switch to these open source apps if you're stuck on Windows or Mac OS!

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?

Linux Kernel Puts An End To ReiserFS

Linux Kernel Puts An End To ReiserFS

Tesseract OCR - Lesson 2: Training Tesseract for new font

Tesseract OCR - Lesson 2: Training Tesseract for new font

Заставили поворачиваться задние колеса. Строим новый ЛуАЗ

Заставили поворачиваться задние колеса. Строим новый ЛуАЗ

24 часа в ЗАБРОШЕННОМ ДОМЕ с ПРИВИДЕНИЯМИ | Милана Некрасова, Лизогуб, Лера Симка, Туров, Вирсавия

24 часа в ЗАБРОШЕННОМ ДОМЕ с ПРИВИДЕНИЯМИ | Милана Некрасова, Лизогуб, Лера Симка, Туров, Вирсавия

Что ЭТО ЗНАЧИТ?

Что ЭТО ЗНАЧИТ?

🔴Шокирующие подробности убийства генерала Кириллова: кто его убил?

🔴Шокирующие подробности убийства генерала Кириллова: кто его убил?

РАЗДАЛИ 5 МИЛЛИОНОВ РУБЛЕЙ за победу в Медиалиге! // Игроки недовольны премиями 😡

РАЗДАЛИ 5 МИЛЛИОНОВ РУБЛЕЙ за победу в Медиалиге! // Игроки недовольны премиями 😡

ЕКАТЕРИНА ШУЛЬМАН: когда закончится февраль, о Трампе и спорах в оппозиции

ЕКАТЕРИНА ШУЛЬМАН: когда закончится февраль, о Трампе и спорах в оппозиции

I tricked my girlfriend's mom 😳👀@isabellaafro #shortvideo #funny #shorts

I tricked my girlfriend's mom 😳👀@isabellaafro #shortvideo #funny #shorts

Малыши Скорпионы слезли с мамы и хотят кушать! Что делать? 😲

Малыши Скорпионы слезли с мамы и хотят кушать! Что делать? 😲