Google Gemini AI Vision - OCR Text Extraction with Python

Tech Expert Tutorials

Просмотров 1,7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 8 ноя 2024
Description: Gemini AI OCR Python API
In this video we are going to teach you how to setup and extract text and other information from images, using the Google Gemini AI API service. Later, we will show you the accuracy of the output, so please stick around.
Gemini AI has the ability to extract text from images and interpret the contents of the image. This model can take in images and answer questions about them. You can provide images by uploading a file. This model also has many other abilities that we will cover in another video.
For text extraction, Gemini AI uses a technology called Optical Character Recognition, or OCR for short. It analyzes images of text, deciphers the characters, and transforms them into editable digital text.
For image recognition and classification, OpenAI Vision uses LLM technology to interpret what it sees in the image you uploaded.
You can use this model to solve a myriad of problems involving images, documents, chatbots, speech and even writing code.
For example - you are asking users to upload an image of a document for a specific purpose, such as proof of address or age. When the image is uploaded, you can ask Gemini AI what is displayed in the image, what text is included, and what type of document it is. The model will verify if the uploaded document is appropriate and contains the necessary information.
Other examples include extracting data from forms and tables in invoices or receipts, converting handwritten notes, and handling multiple languages in one image.
Want to learn more about AI and its potential applications? Stay tuned for future videos where we explore the fascinating world of AI!
📁 code repo on Github: github.com/Tec...
Related Videos:
▶️ Python, Conda and VSCode Video: • Python Conda and Jupyt...
▶️ Azure OCR Video: • Azure AI Vision API fo...
▶️ GCP OCR Video: • Google Cloud Vision AP...
▶️ OpenAI OCR Video: • OpenAI GPT Vision OCR ...
▶️ Gemini AI OCR Video: • Google Gemini AI Visio...
▶️ AWS OCR Video: • AWS Textract API OCR T...
Related Videos/Playlists:
▶️ Google Cloud Vision API (Part 1): OCR Text Extraction Tutorial - • Google Cloud Vision AP...
▶️ Google Cloud Vision API (Part 2): Object Detection Tutorial - • Google Cloud Vision AP...
▶️ Google Cloud Vision API (Part 3): Landmark Detection Tutorial - • Google Cloud Vision AP...
▶️ Google Cloud Vision API (Part 4): Facial Detection Tutorial - • Google Cloud Vision AP...
▶️ Google Cloud Vision API (Part 5): Label Detection Tutorial - • Google Cloud Vision AP...
▶️ Google Cloud Vision API Playlist - • Google Cloud Vision API
💻 Our channel: / @techexperttutorials
💥 link to subscribe: / @techexperttutorials
▶️ Most recent video: • CSharp Async Await Exp...

Комментарии • 11

@EvitandTM 15 дней назад ⁺¹
Thanks a lot sir🙏
@aldoseba Месяц назад
Is there a way to create a doc with images and OCR from something like a scanned book?
@TechExpertTutorials Месяц назад
creating a text document from a scanned book would be possible, but the accuracy may not be 100%. You would need to edit and correct the mistakes in the output after the OCR is finished. Creating the images would be more difficult.
@aldoseba Месяц назад
@@TechExpertTutorials that's fine, but I'm looking a way to do it in one step, ocr and image recognition to a doc file, then, rearrange and correct it.
@BhrantoPathik 2 месяца назад ⁺¹
Which model is it using for extracting the text from images?
@TechExpertTutorials 2 месяца назад
This is using the Google Gemini large language model version 1.5 to extract the text. This multimodal model includes multiple functions in one model. Find more details here: ai.google.dev/gemini-api/docs/text-generation?lang=python
@BhrantoPathik 2 месяца назад
@@TechExpertTutorials I was trying to do same task using open source models. The google ocr engine tesseract is terrible for such tasks, so I was trying to achieve it using the object detection models and then passing the bboxes to the ocr, but it's performance is not up to the mark. Any suggestion how to proceed with open source models?
@TechExpertTutorials 2 месяца назад
@@BhrantoPathik I have found that open-source models are currently not as accurate as the paid models from OpenAI and Google Gemini. This could change in the future as larger and more robust models are built and made available in open-source. I also tested pytesseract, see this video for details: ruclips.net/video/UBpFPBVlINw/видео.html
@crisostomoibarra1760 Месяц назад
Can you teach us how to use document ai for php? Tysm, i also want to learn how to get the extracted text to fill a form, is that possible?
@TechExpertTutorials Месяц назад
You should be able to take the extracted text from Gemini AI and add that information to a web or text-based form. I don't know if there are any 3rd party tools that would simplify this, but it could be done.
@TechExpertTutorials 2 месяца назад
Please like and subscribe

Следующие

Автовоспроизведение

OpenAI GPT Vision OCR API with Python: Extracting Information from Images