Document Understanding with UiPath's Intelligent OCR - Full Tutorial

Поделиться
HTML-код
  • Опубликовано: 9 июл 2024
  • Understand how to use UiPath Intelligent OCR for data extraction and processing of structured and unstructured documents. Also in depth understanding of a various Document Processing steps :
    - Defining Document Taxonomy with Taxonomy manager
    - Classifying Documents using Keyword based Classifier
    - Extracting Data using Form and ML based Extractors
    - Human Validation with Present Validation Station
    ⏩Fast Forward to:
    00:00 - Start
    00:45 - Types of Document - with fixed and varying format
    01:37 - Basic Steps of any Document Processing
    01:49 - What is Document Taxonomy?
    05:13 - Types of Data Extraction methods in UiPath?
    09:07 - How to use Taxonomy Manager to define Document Taxonomy?
    14:47 - Digitizing Document
    18:15 - How to Classify Documents using Keyword based classifier?
    25:14 - Intro to UiPath's data extraction Scope
    26:40 - How to setup UiPath's Form Based Extractor?
    31:41 - How to setup UiPath's Machine Learning Based Extractor?
    34:31 - How to setup UiPath's Present Validation Station?
    35:13 - How to Export Extraction Result?
    🔔Subscribe:
    / @botbotgo4902
    ▶️ Intelligent Automation with UiPath Playlist:
    • Intelligent Automation...
    📁Git Repository:
    gitlab.com/botbotgo/DocumentP...
    📄Intelligent OCR Documentation:
    docs.uipath.com/activities/do...
    🎵Music:
    www.bensound.com/

Комментарии • 96

  • @botbotgo4902
    @botbotgo4902  4 года назад +1

    I have update the workflow by adding the Train Classifier Scope - This would allow you to train keyword based classifiers in cases where they are unable to classify the document.

  • @nobi6139
    @nobi6139 3 года назад

    Excellent video! Very detailed. It took the complexities out of document understanding. I learnt a lot from this video. Thanks heaps!

  • @ezrateferra8146
    @ezrateferra8146 3 года назад +1

    Great explanations! Thanks a million

  • @deathsquad383
    @deathsquad383 3 года назад

    Great tutorial, thank you for posting

  • @chandrayeddala6673
    @chandrayeddala6673 3 года назад

    Superb explanation, it is clear and clean explanation. Thank you.

  • @MateusLyra1991
    @MateusLyra1991 4 года назад

    Hello, awesome video. Many thanks. I am from Brazil and this was really helpfull. Looking forward for more videos.

    • @botbotgo4902
      @botbotgo4902  4 года назад

      Thanks for the feedback Mateus Lyra. Please comment if there are some specific video you are looking for

  • @ateruel84
    @ateruel84 3 года назад +2

    Amazing explanation, congratulations!

  • @nehaaggarwal8001
    @nehaaggarwal8001 Год назад

    thank you for this very informative video

  • @anushnayak6080
    @anushnayak6080 3 года назад

    Amazing content.Looking forward for more videos which will help us.
    Thanks

  • @RaniSingh-dx9tt
    @RaniSingh-dx9tt 3 года назад

    Very informative!!

  • @andersjensenorg
    @andersjensenorg 4 года назад +2

    Hey Anurag, awesome work you are putting in 😊👍💪 Kind regards, Anders

  • @visumanelli9360
    @visumanelli9360 Год назад

    Very Nice

  • @renukadevi1829
    @renukadevi1829 3 года назад

    Beautyful video, please make note such videos.

  • @JS-zm5se
    @JS-zm5se 3 года назад

    Excellent Explanation

  • @mscoder9902
    @mscoder9902 3 года назад

    Thank you

  • @larryding7618
    @larryding7618 2 года назад

    oh my gosh. i hope you can upload more videos.

  • @omololasamson5877
    @omololasamson5877 Год назад

    Thanks for the good job you're doing. Please how did you get the endpoint or is it general for everyone?

  • @tejasvimangal2184
    @tejasvimangal2184 3 года назад

    Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what is the use of texonomy.json?

  • @prashantrai5911
    @prashantrai5911 3 года назад +1

    Where you get this end point while using machine learning extractor.that point I don't understand can u eloborate this point more

  • @allthecommonsense
    @allthecommonsense 2 года назад

    31:09 I don't see a "due date" on that invoice, yet you seem to have configured a custom area and edited that process out. Seems to me like a mistake.

  • @sassydebbie
    @sassydebbie Год назад

    Hi, good day. Please I can't seem to download those packages you mentioned. Do you have any idea how I can work it out?

  • @viralesvideos
    @viralesvideos 3 года назад

    A question how do I do so that it no longer shows the percentage or the "validation station" screen because every time it says to select the area 96% and it always takes it well? "Present Validation Station"

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      if you are so confident about the extraction confidence percentage then no need to use present validation activity in the flow, Directly you can check in the export result in the excel.

  • @swatikarot7486
    @swatikarot7486 4 года назад +1

    great video. I had a question, why does the message box pop up twice with the outputDT string?

    • @botbotgo4902
      @botbotgo4902  4 года назад

      Hello Swati!
      the dataset that is created from the export extraction result is a collection of DataTables. This collection has has two DataTables - *Simple Field* and *Simple Field Formatted* This is the reason you are getting two message boxes. To check the names of the dataTables yourself you can add a message box in the for each loop with "table.TableName"

    • @swatikarot7486
      @swatikarot7486 3 года назад

      @@botbotgo4902 Thanks for the response. I did implement the for each loop with "table.TableName" and saw formatted and unformatted output. But if I unchecked the ‘FormatValuesIfPossible’ option in data extraction scope, then there will be duplication of data. How can I get only one set of data here?

  • @sktanaka
    @sktanaka 3 года назад

    Great video, thanks. One question: if you are satisfied with the results, can you remove the "Present Validation Station" command so it does not prompt the user everytime ? I have dozens of invoices to be processed in an unattended machine.

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      yes you can remove the present validation station in order to skip the human in loop.

  • @MrAyubX
    @MrAyubX 3 года назад

    Great video. What is not apparently clear for me is the documentPath variable you specified in the digitize document activity. I do not think you showed how you set that up, though I assume it is a variable that has the path of the file, correct ? If yes, alternatively we could also specify the file path directly in the Document path without creating the variable documentPath ? Thank you

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      Yes you understanding is correct. either we can directly specify the path in document path place or we can create a variable for same and pass it into specify area.

  • @tibyanralibi
    @tibyanralibi 3 года назад

    Hi, this is a good video. Actually I have question related to the intelligent ocr activities license. Is the activities free or must pay for the licenses. Thank you

  • @premacharles8610
    @premacharles8610 3 года назад +1

    Anurag, can you please tell me how to extract line items from the invoice along with these details. I want to write it to excel preferably for a case when each document might have different number of line items

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      Hi Prema, You can go with Form based extractor in order to extract the line items from table.

  • @ronak7480
    @ronak7480 3 года назад

    hey anurag,
    thank you for the wonderful explanation.
    i have one issue with the invoice date, its not comingb proper in csv file.
    its coming like : Key,Value "Month","5".
    my actual date in pdf is : May 26/20.
    it would be great if you could help on this..

    • @botbotgo4902
      @botbotgo4902  3 года назад

      Can you please check if the date is available in the text coming out of the digitize document activity.
      If not then it would not be possible to extract the date from any extractor. Then you might have to try with other OCR engines.
      If the date is available then you need to do some trial and error with different extractor activities.

  • @zoeyuwang6123
    @zoeyuwang6123 2 года назад

    Could you share the hole project that I want to learn carefully.

  • @aryashrivastav6187
    @aryashrivastav6187 3 года назад

    What if our pdf have lots of pages and lots of pdf can it extract specific data?

  • @shankota5547
    @shankota5547 3 года назад

    Hello @botBotGo,
    That was a great explanation.currently i am able to extract a single page with specific extraction fields,so how to loop through all pages in a pdf file with similar invoices ?

    • @botbotgo4902
      @botbotgo4902  3 года назад

      if you are using community version then u can only process documents with max 2 pages at once.
      one work around would be use some uipath pdf activities to breakdown your single pdf file into multiple pdf files and then loop through them.

  • @gauravbatra10
    @gauravbatra10 5 месяцев назад

    Hi Bro, I am not able to select 5 information on Page 1. I am only able to select one. Are you using shift or ctrl ki to select 5 information... I a,m working on 2 page PDF.Please suggest. I am waiting

  • @issacpaul9846
    @issacpaul9846 3 года назад

    Hey will u do a video on regex based extraction

  • @prasadparalikar954
    @prasadparalikar954 3 года назад

    Hello sir,
    I also want to extract the items along with its specified cost in the excel file. Can i do that?
    Please help

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      You can use form based extractor for extract the line items in the table

  • @zoeyuwang6123
    @zoeyuwang6123 2 года назад +1

    I made a process based on your video, but I reported an error in one place:
    Data Extraction Scope: Index was outside the bounds of the array.
    And I cant fix it.
    Can you help me?

    • @savagestroke4943
      @savagestroke4943 2 года назад

      when defining the keywords, make sure that you typed correctly "invoice" , "receipt", "walmart"

  • @rajatdhammi
    @rajatdhammi 3 года назад

    Hi, while setting up the form extractor, you manually specify the location of document of 2 image (choose 2.jpg) , but at last you change document path location from 2 to 3.jpg.
    If we are manually specifying the location , how the form extractor fetches correct information!!

    • @botbotgo4902
      @botbotgo4902  3 года назад +1

      hey,
      file that you uploaded in the form extractor is just for generating a template. So no matter what document you read it will still work till the time the structure or the positions of various elements in the document remain same.
      Having said that if you try to extract data from an invoice with different structure, the extraction wont work.

    • @rajatdhammi
      @rajatdhammi 3 года назад

      @@botbotgo4902 Ok got it
      One more query, when i tried the same with create doc validation action and wait for validation action and comment out present validation it gives me that error
      "An extension of type 'UiPath.Activities.Contracts.Persistence.IPersistenceBookmarks' must be configured in order to run this workflow."
      ( I have created the storage bucket in orchestrator)

  • @chongyihyang309
    @chongyihyang309 4 года назад

    Hi may I know why you used both Form Extractor and ML Extractor? And also why does the workflow produce 2 sets of the same data table? What do i do if i only need 1.

    • @patilrc
      @patilrc 4 года назад

      Dataset is collect of Datatables, you can try Dataset.Tables(0) and check

    • @botbotgo4902
      @botbotgo4902  4 года назад +1

      Hey Sorry for replying late!
      1. *Why i used both extractors* - I wanted to show that it is possible to combine extractors. It could happen that some attributes cannot be accessed by one of the extractors and in such case the other extractor will be used. Also the order of extractor usage is from left to right, that is, if the left most extractor is not able to get a particular attribute (or the confidence score is less than set threshold) only then the next extractor would be used. Also with configure extractor you have the possibility to decide which attributes are to be accessed by which extractor.
      2. You get a list of tables (also know as *DataSet* ) and always take the first one from the list. -> *Dataset.tables(0)*

    • @chongyihyang309
      @chongyihyang309 4 года назад

      I see. Thanks for the help

  • @KiranPudi
    @KiranPudi 3 года назад

    How can we use Intelligent keyword classifer in Classify Document scope??

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      Intelligent Keyword Classifier for handwritten documents not for unstructured documents

  • @shalinisingh2816
    @shalinisingh2816 4 года назад

    Page 1 has less than 5 selected words as Page Matching Information. Please select at least 5 words.
    This notification is appearing on the screen when i am creating template. It gets pop up again and again even after extracting the elements.

    • @botbotgo4902
      @botbotgo4902  4 года назад

      Hello Shalini!
      You need to select 5 keywords on the page.
      Please watch from 30:00

    • @shalinisingh2816
      @shalinisingh2816 4 года назад

      @@botbotgo4902 I did it in the same way. Let me recheck again if I am doing any mistake

  • @souravsingh4305
    @souravsingh4305 4 года назад

    Hi Anuraag .This video is very important for RPA beginners. Thank you for this. But I was facing some issue while creating a template after a custom supply to the keyword I'm extracting after configure I can see a long red color error. That even we cannot read.

    • @botbotgo4902
      @botbotgo4902  4 года назад +1

      Hello Sourav!
      I am sorry but I cannot understand what you mean

    • @souravsingh4305
      @souravsingh4305 4 года назад

      @@botbotgo4902 that's cool Anuraag I could solve the error. Is this solution is applicable for images invoices also?

    • @souravsingh4305
      @souravsingh4305 4 года назад

      @@botbotgo4902 Hello Anurag Actually I'm using your instructed workflow but, It is not extracting the values always.

    • @botbotgo4902
      @botbotgo4902  4 года назад +1

      @@souravsingh4305 hello saurav!
      So where are you facing problems? I mean with which extractor are you working?

    • @souravsingh4305
      @souravsingh4305 4 года назад

      @@botbotgo4902 I'm working with form extractor. Although I have invoices that includes pdfs, receipts, images , scanned pdf invoices etc of all types which extractors I should use to get the values from all types of invoices

  • @shalinisingh2816
    @shalinisingh2816 4 года назад

    i am not gettgin omnipage OCR in my activity panel

    • @botbotgo4902
      @botbotgo4902  4 года назад

      Hello Shalini,
      You need to install this package before you can use it.
      To Install go to 06:37
      1. go to Manage packages in Studio
      2. click on All Packages
      3. Search for UiPath.OmniPage.Activities
      4. Install it

    • @shalinisingh2816
      @shalinisingh2816 4 года назад

      @@botbotgo4902 Yes, It is. Thanks for your prompt response. :)

  • @aakashm.2495
    @aakashm.2495 4 года назад

    Hey Anurag. Thanks for the video.
    How to perform this on multiple pdf at time?

    • @patilrc
      @patilrc 4 года назад

      you can use for each loop and provide the folder path where the multiple PDF files are there

    • @botbotgo4902
      @botbotgo4902  4 года назад

      Hey Year down!
      Sorry for replying late - I have made a video where i am solving a RPA challenge by extracting data from multiple pdfs - ruclips.net/video/56AOiixQPKY/видео.html
      Let me know if this what you were looking for.

    • @aakashm.2495
      @aakashm.2495 4 года назад

      @@botbotgo4902 data extraction scope index was outside the bounds of the array. I am facing this error

    • @aakashm.2495
      @aakashm.2495 4 года назад

      @@patilrc data extraction scope index was outside the bounds of the array.I am facing this issue

    • @botbotgo4902
      @botbotgo4902  4 года назад

      @Year Down this is mainly happening because Classifier is not able to classify your document. You would have to validate if the classification worked and if did not work then you need to extract data manually in present validation station. In order to check if classification worked
      1. After the classification scope activity add an *IF Activity*
      2. In The *IF Activity* check for condition if *classificationResult.Any* is True
      3. In the true section move your *data extraction scope*
      4. in the false section add an *assign activity* and assign extractionResults = Nothing

  • @laxmipriyapradhan1704
    @laxmipriyapradhan1704 2 года назад

    Sir, can you please do it for pan card and aadhar jpg file ??because I have tried lot of time but didn't get and also when i have give the whole folder path it's showing error why is it so I don't know .... Please please help me to do the task where we have some folder of different candidate where each candidate have their own pan and aadhar card image from that need to extract the particular field like aadhar no.,pan no. And store in a file ... If u can store in MySQL that is very good for me but please sir can you do for whole folder to provide in the documentPath variable where each candidate have their own aadhar and pan card. Please i need it please do this.

  • @ponnusamyk5258
    @ponnusamyk5258 3 года назад

    Man link for invoice file download

  • @sampledemo2947
    @sampledemo2947 4 года назад

    Hello!! Thanks for the video. This is Rohit S. Lanjewar. Please help me how I can change confidence percentage of each field of Invoices in Present Validation station using Intelligent Form extractor in Document Understanding using UiPath.

    • @botbotgo4902
      @botbotgo4902  4 года назад +1

      Hello Sample Demo!!
      sorry for replying late. The confidence score is something that you set for a kind of extractor and if any attribute needed to be extracted by this extractor is below this score that field is not extracted. In such cases you can try using combination of extractors, where in if one extractor fails then the next extractor would be used. And if all fail then the user has to explicitly enter it.
      Did I answer your question?

  • @sushantshiwakoti5578
    @sushantshiwakoti5578 4 года назад +1

    Can you do it with hand written documents,it will be helpful for everyone. Thank you

    • @botbotgo4902
      @botbotgo4902  4 года назад

      Hello Sushant!
      For hand written documents with fixed formats (example - bank account opening form). You can use intelligent form extractor.

    • @sushantshiwakoti5578
      @sushantshiwakoti5578 4 года назад

      @@botbotgo4902 Thank you

    • @ayahabuhantash5948
      @ayahabuhantash5948 4 года назад

      @@botbotgo4902 is intelligent form extractor the extractor we used in this video? thanks a lot

  • @allthecommonsense
    @allthecommonsense 2 года назад

    Also... seems like a mistake to ASSUME that classification result will only match 1 document type. You never check how many matches it got, and *assume* it's always classificationResult(0)

  • @WebHNT
    @WebHNT 2 года назад

    Can you share me a slide ? Video is interesting and helpful. Thank you !!!

  • @allthecommonsense
    @allthecommonsense 2 года назад

    You overcomplicated the classification keywords by using "Add a new set" instead of just typing the right syntax into the first set to add multiple keywords. No need to have more than 1 set in these examples.

  • @umaramnath1961
    @umaramnath1961 2 года назад

    Hello! I followed your tutorial. I am trying to extract data from the receipt using ML extractor. I used "du.uipath.com/ie/receipts" as the end point but I am not getting the dropdown under the ML extractor while defining the attributes of the document to be extracted. Can you please help me solve this?

  • @tejasvimangal2184
    @tejasvimangal2184 3 года назад

    Thanks for a great informative video. Just had a question in mind. If we have to define a keyword.json file for document classification then what us the use of texonomy.json?

    • @mohanrajs8832
      @mohanrajs8832 3 года назад

      Taxonomy for identify the fields on what needs to extracted and same is going to extracted by bot using intelligent OCR