Process Complex Tables with AI Builder and Form Processing

Поделиться
HTML-код
  • Опубликовано: 1 авг 2024
  • This video shows you how to use tag complex tables in a Form Processing model and train the model to understand how to process them. For this demo, I created a sample form that doesn't have a standard tabular structure to it. I walk you through how to create a Form Processing model and use the new tagging feature for tables.
    I posted the sample training set used in this demo on GitHub below,
    github.com/SteveWinward/Power...
    Details on this feature can be found below,
    flow.microsoft.com/en-us/blog...
    Video timeline below,
    Intro: (0:00)
    Sample Form with Complex Table: (0:49)
    Create the Model: (1:27)
    Create a Collection: (2:20)
    Tag the Documents: (2:58)
    Test the Model: (5:14)
    Publish the Model: (5:59)
    Create a Sample Flow: (6:10)
    Limitations: (10:13)
    Summary: (11:16)
  • НаукаНаука

Комментарии • 23

  • @amrkalammansoori4411
    @amrkalammansoori4411 21 день назад

    Hi I am your 1,000 th subscriber :). Btw the video is very informative and addressed my needs.

  • @mannymorales977
    @mannymorales977 3 года назад +1

    Hey, Steve - even with limitations, this is amazing! Thanks so much for sharing!

  • @annapurnabonakurthi9174
    @annapurnabonakurthi9174 3 месяца назад

    Hi,Thanks for the video.Is it possible to add alpha numeric data variable in our custom model in Power Automate.Thanks

  • @vasudevaacharya
    @vasudevaacharya 3 года назад +1

    Thank you for the video, it was quite informative. I have one question - Is it possible to retrain the model which we have already created?

    • @SteveWinward
      @SteveWinward  3 года назад

      Yes you can! Check out the details here.
      docs.microsoft.com/en-us/ai-builder/manage-model#retrain-and-republish-existing-models

  • @rameshbabuc5981
    @rameshbabuc5981 4 месяца назад

    Thanks Steve for this video, one quick question - Is it possible to read table rows content continuing from Page 1 to page 2. My use case is below
    I need to extract information in tabular format from order confirmation pdfs received. Each pdf has multiple items and each item will have a Name, description, Vendor and delivery date.
    So the table will have four columns: Name , description, Vendor, Delivery Date with each row representing an item.
    The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. Example : Description in the table continuing in the page 2 from page 1 bottom , So unable to tag these rows which is continuing from page 1 to page 2.

    For example: if this is the pdf
    -----some text--------------------------------------------
    -----some text---------------------------------------------
    code: 1
    description: this is first item
    Vendor: XYZ1
    delivery date: 12.01.2024

    code: 102
    description: this is second item
    Vendor: XYZ2
    delivery date: 13.01.2024

    code: 103
    description: this is third item

    -------page 1 ends here---------


    -------page 2 begins here--------
    description(Continuing from Page): this is third item Continuing
    Vendor: XYZ3
    delivery date: 14.01.2024

    code: 104
    description: this is fourth item
    Vendor: XYZ4
    delivery date: 15.01.2024

    code: 105
    description: this is fifth item
    Vendor: XYZ5
    delivery date: 16.01.2024

    ---------some text here--------------------------------
    ------------------------------page 2 ends----------------------
    ------------------------------pdf ends----------------------------


    The document cannot be tagged correctly using custom model when page 1 content - Description is continuing on Page 2 . For the above document, the tagged tables look like this
    Code Description Vendor Delivery Date
    101 this is first item XYZ1 11.01.2024
    102 this is second item XYZ2 12.01.2024
    103 this is third item XYZ3 13.01.2024

    Code Description Vendor Delivery Date
    Some text are
    continuing
    from page 1
    104 this is fourth item XYZ4 14.01.2024
    105 this is fifth item XYZ5 15.01.2024

    • @maksymdehtiar2564
      @maksymdehtiar2564 2 месяца назад

      did you find solution for this limitation? i am currently facing the same problem

  • @TheKrutikapadia
    @TheKrutikapadia Год назад

    Hi Steve, I have question if table is credit card transactions and they are in more than one pages of pdf and it may vary client to client means number of pages ..will AI Builder model work for that ?

  • @KrazyMO
    @KrazyMO 2 года назад

    How would you add this information to a data base?

  • @banihas22
    @banihas22 3 года назад +1

    Hey Steve, using "form processing" I trained a model but when I go to create a cloud flow I don't see the option to "process and save from forms". Any ideas? This is a big issue since it wouldn't let me select my model. Thanks for the content!

    • @SteveWinward
      @SteveWinward  3 года назад +1

      Hmm. I think you have to create Flows that are in a solution to be able to access the AI Builder actions in Power Automate. I’m pretty sure you also need to be in an environment that has a Dataverse database provisioned.

  • @limychelseafc
    @limychelseafc 3 года назад +1

    Hi Steve, do we have to train each row on the table (even if we have 100 rows?)

    • @SteveWinward
      @SteveWinward  3 года назад

      It depends. If your table is a complex table like described in this video, then yes you do. If you have a more simple table structure, you do not have to tag the individual rows (see the link below on how to do this).
      docs.microsoft.com/en-us/ai-builder/create-form-processing-model#tag-tables

  • @gusgemmiti847
    @gusgemmiti847 3 года назад +1

    Hi Steve, I'm trying to train on extracting data from Sales Orders. The tables are, as you described, not simple. I've followed your steps and completed the training. However, in the testing phase it doesn't recognize all the tables in any given document, i.e., tables exist on separate pages and are not well segmented by page-breaks.
    Anyway, my question is how can I manually fix the model so that it correctly chooses the tables per page, i.e., I can possibly use the number of pages to ID how many tables should be per document. Also, some text does not seem to be resolving on both the training sets and Test sets.
    Any help would be appreciated.

    • @SteveWinward
      @SteveWinward  3 года назад

      Hey Gus. You are running into some of the limitations with the feature as it is today. I tried outlining those limitations in this section: ruclips.net/video/TULjNIznJYk/видео.html. I think you would want to have different training sets: one with tables on one page, some with tables on multiple pages. Each new page would be its own table that you have to combine after the fact. If actual rows are spanning two different pages, that will be a lot more challenging to resolve. Hope this helps.

    • @gusgemmiti847
      @gusgemmiti847 3 года назад

      @@SteveWinward Hi Steve, Thanks for the quick response. I actually tried splitting the documents into single pages and found keys in the docs that I can use to re-combine the data. However, I'm getting OCR issues even using single page in both training and testing scenarios. I actually posted a question with a snapshot of a page here: powerusers.microsoft.com/t5/General-Power-Automate/PDF-quot-Forms-Processing-quot-ocr-missing-text/td-p/915463
      I'm also trying to convert the PDF's into Word documents first and then training the system. However, here I'm running into situations that the conversion is not always handling barcodes that exist in the PDF's.
      I really want this to work. Can you prpahs give some guidance here?

    • @SteveWinward
      @SteveWinward  3 года назад

      I sent your post over to someone in the product group to take a look. They posted a response to your forum question. Hope that helps.

  • @jodybryan4683
    @jodybryan4683 3 года назад +1

    Hi Steve, I have a trained model to identify 300+ fields from a PDF file. I would like to get the value of those fields to an excel file. When I try a flow using an excel file save on SharePoint I got this message when I try to save the flow? "The dynamic schema response from API 'commondataserviceforapps' operation 'GetPredictionSchema' is too large, only schemas with at most '1024' properties are supported" Is there a limit on the number of fields one can extra using the AI model. I have another model which has only 10+ fields that work ok. I need all 300+ fields is there a way around this issue?

    • @SteveWinward
      @SteveWinward  3 года назад

      Check out my response to your comment on the Power Platform Form Processing with AI Builder video.

  • @sempaxbolong6259
    @sempaxbolong6259 2 года назад +1

    How you handle multiple invoices in 1 pdf ? There is no feature called document separation.