Training UiPath Document Understanding ML Models - Data Manager - Part 3 | RPA

Lahiru Fernando

Просмотров 4,8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 7 сен 2024

Комментарии • 35

@MukeshKala 3 года назад ⁺⁴
This Complete Playlist is in my To Learn List.
@LahiruFernando 3 года назад
Awesome!!! Also feel free to share the feedback :)
@TharaRaman 3 года назад ⁺¹
Thanks for sharing it. Awaited vedio..
@LahiruFernando 3 года назад
Awesome!!
@hemantbonde6348 2 года назад ⁺¹
Hi Lahiru
I got the issue. Actually, I was not clicking on the tick sign which is in front of the auto_retraining after setting it as "True", so it was not getting updated. Now I'm able to get the result of the Training pipeline schedule run as "Successful". Once again thanks for supporting me in the learning process.
@LahiruFernando 2 года назад ⁺¹
Hey.. So sorry for the late reply.. Had a really crazy day yesterday.. Im so happy to hear that you were able to get it resolved.
Thank you so much for letting me know brother!!!
Always feel free to reach out for anything!!
@hemantbonde6348 2 года назад
@@LahiruFernando Sure brother
@yashobantadash6670 Год назад ⁺²
Awesome as always Bro! Do we need to always choose the minor version as latest which is 5 here, or 0 bro? In one of your videos i listened to you saying that we should always choose the minor version as 0. Lil bit confused here bro 😶
@LahiruFernando Год назад ⁺¹
Hey bro...
Yes.. we should always select the minor version as 0 for Training Pipelines. We can use the latest minor version for Evaluation and ML Skill creation. It is what is recommended by UiPath as well.
In one of the videos I used latest, because I wanted to do the training faster on a smaller dataset. But, it has to be 0 all the time for training runs. Hope this clears your doubt.
@yashobantadash6670 Год назад
@@LahiruFernando cleared! thanks bro 😊😊✌✌
@aditivaishampayan9781 2 года назад ⁺¹
THanks for the video. Very helpful. I have trained & re-trainined the ML model on DEV tenant. so now I want to move it to PROD tenant. SHould I just upload the trained dataset to PROD tenant and then create new Package -> New ML skill ? or upload the labelled dataset and start from trianing on PROD tenant. Thanks in advance Aditi
@LahiruFernando 2 года назад ⁺²
Hello Aditi,
Thank you so much for the awesome feedback. Really means a lot.
Regarding your question: You can download the data you exported from the Data Manager (basically download that entire dataset), and upload it to the Production environment. Create a Data Labeling session there as well and map it to the same dataset so that it can auto retrain on the live data.
Next, use that dataset to train your model in production and deploy it as a skill..
Hope this helps.. Feel free to reach out anytime..
@thanuthomas7003 2 года назад ⁺¹
How many docs are ideal for the training run schedule?
@LahiruFernando 2 года назад ⁺¹
Hi Thanu,
You need to have a minimum of 10 unique documents for a training run. This minimum requirement comes from the Data Manager session. Any number of documents above 10 is fine.
If you are thinking about the accuracy of the extraction with a training job that has only 10 documents - it may not be sufficient. In that case, you can use the same Export you did from Data Manager to run multiple training jobs to improve the accuracy.
Over time, when you have more and more documents to feed, it will have more and more unique documents to train on. This will eventually improve the accuracy.
Small idea on how the training work:
it will use the new data, and old data for every training run. Hence, every training run will have a much larger dataset to train on than the previous run.
Hope this helps.
Let me know if anything is confusing.. Would be happy to help.
@prakashr9493 2 года назад ⁺¹
Hello Sir,
Quick question. What happens to the export schedule if the bot processed only 5 documents and uploaded only those 5 documents to the dataset using ML Extractor training. Since you mentioned 10 documents are required to export.
@LahiruFernando 2 года назад
A good question.
So, 10 documents is the minimum requirement for the initial training. So once you do the initial training, those 10 documents will always remain in the Data Manager. Once you run the process on 5 documents, those 5 will get uploaded to Data Manager automatically (based on how you configure the trainer activity). Then the DM has 15 documents and that is what it is going to export and train again..
Hope this helps?
@hemantbonde6348 2 года назад ⁺¹
Hi Lahiru
Now after training through Uipath workflow, I cannot see any fine-tune folder getting created in Dataset. Also, the Machine Learning Extractor Trainer activity now has ML Skill and Output Folder property, No Dataset, and Project property.
After the completion of the export schedule, I cannot see any auto-export folder getting created in the export folder of Dataset.
And while scheduling the pipeline when I selected the path with export\ in Choose input dataset, the pipeline is getting Failed with Exception: Document type default not valid, check that document type data is in dataset folder and follows folder structure.
@LahiruFernando 2 года назад
Hey.. im sorry for the late reply.. on vacation :-)
Training pipeline does not generate the fine-tune folder in the dataset bro. Fine-tune folder gets generates through the output given by the Machine Learning Extractor Trainer.
About the missing properties of trainer activity, I think you are using an older version. Look for the latest version (also enable pre-release options) and use that in the workflow. It should have the properties to select select AI Center project and dataset :-)
On the error you got while running the training pipeline, since you do not have the fine-tune folder yet, you have to point the folder to the exported folder. The folder that has the name you provided during Export from Data Manager.
Let me know if anything is not clear bro. Happy to help!
@hemantbonde6348 2 года назад ⁺¹
@@LahiruFernando Heartly thanks for the help and for answering my questions. I'll check it out and will let you know if stuck somewhere. Have a great vacation!
@LahiruFernando 2 года назад ⁺¹
@@hemantbonde6348 awesome.. always feel free to reach out my friend
@hemantbonde6348 2 года назад
@@LahiruFernando Thanks, bro. It worked. Just have one question. I want to create a schedule for both export and pipeline at the same time. But while creating the pipeline schedule, I need to point to the auto-export folder and it gets created only after the scheduled export is run. And pointing to the export\ path gives an error. So how to create a schedule for both export and pipeline at the same time?
@iamraj9419 3 года назад ⁺¹
Hi Lahiru,
If we have PDFs which contains more than one page.
So here some fields are remain the same for all the pages of the particular PDF (Inv no, name, comp addrss ) The fields like total and net amount are missing in some pages.
Can you let me know what is the work around here?
Thanks in advance
@LahiruFernando 3 года назад
Hi Iamraj,
Good question. So in such a case, still you can have all the fields in the ML model. However, the only thing we need to meet is that we need to have at least 10 documents that have all the fields you are looking for.
So for example:
Let's say you are training an invoice model.
You got 10 unique documents. But 2 of them do not have the field company address. So this violates the minimum requirement. So what you can do is, find few more new documents that have the company address field. Let's say you added 5 more documents and those have all the fields.
SO now you have 15, and for the company address field, you have 13 unique docs. This is enough to meet the requirement. From there onwards, all you need to do is build on top of that :)
Hope this helps
@str8totehtop 3 года назад ⁺¹
i ve communicated with some1 from uipath, afaik if you confirm split your pages it considers every page a new document for now. but there will be new feature in the upcoming release, where you can decide to not split the document. then it will look for certain fields over the span of the whole document. i hope i got your problem right and my infos that i received were correct, because i am looking forward to that feature too :)
@hemantbonde6348 2 года назад ⁺¹
Hi Lahiru
I want to create a schedule for both export and pipeline at the same time. But while creating the pipeline schedule, I need to point to the auto-export folder and it gets created only after the scheduled export is run. And pointing to the export\ path gives an error. So how to create a schedule for both export and pipeline at the same time?
@LahiruFernando 2 года назад
Hello Hemant,
Yes.. The Export will create several folders in the dataset for each export. So you will need to run this scheduler before the training schedule time.
In the training schedule, you have to point to the Export folder as that contains all the folders of auto export.
This is what I have done in one of my training schedules
"Invoice_Training/export/"
You mentioned you get an error.. Can I know what the error is?
@hemantbonde6348 2 года назад ⁺¹
@@LahiruFernando
I'm getting the following error:
Error Details : Pipeline failed due to ML Package Issue
Exception: Document type default not valid, check that document type data is in dataset folder and follows folder structure.
Do I have to create the training schedule once the execution of the export schedule is completed? Because now I'm creating the export and training schedule together. I also made sure that the export run was completed before the training schedule start time.
@LahiruFernando 2 года назад
@@hemantbonde6348 hmm.. Okay..
So just a quick check..
When you create your Training Pipeline, just check whether you do the following..
1. Select Training Run
2. Select package
3. select major version
4. select minor version (always 0)
5. set Auto Retraining option to TRUE
6. Select Dataset "Export" folder
7. Run the pipeline (run now\ schedule\ time based)
Have you done as above?
@hemantbonde6348 2 года назад ⁺¹
@@LahiruFernando Yes. I have followed all the above steps.
@LahiruFernando 2 года назад
@@hemantbonde6348 Hmm.. That's weird.. I tried the same thing on my personal environment trying to replicate your issue, and it is working fine for me. Same configuration works in my other environments too..
Maybe you are missing something. Shall we connect over whatsapp, to have a call to discuss and see how to resolve this?
- You can ping me on my email below and I will share my number :)
Email: lahirufernando90@gmail.com
@mozammilrizwan8611 2 года назад
Hi Lahiru, thank you for such wonderful training information, I followed both parts 1,2,3 and created the same but in the present validation station some data are still missing which were trained multiple times. What do I need to do to increase the accuracy?
@LahiruFernando 2 года назад
Hello, Thank you so much for the awesome feedback my friend.. Im sorry for the late reply.
So about your question,
To improve the accuracy, I would suggest to add more documents for the training in Data Manager.
I'll give you an example.
Once we did an Invoice processing project. We had a lot of vendors to process. What we did was, we collected a minimum of 10 invoices from each vendor (each vendor has its own format), and used that entire collection in the Data Manager, and did the labeling. The training pipeline generated on that data gave a very accurate output. We performed multiple training runs on that large dataset to further improve the accuracy..
Hope this gives an idea about how to improve the accuracy on your one?
Feel free to reach out to me any time for anything.. Happy to help
@mozammilrizwan8611 2 года назад
@@LahiruFernando Thank you for the reply. As you suggested I had already trained a vendor with 11 invoices multiple times but in that still few fields are not being extracted even after training more than 30 times. Should I email you the invoices?
@LahiruFernando 2 года назад
@@mozammilrizwan8611 when you created your training pipeline, how did you configure the minor version? Try setting that to the lowest and train couple of times and see if that works..

Следующие

Автовоспроизведение

Training UiPath Document Understanding ML Models - Data Manager - Part 1 | RPA