If it's a huge docx file, you could break it up and fine-tune on that. If you just need some sort of RAG/grounding, a no-code RAG solution like this would be best ruclips.net/video/G3vyK5ZgSjQ/видео.html (works with docx files).
For fine tuning, yes, you'll just need the extracted text to train on. Something like python-docx could do the extraction. Split up the extracted text into small enough samples that you don't run out of memory during the fine-tuning process (get the best GPU you can for this, like a Colab A100).
Please show us how to make embeddings based on vertex ai and how to deploy anything as an web app or android app from vertex. Thank you so much for the wonderful content.
I have a folder with many PDF's and I would like to fine tune a model to summarize these PDF's and respond to questions in my website. Is there a way to do that using the example of this video?
you would first have to figure out the parsing logic to correctly extract the text and then put it in a summarizer, if all you want is a summary then there are many good models available on hugging face that you can use directly OR just get a gemini api key and use gemini for it, it should do a decent job at it.
@@igorcastilhos also if you want to use the PDFs as context to answer questions then you probably need to parse and then put them into a vector store so that they can be retrieved when needed, this is called RAG
@@abdulsami5843 I'm using Ollama with the Web UI tool. Inside it, I'm sending to the Knowledge collection some PDFs of resolutions of attorneys, so that they can ask about them whenever they want. The main reason to use Ollama (llama3.2) instead of OpenAI API is that it is free. But I'm having problems accessing the Web UI localhost:3000 from our server in my machine, it doesn't show the models installed in the server machine.
@@abdulsami5843 Also, the RAG feature doesn't have a very nice documentation. In my case, we have a distributed folder in microsoft windows (Like C:/) and inside that folder, the attorneys and advocates will send new PDFs through the website, and I wanted to use RAG for it, but it is very hard.
Hello, I'm from Brazil. I'm new to AI. I would like to build an artificial intelligence to automate university work, as I have a lot of work and I can't keep up with it. I want an AI that can write papers like me using my texts. What adjustments or training should I do? Do I need to change a parameter?
In my mind I'm trying to use about 10 review texts of my articles. And 1 expanded summary. I want the AI to write like me without AI plagiarism detection.
what an evolved version of 2010 notepad instruction. Loved it.
I like the style of this video, dam!! Nice job
Awesome. Easy to follow. Ty
What if I have just one docx file that I want to fine tune the model with. How can I achieve that?
If it's a huge docx file, you could break it up and fine-tune on that. If you just need some sort of RAG/grounding, a no-code RAG solution like this would be best ruclips.net/video/G3vyK5ZgSjQ/видео.html (works with docx files).
@@nodematic thanks, would data preparation be the same no matter the file type? just extract the text then convert it into key pairs?
For fine tuning, yes, you'll just need the extracted text to train on. Something like python-docx could do the extraction. Split up the extracted text into small enough samples that you don't run out of memory during the fine-tuning process (get the best GPU you can for this, like a Colab A100).
But in f16 merge it was saving as.bin , how to save as the safetensors natively ?
Please show us how to make embeddings based on vertex ai and how to deploy anything as an web app or android app from vertex. Thank you so much for the wonderful content.
Thanks for the suggestion - we'll add that to our video plans.
Agree. That would be a great piece of content to watch.
@@nodematic Thanks!
@@xXWillyxWonkaXx yup
I have a folder with many PDF's and I would like to fine tune a model to summarize these PDF's and respond to questions in my website. Is there a way to do that using the example of this video?
you would first have to figure out the parsing logic to correctly extract the text and then put it in a summarizer, if all you want is a summary then there are many good models available on hugging face that you can use directly OR just get a gemini api key and use gemini for it, it should do a decent job at it.
@ thank you
@@igorcastilhos also if you want to use the PDFs as context to answer questions then you probably need to parse and then put them into a vector store so that they can be retrieved when needed, this is called RAG
@@abdulsami5843 I'm using Ollama with the Web UI tool. Inside it, I'm sending to the Knowledge collection some PDFs of resolutions of attorneys, so that they can ask about them whenever they want. The main reason to use Ollama (llama3.2) instead of OpenAI API is that it is free. But I'm having problems accessing the Web UI localhost:3000 from our server in my machine, it doesn't show the models installed in the server machine.
@@abdulsami5843 Also, the RAG feature doesn't have a very nice documentation. In my case, we have a distributed folder in microsoft windows (Like C:/) and inside that folder, the attorneys and advocates will send new PDFs through the website, and I wanted to use RAG for it, but it is very hard.
Will this software building studio ever be available for public use?
Yes, it's available now at softwaresim.com/pricing/
very nice
Hello, I'm from Brazil. I'm new to AI. I would like to build an artificial intelligence to automate university work, as I have a lot of work and I can't keep up with it. I want an AI that can write papers like me using my texts. What adjustments or training should I do? Do I need to change a parameter?
In my mind I'm trying to use about 10 review texts of my articles. And 1 expanded summary. I want the AI to write like me without AI plagiarism detection.