🚀 Ready to Master Real-World AI and Land High-impact Roles in just 4 weeks? Join my Hands-on Training and Transform your Career Today 🎯: www.maryammiradi.com/training
@@hassantristed You need Python Programming skills first. It will help you a lot in your AI and Data Science career. I can send you free Tutorials, it takes about 3 weeks to learn it. Then you can start my Program. If you are interested let me know.
I absolutely adore the way you present information with such calmness and clarity. It's like a soothing breeze on a hot day! Thank you immensely, Dr. Maryam. Words can't fully capture the depth of my gratitude for your incredible guidance!
your videos are amazing and very excellent in the context of artificial intelligence and machine learning ,lots of love and thanks from lahore pakistan
Excellent video-useful, informative, and easy to comprehend and follow. On a separate note, the presentation is beautiful, and the editing work is great! Please continue publishing this amazing material. 👍
What I never see in these RAG tutorials is how you modify the contents of the RAG. How do you remove old/redundant data from the vector database and update or add new content. How do you modify an existing PDF for example and have the database remove the old data and add the new data?
Here’s how you can manage it using LangChain and vector databases like ChromaDB or FAISS: 1. Deleting Old Data: In LangChain, when using vector databases like ChromaDB, you can remove old or redundant data by specifying the document ID or metadata. chromadb_client.delete(ids=["your_document_id"]) -- 2. Updating Content: To update an existing PDF or document, first delete the associated embeddings by using the document's ID or metadata: chromadb_client.delete(ids=["your_document_id"]) After modifying your document, re-process the PDF and generate new embeddings using LangChain's text splitter and embedding functions. --- 3. Adding New Content: For new documents, just process them with the same text splitting and embedding workflow: chromadb_client.add(texts, embeddings=embeddings, ids=["new_document_id"])
Thank you for this! I'm curious about your thoughts on HyDE retriever as opposed to multi-query? I'm struggling with greater accuracy in a software manual of 4500 pages I'm trying to have in my RAG pipeline. I'm attempting semantic chunking of the document today. But, I've tried splitting the PDF, it often loses context. This seems to happen often with concepts widely dispursed across the document. For example "financial management" which is referenced frequently in other sections of the manual but has it's own dedicated sections.
When comparing HyDE (Hypothetical Document Embeddings) retriever to multi-query retriever for your use case, there are some key differences to consider. HyDE generates a hypothetical document to create embeddings and then uses those embeddings to retrieve documents, which can sometimes help retain better context, especially for long and dispersed documents like your 4500-page software manual. It works well when concepts are scattered across a large document because it relies more on the embedding space to find semantically related content. On the other hand, multi-query retriever focuses on generating multiple different queries based on the original input query, which can be useful to cover different ways the content could be expressed. However, it may struggle if concepts are heavily dispersed without a central unifying section, as it might retrieve too many unrelated sections. Since you're attempting semantic chunking but facing context loss, HyDE might perform better in capturing broader contextual links across the entire document. Also, I suggest experimenting with the chunk size-sometimes a smaller size loses too much context, while a larger one may work better depending on the density of information. You could also consider using a sliding window approach when splitting the document to maintain overlapping context across chunks. Let me know how it goes or if you need more detailed answers
Yes, but the results van be different for different usecases! Depends on the fact or cohesivity is important or structure. Recursive Text Splitter: This strategy breaks the text recursively at various levels (e.g., paragraphs, sentences) until each chunk is below a certain token limit. It's commonly used when you need to ensure chunks fit within the input size of the language model while maintaining logical coherence at the sentence or paragraph level. Ideal for texts where preserving hierarchical structure (like chapters, sections) matters and token limits need to be strictly respected. Semantic Text Splitter: It analyzes the meaning of the text and splits it based on semantic units (like coherent thoughts or topics). This can lead to better context preservation within chunks compared to recursive splitting, which is based primarily on structural rules. A semantic splitter is more beneficial for tasks where maintaining contextual integrity and understanding within the chunks is crucial. It is especially helpful for documents where sections vary greatly in length or meaning, and you want the model to have a clearer context.
I have been working with a RAG for awhile and I haven't had much luck with the accuracy of local models, especially if the data is structured data which I assume is because it doesn't ingest well into a vector database. For example if you have a pdf that contains an extension list, the data is structured and not connected in a way that works well with cosign similarity search. I had thought about putting the structured data into a SQL database and using semantic routing to decide whether to pull context from a SQL database versus vector database. Do you have an opinion about using SQl databases with vector databases to increase accuracy?
That’s a great question! You’re right-structured data like extension lists or database-like information doesn’t always fit well into vector databases since they’re optimized for unstructured text and work best with cosine similarity or dense embeddings. For highly structured data, SQL databases can be much more accurate. Hybrid Approach (Vector + SQL): What you mentioned about using SQL for structured data and vector databases for unstructured data is spot on. You could implement semantic routing where the system decides which database to query based on the nature of the input. SQL excels at handling structured queries with exact matches, while vector databases can handle the fuzziness needed for semantic or context-based searches. After retrieving the relevant data from both databases, you can rank or combine the results using a post-processing step that merges the outputs from the vector and SQL queries. This way, you leverage the strengths of both systems for better overall accuracy.
Locall for a RAG means you can connect retriever to your LLM without using internet. I have uses llama 3 through ollama on my local computer. This is suitable for Industries with sensitive data which does not give the Data Sciencist to connect to LLMs on internet. Google colab is just the notebook with ability to have GPU as CPUs are not enough.
Is it common practice to re generate embeddings and vector db in case you don't get good answer or you have better way of chunking? You might have hundreds of pdf files.
Chunking strategy can significantly affect the quality of answers. If your initial chunk sizes were too small or too large, or if important context got split across chunks, it might be helpful to reprocess the documents with a refined chunking method. This is particularly important when dealing with hundreds of PDFs, as better chunking can lead to more accurate embedding and improved retrieval. Regenerating embeddings isn’t always necessary unless the chunking change is substantial or the initial embeddings didn’t capture the meaning well. You can also update just a portion of the database rather than starting from scratch if needed. Hope this helps! 😊
Recent Generative AI with LLMs is just a super small part of AI, AI Science includes machine learning, deep learning and Generative AI like GANs are already here about 50 years. As you can see father of AI like Yan le Cun, Geoffrey Hinton are already around quite a number of years.
Of course!! Here are the links: s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Rev_by_Mkt_Qtrly_Trend_Q225.pdf s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Q2FY25-CFO-Commentary.pdf s201.q4cdn.com/141608511/files/doc_financials/2025/q2/78501ce3-7816-4c4d-8688-53dd140df456.pdf
hi i need yor help: i have an exprience rag piple line with llama 3.1 and try to rag for persian characters but it doesnt work and llama answer incorrect the questions .. a have did alot of research and it seems the embedding model is not work for persian characters ..can you offer any embedding model to embedd persian correctly?
Yes, you can obtain embeddings for Persian text to build a local Retrieval-Augmented Generation (RAG) system. Several multilingual and language-specific models support Persian and can generate high-quality embeddings suitable for tasks like semantic search, clustering, and retrieval in RAG systems. Here are some options you might consider: LaBSE (Language-agnostic BERT Sentence Embedding): Developed by Google, LaBSE is designed to produce language-agnostic sentence embeddings for 109 languages, including Persian. LASER (Language-Agnostic SEntence Representations): Developed by Facebook AI, LASER supports over 90 languages, including Persian. mBERT (Multilingual BERT): Supports 104 languages, Persian included. XLM-R (XLM-RoBERTa): An improved version of XLM, supporting 100 languages.Provides robust performance across various languages, including Persian. Persian-Specific Models: HooshvareLab/bert-base-parsbert-uncased: A BERT model pre-trained on large Persian corpora. m3hrdadfi/albert-fa-zwnj: An ALBERT model optimized for Persian. You can fine-tune models like mBERT or XLM-R using the Sentence Transformers library to produce better sentence embeddings for Persian. This might require some labeled data for tasks like paraphrase identification or semantic textual similarity in Persian. Hope this helps.
I hope, Ollama will not take whole memory of my pc. Why not using HuggingFace hosted Llama3? I will try to code your tutorial though :) Since it will be on Colab, I don't think it makes any difference. So, I will code it.
Hugging Face is also great. But Ollama is for when you want to work locally. You will need GPU, also in google Colab I used T4 which has a small GPU. That will really help.
@@vncstudio Hugging Face’s hosted models are fantastic, especially for cloud-based work like Google Colab or AWS SageMaker JupyterLab. Colab's T4 GPU or SageMaker's resources are great for running the model without worrying about local hardware limitations. However, Ollama is specifically designed for running LLMs locally without needing cloud infrastructure. This can be useful if you want complete control over your environment, especially if you're working offline or want to avoid cloud costs.
Of course!! Here are the links: s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Rev_by_Mkt_Qtrly_Trend_Q225.pdf s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Q2FY25-CFO-Commentary.pdf s201.q4cdn.com/141608511/files/doc_financials/2025/q2/78501ce3-7816-4c4d-8688-53dd140df456.pdf
ElevenLabs is one of the best solutions for high-quality AI-generated voices, but there are some free alternatives. Coqui TTS, Google Text-to-Speech API, Mozilla TTS.
🚀 Ready to Master Real-World AI and Land High-impact Roles in just 4 weeks? Join my Hands-on Training and Transform your Career Today 🎯: www.maryammiradi.com/training
Thank you! This is great information. Subscribed! - Will be watching your previous videos too! 😊
@@learning_rust really glad to hear!! 👋Let me know if you have any questions
Can I join training if I don’t have programming skills?
@@hassantristed You need Python Programming skills first. It will help you a lot in your AI and Data Science career. I can send you free Tutorials, it takes about 3 weeks to learn it. Then you can start my Program. If you are interested let me know.
@@maryammiradi please share. I tried to learn python and know the basics.
Bra jobbat!
کامل با ساختار عالی سپاس
Great video mam love to see more of these
Thanks alot! Will have definitely this video looks like
I absolutely adore the way you present information with such calmness and clarity. It's like a soothing breeze on a hot day! Thank you immensely, Dr. Maryam. Words can't fully capture the depth of my gratitude for your incredible guidance!
Thank you very much for the kind words!! 🙏Super Glad to hear!!
your videos are amazing and very excellent in the context of artificial intelligence and machine learning ,lots of love and thanks from lahore pakistan
Thank you and glad to hear!
That was such a clear and insightful explanation. Thanks a lot
Glad to hear it is insightful 👋
Your efforts are always seen in content 💯Always appreciated.
Thanks alot! Glad to hear!
Beautifully explained. Thank you...
Thank you very much! Very glad to hear!!
Thank you! It is very useful and much appreciated 😊
@@Naejbert really glad to hear!!
Excellent video-useful, informative, and easy to comprehend and follow. On a separate note, the presentation is beautiful, and the editing work is great! Please continue publishing this amazing material. 👍
Very glad to hear!! Thank you very much for the kind words
Great work.
Thank you!
Looks like the tutorial crashes at the monent of looking for a directory, nedds a little fix. The video is absolutelly awesome and insightfull.
Thank you! How do you mean Crash? Do you mean in the Google Colab Notebook?
What I never see in these RAG tutorials is how you modify the contents of the RAG. How do you remove old/redundant data from the vector database and update or add new content. How do you modify an existing PDF for example and have the database remove the old data and add the new data?
Here’s how you can manage it using LangChain and vector databases like ChromaDB or FAISS:
1. Deleting Old Data:
In LangChain, when using vector databases like ChromaDB, you can remove old or redundant data by specifying the document ID or metadata.
chromadb_client.delete(ids=["your_document_id"])
--
2. Updating Content:
To update an existing PDF or document, first delete the associated embeddings by using the document's ID or metadata:
chromadb_client.delete(ids=["your_document_id"])
After modifying your document, re-process the PDF and generate new embeddings using LangChain's text splitter and embedding functions.
---
3. Adding New Content:
For new documents, just process them with the same text splitting and embedding workflow:
chromadb_client.add(texts, embeddings=embeddings, ids=["new_document_id"])
Its an excellent video , but why did you removed the colab file?
Thanks, It is now back in the description. Has removed it accidently when formatting the rest of description.
@maryammiradi thank you so much this help us learn 🙏
@@takimdigital3421 Happy to hear that this helps you guys! Are you a Data Scientist?
Thank you for this! I'm curious about your thoughts on HyDE retriever as opposed to multi-query? I'm struggling with greater accuracy in a software manual of 4500 pages I'm trying to have in my RAG pipeline. I'm attempting semantic chunking of the document today. But, I've tried splitting the PDF, it often loses context. This seems to happen often with concepts widely dispursed across the document. For example "financial management" which is referenced frequently in other sections of the manual but has it's own dedicated sections.
When comparing HyDE (Hypothetical Document Embeddings) retriever to multi-query retriever for your use case, there are some key differences to consider. HyDE generates a hypothetical document to create embeddings and then uses those embeddings to retrieve documents, which can sometimes help retain better context, especially for long and dispersed documents like your 4500-page software manual. It works well when concepts are scattered across a large document because it relies more on the embedding space to find semantically related content.
On the other hand, multi-query retriever focuses on generating multiple different queries based on the original input query, which can be useful to cover different ways the content could be expressed. However, it may struggle if concepts are heavily dispersed without a central unifying section, as it might retrieve too many unrelated sections.
Since you're attempting semantic chunking but facing context loss, HyDE might perform better in capturing broader contextual links across the entire document. Also, I suggest experimenting with the chunk size-sometimes a smaller size loses too much context, while a larger one may work better depending on the density of information. You could also consider using a sliding window approach when splitting the document to maintain overlapping context across chunks.
Let me know how it goes or if you need more detailed answers
can we use semantic splitter instead of recursive splitter?
Yes, but the results van be different for different usecases! Depends on the fact or cohesivity is important or structure. Recursive Text Splitter: This strategy breaks the text recursively at various levels (e.g., paragraphs, sentences) until each chunk is below a certain token limit. It's commonly used when you need to ensure chunks fit within the input size of the language model while maintaining logical coherence at the sentence or paragraph level. Ideal for texts where preserving hierarchical structure (like chapters, sections) matters and token limits need to be strictly respected.
Semantic Text Splitter: It analyzes the meaning of the text and splits it based on semantic units (like coherent thoughts or topics). This can lead to better context preservation within chunks compared to recursive splitting, which is based primarily on structural rules. A semantic splitter is more beneficial for tasks where maintaining contextual integrity and understanding within the chunks is crucial. It is especially helpful for documents where sections vary greatly in length or meaning, and you want the model to have a clearer context.
how we can further deploy it or make interface for user to interact?
We can use a web-based interface around it like Streamlit. I will make videos in the future with Streamlit for LLM apps.
@@maryammiradi thanks. we are waiting
Hello mam, where is your time series analysis video?
Would be a part of mini-trainings to come.
I have been working with a RAG for awhile and I haven't had much luck with the accuracy of local models, especially if the data is structured data which I assume is because it doesn't ingest well into a vector database. For example if you have a pdf that contains an extension list, the data is structured and not connected in a way that works well with cosign similarity search. I had thought about putting the structured data into a SQL database and using semantic routing to decide whether to pull context from a SQL database versus vector database. Do you have an opinion about using SQl databases with vector databases to increase accuracy?
That’s a great question! You’re right-structured data like extension lists or database-like information doesn’t always fit well into vector databases since they’re optimized for unstructured text and work best with cosine similarity or dense embeddings. For highly structured data, SQL databases can be much more accurate.
Hybrid Approach (Vector + SQL): What you mentioned about using SQL for structured data and vector databases for unstructured data is spot on. You could implement semantic routing where the system decides which database to query based on the nature of the input.
SQL excels at handling structured queries with exact matches, while vector databases can handle the fuzziness needed for semantic or context-based searches.
After retrieving the relevant data from both databases, you can rank or combine the results using a post-processing step that merges the outputs from the vector and SQL queries. This way, you leverage the strengths of both systems for better overall accuracy.
@@maryammiradi I appreciate the response. Maybe you could create a future video on how to optimize that kind of approach using local models.
@@jim02377 Sure! Let me know what other challenges you face.
From a NEW Subscriber" " Wow, Wow, Wow, .... wow, ...Wow !" [to quote our latest "Pitch Meeting"] ...
Very glad to hear!! Thank you very much!!
Am I able to purchase your trainings if I'm in the USA?
Of course! A high Percentage of my students are from United States. Let me know if you have other questions.
Cool. But by 'fully local' do you mean 'running on google colab'?
Locall for a RAG means you can connect retriever to your LLM without using internet. I have uses llama 3 through ollama on my local computer. This is suitable for Industries with sensitive data which does not give the Data Sciencist to connect to LLMs on internet. Google colab is just the notebook with ability to have GPU as CPUs are not enough.
@@maryammiradi Very cool.
Is it common practice to re generate embeddings and vector db in case you don't get good answer or you have better way of chunking? You might have hundreds of pdf files.
Chunking strategy can significantly affect the quality of answers. If your initial chunk sizes were too small or too large, or if important context got split across chunks, it might be helpful to reprocess the documents with a refined chunking method. This is particularly important when dealing with hundreds of PDFs, as better chunking can lead to more accurate embedding and improved retrieval.
Regenerating embeddings isn’t always necessary unless the chunking change is substantial or the initial embeddings didn’t capture the meaning well. You can also update just a portion of the database rather than starting from scratch if needed.
Hope this helps! 😊
question: how could you have 20+ years a.i. experience when the first generative a.i. was published in 2006 ?!
Recent Generative AI with LLMs is just a super small part of AI, AI Science includes machine learning, deep learning and Generative AI like GANs are already here about 50 years. As you can see father of AI like Yan le Cun, Geoffrey Hinton are already around quite a number of years.
@@maryammiradi thanks for clarifying ;)
Can you also share the pdf files for colab codes?
Of course!! Here are the links:
s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Rev_by_Mkt_Qtrly_Trend_Q225.pdf
s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Q2FY25-CFO-Commentary.pdf
s201.q4cdn.com/141608511/files/doc_financials/2025/q2/78501ce3-7816-4c4d-8688-53dd140df456.pdf
hi i need yor help: i have an exprience rag piple line with llama 3.1 and try to rag for persian characters but it doesnt work and llama answer incorrect the questions .. a have did alot of research and it seems the embedding model is not work for persian characters ..can you offer any embedding model to embedd persian correctly?
Yes, you can obtain embeddings for Persian text to build a local Retrieval-Augmented Generation (RAG) system. Several multilingual and language-specific models support Persian and can generate high-quality embeddings suitable for tasks like semantic search, clustering, and retrieval in RAG systems.
Here are some options you might consider:
LaBSE (Language-agnostic BERT Sentence Embedding):
Developed by Google, LaBSE is designed to produce language-agnostic sentence embeddings for 109 languages, including Persian.
LASER (Language-Agnostic SEntence Representations):
Developed by Facebook AI, LASER supports over 90 languages, including Persian.
mBERT (Multilingual BERT):
Supports 104 languages, Persian included.
XLM-R (XLM-RoBERTa):
An improved version of XLM, supporting 100 languages.Provides robust performance across various languages, including Persian.
Persian-Specific Models:
HooshvareLab/bert-base-parsbert-uncased: A BERT model pre-trained on large Persian corpora.
m3hrdadfi/albert-fa-zwnj: An ALBERT model optimized for Persian.
You can fine-tune models like mBERT or XLM-R using the Sentence Transformers library to produce better sentence embeddings for Persian.
This might require some labeled data for tasks like paraphrase identification or semantic textual similarity in Persian.
Hope this helps.
@@maryammiradi. i cant thank u enough ..so ill do research and use about these models and share the result with you..
I hope, Ollama will not take whole memory of my pc. Why not using HuggingFace hosted Llama3? I will try to code your tutorial though :) Since it will be on Colab, I don't think it makes any difference. So, I will code it.
Hugging Face is also great. But Ollama is for when you want to work locally. You will need GPU, also in google Colab I used T4 which has a small GPU. That will really help.
Ollama is nice since you can test your prompts interactively as well as run code. I use AWS Sagemaker JupyterLab.
@@vncstudio Hugging Face’s hosted models are fantastic, especially for cloud-based work like Google Colab or AWS SageMaker JupyterLab. Colab's T4 GPU or SageMaker's resources are great for running the model without worrying about local hardware limitations.
However, Ollama is specifically designed for running LLMs locally without needing cloud infrastructure. This can be useful if you want complete control over your environment, especially if you're working offline or want to avoid cloud costs.
@@maryammiradi My requirement is to work in a regulated cloud environment. So it would either be Ollama or Transformers in offline mode. 🙂
I don't find 4 pdf in the notebook folders. is it possible to put the pdf into the folder
I can do that but then after one day, Google colab remove temporary data. I can give the links to the pdfs.
@@maryammiradi yes, would you please share the links for me to download, maybe google does not allow to attach the pdf
would you please share the pdf links , thank you.
Of course!! Here are the links:
s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Rev_by_Mkt_Qtrly_Trend_Q225.pdf
s201.q4cdn.com/141608511/files/doc_financials/2025/Q225/Q2FY25-CFO-Commentary.pdf
s201.q4cdn.com/141608511/files/doc_financials/2025/q2/78501ce3-7816-4c4d-8688-53dd140df456.pdf
@@maryammiradi Thank you very much
still no free alternative to Elevenlab?
ElevenLabs is one of the best solutions for high-quality AI-generated voices, but there are some free alternatives. Coqui TTS, Google Text-to-Speech API, Mozilla TTS.