Synthetic DATA Generation using LANGCHAIN 🦜️🔗

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • In this video, I will show you how to create synthetic data using LangChain and OpenAI models.
    Synthetic data refers to artificially generated data that imitates the characteristics of real data without containing any information from actual individuals or entities. It is typically created through mathematical models, algorithms, or other data generation techniques. Synthetic data can be used for a variety of purposes, including testing, research, and training machine learning models, while preserving privacy and security
    Happy Learning 😎
    👉🏼 Links:
    GitHub repo: github.com/sud...
    LangChain documentation: python.langcha...
    ------------------------------------------------------------------------------------------
    ☕ Buy me a Coffee: ko-fi.com/data...
    ✌️Patreon: / datasciencebasics
    ------------------------------------------------------------------------------------------
    🔗 🎥 Other videos you might find helpful:
    🔥 Databricks playlist: • 30 Days Of DataBricks
    ⛓️ Langflow: • ⛓️ langflow | UI For 🦜...
    ⛓️ Flowise: • Flowise | UI For 🦜️🔗 L...
    🔥Chainlit playlist: • Chainlit
    🦜️🔗 LangChain playlist: • LangChain
    ------------------------------------------------------------------------------------------
    🤝 Connect with me:
    📺 RUclips: www.youtube.co...
    👔 LinkedIn: / sudarshan-koirala
    🐦 Twitter: / mesudarshan
    🔉Medium: / sudarshan-koirala
    💼 Consulting: topmate.io/sud...
    #langchian #llm #synthetic #syntheticdata #datasciencebasics

Комментарии • 27

  • @seththunder2077
    @seththunder2077 10 месяцев назад +1

    This is amazing! Can you please try making a more comprehensive version of this and use real data as example (doesnt have to be medical but just so that we can see full procedure)

  • @hadikhantec
    @hadikhantec 2 месяца назад

    Thanks! That's a very practical use case. Can you make a full-scale video?

  • @nasiksami2351
    @nasiksami2351 3 месяца назад

    Great tutorial! Is there any open-source implementation available of this approach?

  • @aanchalrawat
    @aanchalrawat 5 месяцев назад

    Really Amazing

  • @teja3925
    @teja3925 Месяц назад

    Hello,
    How to generate data when there are two tables and having relationship PK, FK? Does the model is capable enough to generate such data with relation?

  • @pseudoartist
    @pseudoartist 4 месяца назад

    dami dai dami

  • @devyanshrastogi
    @devyanshrastogi 9 месяцев назад

    I saw your video about fine tuning Llama 2 on your own data, can you please make a similar video on fine tuning zephyr or mistral 7b on google colab using abhisekh thakur's autotrain and then how to use that fine tuned model?

  • @sivaprasadatla
    @sivaprasadatla Месяц назад

    Please give the approach for synthetic data generation using Azure open AI as i have azure open AI key

    • @datasciencebasics
      @datasciencebasics  Месяц назад

      Hello, you can quickly use Azure OpenAI by importing Azure OpenAI feom LangChain.
      For ref here is the link -> python.langchain.com/v0.2/docs/integrations/llms/azure_openai/

    • @sivaprasadatla
      @sivaprasadatla Месяц назад

      @@datasciencebasics thanks a lot! i will check

  • @henkhbit5748
    @henkhbit5748 10 месяцев назад

    interesting video👍 Curious if you have fields that are lookup values and has only 4 different values and after generation the generated values is still valid... Also if you have fields that are made by some algorithm, for example bank number, if its also passed the check constraint for this field after generation based on the few shot examples... And can it also be done using open source llm?

  • @prashantt022
    @prashantt022 8 месяцев назад

    Good content , very helpful , able to advice ?
    If we check statistical correlation between the real and synthetic data , will the % would be above 90 % ?

    • @datasciencebasics
      @datasciencebasics  8 месяцев назад +1

      Personally, haven’t checked it. That would be a good check though before utilizing this in usecases.

  • @Player13.917
    @Player13.917 2 месяца назад

    I am unable to create 2 tier nested json using this example. Can anyone help here?

  • @ankit85jain
    @ankit85jain 8 месяцев назад

    May I request to suggest what other open source models we can use to generate synthetic data?

    • @datasciencebasics
      @datasciencebasics  8 месяцев назад

      I haven’t tried myself with other os models. You can try if it works. Also, one thing to notice is how statistically close the synthetic data and real data are.

  • @harshadahadawale9533
    @harshadahadawale9533 8 месяцев назад

    I have made application using same code ....getting output parser error while passing sample data to langchain library

  • @user-yi8lk1ki9y
    @user-yi8lk1ki9y 7 месяцев назад

    Hi, good video, for multi table data generation with referential integrity can we use Langchain ?

    • @ankit85jain
      @ankit85jain 6 месяцев назад

      This video is just the explanation of same example which Langchain has given in documentation. I am also looking for examples of more of real world scenario based data generation.

  • @ShubhamKumar-je5dm
    @ShubhamKumar-je5dm 6 месяцев назад

    Using AzureChatOpenAI instead of ChatOpenAI, It's not working any idea?

  • @sebiraj149
    @sebiraj149 8 месяцев назад

    Could you let me know which version of opening and Langchain used in this video

    • @datasciencebasics
      @datasciencebasics  8 месяцев назад

      I used the latest version when the video was uploaded so you can check the version from this link searching the package (video uploaded on Oct 27)
      pypi.org/

  • @orlandocastellanos9263
    @orlandocastellanos9263 9 месяцев назад

    What framework is best for enterprise application, haystak or langchain?

    • @datasciencebasics
      @datasciencebasics  9 месяцев назад

      Haven’t explored Haystack yet so can’t say which one but having knowledge of both might be beneficial !

    • @orlandocastellanos9263
      @orlandocastellanos9263 9 месяцев назад

      @@datasciencebasics thanks for the recommendation but is langchain good enough to work at scale in production?

    • @datasciencebasics
      @datasciencebasics  9 месяцев назад

      It depends what kind of app you want to build and deploy it. Underlying models are the key as Langchain is just the framework. Having said that, this field is still evolving and constant upgrades are necessary.