LLMs: Data Privacy and Protection, PII Anonymisation

Поделиться
HTML-код
  • Опубликовано: 6 сен 2024
  • python.langcha...
    #datascience #machinelearning #deeplearning #datanalytics #predictiveanalytics
    #artificialintelligence #generativeai #largelanguagemodels #naturallanguageprocessing
    #computervision #transformers #embedding #graphml #graphdatascience
    #datavisualization #businessintelligence #montecarlosimulation #simulation #optimization
    #python #aws #azure #gcp

Комментарии • 22

  • @SridharKumarKannam
    @SridharKumarKannam  Месяц назад +1

    If you found this content useful, pleases consider sharing it with others who might benefit. Your support is greatly appreciated :)

  • @syedibrahimkhalil786
    @syedibrahimkhalil786 26 дней назад

    Subbed. Any applications for maintaining/enhancing "crowdsourced data quality", "improving transparency and trustworthiness of data anonymization process" using LLM?

    • @SridharKumarKannam
      @SridharKumarKannam  26 дней назад +2

      AFAIK, Microsoft Presidio is the best one for data anonymization and PII

    • @syedibrahimkhalil786
      @syedibrahimkhalil786 26 дней назад

      @@SridharKumarKannam ahan. Looking forward to knowing any other LLM based videos on crowdsourced data quality. Thanks!

  • @ghrangelr
    @ghrangelr 7 месяцев назад

    Muchas gracias

  • @user-iv3jf5iv2z
    @user-iv3jf5iv2z 6 месяцев назад +1

    is any data base needed , or it stores in buffer menmory of langchain , i was thinking in a application level perspective? where multilple prompts mai raise in same timestamp to the llm , how it de mask to the right prompt?

    • @SridharKumarKannam
      @SridharKumarKannam  6 месяцев назад

      Its in-buffer memory. What you suggested is useful for a production level application, store the mappings in an external database. I'm not clear about your second question, it should work fine even with multiple prompts at the same time.

  • @utkarshashinde9167
    @utkarshashinde9167 3 месяца назад

    Very informative videos sir..... just add link or any ref to Notebook

  • @dr.salilpattnaik3429
    @dr.salilpattnaik3429 8 месяцев назад

    Thank you very much. Very nice, crisp and clear presentation. A lots of learning. Can you please share the code ?

  • @ukcp265
    @ukcp265 2 месяца назад +1

    how to handle PII for tabular data or csv or excel

    • @SridharKumarKannam
      @SridharKumarKannam  2 месяца назад

      afaik, there isn't any direct way unless you turn your tabular data into a string.

  • @karthikb.s.k.4486
    @karthikb.s.k.4486 8 месяцев назад +1

    Nice explanation where can we find the code please

    • @SridharKumarKannam
      @SridharKumarKannam  8 месяцев назад

      python.langchain.com/docs/guides/privacy/presidio_data_anonymization/qa_privacy_protection

    • @karthikb.s.k.4486
      @karthikb.s.k.4486 8 месяцев назад

      @@SridharKumarKannam Thank you

  • @seththunder2077
    @seththunder2077 8 месяцев назад

    i was wondering how exactly can I do this with ConversationalRetrievalChain cuz I am not using LCEL as its still buggy and a bit confusing

    • @SridharKumarKannam
      @SridharKumarKannam  8 месяцев назад

      Presidio library is from Microsoft. Langchain simply integrated it with their framework. You can use standalone Predidio - github.com/microsoft/presidio

  • @satyamsharma4692
    @satyamsharma4692 3 месяца назад

    Any library or language pack that we can use for Indian data?

    • @SridharKumarKannam
      @SridharKumarKannam  3 месяца назад +1

      I've not using anything specific to India. I'll let you know if I come across anything..

    • @chad9756
      @chad9756 13 дней назад +1

      @@SridharKumarKannam yes please, if there's anything we can do to train the dataset, would really appreciate a video on the same