Use ChatGPT, Mistral and ollama for Text Processing in R | Step-By-Step Tutorial

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • DESCRIPTION AND CODE
    Using Web API calls you can communicate with a whole lot of AI chat bots. But setting up the API calls for different chatbots can be a bit tediuos. That's why my new {tidychatmodels} package gives you a common interface for these kind of chat models. You can find everything you've seen in this video at albert-rapp.de...
    GITHUB REPO
    You can find the package on GitHub at github.com/Alb...
    📈 CREATE EFFECTIVE CHARTS
    Check out my video course to create insightful data visualizations with ggplot at arapp.thinkifi...
    MORE VIDEOS
    📺 Avoid duplicate R code in 150 seconds • Avoid duplicate code w...
    📺 Shiny modules in 100 seconds • Shiny Modules in 100 S...
    📺 Fast explainer playlist • Explainer videos
    Subscribe at 👉 / @rappa753
    MORE CONTENT
    - weekly 3-minute newsletter about R, DataViz and webdev at 3mw.albert-rap...
    - LinkedIn at / albert-rapp-9a5b9b28b
    #rstats #dataviz #ggplot #dplyr

Комментарии • 29

  • @rappa753
    @rappa753  2 месяца назад

    If you enjoyed this video and want to level up your R skills even further, check out my latest video courses:
    📍Data Cleaning Master Class at data-cleaning.albert-rapp.de/
    📍Insightful Data Visualizations for "Uncreative" R Users at arapp.thinkific.com/courses/insightful-data-visualizations-for-uncreative-r-users

  • @Aaqib..
    @Aaqib.. 5 месяцев назад +4

    Best videos on R on youtube, thanks a lot

    • @rappa753
      @rappa753  5 месяцев назад

      You're welcome 🤗

  • @infuriatinglyopaque57
    @infuriatinglyopaque57 5 месяцев назад +2

    Looks super cool, can't wait to try this out! Some feature requests would be functions to create vector embeddings from pdfs - which could then be included in the chat via RAG. Would also be useful to include support for the Anthropic api - since the new Claude-3 models seem to be rivalling gpt-4. For future videos, would be cool to see how your beautiful visualization techniques could be used to visualize the chat history, or for creating tables or figures that compare responses of different models to the same prompt, or the responses of the same model to different prompts.

    • @rappa753
      @rappa753  5 месяцев назад

      Anthropic support was just pushed to the package repository. If you update, then you should be able to use anthropic :) Beware though that you will have to use the new argument api_version in create_chat() because Anthropic requires that.

  • @gecarter53
    @gecarter53 5 месяцев назад +1

    Great! Looking forward to related videos where you add additional functionality.

  • @MKhan-zo8xo
    @MKhan-zo8xo 5 месяцев назад +1

    this is cool! in my opinion having the skills to leverage AI will become a necessity for any data facing position soon. more videos showing the application of AI as data analysts/scientise would be great!

    • @rappa753
      @rappa753  5 месяцев назад +1

      Will try my best to add some more in the near future 🥳

  • @sadettindemirel
    @sadettindemirel 5 месяцев назад +1

    I have tried Mistral AI and Open AI examples. It works great. If one is interested in looping perform_chat functions., this should work:
    for (i in 1:nrow(data)) {
    print(i)
    question

  • @sadettindemirel
    @sadettindemirel 5 месяцев назад +2

    Tremendous work, I will definitely try out this package. Many generative ai models offer only python or curl support. This video made my day. I just have two questions. 1) would any models at ollama work with this package? 2) There are powerful & finetuned ai models in Hugging Face, can you add support for Hugging face and other ai models such as Gemini and Anthropic? Great work, thank you for the tutorial.

    • @rappa753
      @rappa753  5 месяцев назад

      Ollama models should work as long as they use the standard chat interface (which are hopefully all but I'm not sure about that). Other vendors are coming in the near future too 🤗

  • @krushnachChandra
    @krushnachChandra 5 месяцев назад +1

    helpful tutorial,as i was thinking of using to do something pubmed data in context of specific disease text processing this i will use as my starting point

    • @rappa753
      @rappa753  5 месяцев назад

      Nice. Good timing I guess 😀

  • @holopecopeco111
    @holopecopeco111 3 месяца назад +1

    Thank you for creating a great R package. Is there a function like httr2package's req_timeout in tidychatmodels? On my environment, it times out when I use gpt-4-turbo.

    • @rappa753
      @rappa753  3 месяца назад

      That's a great feature request but it will probably take quite some time until I find the time to implement this.

  • @adamdesutter6393
    @adamdesutter6393 3 месяца назад +1

    Great package, with infinite purposes. Thank you!
    I still struggle with a "HTTP 429 Too many requests" error.
    Do you have an idea how to get away with it?
    Many thanks in advance

    • @rappa753
      @rappa753  3 месяца назад +1

      If you use this package in a loop, you could try adding a bit of time delay into the loop. A bit of Sys.sleep() after each iteration will probably do.

    • @adamdesutter6393
      @adamdesutter6393 3 месяца назад

      Many thanks for your answer

  • @ambhat3953
    @ambhat3953 2 месяца назад

    Looks good, unfortunately in my workplace i cant use these 3rd party AI tools/packages....I wish there was a work around

    • @rappa753
      @rappa753  2 месяца назад

      That's unfortunate. Is it just the tidychatmodels package that you are not allowed to use? If so, you could use the httr2 package and do the API requests manually 🤔

    • @ambhat3953
      @ambhat3953 2 месяца назад

      @@rappa753 connecting to any 3rd party AI tool is banned...so can't help it for now. Hopefully things change in near future

  • @NewbiaLeogetti
    @NewbiaLeogetti 4 месяца назад

    This is great, thank you so much for making this video.
    My question is -- say I have a dataframe with the text of a document from some meeting in each row. I want to use AI to summarize each document. I created a function called summarize_text(), based on this tutorial, that works great at summarizing one document. But, when I try to make it summarize each document in the dataframe (df |> mutate(summary = summarize_text(text)
    ), I get the error "HTTP 400 Bad Request." Any ideas on how to deal with this issue?

    • @rappa753
      @rappa753  4 месяца назад +1

      Your summarize_text function is likely not vectorized. Hence, you will need to iderate over the documents one-by-one. You man do that with df |> mutate(summary = map(text, summarize_text)). If your summarize_text returns a single character value, you can also replace map with map_chr

    • @NewbiaLeogetti
      @NewbiaLeogetti 4 месяца назад +1

      @@rappa753 That works! Thank you so much!

  • @dasrotrad
    @dasrotrad 5 месяцев назад +1

    This is great. Did I understand that I can access various file formats? For example, I can summarize a txt file, but if I change the file format MSWord docx, or to a pdf it will not work. The error with a pdf is + perform_chat()
    Error in `httr2::req_perform()`:
    ! HTTP 400 Bad Request.
    Backtrace:
    1. tidychatmodels::perform_chat(...)
    5. httr2::req_perform(prepared_engine)

    • @rappa753
      @rappa753  5 месяцев назад

      How did you read in the PDF? You will need to stick plain text into the add_message function. Does whatever function you use to read the PDF return plain text?

    • @dasrotrad
      @dasrotrad 5 месяцев назад

      @@rappa753 I changed this line to 'pdf': subtitle
      paste(collapse = ' ') The code broke. If I used subtitle
      paste(collapse = ' ') , the code executed but it produced info regarding xml code.

    • @dasrotrad
      @dasrotrad 5 месяцев назад +1

      @@rappa753 I didn't think to ask ChatGPT to modify the code, before I posted. GPT suggested: Replace 'docs/your_pdf_file.pdf' with the path to your PDF file. This code will extract text from the PDF file using pdf_text() function from pdftools, then it will pass the extracted text to the chat model as before. Make sure you have the pdftools package installed in your R environment.
      And for docx file types, ChatGPT recommends the following; however, this modification did not resolve. I am still not able to summarize a Word.docx file type: Replace 'docs/your_docx_file.docx' with the path to your MS Word (.docx) file. This code will extract text from the Word document using read_docx() function from readtext, then it will pass the extracted text to the chat model as before. Make sure you have the readtext package installed in your R environment.
      The suggestions by GPT resolved the PDF issue. This is really awesome Albert. Thank you.

    • @rappa753
      @rappa753  5 месяцев назад

      @@dasrotrad Glad that you got it working :)