How to deploy LLMs (Large Language Models) as APIs using Hugging Face + AWS

Поделиться
HTML-код
  • Опубликовано: 26 окт 2024

Комментарии • 62

  • @ajith_e
    @ajith_e Год назад +18

    Great tutorial that focuses on the aspect of practical deployment. I've come across multiple tutorials that involves Google Colab, which are great for testing things out but API access to the LLM is the thing we need for building practical applications.
    I've few questions:
    Questions:
    1. What's the estimated cost per day, considering no change is made to the underlying infrastructure ?
    2. Does the steps remain same for deploying other LLM's ?
    3. How can we maintain the context (of the previous response) in conversation ?
    4. How can we provide custom information through RAG ?
    Thanks again for making this tutorial.

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +2

      Great questions. Many of these are highly use-case dependent
      I cover many of these topics in my blog: skanda-vivek.medium.com/
      Thanks for the feedback!

    • @sanjeevav1069
      @sanjeevav1069 6 месяцев назад

      @@scienceineverydaylife3596 But all the blogs are members only, could we get access some day?

  • @nat.serrano
    @nat.serrano Год назад +10

    Thank you. Better than the amazon videos prepared by 100 people. The only thing is how cheap is sagemaker😅

  • @justcreate1387
    @justcreate1387 Год назад +11

    Great video!
    I noticed that you’ve set the Lambda function timeout to three minutes. However, it’s triggered by the AWS API Gateway, which has a maximum timeout of 30 seconds. Therefore, if the Lambda function execution exceeds 30 seconds, its response will never be sent unless you’ve configured the response to be asynchronous. Just an observation I thought I’d share.

    • @VIVEKKUMAR-kx1up
      @VIVEKKUMAR-kx1up Год назад +1

      great observation!!

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +2

      Good point!

    • @abdulwaqar844
      @abdulwaqar844 Год назад +1

      Can we make endpoint of aws Lambda function instead of API gateway?

    • @justcreate1387
      @justcreate1387 Год назад +1

      @@abdulwaqar844 yes indeed. It’s called a Lambda ‘function url’. It just takes a few clicks to set up the function url endpoint.

    • @abdulwaqar844
      @abdulwaqar844 Год назад +1

      @@justcreate1387 Yes, In this way we can avoid limit of execution time of API gateway which is very low if we are working with ML models.

  • @MoThoughNo
    @MoThoughNo 4 месяца назад +2

    Why is that everyone skip the most important part of AWS service for automation which is how to create the Lambda code! Is there a resource about how to write or make the Lambda context/code?

  • @VIVEKKUMAR-kx1up
    @VIVEKKUMAR-kx1up Год назад +5

    informative video!
    I have requested for the ml.m5.2xlarge instance - after 2 working days I will be given permission !!

  • @thecryptobeard
    @thecryptobeard 7 месяцев назад +2

    is there an easier way to do this with Ollama?

  • @CsabaTothMr
    @CsabaTothMr Год назад +2

    I'm deploying a LLava model (image + text), how can I invoke that?

  • @brayancuenca6925
    @brayancuenca6925 Год назад +4

    Can you make a video tutorial using nextjs with it?

  • @MichaelMcCrae
    @MichaelMcCrae Год назад +2

    I received an error running the code above. UnexpectedStatusException Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check.

  • @georgekuttysebastian1412
    @georgekuttysebastian1412 2 месяца назад

    Great video. I'm pretty unfamiliar with cloud, I just wanna make sure that I can get a LLM to service multiple endpoints for multiple users. If so how do I get to know the number of users that can be serviced

  • @RamakrishnanGuru
    @RamakrishnanGuru Год назад +4

    Would really help if you can provide an approximate cost of trying out this tutorial on AWS. Is there any info someone can share?

  • @islamicinterestofficial
    @islamicinterestofficial Год назад

    Thanks for the video. Can we run 33B model with instance_type="mI.g4dn.2xlarge", that you also used in the video? If not, then which instance type should I use?

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +1

      Hi - unfortunately that is too small, as it is a 32 GB instance. I would suggest an instance with 33*2 (atleast) - so a 70 GB or greater instance like ml.p3.16xlarge or ml.g4dn.12xlarge

    • @islamicinterestofficial
      @islamicinterestofficial Год назад

      @@scienceineverydaylife3596 Thanks for the quick reply. I'll definitely use them then ...

  • @davidthiwa1073
    @davidthiwa1073 4 месяца назад

    Yes what about costs? Any easier platforms you have tried?

  • @buksa7257
    @buksa7257 9 месяцев назад

    @buksa7257
    0 seconds ago
    Im having a bit trouble undestanding the following: i believe you're saying the lambda function is calling the endpoint of the sagemaker (the place where we stored the llm). But then who calls the lambda function? When is that function triggered? Does it need another endpoint?

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  8 месяцев назад

      The lambda function is called from the API gateway (whenever a user invokes the API)

  • @prestonmccauley43
    @prestonmccauley43 Год назад

    Amazon sagemaker is pretty complex and the ui horrible, any other ways to deploy? The compute model tried to charge me like 1000 dollars for the free usage. Because it spins up like 5 instances. Instance that don’t show up in the console directly, you have to open the separate sage maker instance viewer,

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад

      Yes - you can deploy quantized models locally using desktop apps (ruclips.net/video/BPomfQYi9js/видео.html&ab_channel=DataScienceInEverydayLife) or look at other 3rd party solutions like lambda labs

  • @arvindshelke8889
    @arvindshelke8889 Год назад +1

    I am setting this up for the first time, can you share role config

  • @Vijayakumar-nk9fr
    @Vijayakumar-nk9fr Год назад +1

    the response generated by the model is of very limited characters how can adjust the model to generate more data. can anyone please help

    • @tootemakan
      @tootemakan 7 месяцев назад

      Increase the number of tokens generated, obviously it would be compute heavy

  • @mohammadkhair7
    @mohammadkhair7 Год назад +1

    Great video, thank you.
    I can not find the source code deploying.ipynb in the repository
    I appreciate adding a link to it here and in the description

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +1

      Here is the blog link: skanda-vivek.medium.com/deploying-open-source-llms-as-apis-ec026e2187bc

  • @MichealAngeloArts
    @MichealAngeloArts Год назад

    Many thanks for the great video! Just wondering if you share your code / notebooks anywhere e.g. github etc

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +1

      You can find blog posts here: skanda-vivek.medium.com/
      And github code here: github.com/skandavivek/

    • @gabesusman4592
      @gabesusman4592 Год назад

      @@scienceineverydaylife3596 which repo? I'm looking for the lambda logic

  • @indianhub6001
    @indianhub6001 Год назад

    Bro I have a question kindly reply please Kindly suggest a channel which describes the NLP in detail from scratch mean total beginner friendly

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад

      try a huggingface crash course (though that is a bit more advanced - but covers topics like transformers)
      Otherwise I'd suggest making yourself comfortable with the fundamentals of ML, deep learning etc.

  • @anujcb
    @anujcb Год назад

    did you upgrade boto3 in the notebook to get this going? Any other packages upgraded?

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад

      Here's the blog link, I don't think I needed any other upgrade/downgrades:
      skanda-vivek.medium.com/deploying-open-source-llms-as-apis-ec026e2187bc

    • @anujcb
      @anujcb Год назад

      @@scienceineverydaylife3596 , thank you, i got this working, although the response time is slow like you mentioned.

    • @anujcb
      @anujcb Год назад

      @scienceineverydaylife3596 have you tried sending embeddings to the model as context?

  • @SD-rg5mj
    @SD-rg5mj Год назад

    hello, does the text generated by hugging face or another artificial intelligence to make images to text have directly in a cell of Google sheet,
    as I can do with a copy and paste of the chatgpt API
    anyway thank you very much for your videos

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад

      Not sure - you might look up whether there is a way to integrate google sheets with API calls

    • @SD-rg5mj
      @SD-rg5mj Год назад

      @scienceineverydaylife3596 sorry I'm not sure I understand you, I speak bad English,
      are you asking if it is possible to connect chatgpt to Google sheet through an API? yes that's possible, there are several tutorials, if you want I'll show you one, it's very simple pink lovers for a beginner like me
      can i send you an email

  • @allailqadrillah
    @allailqadrillah Год назад

    very informative video. I am a student and want to deploy my LLM as a personal portfolio. is it possible to do it for free on aws? and what are the limitations?

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +1

      I believe there are some free endpoints for 2 months or so - but not GPU or accelerated endpoints needed for LLMs (but good enough for BERT like models if you want to get started)
      aws.amazon.com/pm/sagemaker/?trk=b6c2fafb-22b1-4a97-a2f7-7e4ab2c7aa28&sc_channel=ps&ef_id=Cj0KCQjw5f2lBhCkARIsAHeTvlghqZsHJxS3V-li795UoUywEr9p7P6bKxbQx4XPL3vV2En4QFaHdtsaAnqTEALw_wcB:G:s&s_kwcid=AL!4422!3!651751060713!p!!g!!aws%20sagemaker%20pricing!19852662230!145019226617

  • @chrisder1814
    @chrisder1814 4 дня назад

    thanks

  • @smokyboy3536
    @smokyboy3536 Год назад

    Only helpful if you already know what you are doing.

  • @nonameyet5069
    @nonameyet5069 8 месяцев назад

    haha i am the 1000th subscriber :)

  • @SO-vq7qd
    @SO-vq7qd Год назад

    If you could show how to deploy a finetuned HF model and monetize it youll be rich

  • @cas818028
    @cas818028 Год назад

    This is pretty expensive to host and run, just fyi

    • @ainanirina758
      @ainanirina758 Год назад

      Do you have the pricing for each service?

    • @scienceineverydaylife3596
      @scienceineverydaylife3596  Год назад +1

      Here is the pricing for various AWS Sagemaker endpoints - what you choose depends on your model compute needs: aws.amazon.com/sagemaker/pricing/

    • @ainanirina758
      @ainanirina758 Год назад

      @@scienceineverydaylife3596 Thank you so much for your quick answer!
      I really would like to be as close as 0$ running cost as possible even if it may needs a lot of initial investment, do you think running API from our own computer server would be realistic for production? What would be the requirements?
      Thank you for your time

  • @Kaushikshresth
    @Kaushikshresth 11 месяцев назад

    hey i want to contact you, how can i ?