Deploy ANY Open-Source LLM with Ollama on an AWS EC2 + GPU in 10 Min (Llama-3.1, Gemma-2 etc.)

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024

Комментарии • 15

  • @DevelopersDigest
    @DevelopersDigest  Месяц назад

    The best way to support this channel? Comment, like, and subscribe!

  • @hpongpong
    @hpongpong Месяц назад +1

    Great concise presentation. Thank you so much!

  • @ryanroman6589
    @ryanroman6589 Месяц назад +1

    this is super valuable. awesome vid!

  • @rembautimes8808
    @rembautimes8808 Месяц назад +1

    Thanks very nice tutorial

  • @dylanv3044
    @dylanv3044 27 дней назад +1

    maybe a dumb question. how do you turn the stream data you received into readable sentences

    • @DevelopersDigest
      @DevelopersDigest  26 дней назад

      You could accumulate tokens and split by the end of sentences . ! ? Etc and then send resp after grouping function like that

  • @alejandrogallardo1414
    @alejandrogallardo1414 Месяц назад +1

    for models at ~70b, i am getting timeout issues using vanilla ollama. It works with the first pull/run, but times out when i need to reload model. Do you have any recommendations for persistently keeping the same model running?

  • @nexuslux
    @nexuslux 28 дней назад

    Can you use open web ui?

  • @danielgannage8109
    @danielgannage8109 Месяц назад

    This is very informative! Thanks :)
    Curious why you used a g4dn.xlarge GPU ($300/month) instead of a t3.medium CPU ($30/month)? I assumed the 8 Billion parameter model was out of reach with regular hardware. What max model size works with the g4dn.xlarge GPU? To put into perspective, I have a $4K macbook (16gb ram) that can really only run the large (150 million) or medium (100 million parameter) sized model, which i think the t3.medium CPU on AWS can only run the 50 million param (small model).

  • @BeCodeless-dot-net
    @BeCodeless-dot-net Месяц назад +1

    nice explaination