Комментарии •

  • @EcoTekTermika
    @EcoTekTermika 2 месяца назад

    This looks quite slow on a 1.5B model. What time did you get for your model?

  • @rubencontesti221
    @rubencontesti221 3 месяца назад

    Another great video Mark! Wish you could share some good options for local hosting large models like Llama 3 70B quantized to 4 bits. I'm curious about the cheapest ways to host these models with my own server. Thank you!

    • @learndatawithmark
      @learndatawithmark 3 месяца назад

      If it's fully blown models, I think Hugging Face's inference server is best - github.com/huggingface/text-generation-inference
      If it's quantized models, llamafile are doing some cool work to make a super fast server - github.com/Mozilla-Ocho/llamafile