LLMs are computational expensive so to rub them locally you need more resources. You can consider running them locally if you’ve more computational power to handle their needs
I have enough compute power infact i have 3 rtx 8000 and total compture power is 144GB . I run llama-3 quantize model in offline mode .the size of model is just 40 gb but the problem is that my inference time is so hight i want to reduce the inference time. Is it possible to use groq offline and is an other option ls availabale?
Am getting real time response. Great tutorial
This is insanely fast. Thanks for this tutorial
This solution only for online inference. I want to run offline mode than should i do??
LLMs are computational expensive so to rub them locally you need more resources. You can consider running them locally if you’ve more computational power to handle their needs
I have enough compute power infact i have 3 rtx 8000 and total compture power is 144GB . I run llama-3 quantize model in offline mode .the size of model is just 40 gb but the problem is that my inference time is so hight i want to reduce the inference time. Is it possible to use groq offline and is an other option ls availabale?
Compute power 144gb
@@MuhammadAdnan-tq3fx Groq uses their proprietary hardware (lpu) and not gpu. The optimization is hardware level.
@@antonvinny how can i ise groq offline??