Elvis mentioned "there were other libraries like torchchat", what are those? After some simple research, I learned it's libraries like ollama (i.e., llama.cpp) or vllm. But that makes me wonder, why aren't people just using native pytorch for LLM inference? If these aren't stupid questions, does `torchchat` handle things like distributed inference, or does it create an API around the LLM to lower the barrier for inference (e.g., no need to encode/decode strings and tokens)?
First part is right. ollama, HF, llama.cpp, and many others. For second part, torchchat is quite new and I doubt they will have extensive support for inference although that’s changing quickly based on additions/LLM support to PyTorch.
Elvis mentioned "there were other libraries like torchchat", what are those? After some simple research, I learned it's libraries like ollama (i.e., llama.cpp) or vllm. But that makes me wonder, why aren't people just using native pytorch for LLM inference? If these aren't stupid questions, does `torchchat` handle things like distributed inference, or does it create an API around the LLM to lower the barrier for inference (e.g., no need to encode/decode strings and tokens)?
First part is right. ollama, HF, llama.cpp, and many others. For second part, torchchat is quite new and I doubt they will have extensive support for inference although that’s changing quickly based on additions/LLM support to PyTorch.