Hello and thanks for watching. I'm not sure, but in a chat situation, I would think it could be because it is storing previous prompts in memory in order to "remember" the previous conversations. How much it keeps could depend on the context window of the model. I have not tested this but that's my best guess. Someone else may have a better answer.
Thanks, keep the good work going.
Thanks, will do!
In my specific case (7900XT) it tries to locate ROCm0, as Iam using windows my computer freezes.
why after a while, the response_token/s gradually decreases?
i use rx 6600xt
Hello and thanks for watching. I'm not sure, but in a chat situation, I would think it could be because it is storing previous prompts in memory in order to "remember" the previous conversations. How much it keeps could depend on the context window of the model. I have not tested this but that's my best guess. Someone else may have a better answer.
@@TigerTriangleTech Thanks for your reply to my question, Hope you can solve it later😁