They should sell their LPUs instead and compete with Nvidia. They would surely get lots of backup and investments. They will probably be copied instead othetwise and fade away quickly.
Great video! Can ou make a voice chatbot using groq in one of your next videos please? I would also love to see if you do this on streamlit or if it's too slow and you use something else. Thanks so much for your videos
why you cant use the conversational retrieval chain instead of the conversation chain Because it can handle the memory by default no need maintain externally? @prompt Engineering
Thanks for your content! I´m using Streamlit as well and want to give Content as the System role. For Example "answer me in short sentences in italian" so it will do this for each prompt i do. Where can i do this in the code? I used the Streamlit Chatbot Repo. Thanks in advance
Heres the question, can Groq cards also work on inference for art and audio and voice models? or is it just LLM inference specific? It is like, well superfast... the only worry is literally the latency from you to the endpoint... so if its say, a streaming interruptible feed you are giving the model then the use cases for TTS and Speech applications just went through the damn roof!
I am not sure but I was listening to Chamath (who is an investor in Groq) and he was talking about the initial use cases of the hardware. Seems like they were focused on vision so it might have the ability
Depends on the amount of input tokens. With a one line instructions it's below 1 second. If you include context of a RAG-System it will go up to 3 seconds to start the first token (30k tokens of context)
almost a baby version of a quantum computer if you can actually perfect a model based on speed of responses to your questions and using the groq gpu...
It will be same as setting it zero :) basically if you set it zero, it will pick the next most probable token. If you set a higher value, it can to sample among the most probable tokens
Thanks for the video! I will start testing this API with a POC I am working now to learn.
They should sell their LPUs instead and compete with Nvidia. They would surely get lots of backup and investments. They will probably be copied instead othetwise and fade away quickly.
Great video! Can ou make a voice chatbot using groq in one of your next videos please? I would also love to see if you do this on streamlit or if it's too slow and you use something else. Thanks so much for your videos
Planning on making that. For voice chatbot, might just do cli though
why you cant use the conversational retrieval chain instead of the conversation chain Because it can handle the memory by default no need maintain externally?
@prompt Engineering
this is next level. OpenAI got some serious competition.
Please Create a step-by-step video guide on using the Groq API with Streamlit.
That's coming soon
Thanks for your content! I´m using Streamlit as well and want to give Content as the System role. For Example "answer me in short sentences in italian" so it will do this for each prompt i do. Where can i do this in the code? I used the Streamlit Chatbot Repo.
Thanks in advance
Awesome stuff !!!!
Heres the question, can Groq cards also work on inference for art and audio and voice models? or is it just LLM inference specific? It is like, well superfast... the only worry is literally the latency from you to the endpoint... so if its say, a streaming interruptible feed you are giving the model then the use cases for TTS and Speech applications just went through the damn roof!
I am not sure but I was listening to Chamath (who is an investor in Groq) and he was talking about the initial use cases of the hardware. Seems like they were focused on vision so it might have the ability
I am trying to put together an example for end to end speech conversation, let's see how that goes
Very Helpful.
How to control the output of LLM for a single input?
what is the time to receive the first chunk in streaming?
Depends on the amount of input tokens. With a one line instructions it's below 1 second. If you include context of a RAG-System it will go up to 3 seconds to start the first token (30k tokens of context)
What are the rate limits of the free api? Is it necessary to provide credit card?
It's free at the moment and there is a rate limit as well. Seems to keep changing. Last time I checked, it was around 20 messages per minute
tu est fort man ^^👍
Hi, this api have function calling? regards
how can the groq fpga use mixtral 8x7b with just 250gigs of vram?
Bcoz of groq tpu...
almost a baby version of a quantum computer if you can actually perfect a model based on speed of responses to your questions and using the groq gpu...
If temp can adjust to minus what is impact on generation ( consider it as hypothetical if case don't exist )
It will be same as setting it zero :) basically if you set it zero, it will pick the next most probable token. If you set a higher value, it can to sample among the most probable tokens
Can it run other models?
I tried a few things with this and it is incredibly fast.
I agree!
can we fine tune this and use it?
You can't fine-tune via their api yet.
Hi whether it's free or paid ?
Free at the moment
Wow
Fuck all these cloud only AI services, release the cards!
Yes, otherwise they will fade away quickly. Their window of opportunity is small. Money is looking at eating in the nvidia cake now not tomorrow.
Grok is not a llm it can run a llm
YALLM ... it is almost becoming daily news ... Yet Another LLM.
Fast but useless. These oss models still way far behind cgpt4.
Bro groq outsmarts GPT-4 in 70B model
It is super faster than gpt 4
This is an AD
did someone say free
For the time being :)