If you found DSPy useful, check out this TextGrad tutorial: ruclips.net/video/6pyYc8Upl-0/видео.html. TextGrad is another awesome LLM Prompt Optimization library that also tries to reduce prompt engineering in favor of a more programmatic approach.
Thank you for sharing step by step tutorial, i tried dspy with local Ollama, with Llama 3.1, the Chain of thought provided an different answer, i have shared the result, [ I don't know anything about football.] Reasoning: Let's think step by step in order to answer this question. First, we need to identify the team that won the World Cup in 2014. The winner of the tournament was Germany. Next, we should find out who scored the final goal for Germany. It was Mario Götze who scored the winning goal against Argentina. To determine who provided the assist, we can look at the details of the game and see that Toni Kroos made a long pass to André Schürrle, who then crossed the ball to Götze.
I love the content & found it really useful. Thank You! I have only 1 suggestion for you, "ZOOM IN" the code section as it's really difficult to see the code.
I've been impressed with the ability of LLMs to summarize academic research but also getting really frustrated with the limits and hallucinations. I wonder if programming my own LLM is the answer.
Thank you for making this video. This has been a great hands-on experience learning DSPy.. What are your thoughts on this - For building AI Agents or having more robust prompt programming, what other framework can be used?
My current go-to framework is definitely DSPy, although they do have their fair share of issues (like no async programming support, continuous changes, etc). There are a bunch of frameworks that try to do prompt tuning - I have played around with TextGrad, which uses feedback loops to improve LLM generated content/prompts. Many people swear by Langchain, although I personally haven't used it much. Langfuse is a good tool for maintaining prompts, and tracking usage/sessions etc - unnecessary for small scale apps. I have also used litellm, which is a good solution for load-balancing if you are using multiple different LLM providers in your application. It might sound funny but the most useful framework I have used for prompting is good old function composition in Python with some Jinja2 templating. :)
Excellent video. Thank you. Can I grab the resulting prompt? I know it is supposedly a new paradigm which abstracts it away, but some may still want to revert back to using a simple prompt in prod. post optimization
Hi! I am trying to use the LLM (Mistral-Nemo) for sentiment analysis. The issue I'm facing is that for the same input text, it returns different responses. Sometimes it identifies the sentiment as positive, and other times as negative. I have set the temperature to 0, but this hasn’t resolved the problem. Can using DsPy help solve this inconsistency, or is there another solution? Also , great video with precise and crisp explanation ! kuddos 👏
That’s indeed strange. Setting the temp to zero and keeping the entire prompt same generally return the same output coz the LLM chooses next tokens greedily. When I say “entire prompt” it includes the prompt as well as the input question. I don’t know the implementation details in Mistral Nemo, but if they still sample tokens with t=0 during decoding then I’m afraid we can’t do much to make it deterministic. Again, I am not aware. You may want to test with other LLMs once. DSPy might help, you could try it. Note that DSPy by default caches the results of past prompts in the session, basically reusing the cached outputs when the same prompt is re-ran. This means to test the consistency correctly, you must first set the caching off. (Control F “cache” here: dspy-docs.vercel.app/docs/faqs)
Great video. However , dspy seems to be very fragile; it breaks easily. e.g. at 11:00 you ask ""What is the capital of the birth state of the person who provided the assist for Mario Gotze's in the football World Cup finals in 2014?"" and it answers 'Mainz', which you said is correct. But if I make the question slightly different by adding "goal" after "Gotze's", so the question is now ""What is the capital of the birth state of the person who provided the assist for Mario Gotze's goal in the football World Cup finals in 2014?"", it answers "Research".
In general it’s the underlying LLM that could be “fragile”. Remember that DSPy is just converting your program into a prompt and sending to the LLM. The LLM generates the answer which depends on input prompts and temperature settings. Either way, as long as the concepts make sense, don’t worry about replicating the test cases shown in the video!
@@avb_fj I tested your exact code with and without the "goal". It responded correctly for both prompts using a local Ollama model: gemma2:27b. DSPy seems to work well with local models that are >20B parameters. Smaller local models especially Mistral-Nemo:12b work in some cases but tend to fail with multi-step (ChainOfThought) modules.
I just tried it. However it failed on the first try of the "basic" stuff. With GPT4, the BasicQA keep returning "Question: .. Answer: ..." , but I only need the answer itself, not the whole "Quesion: ... Answer: ..." .... So, which part I don't have to worry about the prompting part?
if the 60m parameter model does 50% accuracy, how can you improve this without using a bigger model? Because if you use a bigger model, then it actually just memorizes the data better. So it is basically overfitting, isn't it?
I believe so. It's on my bucket list of things to try out one day. Look into this page: dspy-docs.vercel.app/docs/building-blocks/optimizers and look for COPRO and MIPROV2 optimizers.
Haven't tested this myself, but I assume that you can call deepinfra models using OpenAI apis by changing the base_url parameter. deepinfra.com/docs/openai_api
Btw, I'm not a football fan, and although you mentioned in the note that it is okay .... no, it's really not ... even when I watched this video a few times, who did what still doesn't register in my brain...
Thanks for your comment. The DSPy documentation and official tutorial still uses it (links below), and it worked out for the examples I was going for in the tutorial. Whether the particular LM is deprecated or not is not important at all. You can replace it with whichever model you prefer… the concepts remain the same. dspy-docs.vercel.app/docs/tutorials/rag github.com/stanfordnlp/dspy/blob/main/intro.ipynb
If you found DSPy useful, check out this TextGrad tutorial: ruclips.net/video/6pyYc8Upl-0/видео.html.
TextGrad is another awesome LLM Prompt Optimization library that also tries to reduce prompt engineering in favor of a more programmatic approach.
I kinda feel guilty that I am seeing such content without paying anything! This is gold...Thank you!
Thanks!
You could always donate to his channel 😉
wrr
This is the only video or resource I've seen on DSPy that makes ANY sense. Great job!
Thanks to you, my friend, I learned what I haven't understood for days. I insistently want to learn dspy, but I didn't understand it.
Thanks a lot.
Glad to hear that! Your insistence has paid off! Good luck with your DSPy journey!
Same here, learnt what I couldn't from my other attempts
When this module came out the docs were very confusing. thank you for such a great explanation
Very good and clear explanation, thanks buddy👌
thanks for the video! It will be great to see how to use DSPy for agents.
Thank you for sharing step by step tutorial, i tried dspy with local Ollama, with Llama 3.1, the Chain of thought provided an different answer, i have shared the result, [ I don't know anything about football.]
Reasoning: Let's think step by step in order to answer this question. First, we need to identify the team that won the World Cup in 2014. The winner of the tournament was Germany. Next, we should find out who scored the final goal for Germany. It was Mario Götze who scored the winning goal against Argentina. To determine who provided the assist, we can look at the details of the game and see that Toni Kroos made a long pass to André Schürrle, who then crossed the ball to Götze.
Yes the results will vary depending on the underlying LLM and the specific instructions as well.
As a German I enjoyed the chosen example a lot 😄
😂
Yes boss! Subscribed! Great video and very much untapped territory, the only well made tutorial for dspy!
Thanks!
Wonderful content and Presentation loved the way you explained . Keep it up
I love the contents and presentations in this video! Keep it up!💙
😇
Great video man, also loved the one piece T-shirt ;)
Thanks for noticing!
I love the content & found it really useful. Thank You!
I have only 1 suggestion for you, "ZOOM IN" the code section as it's really difficult to see the code.
Thanks for the suggestion! Will keep it in mind next time.
Thanks for sharing this amazing tutorial
I've been impressed with the ability of LLMs to summarize academic research but also getting really frustrated with the limits and hallucinations. I wonder if programming my own LLM is the answer.
Thanks!
Really nice content. I liked and subscribed :-). Is there something that can be done easily with DSPy that cant be done with langchain?
Thanks man!!!
Thank you for making this video. This has been a great hands-on experience learning DSPy..
What are your thoughts on this - For building AI Agents or having more robust prompt programming, what other framework can be used?
My current go-to framework is definitely DSPy, although they do have their fair share of issues (like no async programming support, continuous changes, etc). There are a bunch of frameworks that try to do prompt tuning - I have played around with TextGrad, which uses feedback loops to improve LLM generated content/prompts. Many people swear by Langchain, although I personally haven't used it much. Langfuse is a good tool for maintaining prompts, and tracking usage/sessions etc - unnecessary for small scale apps. I have also used litellm, which is a good solution for load-balancing if you are using multiple different LLM providers in your application.
It might sound funny but the most useful framework I have used for prompting is good old function composition in Python with some Jinja2 templating. :)
GOAT
Awesome content, u can please try some text-grad? The Stanford guys release a paper about this in July
Thanks for the suggestion! Sounds like a good idea for a future video.
Pretty Nice explanation
Nice username 🍎
Thanks!
Awsome ! Please can you share the collab link of the examples shown in video
Excellent 👌
Thank you:)
Great video! Can you also please share the notebook with this code. This would help us in doing hands-on ourselves. Thanks!
Thanks! As mentioned in the video, currently all the code produced in the channel is for Patreon/Channel members.
Excellent video. Thank you. Can I grab the resulting prompt? I know it is supposedly a new paradigm which abstracts it away, but some may still want to revert back to using a simple prompt in prod. post optimization
Yes you can use the inspect_history() function as shown in the video (around 4:30) to check out all the previous prompts ran by a module.
Hi! I am trying to use the LLM (Mistral-Nemo) for sentiment analysis. The issue I'm facing is that for the same input text, it returns different responses. Sometimes it identifies the sentiment as positive, and other times as negative. I have set the temperature to 0, but this hasn’t resolved the problem. Can using DsPy help solve this inconsistency, or is there another solution?
Also , great video with precise and crisp explanation ! kuddos 👏
That’s indeed strange. Setting the temp to zero and keeping the entire prompt same generally return the same output coz the LLM chooses next tokens greedily. When I say “entire prompt” it includes the prompt as well as the input question. I don’t know the implementation details in Mistral Nemo, but if they still sample tokens with t=0 during decoding then I’m afraid we can’t do much to make it deterministic. Again, I am not aware. You may want to test with other LLMs once.
DSPy might help, you could try it. Note that DSPy by default caches the results of past prompts in the session, basically reusing the cached outputs when the same prompt is re-ran. This means to test the consistency correctly, you must first set the caching off. (Control F “cache” here: dspy-docs.vercel.app/docs/faqs)
@@avb_fj I guess , trying Dspy through Fewshot might help with the consistent result . Thank you so much though!
Great video.
However , dspy seems to be very fragile; it breaks easily.
e.g. at 11:00 you ask ""What is the capital of the birth state of the person who provided the assist for Mario Gotze's in the football World Cup finals in 2014?"" and it answers 'Mainz', which you said is correct.
But if I make the question slightly different by adding "goal" after "Gotze's", so the question is now ""What is the capital of the birth state of the person who provided the assist for Mario Gotze's goal in the football World Cup finals in 2014?"", it answers "Research".
In general it’s the underlying LLM that could be “fragile”. Remember that DSPy is just converting your program into a prompt and sending to the LLM. The LLM generates the answer which depends on input prompts and temperature settings. Either way, as long as the concepts make sense, don’t worry about replicating the test cases shown in the video!
@@avb_fj I tested your exact code with and without the "goal". It responded correctly for both prompts using a local Ollama model: gemma2:27b. DSPy seems to work well with local models that are >20B parameters. Smaller local models especially Mistral-Nemo:12b work in some cases but tend to fail with multi-step (ChainOfThought) modules.
Greate explanation.
can i have link for the Notebook that you have show on the video
As I mentioned in the video and on the description, the code is currently members/patrons only.
I just tried it. However it failed on the first try of the "basic" stuff. With GPT4, the BasicQA keep returning "Question: .. Answer: ..." , but I only need the answer itself, not the whole "Quesion: ... Answer: ..." .... So, which part I don't have to worry about the prompting part?
if the 60m parameter model does 50% accuracy, how can you improve this without using a bigger model? Because if you use a bigger model, then it actually just memorizes the data better. So it is basically overfitting, isn't it?
Also at 12:42 I am getting:
answer='Mario Götze' confidence=0.9
not Andre Schurrle
I quadruple-checked that my code is the same as yours.
Can we get Optimized Prompt using DSPy as we get using TextGrad? If yes How can we do it?
I believe so. It's on my bucket list of things to try out one day. Look into this page: dspy-docs.vercel.app/docs/building-blocks/optimizers
and look for COPRO and MIPROV2 optimizers.
i have a question can i use any other models other than openai? im running my own models in deepinfra.
Haven't tested this myself, but I assume that you can call deepinfra models using OpenAI apis by changing the base_url parameter.
deepinfra.com/docs/openai_api
@@avb_fj Thanks man ✨
@@avb_fj also another doubt can i use another vector db's like astra db into rag
Thanks.
@@JeevaPadmanaban check out the supported ones here:
dspy-docs.vercel.app/docs/category/retrieval-model-clients
Champ
you didn't show us the goal :(
Haha so I had a 5-second clip before, but FIFA copyrighted it so had to remove it.
@@avb_fj ah no worries, but thanks so much for teaching us I really appreciate it.
Plz share the colab in this video, not dspy example.
Btw, I'm not a football fan, and although you mentioned in the note that it is okay .... no, it's really not ... even when I watched this video a few times, who did what still doesn't register in my brain...
Fair criticism. I regretted using these examples soon after shooting the video. Future tutorials won’t have examples like these.
was hoping to make the prompts autonomized. i feel like you still need to understand prompting well before you can use this :(
You do realize that GPT 3.5 Turbo was deprecated aka it no longer exists.
Thanks for your comment. The DSPy documentation and official tutorial still uses it (links below), and it worked out for the examples I was going for in the tutorial. Whether the particular LM is deprecated or not is not important at all. You can replace it with whichever model you prefer… the concepts remain the same.
dspy-docs.vercel.app/docs/tutorials/rag
github.com/stanfordnlp/dspy/blob/main/intro.ipynb