Is there any chance you can share the source code to your streamlit app. I've been looking to create my own LLM benchmarking tool on streamlit as well and when I saw you pull out your benchmarking app I got super excited. But unfortunately no link in description :(
Which LLM from all you tested up to now(in general, not only the ones you talked about in this video) is the best at this moment at breaking down subjects that are at a university level using pedagogical tools? If I request the model to read 2-3 books on pedagogical tools can it properly learn how to use these tools and actually apply them on explaining clearer and better the subjects?
This video is focused on which models perform the best at generating source code (that is to say Java, C++, python etc.). On the other hand the subject of this video -> Text Summarisation Showdown: Evaluating the Top Large Language Models (LLMs) ruclips.net/video/8r9h4KBLNao/видео.html is on text generation/translation/summarization etc. Perhaps the other video is more what you are looking for? In either event the key takeaway is that by all means rely on public, published benchmarks. But if you want to evaluate models on your specific use-case (and if I correctly understand your question, I think you do) then it might be worth considering setting up your own tests and your own benchmarks for your own specific evaluation. Clearly there is a trade off here. Setting up custom benchmarks and tests isn’t free. But if you understand how to build AI models, then it isn’t that complex either.
Thanks for the clarification. The challenge with reading 2 or 3 books will be the size of the LLMs context window (the amount of tokens that can be input at once). Solutions to this involve using vector databases - example here -> ruclips.net/video/jP9swextW2o/видео.html This involves writing Python code and development frameworks like LangChain. You may be an expert at this, in which case I'd recommend some of the latest Llama models and GPT-4. Alternatively you can use Gemini and Claude 3 and feed in sections of the books at a time (up to the token limit of the LLM). These models tend to perform the best when it comes to breaking down complex, university-level subjects. They seem to have a strong grasp of pedagogical principles and can structure explanations in a clear, easy-to-follow manner. That said, I haven't specifically tested having the models read books on pedagogical tools and then applying those techniques. It's an interesting idea though! Given the understanding these advanced models already seem to have, I suspect that focused training on pedagogical methods could further enhance their explanatory abilities. My recommendation would be to experiment with a few different models, providing them with sample content from the books and seeing how well they internalize and apply the techniques. You could evaluate the outputs to determine which model best suits your needs.
Is there any chance you can share the source code to your streamlit app. I've been looking to create my own LLM benchmarking tool on streamlit as well and when I saw you pull out your benchmarking app I got super excited. But unfortunately no link in description :(
github.com/mrspiggot/LuciSummarizationApplication With thanks and apologies. I've just updated the description. Enjoy the repo!
@@lucidateAI No, thank you for the rapid response. You sir just earned another subscriber👍
Thanks! I hope you enjoy the other videos on the channel as much as this one.
How have you got on with the code in the repo? Have you been able to use it as a platform to add your own functionality?
@@lucidateAI❤
Which LLM from all you tested up to now(in general, not only the ones you talked about in this video) is the best at this moment at breaking down subjects that are at a university level using pedagogical tools? If I request the model to read 2-3 books on pedagogical tools can it properly learn how to use these tools and actually apply them on explaining clearer and better the subjects?
This video is focused on which models perform the best at generating source code (that is to say Java, C++, python etc.). On the other hand the subject of this video -> Text Summarisation Showdown: Evaluating the Top Large Language Models (LLMs)
ruclips.net/video/8r9h4KBLNao/видео.html is on text generation/translation/summarization etc. Perhaps the other video is more what you are looking for? In either event the key takeaway is that by all means rely on public, published benchmarks. But if you want to evaluate models on your specific use-case (and if I correctly understand your question, I think you do) then it might be worth considering setting up your own tests and your own benchmarks for your own specific evaluation. Clearly there is a trade off here. Setting up custom benchmarks and tests isn’t free. But if you understand how to build AI models, then it isn’t that complex either.
@@lucidateAI I reformulated a bit my inquiry since it was not clear enough. Can you read it again please?
Thanks for the clarification. The challenge with reading 2 or 3 books will be the size of the LLMs context window (the amount of tokens that can be input at once). Solutions to this involve using vector databases - example here -> ruclips.net/video/jP9swextW2o/видео.html This involves writing Python code and development frameworks like LangChain. You may be an expert at this, in which case I'd recommend some of the latest Llama models and GPT-4. Alternatively you can use Gemini and Claude 3 and feed in sections of the books at a time (up to the token limit of the LLM). These models tend to perform the best when it comes to breaking down complex, university-level subjects. They seem to have a strong grasp of pedagogical principles and can structure explanations in a clear, easy-to-follow manner.
That said, I haven't specifically tested having the models read books on pedagogical tools and then applying those techniques. It's an interesting idea though! Given the understanding these advanced models already seem to have, I suspect that focused training on pedagogical methods could further enhance their explanatory abilities.
My recommendation would be to experiment with a few different models, providing them with sample content from the books and seeing how well they internalize and apply the techniques. You could evaluate the outputs to determine which model best suits your needs.