Dolly 2.0 is fully permissive license for commercial use I thought? . Listed on your talk as proprietary license . Also the mosaic fully open source models look promising (MPT-7B)
This is correct! We got a bit tripped up on Dolly 2.0, as the licensing is weirdly complicated. From what we can tell, it has an MIT license on the weights (huggingface.co/databricks/dolly-v2-12b), an Apache license on the training/inference code (github.com/databrickslabs/dolly/blob/master/LICENSE) and a CC-BY-SA license on the training data (github.com/databrickslabs/dolly#model-overview). MPT wasn't out yet when these were recorded (three weeks ago, late April 2023), but we agree it looks promising. Especially the long context window models!
Fantastic "part 3" in a sequence of topics. The speaker (Josh) is very comfortable explaining application development for LLMs, which is our main focus in developing an AI certificate at our college. Josh is clearly experienced and enthusiastic about this field, and explains topics well!
I started working in this space a few months after Glove and Word2Vec embeddings came out back in 2014. I have to say when I see the word "bootcamp" in a title I usually run for the hills, but this guy actually gave a great presentation with a coherence and fluency showing he actually has experience and didn't just learn this from index cards 5 minutes before the presentation (my usual experience with bootcamps). Bravo!
if anyone else is exploring this chat its good to note that because LLMs are moving so fast, there are even more Apache 2.0 models out there that have been released after this presentation. RedPajama and GPT4All-j variants have A2.0 licenses and from memory their performance is decent.
What a phenomenal talk! Amazing slides, kept simple yet they do really add something to your great explanatioins. Showing the difference between then and now, between DNNs and LLMs operations was also great and a very kind wrap up in the first half.
This is exactly the thing I was looking for (having made a codebase analysis tool with LLM that I want to share with my team). Thank you for making this video for free. Much appreciation to whoever runs this channel.
Great Talk, a lot of work to be done in the LLM deployment/production scene for Software Eng/DevOps I like how this was recorded like 2 weeks maybe ago ? now it a bit aged look at anthropic announcing their 100k not out yet but more promising the 65k MPT-7B-StoryWriter-65k+ by Mosaic crazy how this field is progressing
## Choosing a base language model - Trade-offs to consider: - Out-of-the-box quality - Speed and latency - Cost - Fine-tunability - Data security - License permissibility - Conclusion: Start with GPT-4 for most use cases ## Managing prompts and chains - Level 1: No tracking - Level 2: Manage in git - Level 3: Use a specialized tool (if needed) ## Evaluating performance - Build evaluation set incrementally: 1. Start small 2. Use LM to generate test cases 3. Add more data as you discover failure modes - Metrics: - Accuracy (if correct answer exists) - Reference matching (if reference answer exists) - Which is better? (if previous answer exists) - Incorporates feedback? (if human feedback exists) - Static metrics (if no data exists) ## Deployment - Call API from frontend - Isolate LM logic as separate service (if needed) ## Monitoring - Outcomes - Model performance metrics - Common issues: incorrect answers, toxicity, etc. ## Improving the model - Use feedback to improve prompt - Optionally fine-tune the model
All one needs to do to track prompt accuracy, at least from a basic standpoint, is track prompts in git, as he mentions, but that have an automation pipeline that runs prompt changes against a ground truth or fine-tuning data set, probably in cicd. Have that pipe output statistical measurements and vois la: automated prompt comparison.
Really good talk! New to LLMs and learned a lot. At the end when you were talking about the iteration cycle, you were describing how you would come up with an idea as an individual, experiment a bit, then share with your team. As a software developer I find that pair or mob programming is really good approach at the start of a new piece of work. Do you have any thoughts on 'pair-prompting' as a way to improve the initial stage of the project ? After all, interacting with a LLM is a conversation, so having a few people working together on refining prompts could help reduce biases/assumptions you are introducing as an individual.
Can we find the slides used anywhere? The Fine tuning related slides were skipped due to shortage of time, but it seemed there is a lot of useful information in it too. If there is a link available to the slides, kindly share.
well your open source slide was just wrong. OpenRAIL absolutely does allow commercial use for both the BLOOM and BLOOMZ model. Oddly enough, BLOOMZ which is alot like gpt-3.5 is conspicuously missing from your slides.
The move from MLOps to LLMOps will be quite humbling for the MLOps world/hype. LLMs means custom internal DS/ML functions are no longer that important when you have commodity API to use. LLMOps then just becomes basic data engineering and management again
Definitely possible -- that's why we spent less time on deployment in the LLM Bootcamp than in our Deep Learning Course. But if FOSS models and finetuning take off, then MLOps concerns about experiment management and model versioning will come roaring back!
I'm surprised seeing claude-instant only got 1 out of 4 star in terms of quality. I've been using both chatgpt 3.5 and claude-instant and I much prefer claude-instant. in my opinion if chatgpt 3.5 receives 3 star then claude-instant deserves at least the same.. the issue with openai model is they put too much filter / constraints to the models...if I ask chatgpt something considered to be "sensitive" it just outright refuse to answer the question.
Interesting approach! For intensive and open-ended applications like agents, the LM calls can definitely add up to a ton of tokens. When using model providers, follow best practices for all cloud services, like putting guardrails in place to limit the pain from surprise bills.
We agree that mitigating prompt injection is critical for LLM-powered apps that use tools or access possibly sensitive information! Because prompt injection isn't solved yet, we covered it in our What's Next? lecture, where we discuss multiple safety+security concerns for LLM software: ruclips.net/video/ax_R4yz1WwM/видео.html
Oh that one. I thought meta’s. Haven’t experimented with OpenLlama so can’t say anything about performance but meta’s llama if open sourced will also open doors for it’s popular dialogue derivatives such as Vicuna and Koala.
This presentation is horribly outdated after 1 week. There are now super competent open source LLMs, uncencored that can be used as Auto-GPTs with Langchain and Pinecode and 100K tokens. Cmon, this bootcamp needs to chill or go streaming every second day to be relevant.
Some of the material we cover does change quickly, and the state of play for FOSS models happened to change a lot in the three weeks since we recorded this video! Here's hoping they keep improving. We really like HELM (crfm.stanford.edu/helm) and the LMSys leaderboard (chat.lmsys.org/?leaderboard) for keeping up with capabilities and benchmarking models against one another. What do you use?
@@The_Full_Stack I used hyperbole speech to point to the super progress AI is making and that this kind of conference would probably be better of waiting untill the progress reaches a steady state. I didn't mean to hurt peoples feeling. Sorry if i did.
@@BodinhoDE The Vicuna 13B model and comparable ones are far far better than what is suggested here (where they are rated as basically useless), that's the only misleading part of this video. But also, they can't update the video every week, so it's hard to be annoyed!
Dolly 2.0 is fully permissive license for commercial use I thought? . Listed on your talk as proprietary license . Also the mosaic fully open source models look promising (MPT-7B)
This is correct! We got a bit tripped up on Dolly 2.0, as the licensing is weirdly complicated.
From what we can tell, it has an MIT license on the weights (huggingface.co/databricks/dolly-v2-12b), an Apache license on the training/inference code (github.com/databrickslabs/dolly/blob/master/LICENSE) and a CC-BY-SA license on the training data (github.com/databrickslabs/dolly#model-overview).
MPT wasn't out yet when these were recorded (three weeks ago, late April 2023), but we agree it looks promising. Especially the long context window models!
Awesome, appreciate the detailed response!
Fantastic "part 3" in a sequence of topics. The speaker (Josh) is very comfortable explaining application development for LLMs, which is our main focus in developing an AI certificate at our college. Josh is clearly experienced and enthusiastic about this field, and explains topics well!
This is probably the best talk on LLMops from the dev perspective, as opposed to from the devops perspective, on the internet.
I started working in this space a few months after Glove and Word2Vec embeddings came out back in 2014. I have to say when I see the word "bootcamp" in a title I usually run for the hills, but this guy actually gave a great presentation with a coherence and fluency showing he actually has experience and didn't just learn this from index cards 5 minutes before the presentation (my usual experience with bootcamps). Bravo!
if anyone else is exploring this chat its good to note that because LLMs are moving so fast, there are even more Apache 2.0 models out there that have been released after this presentation. RedPajama and GPT4All-j variants have A2.0 licenses and from memory their performance is decent.
👍
What is the latest on these?
gem of a resource. concise and clear.
What a phenomenal talk! Amazing slides, kept simple yet they do really add something to your great explanatioins. Showing the difference between then and now, between DNNs and LLMs operations was also great and a very kind wrap up in the first half.
An amazing presentation. We definitely need more videos/content like this that can help navigate the quick-paced, dynamic tech world. Thank you.
dear youtube algo please give me more recommendation like this
Very useful session. I've learned a lot -- especially evaluation metrics for LLM. Thank you!
Dude, this was awesome. Thanks for spilling the beans on what's to come in our space ;)
Goldmine of information. Love it!
This is exactly the thing I was looking for (having made a codebase analysis tool with LLM that I want to share with my team). Thank you for making this video for free. Much appreciation to whoever runs this channel.
My boy Josh Tobin. legend
Great Talk, a lot of work to be done in the LLM deployment/production scene for Software Eng/DevOps
I like how this was recorded like 2 weeks maybe ago ?
now it a bit aged look at anthropic announcing their 100k not out yet but more promising the 65k MPT-7B-StoryWriter-65k+ by Mosaic
crazy how this field is progressing
## Choosing a base language model
- Trade-offs to consider:
- Out-of-the-box quality
- Speed and latency
- Cost
- Fine-tunability
- Data security
- License permissibility
- Conclusion: Start with GPT-4 for most use cases
## Managing prompts and chains
- Level 1: No tracking
- Level 2: Manage in git
- Level 3: Use a specialized tool (if needed)
## Evaluating performance
- Build evaluation set incrementally:
1. Start small
2. Use LM to generate test cases
3. Add more data as you discover failure modes
- Metrics:
- Accuracy (if correct answer exists)
- Reference matching (if reference answer exists)
- Which is better? (if previous answer exists)
- Incorporates feedback? (if human feedback exists)
- Static metrics (if no data exists)
## Deployment
- Call API from frontend
- Isolate LM logic as separate service (if needed)
## Monitoring
- Outcomes
- Model performance metrics
- Common issues: incorrect answers, toxicity, etc.
## Improving the model
- Use feedback to improve prompt
- Optionally fine-tune the model
Great talk!!
Fantastic talk!
Very impressive quality Lecture, excited to learn more. I’m looking to get started with making my own chatbot tutor.
I found exactly what I was searching for! The explanation was amazing and the insights were great
so good man, I really loved the model comparisons
All one needs to do to track prompt accuracy, at least from a basic standpoint, is track prompts in git, as he mentions, but that have an automation pipeline that runs prompt changes against a ground truth or fine-tuning data set, probably in cicd. Have that pipe output statistical measurements and vois la: automated prompt comparison.
Wow. Great timing for this. Thanks! The only model I’m missing in comparison is Open Assistant, seems to be fully “open”
Really good talk! New to LLMs and learned a lot. At the end when you were talking about the iteration cycle, you were describing how you would come up with an idea as an individual, experiment a bit, then share with your team.
As a software developer I find that pair or mob programming is really good approach at the start of a new piece of work. Do you have any thoughts on 'pair-prompting' as a way to improve the initial stage of the project ? After all, interacting with a LLM is a conversation, so having a few people working together on refining prompts could help reduce biases/assumptions you are introducing as an individual.
it feels quite ancient?after openAI dev day? things can be obsolete in months?
Why weren't GPT-J models included in the open source discussion?
Can we find the slides used anywhere? The Fine tuning related slides were skipped due to shortage of time, but it seemed there is a lot of useful information in it too. If there is a link available to the slides, kindly share.
Ignore the comment. I found the slide links in the description. Thank you! Excellent presentation ❤
So, my question is how flan-t5's context length is referred as 2K? As far as I know, it must be 512. Am I wrong?
well your open source slide was just wrong. OpenRAIL absolutely does allow commercial use for both the BLOOM and BLOOMZ model. Oddly enough, BLOOMZ which is alot like gpt-3.5 is conspicuously missing from your slides.
The move from MLOps to LLMOps will be quite humbling for the MLOps world/hype. LLMs means custom internal DS/ML functions are no longer that important when you have commodity API to use. LLMOps then just becomes basic data engineering and management again
Definitely possible -- that's why we spent less time on deployment in the LLM Bootcamp than in our Deep Learning Course.
But if FOSS models and finetuning take off, then MLOps concerns about experiment management and model versioning will come roaring back!
I'm surprised seeing claude-instant only got 1 out of 4 star in terms of quality.
I've been using both chatgpt 3.5 and claude-instant and I much prefer claude-instant.
in my opinion if chatgpt 3.5 receives 3 star then claude-instant deserves at least the same..
the issue with openai model is they put too much filter / constraints to the models...if I ask chatgpt something considered to be "sensitive" it just outright refuse to answer the question.
I'm doing initial coding on an open source model, then I can switch to gpt-4 once I know I'm not doing anything stupid like infinite loops
Interesting approach! For intensive and open-ended applications like agents, the LM calls can definitely add up to a ton of tokens.
When using model providers, follow best practices for all cloud services, like putting guardrails in place to limit the pain from surprise bills.
I was expecting more on deployment.
Comeon.. 😢😮😅
You guys barely mentioned Prompt injection attacks, come on this is a crucial aspect for the future of LLMs
We agree that mitigating prompt injection is critical for LLM-powered apps that use tools or access possibly sensitive information!
Because prompt injection isn't solved yet, we covered it in our What's Next? lecture, where we discuss multiple safety+security concerns for LLM software: ruclips.net/video/ax_R4yz1WwM/видео.html
cool
Calude was supposed to be 100k context....?
That only just dropped. I doubt this is up to date? Only at 5:33 atm
@@StephenRayner It is supported in Poe now
Correct! These videos are about three weeks old, and a lot happened in the FOSS model world in that time.
@@The_Full_Stack Does this mean that the time to obsolescence is getting drastically shorter?
Dolly is not proprietary.
Llama is OSS now
Source? Can't find anything regarding this
On the official model card it still says: License Non-commercial bespoke license
@@sachinkun21
OpenLLaMA is an open reproduction of LLaMA with the original architecture but trained with the RedPajama Dataset
@@sachinkun21 Released under the name OpenLLaMA under Apache 2.0
Oh that one. I thought meta’s. Haven’t experimented with OpenLlama so can’t say anything about performance but meta’s llama if open sourced will also open doors for it’s popular dialogue derivatives such as Vicuna and Koala.
This presentation is horribly outdated after 1 week. There are now super competent open source LLMs, uncencored that can be used as Auto-GPTs with Langchain and Pinecode and 100K tokens. Cmon, this bootcamp needs to chill or go streaming every second day to be relevant.
Some of the material we cover does change quickly, and the state of play for FOSS models happened to change a lot in the three weeks since we recorded this video! Here's hoping they keep improving.
We really like HELM (crfm.stanford.edu/helm) and the LMSys leaderboard (chat.lmsys.org/?leaderboard) for keeping up with capabilities and benchmarking models against one another. What do you use?
The presentation is mainly about how to evaluate / testing / deploying LLMs. Can you elaborate what is „horribly“ outdated on these topics?
@@The_Full_Stack I used hyperbole speech to point to the super progress AI is making and that this kind of conference would probably be better of waiting untill the progress reaches a steady state. I didn't mean to hurt peoples feeling. Sorry if i did.
@@BodinhoDE The Vicuna 13B model and comparable ones are far far better than what is suggested here (where they are rated as basically useless), that's the only misleading part of this video. But also, they can't update the video every week, so it's hard to be annoyed!