- Видео 40
- Просмотров 71 488
Luke Monington
США
Добавлен 24 июл 2020
Niels Bantilan: Bridging the Frontier of ML and Infrastructure at Union.AI
On this installment of the AI Ascent Podcast, join us as we dive deep with Niels Bantilan, Chief Machine Learning engineer at Union AI.
Discover the philosophy and tech driving Union.AI's primary open-source offering, Flyte, an orchestration system designed for tasks ranging from data processing to machine learning. Our conversation dives deep into the core of Union.AI, revealing their focus on data orchestration, infrastructure, and the reliable allocation of GPUs from the cloud. With users ranging from tech giants like Spotify and Stripe, Flyte offers solutions to bridge the gap between infrastructure teams and data practitioners. Niels also touches upon the dynamic nature of startups, ...
Discover the philosophy and tech driving Union.AI's primary open-source offering, Flyte, an orchestration system designed for tasks ranging from data processing to machine learning. Our conversation dives deep into the core of Union.AI, revealing their focus on data orchestration, infrastructure, and the reliable allocation of GPUs from the cloud. With users ranging from tech giants like Spotify and Stripe, Flyte offers solutions to bridge the gap between infrastructure teams and data practitioners. Niels also touches upon the dynamic nature of startups, ...
Просмотров: 129
Видео
Flemming Miguel: Translating AI Strategy into Real-World Success | AI Ascent Podcast #2
Просмотров 119Год назад
In this second episode of the AI Ascent Podcast, we dive deep into the world of AI Consulting with Flemming Miguel. He discusses the projects that he's been working on and we discuss deep topics on AGI, AI safety, the gap between open-source and proprietary LLMs, and the future of the job market. You can reach Flemming Miguel here: flemming@mapmakers.live Mapmakers.live
Codie Petersen: Sparsey Decoded and the Dawn of Superintelligence | AI Ascent Podcast #1
Просмотров 174Год назад
In this first episode of the AI Ascent Podcast, we discuss topics such as the gap between Open-Source LLMs and Proprietary Models, AGI, autonomous agents, AI consulting, and groundbreaking AI technologies like Sparsey. We'll discuss how these AI models echo the functioning of the human brain and their implications for industries like gaming. Get insights on brain implants, regulatory prospects ...
Open-Source LLMs Unleashed! Proven Strategies for Real-World Application to Dominate the Competition
Просмотров 1,1 тыс.Год назад
Article: medium.com/@lukemoningtonAI/introduction-d28b225b6025 Twitter: lukemoningtonAI Research Paper: arxiv.org/abs/2304.13712 Dive into the world of Open-Source Large Language Models (LLMs) and uncover their transformative impact on the field of natural language processing (NLP). This video delves deep into the capabilities of LLMs, and how they effectively address various NLP ta...
How to Fine-Tune Open-Source LLMs Locally Using QLoRA!
Просмотров 10 тыс.Год назад
In this video, I walk through a collab notebook in order to fine tune an open source LLM locally using QLoRA. Here is my Twitter: lukemoningtonAI Here is the Hugging Face collab notebook for QLoRA: colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing Feel free to post questions in the comments section! 0:00 Introduction 0:24 Explanation of Python Libraries 2...
AI Breakthrough Democratizes Open-Source LLMs! Introducing QLoRA and the Guanaco Family
Просмотров 4,7 тыс.Год назад
A major event just happened in the world of open-source LLMs brought about by the launch of QLoRA and Guanaco. A significant paradigm shift in AI fine-tuning, QLoRA, an open-source method, and Guanaco, the first model perfected with this new technique, have ushered in a groundbreaking era. These developments have democratized the fine-tuning of AI models, significantly reducing the computationa...
MPT-7B Guide: Chat with Open-Source LLMs Entirely Locally!
Просмотров 5 тыс.Год назад
This video provides a guide on how to run MPT-7B locally though two ways. The first way to install MPT-7B locally is through Docker on HuggingFace. The second way is through using the oobabooga text-generation-webui. This same process will work on any operating system. Just make sure to download the correct zip file for your operating system on the oobabooga webui. Here are all of the links: MP...
The Open Source LLM Revolution: RedPajama-INCITE Models Leading the Charge!
Просмотров 3,2 тыс.Год назад
Dive into the world of open source AI as we explore RedPajama's latest innovation, the INCITE family of models. In this video, we delve deep into the remarkable achievements of RedPajama, uncovering the capabilities of their 3B and 7B models, including base, instruction-tuned, and chat models. This impressive project aims to revolutionize the AI landscape by delivering high-performing models th...
Open Source LLMs Score Again! Introducing MPT-7B, Commercially Usable and Free of Charge
Просмотров 6 тыс.Год назад
In this video, I introduce MPT-7B, the latest development in the MosaicML Foundation Series. It is a commercially-usable, open-source LLM that was trained on 1 trillion tokens of text and code. Watch as we compare the MPT-7B to popular open-source LLMs like LLaMA series from Meta, the Pythia series from EleutherAI, the StableLM series from StabilityAI, and the OpenLLaMA model from Berkeley AI R...
AI Brain Implants are Closer Than You Think: Inside the Race to Push Human Potential
Просмотров 958Год назад
Dive into the groundbreaking world of AI brain implants with our enlightening video focusing on the trailblazing efforts of Neuralink and Blackrock Neurotech. In this video, we probe into the astounding potential of this cutting-edge technology, discussing how it might reshape everything from healthcare to virtual reality (VR) gaming. In the second half, we contrast Neuralink's ambitious vision...
PrivateGPT Quickstart Guide for Windows: Chat With PDF, TXT, and CSV Files Privately!
Просмотров 1,4 тыс.Год назад
Welcome to our quick-start guide to getting PrivateGPT up and running on Windows 11. PrivateGPT is a powerful local language model (LLM) that allows you to interact with your documents privately, without an internet connection, ensuring that no data leaves your execution environment. Built using innovative tools such as LangChain, GPT4All, LlamaCpp, Chroma, and SentenceTransformers, PrivateGPT ...
AI Deconstructed: Astonishing New Insights into Today's Most Sophisticated LLMs
Просмотров 1,8 тыс.Год назад
AI Deconstructed: Astonishing New Insights into Today's Most Sophisticated LLMs
Install Stable Diffusion Locally: Quick Setup Guide with Ubuntu 22.04
Просмотров 10 тыс.Год назад
Install Stable Diffusion Locally: Quick Setup Guide with Ubuntu 22.04
Google AI Document LEAKED: How Open Source is Quietly Defeating Google and OpenAI!
Просмотров 15 тыс.Год назад
Google AI Document LEAKED: How Open Source is Quietly Defeating Google and OpenAI!
Memory-Augmented Models: LLMs That Can Handle 2+ Million Tokens
Просмотров 173Год назад
Memory-Augmented Models: LLMs That Can Handle 2 Million Tokens
ChatGPT's Legal Nightmare: How OpenAI Plans to Tackle Data Privacy Concerns
Просмотров 124Год назад
ChatGPT's Legal Nightmare: How OpenAI Plans to Tackle Data Privacy Concerns
AutoGPT Quickstart Guide - More Exciting Than ChatGPT
Просмотров 286Год назад
AutoGPT Quickstart Guide - More Exciting Than ChatGPT
How to Run LLMs Locally without an Expensive GPU: Intro to Open Source LLMs
Просмотров 563Год назад
How to Run LLMs Locally without an Expensive GPU: Intro to Open Source LLMs
How Deep Learning Paved the Way for AI's Next Frontier
Просмотров 114Год назад
How Deep Learning Paved the Way for AI's Next Frontier
Mind Games: How AI is Evolving to Outsmart and Persuade Us
Просмотров 105Год назад
Mind Games: How AI is Evolving to Outsmart and Persuade Us
Pandas Tutorial - Importing Data and Slicing with df.loc() and df.iloc()
Просмотров 1124 года назад
Pandas Tutorial - Importing Data and Slicing with df.loc() and df.iloc()
Luke, your presentation is excellent. Your code works up until training is complete. After that I really need to save it locally. I realize you set this up for running on Colab, but the python code itself runs fine for python 3.10. However, if I save as "model.save_pretrained(model_id), it fails to load successfully. Could you be so kind as to include a save locally segment to your video and then demonstrate the ability to load and query that saved model? The code you show after training does not work. Much obliged Amigo!
captain?
sauce?
This guy knows his stuff of course!
Amazing video
One question is: in the fine-tuning dataset, the quote is {"quote":"“Ask not what you can do for your country. Ask what’s for lunch.”","author":"Orson Welles","tags":["food","humor"]}, but why the output after fine-tuning still shows the quot from J F Kennedy?
Great video! Explained everything clearly! The notebook runs smoothly out of box, LOL!
This looks like a website you are using which is the opposite of fine-tuning something locally.
That's an excellent short summary!
Hi Luke, does this work for any other models like llama2? and by doing it this way is everything kept locally on your own machine? Im specifically thinking about the dataset. If I have a custom dataset, which I do not want to be uploaded to any of huggingface's servers, will this approach ensure that? From the code in your video, it doesn't seem utilize any external servers for finetuning correct?
how to install ?
not working anymore
I'm pondering the best way to structure datasets for fine-tuning a language model. Does the structure depend on the specific LLM used, or is it more about the problem we aim to solve? For instance, if I want to train a model on my code repository and my dataset includes columns like repository name, path to file, code, etc., would it enable the model to answer specific questions about my repository?
Could you make an upsdate on this. I think it has changed a small amount
if we have data like this in a json file [ { "conversation_id": 1, "job_role": "Software Developer", "messages": [ {"role": "system", "content": "you are an interview ."}, {"role": "assistant", "content": "Welcome to the interview for the Software Developer role. I'm [Your Name], and I'll be your interviewer. Can you explain the concept of microservices architecture and its advantages over monolithic architecture?"} ] }, { "conversation_id": 1, "job_role": "Software Developer", "messages": [ {"role": "system", "content": "you are an interview ."}, {"role": "user", "content": "Candidate: Microservices architecture is an architectural style where an application is composed of loosely coupled and independently deployable services. Each service focuses on a specific business functionality. This approach offers advantages such as scalability, fault isolation, and independent deployment, enabling teams to work on different services simultaneously."}, {"role": "assistant", "content": "Score: 9.7"} ] }, { "conversation_id": 1, "job_role": "Software Developer", "messages": [ {"role": "system", "content": "you are an interview ."}, {"role": "assistant", "content": "Interviewer: Thank you for your detailed response. Now, let's discuss software testing. Can you explain the differences between unit testing and integration testing, and why both are important in the development process?"} ] } ] so how to train the model on this
is there a version of this tutorial that isn't using a jupyter notebook? I hate working with these
So that is the easy way to do this ...
I love your stuff. Thank you so much! Would you consider dark screen mode for your desktop though? That would safe my eyes, especially if I preview on a TV.
Regulation is evil. Especially in a young evolving technological field. GPUs? Decent gaming GPUs are more than sufficient. Try to lock those up. The dystopian total Police State needed to stop anything on the internet at large and on everyone's computer is much much more dangerous than LLMs.
what should i put in the command line ag if i have both low vram and low ram?
Loved it. Very explanatory and very detail explanation. Following.
Love your tutorial on QLoRa. Would you happen to have a tutorial on fine-tuning llm with pdf or docx files? I already tried vector embedding with the Falcon-7b model to embed some of my local data files but the output was not got. I wanted to see if the output will be better with finetuning.
Musks robot can basically only walk in a straight line... Slowly... So no matter how intelligent it might be, if it tries to kill you - just step aside.
Great video, I look forward to the in-depth analysis. Nice view from the top of the Sydney Harbour Bridge hey!
Hello, a really exciting project with a lot of potential! This raised two questions for me: 1. How to customize the model by training it to recognize and follow specific storylines? 2. Is there generally no interface for internet use for privateGPT or was it just not configured accordingly in the current project? Thanks and keep it up!
Awesome but how to merge the Lora to the model like in the original 4bit repo?
I'm desperately searching for this type of content
Thanks, great tutorial!
I actually didn't know that
how lazy can we get
It’s not being lazy, it’s called being smarter. A regular Uber driver makes $10 after all the deductions but a Waymo software engineer who drives the cars from afar makes $162,000 just at the base level.
so safety is lazy now
Useful, concise summary of a rapidly changing landscape.
Thanks!
Great article. I just heard about using an LLM to subtask fine-tuned models for outputs that the LLM then uses in it's response. That might be the way to go. I also wanted to mention that we do have a problem with human bias when using humans to rate the output of a model. Perhaps we need a mechanism for defining "human values" before we try to align AI to some philosophical Jello TM.
Ah yes, that’s a good idea: to have an LLM provide the response, but give it access to fine-tuned models to leverage each of their strengths Yes humans also have biases. Then when the models are trained using techniques such as RLHF, then those biases can be imprinted into the model. The filters that are placed on the models may not be the right filters. In the open-source space, luckily there are a lot of models, so a different model can be used if the filter isn’t helpful
How do I train the LLM to answer multiple choice questions with predictible levels of difficulty.
That sounds like a natural language understanding task, so a fine-tuned model would likely perform the best. It’s a task where the input is a sequence and the output is a prediction, which means that an encoder-type model would be preferred. So then it would be a matter of curating a dataset, deciding on a model (ideally a model pre-trained on similar data), and then fine-tuning the model
Thanks for touching on the topic of bias. It is something that is not being talked about nearly enough.
For sure! Yea it’s an important topic
dont suppose you know how to modify the code to work on a tpu?
Hmm 🤔 I haven’t done any work with the TPUs, so I’m not sure on this one
do you have a video on pre-training an LLM?
No I do t have any videos on that. Pre-training can be a lot more resource intensive and require more gpu vram. With the parameter efficient techniques, fine-tuning can now be done on a single gpu. But when it comes to pre-training from scratch, that would require substantial compute. For example, MPT-7B required 440 GPUs trained over the course of 9.5 days
May be a discussion on what issues arise with scaling. Like parallelism.
@@user-wr4yl7tx3w The parallelism is an interesting point. I remember reading the research papers as these transformers were growing. Initially, the entire model could fit on a single gpu. But then the models grew too large to fit on a single gpu, so they had to find a way to split a model onto multiple gpus by putting a layer onto each gpu (or some other splitting method) Then the models kept growing and got to the size that a single layer couldn't fit on a single gpu. So that required further advancements such that a part of a layer could fit on each gpu. This all ties back to the hardware problem of ai - the fact that model size has increased by 1000x+ while gpu vram size has only increased by ~3x. I was thinking about creating a video on that. It might be an interesting video
Finally! someone who actually walks through the code line by line with explanation :D thank you for creating this tutorial
You’re welcome!
Thanks was going to try and figure this out my self. Appreciate the walk through
For sure!
I've tried the full precision Guanaco 65B. Its quality is nowhere near GPT-3.5-Turbo, at least not in simple coding tasks. It is still the best open-source model I've tried so far. Its license is not friendly to commercial use either, so we've got another toy model useful only for YT'ers to make a video about.
I don’t think any models compare with chatgpt and gpt4 when it comes to coding. Not even Google Bard / PaLM 2
Prob one of the best walk throughs, thanks!
No problem!
This is so cool. Would be great to have a good understanding of the timeline for fine-tuning a model in this awesome new way of doing it. Especially how it relates to the size of a given dataset
That's going to also depend on factors such as model size and context length as well. For a smaller model, the fine-tuning will be much faster. Last night I was fine-tuning a 5 Billion parameter model on a sequence-to-sequence dataset of ~50,000 rows, and it was looking like it would take 36 hours on my RTX 4090. But this is going to depend a lot. Ideally, it would be good to find ways to fine-tune the models so that they can be done in less time. The QLoRA paper says: "Using QLORA, we train the Guanaco family of models, with the second best model reaching 97.8% of the performance level of ChatGPT on the Vicuna [10] benchmark, while being trainable in less than 12 hours on a single consumer GPU; using a single professional GPU over 24 hours we achieve 99.3% with our largest model, essentially closing the gap to ChatGPT on the Vicuna benchmark." So I certainly think it is possible to fine-tune a model and get great performance in 12-24 hours.
Falcon has just been re-licenced as Apache 😂 👍 The QLoRa falcon derivatives will not only outperform Llama, but will be fully open source.
Yea for sure! I'm excited to do some fine-tuning with Falcon 40B
Wow, this is slow. Funny that it thinks Trump gave the speech, not Biden. It must have inferred that due to it's date of training and that is the president's speech. Thanks for this info. I did get it working, at least.
Yea it’s definitely slow. It’s running on the cpu, which is going to cause a massive performance hit. If there was a way to connect it up to a gpu, the speed would be much better. Glad you got it working!
Does fine-tuning get you better results than relying on vector store, as way to include personal knowledge?
Fine-tuning and using a vector store are two distinct methods for incorporating domain-specific knowledge into an AI model. There are advantages for each. Fine-tuning involves adjusting the parameters of an already trained model using a smaller, specific dataset. This allows the model to specialize in a particular domain or task. Fine-tuning can provide excellent results but it may require a sizable dataset of the specific knowledge domain. Fine-tuning can also be used to train the model to respond in a particular way, which would be important in a business context because the business may want the LLM to respond in a particular way that aligns with the brand. Fine-tuning can also be used to create filters so that the LLM doesn't say anything that it isn't supposed to do. There are other ways of creating a filter though, such as RLHF. A vector store can allow the model to retrieve information that it might not have had access to during training. It's a more flexible way to add new information after training, but the AI must still be capable of utilizing that information effectively. The retrieval of information from the vector store may not be as smooth or contextual as with a fine-tuned model. In terms of achieving better results, it depends on the specifics of the task or application. For many real-world AI applications, a combination of fine-tuning and a vector store might be used. Fine-tuning can help the model understand and generate language more effectively in a specific context, while a vector store can provide the model with access to a large, up-to-date knowledge base.
if I wanted to finetune the model on a language that the original training data likely didn't include how much text would I need and how would I approach that?
well there's multiple different considerations that would go with that. For example, what kind of task are you looking to do in the other language? Is it just next-token prediction? Or are you looking to do something like sequence-to-sequence translation, where you give it input in one language and then it translates into another language? In a case like that, you would set up your training pipeline slightly differently from the way that it is done in the video. This would be done by designing a dataset that has an input and output, and then maybe creating a pipeline using AutoModelForSeq2SeqLM. The amount of data required would also depend on the task and also on the size of the model. Some of the most powerful LLMs have an emergent capability where it has already learned additional languages that it was not explicitly meant to learn. Additionally, there are many LLMs available on the HuggingFace hub that already are trained to work with multiple languages. So, depending on your use case, you can pick the correct model accordingly. The larger models are much more powerful. Usually I try to use the largest LLM possible since that will get the best response. I notice a huge jump in performance as I move from using a 3B model to a 7B and up to 30B+. To answer your question about how much text would be required, this can vary, but on the OpenAi website, they say "The more training examples you have, the better. We recommend having at least a couple hundred examples. In general, we've found that each doubling of the dataset size leads to a linear increase in model quality." But I think if you were teaching an LLM that had no prior exposure to a new language, it would likely take much more than a few hundred examples
@@lukemonington1829 thanks for the elaborate response. Actually the language I want to train it on (it's Persian btw) is already generated by some of the free language models however they don't make a single error free sentence at the moment and even GPT-4 makes many mistakes writing it. As my country is lacking teachers I was thinking about training it to explain educational material in a back and forth fashion where the students can ask anything. So it will be similar to training an assistant I think. The fact that even any kind of training can be done just by a few hundred examples is a great news to me. I was thinking about much higher numbers as I am very new to all this. Thanks
@@sinayagubi8805 I'm going to be posting my next video (hopefully tomorrow) where I'll touch on this point a little bit. The video will be a practical guide to using LLMs. This falls into the machine translation task of text generation. It turns out that for general machine translation tasks, using a regular LLM (without fine-tuning) is better. But when it comes to low-resource machine translation, where there is a limited amount of parallel corpora available, fine-tuning the model can lead to a large increase in performance
What about using a vector db like chrome db for dataset is that pos in this case?
@@lukemonington1829 cool, eagerly waiting for it.
48gb consumer grade?
I thought the same thing when I read the research paper and they were implying that 48gb is consumer grade. When I wrote this article, I kept that idea in. I can't speak for them on why they consider 48gb to be consumer grade. But if I were to hypothesize, I think the reason why they consider 48gb to be consumer grade is that a professional gpu with 48gb of vram can be rented online for less than $1 per hour. In the paper, they didn't do any fine-tuning runs for longer than 24 hours, which means that it would cost at most $24 per fine-tuning run. This is cheap enough that a normal person can do their own fine-tuning runs if they wanted to
@@lukemonington1829 Its amazing a single gpu with 48gb can train any LLM but consumer grade is pushing things to far :)
@@rolyantrauts2304 yea, I agree. At the 48gb size, that means that it couldn't be done locally anymore, so that feels like it should be another tier higher
I guess if you have unfied memory as in a Mac then maybe possible?
@@rolyantrauts2304 yes it definitely could be done by splitting the model between the gpu and the cpu. This can also be done in Windows / Linux. It's even possible to split between the gpu, cpu, and harddrive for extremely large models. Through a process like that, inferencing can can be done with even the largest of models on a consumer level computer. But splitting between devices dramatically reduces the speed that training and inferencing can be done. Personally, if I can't load the full model onto the gpu, then it's not practical for me to use since it is too slow. I think training would be impractical too, it would take a lot longer. But it would be possible if someone was determined to do that lol
Fun Fact this is the exact reason why Open AI is pushing for regulations so that open source will be years behind the bigger companies
It does appear that the big companies will be able to use regulations in order to build a moat so that startups and small companies won’t be able to compete without huge amounts of funding
In linux how do u select a file located for batch
I'm not quite sure what you mean
Hi, cloud you show with a different model like llama model and a different datasets? I tried to change the model to a llama model and got a bunch of errors in tokenizer. Also it will be so great if you make a video about how to prepare and clean datasets for training! Thanks
I appreciate the suggestions! Yea I think that would be good. I'm looking into preparing and cleaning my own datasets too. I might make a video on that! And yes, it looks like it doesn't work with all models. So far, I've tested it with pygmalian, wizardLM. I'll be testing with others in the future. I tried llama as well and got errors as well. Also, I was having trouble initially getting this running locally, so I ended up creating a Dockerfile and docker-compose, and that solved my problem. Let me know if it would be helpful for me to post the code on github for that!
thinking about it a little more, it should definitely work with the llama model, since the Guanaco model family was fine tuned based off of the llama model. There must be a way to do it
@@lukemonington1829 Thank you