- Видео 55
- Просмотров 213 716
AemonAlgiz
Добавлен 28 ноя 2006
I’m an AI and Machine Learning expert! I love talking about and teaching the cutting edge in technology. I make tutorials, funny videos about technology, I also have dogs, and I love them too, so expect the random video of my dogs.
Large Language Models Process Explained. What Makes Them Tick and How They Work Under the Hood!
Explore the fascinating world of large language models in this comprehensive guide. We'll begin by laying a foundation with key concepts such as softmax, layer normalization, and feed forward layers. Next, we'll delve into the first step of these models, tokenization, explaining the difference between word-wise, character-wise, and sub-word tokenization, as well as how they impact the models' understanding and flexibility.
After setting a strong base, we'll dive deeper into the fascinating process of embedding and positional encoding. Learn how these techniques translate human language into a language that models can understand, ultimately creating a space where tokens can relate to one an...
After setting a strong base, we'll dive deeper into the fascinating process of embedding and positional encoding. Learn how these techniques translate human language into a language that models can understand, ultimately creating a space where tokens can relate to one an...
Просмотров: 3 268
Видео
SuperHOT, 8k and 16k Local Token Context! How Does It Work? What We Believed About LLM’s Was Wrong.
Просмотров 3,9 тыс.Год назад
Hey everyone! Today, we're delving deep into SuperHot, an innovative approach that dramatically extends the context length of the links we're used to, from 8K to a whopping 16,000 tokens. You might wonder, how is this possible? Or what were the hurdles we had to overcome? In this video, we unravel these questions and discuss the strategies we developed to address the issue of context length. We...
Reinforcement Learning From Human Feedback, RLHF. Overview of the Process. Strengths and Weaknesses.
Просмотров 1,6 тыс.Год назад
Dive into the captivating world of Reinforcement Learning with Human Feedback (RLfH), one of the most sophisticated topics in fine-tuning large language models. This comprehensive guide offers an overview of crucial concepts, focusing on powerful techniques like Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). We begin with an exploration of reinforcement learning...
Landmark Attention Training Walkthrough! QLoRA for Faster, Better, and Even Local Training.
Просмотров 2,7 тыс.Год назад
Hey all, I think I've got the audio fixed this time, hopefully! Explore the intricate process of utilizing Landmark attention for fine-tuning models, ensuring better context awareness. This detailed walkthrough explains how to correctly set up OobaBooga for Landmark attention and subsequently adjust your models and related hyperparameters for optimal results. This is done leveraging the incredi...
Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!
Просмотров 10 тыс.Год назад
In this video, we discuss large language models and why they have context length limits. We begin by explaining the significance and challenges of increasing the context length limit. The video offers a detailed explanation of computational concepts like Big O notation, its implications, and examples. It also clarifies how time and space complexity works with attention layers in large language ...
How To Create Datasets for Finetuning From Multiple Sources! Improving Finetunes With Embeddings.
Просмотров 40 тыс.Год назад
Today, we delve into the process of setting up data sets for fine-tuning large language models (LLMs). Starting from the initial considerations needed before dataset construction, we navigate through various pipeline setup questions, such as the need for embeddings. We discuss how to structure raw text data for fine-tuning, exemplified with real coding and medical appeals scenarios. We also exp...
QLoRA PEFT Walkthrough! Hyperparameters Explained, Dataset Requirements, and Comparing Repo's.
Просмотров 4,2 тыс.Год назад
Today explore two different applications for fine-tuning large language models using QLoRAs: the Alpaca QLoRA and the official QLoRA. We delve into the setup and installation process, highlighting the ease of the Alpaca QLoRA compared to the more powerful but complex official QLoRA. Walkthroughs of each setup, troubleshooting advice, as well as an explanation of the functionalities and differen...
QLoRA Is More Than Memory Optimization. Train Your Models With 10% of the Data for More Performance.
Просмотров 7 тыс.Год назад
Today we explore the groundbreaking innovation in fine-tuning large language models - QLoRAs or Quantized Low-Rank Adapters. Delving into its rationale, underlying mathematics, and the advantages it holds over the previous versions of LoRAs, we present a comprehensive guide to understanding and leveraging this new technology. We start with a quick recap of LoRAs and their role in making trainin...
PEFT LoRA Finetuning With Oobabooga! How To Configure Other Models Than Alpaca/LLaMA Step-By-Step.
Просмотров 16 тыс.Год назад
This is my most request video to date! A more detailed walk-through of how to perform LoRA Finetuning! In this comprehensive tutorial, we delve into the nitty-gritty of leveraging LoRAs (Low-Rank Adaption) to fine-tune large language models, utilizing Oogabooga and focusing on models like Alpaca and StableLM. The video begins with a discussion about the important considerations to bear in mind ...
What Is Positional Encoding? How To Use Word and Sentence Embeddings with BERT and Instructor-XL!
Просмотров 1,6 тыс.Год назад
Correction: 0:00 It appears that the audio on this video got de-synced after uploading. Today we dive into the fascinating world of word and sentence embeddings and their role in large language models. In this video, we explore how positional encoding and word embeddings are used to enable a deeper understanding of natural language. We delve into concepts like the attention layer, the importanc...
LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?
Просмотров 25 тыс.Год назад
We dive deep into the world of GPTQ 4-bit quantization for large language models like LLaMa. We'll explore the mathematics behind quantization, immersion features, and the differential geometry that drives this powerful technique. We'll also demonstrate how to use the GPTQ 4-bit quantization with the Llama library. This video is a must-watch if you're curious about optimizing large language mod...
Fine-Tune Language Models with LoRA! OobaBooga Walkthrough and Explanation.
Просмотров 39 тыс.Год назад
In this video, we dive into the world of LoRA (Low-Rank Approximation) to fine-tune large language models. We'll explore how LoRA works, its significance in reducing memory usage, and how to implement it using oobabooga's text generation web UI. Whether you're a beginner or a pro, this step-by-step tutorial will help you harness the power of LoRA to improve your language model's performance. Do...
Google's Concern Over No Moat Against Open Source LLMs: The Breakthrough Technologies Behind It.
Просмотров 932Год назад
Google's Concern Over No Moat Against Open Source LLMs: The Breakthrough Technologies Behind It.
What is Tokenization in Transformers and How Are They Made? Byte Pair Encoding Explained Simply.
Просмотров 1,9 тыс.Год назад
What is Tokenization in Transformers and How Are They Made? Byte Pair Encoding Explained Simply.
Embeddings in Machine Learning! What Are They and How Do We Use Them? How We Can Teach Computers.
Просмотров 1,2 тыс.Год назад
Embeddings in Machine Learning! What Are They and How Do We Use Them? How We Can Teach Computers.
Updated Installation for Oobabooga Vicuna 13B And GGML! 4-Bit Quantization, CPU Near As Fast As GPU.
Просмотров 13 тыс.Год назад
Updated Installation for Oobabooga Vicuna 13B And GGML! 4-Bit Quantization, CPU Near As Fast As GPU.
Why Search Your Data When You Can Ask It? Vector Databases and Embeddings to Level Up Your Corpus.
Просмотров 737Год назад
Why Search Your Data When You Can Ask It? Vector Databases and Embeddings to Level Up Your Corpus.
Vicuna 13B V1.1! With 4-Bit Quantization, What Can't it Run On? OobaBooga One Click Installer.
Просмотров 4,4 тыс.Год назад
Vicuna 13B V1.1! With 4-Bit Quantization, What Can't it Run On? OobaBooga One Click Installer.
FastChat Vicuna Can Support What!? A Complete Walkthrough. Will you use ChatGPT again?
Просмотров 4,5 тыс.Год назад
FastChat Vicuna Can Support What!? A Complete Walkthrough. Will you use ChatGPT again?
ChatGPT Killer On Your Computer? Let's Install FastChat Alpaca Vicuna!
Просмотров 4,1 тыс.Год назад
ChatGPT Killer On Your Computer? Let's Install FastChat Alpaca Vicuna!
Hugging GPT - Automate Your Life, With GPT and Hugging Face!
Просмотров 568Год назад
Hugging GPT - Automate Your Life, With GPT and Hugging Face!
Fine tune ChatGPT. Leverage Your Data to Enhance Search and Improve Your Chat Experience.
Просмотров 720Год назад
Fine tune ChatGPT. Leverage Your Data to Enhance Search and Improve Your Chat Experience.
55555555555555555555555
I'm still lost when it comes to models, there seems to be error after error when training with oobabooga
The amount of energy and resources needed to train and develop LLM is sorely overlooked.
Nice explanation, thank you!!
Thanks for the clear and concise explanation, it was perfect.
Thanks a lot for this explanation. How can you even out the errors via the next weights, when you do not now in advance what activation value the weights will be multiplied with?
top notch content
Wow, this incredibly cool for a web inferface and to keep my training going while i wait for for my python to get better 😂 Would love to know about why you used that specific model and what models it can be ised with? I'd love to try it on an uncesored.
Thank you for this. I am sick of seeing click bait or low quality content when Im trying to self educate about llms and get some good knowledge .
Do you have any videos on how to use already existing Loras? I'm running 7-10b Exl2 models, but I have no idea how to use Loras.
wow such a in debt training. I am an IT/Cyber security senior and played with llm's couple of years back. training a statistics model. Your video's helped me a lot to get up to speed again. Thank you for the detailed training.
tks
thanks
You’re literally a genius! I appreciate you taking the time to share the knowledge with us! Exactly what I was looking for… how to create a dataset and in such a well put together video. Thank you
just works with Nvidia ... the title of video is fake: "NO GPU REQUIRED" but the exemplo he used de GPU.
Great explanation. Some questions: When we are quantising and computing the quantisation loss, do we not need to supply some data for it to compute the loss against? If not, how exactly is this loss computed? (surely we need some inputs and expected outputs to compute this loss, is this why all of the weight errors were 0 when you quantised? ) If we do, could this be interpreted as a form of post training, quantisation 'fine-tuning'? By this I mean that we can use domain data in the quantisation process to help preserve the emergent features in the model that are most useful for our specific domain dataset? Thanks!
Finally some freaking great tutorial! Practical, straight to the point and it works!!
I would pay a lot of money for this information, thank you.
Thanks for the walkthrough! So I'm running the utility on an i7 16 GB RAM (no GPU) laptop, but for me it takes close to 10 mins for each response to fully generate. How did you manage to speed yours up? I'm also using Vicuna 7b
hmm... : I would like to be able to : Update the llm , ie by extrracting the documents in a folder , extracting the text and fine tuning it in ? ie : i suppose the best way would be to inject it as a text dump ~ HOW?(Please) ie take the whole text and tne a single epoch only !: As well as saving my chat history as a input/Response dump : single epoch only . Question : each time we fine tune ? it takes the last layer and makes a copy then trains the copy and replaces the last layer ? as the model weights are FROZEN? does this mean that they dont get updated ....? if so then the lora is applied to this last layer esentially replacing the layer ? If we keep replacing the last layer do we essentially wipe over the previous training ?? i have seen that you can target Specific layers ? ... How to determine which layers to target? then create the config to match these layers? Question : How dowe create a strategy for regular tuning without destroying the last training ? should we be Targetting different layers each fine tuning ? Also Why canwe not tune it Live!! ie while we are talking to it ? or discuss with the model and adust the model whilst talking ? is adjusting the weights done by the AUTOGRAD? NN in pytorch with the optimization ? ie adam optimizer ? as with each turn we can produce the loss from the input by supplying the expected outputs to compare with simuarity so if the output is over a specfic threshhold it would finetune acording to the loss (optimize this(once)) ... ie switching between train and evaluation , (freezing a specific percentage of the model )... ? ie essentially woring with a live brain ??? how can we update the llm with conversation , ??? by giving it the function (function calling) to execute a single training optimization based on user feedback ? ie positive and negative votes... and the current response chain ... ie if the rag was used then the content should be tuned in ?? SOrry for the long post but it all connects to the same thingy?
Thank you so much for simplifying this to such extent. Subscribed
Great video! I am wondering if Oobagooga provides a command line way to perform training. I am using a lab server which the web UI is not accessible.
Just found your channel. Excellent Content - Another sub for you sir!
Amazing, loved it
Hi @aemonAlgiz , I am new to Python (and LLMs) and wanted to try creating a dataset from a book as well. However when running the provided code, I got a warning: "Token indices sequence length is longer than the specified maximum sequence length for this model (181602 > 2048). Running this sequence through the model will result in indexing errors Max retries exceeded. Skipping this chunk." (which happened a lot). The new .JSON file was empty. I tried changing the "model_max_length": from 2048 to 200000 in the tokenizer_config from my model, but that only made the warning disappear (but the result was the same). Would love if anyone has a solution to this :)
did u got the solutiion??
@@abhaypratap7415 nope
Thank you, Aemon! I think your channel is underrated.
Token indices sequence length is longer than the specified maximum sequence length for this model (249345 > 2048). Running this sequence through the model will result in indexing errors I am facing this issue, please help for resolution.
can you explain code to convert pdf to json.. i dont know how you doing that.. it's great and thats what we need.. thanks before
Content is good.. but the comedic interruptions are not helping much, pls
where are you buddy! why did you ditch your youtube channel, this is some of the best content available on the intricate topic of advanced LLM fine tuning and such, please come back to us 😆 where are you , we want new content about Openrouter, Claude, etc' !
great explanations thanks a lot for your efforts making this great content!
MAKE MORE VIDEOS
I'm completely new to ML and LLMs and found this incredibly insightful. Love this type of content.
What is the moat for Nvidia? Can you make a video on that?
what if you have 500 pdf`s you want to use as data
Thank you so much for this video! It was incredibly clear and the concrete example you provided really helped to understand. Your content stands out, and it's greatly appreciated!☺
I came up to an idea (similar to your "crackpot" idea) on my own a week ago, which involves "giving the memories to neural networks" based on my knowledge & some research in biology (even though i'm not a biologist myself) and hacked it into pytorch to test on some pretrained models. I'm still tinkering with this concept of "memories", it requires some tuning of how much the "memories" influence "future state" of the model and which parts of layers to apply the "memory hack" to.. but it works (kind of..). There are some cases when the model goes a bit "crazy", being "overwhelmed" with past "memories", thus i'm still playing with different numbers. Yet, I can't test it on huge models, because I'm just a random guy with some linux & programming skills, and i'm running my tests on a 14y old cpu with 8GB ram 😂
3:00 This is a misrepresentation of softmax. It assigns proportionally more weight to bigger values - that's why it has "max" in its name.
I get "can only concatenate list (not "Tensor") to list" and "Error None"
great job!
Did you merge the qlora with the quantized model or the base model? Is it possible to train on the quantized version and apply it to the base version?
thank you
Dude seriously your content is so clear and easy to follow keep it up!
Greetings. Is everything ok at your end ? We haven’t heard from you for a while and just checking in. Anyway hope you are alright.
I know this video is almost a year old, but I am having some trouble with any of the models I know of. I have tried many different models and there is always an error here. WARNING LoRA training has only currently been validated for LLaMA, OPT, GPT-J, and GPT-NeoX models. (Found model type: LlamaCppModel) WARNING LoRA training has only currently been validated for LLaMA, OPT, GPT-J, and GPT-NeoX models. (Found model type: NoneType) Which models do you use? Does this tutorial still work?
"big O notation" is not the same thing as worst case. You take an input distribution and you can consider average/worst case over that distribution. O gives an "upper bound" to the complexity of that case. If you take quicksort with uniform input distribution, one would typically say that the worst case is O(n^2) and the average case is O(n log n). You could equally well say that the worst case is O(n^3) although this gives you less information.
It's interesting ... still a bit in the dark as to what goal is expected from those inputs and what it does in the final results. Is the goal to style the answers in a certain way (which is how the formatted input seems to focus on) or is the goal to supplement content (a correct Turing biography)? If content is involved, can we train a model to follow a fictional lore, get characters to express behaviors that appear "normal" to its world, or redirect behavior (if someone speaks ill of the king, characters will always respond negatively - or - reacting according to political alignment). And if it's possible, is a large explanatory text like the Turing biography a valid input format to do so ?
using rtx3070,run Lora with ur data only in 5 minutes not hours! but inference is time consuming.Thanks! professor! and I'll be your fans.
do you have a video on how to prepare a dataset for creative writing?