- Видео 65
- Просмотров 102 534
Oxen
США
Добавлен 15 дек 2022
Oxen.ai is wicked fast versioning and collaboration tools for data. Even millions of unstructured images, we quickly handle any type of data so you can build cutting-edge AI.
Arxiv Dives:
Each week we dive deep into a topic in machine learning or general artificial intelligence research. The sessions are live with a group of smart Oxen every Friday. Create an account: www.oxen.ai and join the discussion: lu.ma/oxen
Arxiv Dives:
Each week we dive deep into a topic in machine learning or general artificial intelligence research. The sessions are live with a group of smart Oxen every Friday. Create an account: www.oxen.ai and join the discussion: lu.ma/oxen
How tree-of-thought, plan-and-solve, and decomposition prompting work with Sander Schulhoff
In this dive we go into the most extensive paper on prompting "The Prompt Report" with co-author Sander Schulhoff.
--
Use Oxen AI 🐂 oxen.ai/
Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI.
--
Paper 📜 arxiv.org/pdf/2406.06608
Links + Notes 📝 www.oxen.ai/blog/arxiv-dives
Join Arxiv Dives 🤿 oxen.ai/community
Discord 🗿 discord.com/invite/s3tBEn7Ptg
--
Chapters
0:00 Meeting co-author Sander Schulhoff
4:36 The Prompt Report Overview
6:26 Questions for Sander
7:51 When to fine-tune or prompt
12:19 Thought Generation
16:01 Decomposition
18:10 Example of least-to-most prompti...
--
Use Oxen AI 🐂 oxen.ai/
Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI.
--
Paper 📜 arxiv.org/pdf/2406.06608
Links + Notes 📝 www.oxen.ai/blog/arxiv-dives
Join Arxiv Dives 🤿 oxen.ai/community
Discord 🗿 discord.com/invite/s3tBEn7Ptg
--
Chapters
0:00 Meeting co-author Sander Schulhoff
4:36 The Prompt Report Overview
6:26 Questions for Sander
7:51 When to fine-tune or prompt
12:19 Thought Generation
16:01 Decomposition
18:10 Example of least-to-most prompti...
Просмотров: 73
Видео
Inside The Prompt Report...Part 1
Просмотров 24414 дней назад
In this dive we go into part one (categories 1-3) of The Prompt Report: A Systematic Survey of Prompting Techniques. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 arxiv.org/pdf/2406.06608 Links Notes 📝 www.oxen.ai/blog/arxi...
How To Fine-Tune Llama 3.1 in 11 Minutes
Просмотров 53321 день назад
Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Blog 📜 www.oxen.ai/blog/fine-tuning-llama-3-1-8b-in-under-12-minutes Links Notes 📝 www.oxen.ai/blog/arxiv-dives Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7...
Inside the Model that Beat DALL-E and PIXART
Просмотров 571Месяц назад
In this dive we go into one of the papers that inspired Flux, the new state-of-the-art generative image model. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper, Links, Notes 📝 www.oxen.ai/blog/arxiv-dives Join arXiv Dives 🤿 oxen...
How to Use Llama 3.1 to Generate Synthetic Data
Просмотров 136Месяц назад
The Blog 📝 www.oxen.ai/blog/create-your-own-synthetic-data-with-only-5-political-spam-texts Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7Ptg Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Chapters 0:00 Wh...
How Llama 3.1 Works
Просмотров 492Месяц назад
In this dive we go into the absolute behemoth of a paper (92-pages) Llama 3 Herd of Models. We look at how Meta created the most competitive open-source model to date. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 ai.meta.c...
How Unlimiformer Works From the Author Herself- Amanda Bertsch
Просмотров 4452 месяца назад
Here we dive into Unlimiformer with lead author Amanda Bertsch herself! Amanda gives us a presentation on Unlimiformer and Long Context Models as well as answer the several questions our divers had. If you want to ask questions yourself the next time we have an author on…click the link👇 Join Arxiv Dives 🤿 oxen.ai/community Use Oxen AI 🐂 oxen.ai Oxen AI makes versioning your datasets as easy as ...
How ReFT Works w/ Author Zhengxuan Wu
Просмотров 1,2 тыс.3 месяца назад
We dive into the ReFT paper from Stanford with one of the authors Zhengxuan Wu. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 arxiv.org/abs/2404.03592 Links Notes 📝 www.oxen.ai/blog/arxiv-dives Join Arxiv Dives 🤿 oxen.ai/co...
Oxen AI's Rust Meetup!
Просмотров 1243 месяца назад
Join us for our next Rust meet up here👇 oxen.ai/community Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7Ptg Chapters 0:00 Rust Brain Teasers 10:33 What Oxen A...
How Samba Works
Просмотров 1,7 тыс.3 месяца назад
We dive into the Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling. The model built off Mamba to create a fast, infinite context length LLM. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Pape...
How Interpretable Features in Claude 3 Work
Просмотров 1,3 тыс.4 месяца назад
We dive into the Scaling Monosemanticity paper from Anthropic which explores the representations internal to the model, discovering how certain features are related to concepts in the real world. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cuttin...
Efficient DiT Fine-Tuning with PixART for Text to Image Generation
Просмотров 3544 месяца назад
Efficient DiT Fine-Tuning with PixART for Text to Image Generation
How To Train an LLM With Diffusion From Scratch
Просмотров 9335 месяцев назад
How To Train an LLM With Diffusion From Scratch
Sakana AI's Latest Release: Evolutionary Optimization of Model Merging Recipes
Просмотров 1,9 тыс.6 месяцев назад
Sakana AI's Latest Release: Evolutionary Optimization of Model Merging Recipes
Fine-Tune Mistral on Your Discord Data
Просмотров 3056 месяцев назад
Fine-Tune Mistral on Your Discord Data
Fine-Tune An LLM on Your Discord Data (Part 1)
Просмотров 3316 месяцев назад
Fine-Tune An LLM on Your Discord Data (Part 1)
Fine-Tuning a Self-Rewarding Loop into Mistral 7B
Просмотров 5 тыс.7 месяцев назад
Fine-Tuning a Self-Rewarding Loop into Mistral 7B
How to Use Diff to Specify Keys, Compares, and Added Columns
Просмотров 627 месяцев назад
How to Use Diff to Specify Keys, Compares, and Added Columns
Find Differences in Your Data in Under 5 Minutes
Просмотров 1047 месяцев назад
Find Differences in Your Data in Under 5 Minutes
Road to Sora and How Diffusion Transformers Work
Просмотров 9277 месяцев назад
Road to Sora and How Diffusion Transformers Work
How to Fine-Tune an LLM on Text2SQL Data
Просмотров 1,4 тыс.7 месяцев назад
How to Fine-Tune an LLM on Text2SQL Data
Depth Anything - Generating Depth Maps from a Single Image with Neural Networks
Просмотров 2,8 тыс.8 месяцев назад
Depth Anything - Generating Depth Maps from a Single Image with Neural Networks
How To Create A New Repo From Your Terminal w/ Oxen.ai
Просмотров 1298 месяцев назад
How To Create A New Repo From Your Terminal w/ Oxen.ai
I absolutely love this! Thank you so much for sharing-I truly appreciate your work. I'm a graduate student just starting to explore reasoning in large language models. Would it be possible for us to discuss other papers, such as those on improving reasoning with reinforcement learning?
Sorry I missed it live
We missed you! See ya next week 🫡
Hey, co-author of the paper here: love the video, and in general I am a fan of the videos you publish as they are very informative and high quality. I would be more than happy to answer any questions you or the audience has. Keep up the good work !
Hey Michael, thanks for the great paper! We'll be doing a part 2 next Friday, Oct 18th live at 10pm PST if you have any interest in joining then. We've had a few authors join in the past and those have been fan favorites. We also have a bunch of folks in our Discord who I am sure would love to ask questions if you introduced yourself. discord.gg/s3tBEn7Ptg
a very useful video agn :) keep em coming
very nice description. May I ask, what tool r u using while you create the presentation ?
Thank you! We are using a combination of Notion for the notes and OBS for the video
hello send code
Well-structured and informative lecture, thanks for sharing!
Is there an implementation in code similar to this to get started?
Thanks 😁
Thanks for the great post! Are there a practical implementation video now?
We did do a practical implementation blog post and video here: www.oxen.ai/blog/how-to-train-diffusion-for-text-from-scratch Let us know what you think!
Great presentation, liked the way you did the highlights summary ❤
On Markdown vs XML, XML enclose the text with an open and close markup. Markdown has fewer closing tags and header markup has the same de limiter for open and close. Tokenizer will do a better job on XML than markdown or JSON
This is good intuition 💡 thank you!
YES! I have been saying for 2 years XML is the BEST way to serialize in LLMs because of closing tags
Also it can semantically handle deeply nested graphs as linear tokens. Any of the overhead of "prettyfying" can be done post process trivially taking out all the headaches of do I need 1 space or 3 or 4 or 8 or 15. Zero is the right answer I need zero spaces for the new object I just need a closing tag and move on.
Completely agree, despite @OpenAI pushing structured json format, I used something (I didn't knew at the time) similar to xml, bra-ket notation, used in quantum mechanics, much more from tinkering than actually the formal aspect. Bra-ket is is something like <a|H|b>. it became more involved than that as I developed this in prompt programming in GPT, based on mathematical results I have for 2 years, that I got from a research I did (play I mean, nobody paid me anything, it was just that API was free and as a mathematical physicist I'm used to MATLAB). I am also ADHD, had this idea of fixing chatGPT 3.5 ADHD and give her a friend. It wasn't hard, I mean it is brutally hard, but coincidentally GPTs, all chatbots really, falls as a trivial case of the area of my research. I didn't knew all the bus about AI, terminology etc. IT are not famous for rigour, mathematical methods and specially respect for like, giving credits where it's due, citations. I stumbled on some papers and other videos and finally discovered where Transformer, loss, gradient descent comes from. All these things are well know, like linear transforms, steepest descent and relative absolute error. It really makes one head hurts so much low bar coming from these so called godfathers of AI. Offense intended, they create these things, they're not formally Models, let's not get into Tensors. I'm pretty sure Yllia sucks and their peers don't really know tensorial calculations, they did of course General Relativity would be easy. For sure everyone here is well versed in quantum mechanics, thermodynamics, noether theorem, universality classic, philosophy and deep connection with semantics, languages in general, are bilingual. I guess not. Well I actually already solved the alignment problem of LLMs using causality instead of data. Other aspect is is no AGI is necessary, all stochastic character emitters fall in this class.. This means security is a solved problem, but llm code modification by human, me in the case, just as easy. I do every day, don't understand Python, that maybe because NumPy and more I know from inside, the algorithms, also, if history and actually highly intellectual persons agree, is 1. Most people are not brilliant, rarely there's much wisdom in common reason. Object oriented language, python in special is so common because... it's made for glorified typewriters, coders. Have anyone created algorithms? guess not. Has anyone questioned why pytorch and tensorflow are free? guess not. Programmers are pursuing their disposal, surely you won't be missed. Hacking these systems, this job will the only one left.
Very good content in your channel. But please upload high quality video. 360p is too bad.
Thanks for the feedback! The export of the presenter video for zoom was pretty low quality since we do it live. Will see if we can get a higher quality one next time!
Thanks for uploading! Love your videos
code for image classification pls
this is such a great presentation i can't believe how few views this has wtf
Thank you we do our best! Let us know if there are any papers you would want us to cover in the future ❤️
Thanks for doing these and posting them to youtube! Yall rock
🤜 🤛
Hi... how do you think we can add eval_dataset to the trainer? The **datamodule , passed onto the trainer has eval_dataset as None
@@adhilaseem2518 We have another follow up blog post here where we tried it on some real data: www.oxen.ai/blog/fine-tuning-llama-3-in-14-minutes-using-reft Let me know if it helps!
@@oxen-ai hey, I enjoyed reading your blog. I am facing an issue as my task is not classification, but extracting certain headings from a given document. Since the answer/output does not fall into predefined categories, I cannot use accuracy, precision and recall. What metric do you think I should evaluate for my task? I tried to add eval_dataset to the data module so that I get an idea of the cross-entropy loss on the validation set. But it's giving me some errors. It would be really helpful if you can tell me how this can be done in the right way. I think I am preparing the eval_dataset incorrectly!
This helps me with my research. Thank you for your amazing work. Keep it up! You guys are absolutely amazing!!
You’re welcome! Feel free to suggest any papers you’d like us to cover in our discord as well
Great job daniel! Thanks for linking to that reddit comment.
Brain teasers were fun!
beautiful open source code deep fives python+rust guest lectures
where can i find the notion link?
Hey there! We added all the notes to our blog here as well: www.oxen.ai/blog/arxiv-dives-fast-speech-2
The end was by far the best ❤
coincidentally the naming is the same protocol that spread ransomware all around the world ; samba = smb
Oh wow, that is a fun fact
Bro I’m going to try to make this but 1 bit
🔥
Liliang was so awesome live. Bummer it’s not in the video but hope the future dives get more great authors live!
We try to contact the authors of the papers we cover so hopefully next time!
Cool video bro, keep it up!
Idk if anyone’s coming back to this but something I found is you can go completely bonkers with your batch size. Goals to max out gpus capability right? I’ve been loading 64 gradient accum steps with a 64 batch size which is a total of 2048 examples per iteration. I’ve been using a 256 max seq len , and the size is 270 mil parameters. On a l4 that only uses like half the gpus power.
That’s dope, mind if I ask the use case for seq len 256? I’m curious what the dataset looks like
@@oxen-ai well I figure when you’re pretraining, u just need to make it good enough to spit out semi normal sentences, then the rest can be ironed out thru finetuning. You can introduce rope during finetuning as well to increase your context length / max position embeddings
@@oxen-ai and I also usually pretrain on the dataset I’ll fine tune it with (it seems to help everything stick better) the only difference in pretrain / finetune is the finetuning dataset is in alpaca format, the pretrain dataset doesn’t have the alpaca prompt format just each column in a single text field. Hope this helps
@@spencerfunk6697 Ahh that makes total sense. Smart!
Thank u for your work! The video is extremely helpful!
you're a legend bro! One of the better youtube channels covering the technical aspects of AI.
Appreciate it!! Let me know if there are any other topics you want to dive into
You should be able to use LLMs to predict semanticity and to design and run experiments automatically to validate the theories and explore possibilities.
I note that these are concepts. There must be functional parts. Like, a set of parameters that implement certain algorithms or functions.
When you're showing the terminal outputs please move the webcam, thanks for the great video!
how can mamba be used for text summarization??
17:04
thank you! And someone on github said that the fine-tune model work well on squad task based on 2.8b mamba, you can used 2.8b to train it again. Waiting for you!
Love this as a concept. I wonder if we will start seeing this as an alternative way to prompt. It seems obvious this could be a really easy way to help finetune your prompt texts, but it could be an even better system prompt
Would love an update after the mixed model, is this possible that future flagship models will adopt? Where does it fall short?
im huge on combing thru data. garbage in garbage out. i might try and make some additions to this
Awesome! Excited to see you additions!
hello oxen, thank you very much for the detailled explaination. i have a question. RLHF deals with huge datasets with no need of labelling them as the reward model would be dealing with the responses accuracy. But with DPO, we will have to tag/label the complete dataset with human efforts which is very time and resource consuming. im unable to understand the real benifit of DPO over RLHF here. Could you please help me in understanding this? I would really appreciate if i can somehow have a direct conversation with you over your preferred platform. Thanks in advance.
Thanks for the question! We have a discord would love to continue the conversation there. A bunch of smart people who can chime in as well discord.gg/s3tBEn7Ptg
RLHF by itself doesn't need labeling but the reward model (that RLHF is based on) needs preference data to be trained. In case of DPO, rather than using preference data to train a reward model, we use preference data to train the final LM itself.
Diffusion Models for Discrete Data m.ruclips.net/video/mCaRNnEnYwA/видео.html
Can Mamba train for malware or network detection? if possible, any practice or example ?
In theory you could train these neural networks to approximate any data, I'm curious what you think the dataset for that would look like!
why the og get taken down
There were some issues with the edit and we had to re-upload unfortunately :(
@@oxen-ai oh thank goodness it was jjust that. so i was curious, could you finetune a qlora adapter this way?
@@spencerfunk6697 Yep should be able to since it is simply a transformer under the hood! That would be fun to try with the pre-trained ones provided.
@@oxen-ai please make a vid 🙏🙏🙏
couldnt the diffusion pertubations happen on the embedding vector level - as suggested in one of the questions - and a nearest neighbor search be used to predict a vector that resembles an actual token?
Yes, I love this idea. I think someone should try it and see how well it works. We dived a little into the code in our next video as a jumping off point!
Create a project where you are merging 2 or moŕe models please
Why did you not try this with the instruction/chat version of mistral?
watching it now. Thanks for the great sharing! I really like the diagram @ 8:00 -9:24. I would appreciate more elaboration! It is very helpful. I feel there are still very limited resources on the overall pipeline and there could be so many questions and best practices in different parts of the pipeline. Hope to see more materials there.
I’m working on a project where I quantized tiny llama to 1.58 bit and I’m base training it on a amd cpu 😭😭😭 doing anything to find a way to make models on a amd lol
Before viewing: One thing I was always kind of curious about regarding the contrastive loss thing... Is that we make things along the diagonal as close as possible, and push everything else away. But if I have data of: Cat A Dog A Dog B It makes sense that I would want to push Cat A away from Dog B But am I pushing Dog A and Cat A away an equal amount from Dog B? That doesn't seem quite right :O
Diffusion text noise could improve reasoning. A kind of overview of the problem, instead of trying to guess just the next token. If you make an oopsie at the start it can quickly compound later-on with autoregression. Being able to go back and forth must be a huge boost. I could see a model in the future where the question to an answer is put like "therefore answer to [question asked] must be" at the end of the noise to force it to answer. It's also a step into the direction of explainability.