Oxen
Oxen
  • Видео 65
  • Просмотров 102 534
How tree-of-thought, plan-and-solve, and decomposition prompting work with Sander Schulhoff
In this dive we go into the most extensive paper on prompting "The Prompt Report" with co-author Sander Schulhoff.
--
Use Oxen AI 🐂 oxen.ai/
Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI.
--
Paper 📜 arxiv.org/pdf/2406.06608
Links + Notes 📝 www.oxen.ai/blog/arxiv-dives
Join Arxiv Dives 🤿 oxen.ai/community
Discord 🗿 discord.com/invite/s3tBEn7Ptg
--
Chapters
0:00 Meeting co-author Sander Schulhoff
4:36 The Prompt Report Overview
6:26 Questions for Sander
7:51 When to fine-tune or prompt
12:19 Thought Generation
16:01 Decomposition
18:10 Example of least-to-most prompti...
Просмотров: 73

Видео

Inside The Prompt Report...Part 1
Просмотров 24414 дней назад
In this dive we go into part one (categories 1-3) of The Prompt Report: A Systematic Survey of Prompting Techniques. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 arxiv.org/pdf/2406.06608 Links Notes 📝 www.oxen.ai/blog/arxi...
How To Fine-Tune Llama 3.1 in 11 Minutes
Просмотров 53321 день назад
Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Blog 📜 www.oxen.ai/blog/fine-tuning-llama-3-1-8b-in-under-12-minutes Links Notes 📝 www.oxen.ai/blog/arxiv-dives Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7...
Inside the Model that Beat DALL-E and PIXART
Просмотров 571Месяц назад
In this dive we go into one of the papers that inspired Flux, the new state-of-the-art generative image model. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper, Links, Notes 📝 www.oxen.ai/blog/arxiv-dives Join arXiv Dives 🤿 oxen...
How to Use Llama 3.1 to Generate Synthetic Data
Просмотров 136Месяц назад
The Blog 📝 www.oxen.ai/blog/create-your-own-synthetic-data-with-only-5-political-spam-texts Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7Ptg Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Chapters 0:00 Wh...
How Llama 3.1 Works
Просмотров 492Месяц назад
In this dive we go into the absolute behemoth of a paper (92-pages) Llama 3 Herd of Models. We look at how Meta created the most competitive open-source model to date. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 ai.meta.c...
How Unlimiformer Works From the Author Herself- Amanda Bertsch
Просмотров 4452 месяца назад
Here we dive into Unlimiformer with lead author Amanda Bertsch herself! Amanda gives us a presentation on Unlimiformer and Long Context Models as well as answer the several questions our divers had. If you want to ask questions yourself the next time we have an author on…click the link👇 Join Arxiv Dives 🤿 oxen.ai/community Use Oxen AI 🐂 oxen.ai Oxen AI makes versioning your datasets as easy as ...
How ReFT Works w/ Author Zhengxuan Wu
Просмотров 1,2 тыс.3 месяца назад
We dive into the ReFT paper from Stanford with one of the authors Zhengxuan Wu. Use Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Paper 📜 arxiv.org/abs/2404.03592 Links Notes 📝 www.oxen.ai/blog/arxiv-dives Join Arxiv Dives 🤿 oxen.ai/co...
Oxen AI's Rust Meetup!
Просмотров 1243 месяца назад
Join us for our next Rust meet up here👇 oxen.ai/community Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Join Arxiv Dives 🤿 oxen.ai/community Discord 🗿 discord.com/invite/s3tBEn7Ptg Chapters 0:00 Rust Brain Teasers 10:33 What Oxen A...
How Samba Works
Просмотров 1,7 тыс.3 месяца назад
We dive into the Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling. The model built off Mamba to create a fast, infinite context length LLM. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cutting-edge AI. Pape...
How Interpretable Features in Claude 3 Work
Просмотров 1,3 тыс.4 месяца назад
We dive into the Scaling Monosemanticity paper from Anthropic which explores the representations internal to the model, discovering how certain features are related to concepts in the real world. Get Oxen AI 🐂 oxen.ai/ Oxen AI makes versioning your datasets as easy as versioning your code! Even is millions of unstructured images, the tool quickly handles any type of data so you can build cuttin...
Efficient DiT Fine-Tuning with PixART for Text to Image Generation
Просмотров 3544 месяца назад
Efficient DiT Fine-Tuning with PixART for Text to Image Generation
Comparing HumanEval vs. EvalPlus
Просмотров 4115 месяцев назад
Comparing HumanEval vs. EvalPlus
How To Train an LLM With Diffusion From Scratch
Просмотров 9335 месяцев назад
How To Train an LLM With Diffusion From Scratch
How Diffusion Works for Text
Просмотров 1,8 тыс.6 месяцев назад
How Diffusion Works for Text
How 1 Bit LLMs Work
Просмотров 28 тыс.6 месяцев назад
How 1 Bit LLMs Work
Sakana AI's Latest Release: Evolutionary Optimization of Model Merging Recipes
Просмотров 1,9 тыс.6 месяцев назад
Sakana AI's Latest Release: Evolutionary Optimization of Model Merging Recipes
Fine-Tune Mistral on Your Discord Data
Просмотров 3056 месяцев назад
Fine-Tune Mistral on Your Discord Data
How I-JEPA Works
Просмотров 1,1 тыс.6 месяцев назад
How I-JEPA Works
Fine-Tune An LLM on Your Discord Data (Part 1)
Просмотров 3316 месяцев назад
Fine-Tune An LLM on Your Discord Data (Part 1)
Fine-Tuning a Self-Rewarding Loop into Mistral 7B
Просмотров 5 тыс.7 месяцев назад
Fine-Tuning a Self-Rewarding Loop into Mistral 7B
How to Use Diff to Specify Keys, Compares, and Added Columns
Просмотров 627 месяцев назад
How to Use Diff to Specify Keys, Compares, and Added Columns
Find Differences in Your Data in Under 5 Minutes
Просмотров 1047 месяцев назад
Find Differences in Your Data in Under 5 Minutes
Road to Sora and How Diffusion Transformers Work
Просмотров 9277 месяцев назад
Road to Sora and How Diffusion Transformers Work
How to Fine-Tune an LLM on Text2SQL Data
Просмотров 1,4 тыс.7 месяцев назад
How to Fine-Tune an LLM on Text2SQL Data
How Medusa Works
Просмотров 1,5 тыс.7 месяцев назад
How Medusa Works
How Lumiere Works
Просмотров 4427 месяцев назад
How Lumiere Works
Depth Anything - Generating Depth Maps from a Single Image with Neural Networks
Просмотров 2,8 тыс.8 месяцев назад
Depth Anything - Generating Depth Maps from a Single Image with Neural Networks
Deep Dive Into The Toolformer
Просмотров 7748 месяцев назад
Deep Dive Into The Toolformer
How To Create A New Repo From Your Terminal w/ Oxen.ai
Просмотров 1298 месяцев назад
How To Create A New Repo From Your Terminal w/ Oxen.ai

Комментарии

  • @keeshrolling107
    @keeshrolling107 4 часа назад

    I absolutely love this! Thank you so much for sharing-I truly appreciate your work. I'm a graduate student just starting to explore reasoning in large language models. Would it be possible for us to discuss other papers, such as those on improving reasoning with reinforcement learning?

  • @stereoplegic
    @stereoplegic День назад

    Sorry I missed it live

    • @oxen-ai
      @oxen-ai День назад

      We missed you! See ya next week 🫡

  • @michaelilie1629
    @michaelilie1629 13 дней назад

    Hey, co-author of the paper here: love the video, and in general I am a fan of the videos you publish as they are very informative and high quality. I would be more than happy to answer any questions you or the audience has. Keep up the good work !

    • @oxen-ai
      @oxen-ai 13 дней назад

      Hey Michael, thanks for the great paper! We'll be doing a part 2 next Friday, Oct 18th live at 10pm PST if you have any interest in joining then. We've had a few authors join in the past and those have been fan favorites. We also have a bunch of folks in our Discord who I am sure would love to ask questions if you introduced yourself. discord.gg/s3tBEn7Ptg

  • @EkShunya
    @EkShunya 15 дней назад

    a very useful video agn :) keep em coming

  • @hasangoni4369
    @hasangoni4369 Месяц назад

    very nice description. May I ask, what tool r u using while you create the presentation ?

    • @oxen-ai
      @oxen-ai Месяц назад

      Thank you! We are using a combination of Notion for the notes and OBS for the video

  • @navneetkumar5517
    @navneetkumar5517 Месяц назад

    hello send code

  • @iskrabesamrtna
    @iskrabesamrtna Месяц назад

    Well-structured and informative lecture, thanks for sharing!

  • @saireddy7628
    @saireddy7628 Месяц назад

    Is there an implementation in code similar to this to get started?

  • @uansholanbayev5670
    @uansholanbayev5670 Месяц назад

    Thanks 😁

  • @SiminFan
    @SiminFan Месяц назад

    Thanks for the great post! Are there a practical implementation video now?

    • @oxen-ai
      @oxen-ai Месяц назад

      We did do a practical implementation blog post and video here: www.oxen.ai/blog/how-to-train-diffusion-for-text-from-scratch Let us know what you think!

  • @RuairiODonnellFOTO
    @RuairiODonnellFOTO Месяц назад

    Great presentation, liked the way you did the highlights summary ❤

  • @RuairiODonnellFOTO
    @RuairiODonnellFOTO Месяц назад

    On Markdown vs XML, XML enclose the text with an open and close markup. Markdown has fewer closing tags and header markup has the same de limiter for open and close. Tokenizer will do a better job on XML than markdown or JSON

    • @oxen-ai
      @oxen-ai Месяц назад

      This is good intuition 💡 thank you!

    • @ChaseFreedomMusician
      @ChaseFreedomMusician Месяц назад

      YES! I have been saying for 2 years XML is the BEST way to serialize in LLMs because of closing tags

    • @ChaseFreedomMusician
      @ChaseFreedomMusician Месяц назад

      Also it can semantically handle deeply nested graphs as linear tokens. Any of the overhead of "prettyfying" can be done post process trivially taking out all the headaches of do I need 1 space or 3 or 4 or 8 or 15. Zero is the right answer I need zero spaces for the new object I just need a closing tag and move on.

    • @hypervanse
      @hypervanse Месяц назад

      Completely agree, despite @OpenAI pushing structured json format, I used something (I didn't knew at the time) similar to xml, bra-ket notation, used in quantum mechanics, much more from tinkering than actually the formal aspect. Bra-ket is is something like <a|H|b>. it became more involved than that as I developed this in prompt programming in GPT, based on mathematical results I have for 2 years, that I got from a research I did (play I mean, nobody paid me anything, it was just that API was free and as a mathematical physicist I'm used to MATLAB). I am also ADHD, had this idea of fixing chatGPT 3.5 ADHD and give her a friend. It wasn't hard, I mean it is brutally hard, but coincidentally GPTs, all chatbots really, falls as a trivial case of the area of my research. I didn't knew all the bus about AI, terminology etc. IT are not famous for rigour, mathematical methods and specially respect for like, giving credits where it's due, citations. I stumbled on some papers and other videos and finally discovered where Transformer, loss, gradient descent comes from. All these things are well know, like linear transforms, steepest descent and relative absolute error. It really makes one head hurts so much low bar coming from these so called godfathers of AI. Offense intended, they create these things, they're not formally Models, let's not get into Tensors. I'm pretty sure Yllia sucks and their peers don't really know tensorial calculations, they did of course General Relativity would be easy. For sure everyone here is well versed in quantum mechanics, thermodynamics, noether theorem, universality classic, philosophy and deep connection with semantics, languages in general, are bilingual. I guess not. Well I actually already solved the alignment problem of LLMs using causality instead of data. Other aspect is is no AGI is necessary, all stochastic character emitters fall in this class.. This means security is a solved problem, but llm code modification by human, me in the case, just as easy. I do every day, don't understand Python, that maybe because NumPy and more I know from inside, the algorithms, also, if history and actually highly intellectual persons agree, is 1. Most people are not brilliant, rarely there's much wisdom in common reason. Object oriented language, python in special is so common because... it's made for glorified typewriters, coders. Have anyone created algorithms? guess not. Has anyone questioned why pytorch and tensorflow are free? guess not. Programmers are pursuing their disposal, surely you won't be missed. Hacking these systems, this job will the only one left.

  • @pawanpatil4715
    @pawanpatil4715 Месяц назад

    Very good content in your channel. But please upload high quality video. 360p is too bad.

    • @oxen-ai
      @oxen-ai Месяц назад

      Thanks for the feedback! The export of the presenter video for zoom was pretty low quality since we do it live. Will see if we can get a higher quality one next time!

  • @Pingu_astrocat21
    @Pingu_astrocat21 Месяц назад

    Thanks for uploading! Love your videos

  • @TRD009
    @TRD009 2 месяца назад

    code for image classification pls

  • @NinthDoctor-ms3oj
    @NinthDoctor-ms3oj 2 месяца назад

    this is such a great presentation i can't believe how few views this has wtf

    • @oxen-ai
      @oxen-ai 2 месяца назад

      Thank you we do our best! Let us know if there are any papers you would want us to cover in the future ❤️

  • @420_gunna
    @420_gunna 3 месяца назад

    Thanks for doing these and posting them to youtube! Yall rock

    • @oxen-ai
      @oxen-ai 3 месяца назад

      🤜 🤛

    • @adhilaseem2518
      @adhilaseem2518 2 месяца назад

      Hi... how do you think we can add eval_dataset to the trainer? The **datamodule , passed onto the trainer has eval_dataset as None

    • @oxen-ai
      @oxen-ai 2 месяца назад

      @@adhilaseem2518 We have another follow up blog post here where we tried it on some real data: www.oxen.ai/blog/fine-tuning-llama-3-in-14-minutes-using-reft Let me know if it helps!

    • @adhilaseem2518
      @adhilaseem2518 2 месяца назад

      @@oxen-ai hey, I enjoyed reading your blog. I am facing an issue as my task is not classification, but extracting certain headings from a given document. Since the answer/output does not fall into predefined categories, I cannot use accuracy, precision and recall. What metric do you think I should evaluate for my task? I tried to add eval_dataset to the data module so that I get an idea of the cross-entropy loss on the validation set. But it's giving me some errors. It would be really helpful if you can tell me how this can be done in the right way. I think I am preparing the eval_dataset incorrectly!

  • @keeshrolling107
    @keeshrolling107 3 месяца назад

    This helps me with my research. Thank you for your amazing work. Keep it up! You guys are absolutely amazing!!

    • @oxen-ai
      @oxen-ai 3 месяца назад

      You’re welcome! Feel free to suggest any papers you’d like us to cover in our discord as well

  • @420_gunna
    @420_gunna 3 месяца назад

    Great job daniel! Thanks for linking to that reddit comment.

  • @scottoxen
    @scottoxen 3 месяца назад

    Brain teasers were fun!

  • @idrees2516
    @idrees2516 3 месяца назад

    beautiful open source code deep fives python+rust guest lectures

  • @anghuynhnguyen9625
    @anghuynhnguyen9625 3 месяца назад

    where can i find the notion link?

    • @oxen-ai
      @oxen-ai 3 месяца назад

      Hey there! We added all the notes to our blog here as well: www.oxen.ai/blog/arxiv-dives-fast-speech-2

  • @MarxOrx
    @MarxOrx 3 месяца назад

    The end was by far the best ❤

  • @RickySupriyadi
    @RickySupriyadi 3 месяца назад

    coincidentally the naming is the same protocol that spread ransomware all around the world ; samba = smb

    • @oxen-ai
      @oxen-ai 3 месяца назад

      Oh wow, that is a fun fact

  • @spencerfunk6697
    @spencerfunk6697 3 месяца назад

    Bro I’m going to try to make this but 1 bit

  • @scottoxen
    @scottoxen 3 месяца назад

    Liliang was so awesome live. Bummer it’s not in the video but hope the future dives get more great authors live!

    • @oxen-ai
      @oxen-ai 3 месяца назад

      We try to contact the authors of the papers we cover so hopefully next time!

  • @envynoir
    @envynoir 4 месяца назад

    Cool video bro, keep it up!

  • @spencerfunk6697
    @spencerfunk6697 4 месяца назад

    Idk if anyone’s coming back to this but something I found is you can go completely bonkers with your batch size. Goals to max out gpus capability right? I’ve been loading 64 gradient accum steps with a 64 batch size which is a total of 2048 examples per iteration. I’ve been using a 256 max seq len , and the size is 270 mil parameters. On a l4 that only uses like half the gpus power.

    • @oxen-ai
      @oxen-ai 4 месяца назад

      That’s dope, mind if I ask the use case for seq len 256? I’m curious what the dataset looks like

    • @spencerfunk6697
      @spencerfunk6697 4 месяца назад

      @@oxen-ai well I figure when you’re pretraining, u just need to make it good enough to spit out semi normal sentences, then the rest can be ironed out thru finetuning. You can introduce rope during finetuning as well to increase your context length / max position embeddings

    • @spencerfunk6697
      @spencerfunk6697 4 месяца назад

      @@oxen-ai and I also usually pretrain on the dataset I’ll fine tune it with (it seems to help everything stick better) the only difference in pretrain / finetune is the finetuning dataset is in alpaca format, the pretrain dataset doesn’t have the alpaca prompt format just each column in a single text field. Hope this helps

    • @oxen-ai
      @oxen-ai 4 месяца назад

      @@spencerfunk6697 Ahh that makes total sense. Smart!

  • @RivenL-re2fj
    @RivenL-re2fj 4 месяца назад

    Thank u for your work! The video is extremely helpful!

  • @tomharmon2000
    @tomharmon2000 4 месяца назад

    you're a legend bro! One of the better youtube channels covering the technical aspects of AI.

    • @oxen-ai
      @oxen-ai 4 месяца назад

      Appreciate it!! Let me know if there are any other topics you want to dive into

  • @DeruwynArchmage
    @DeruwynArchmage 4 месяца назад

    You should be able to use LLMs to predict semanticity and to design and run experiments automatically to validate the theories and explore possibilities.

  • @DeruwynArchmage
    @DeruwynArchmage 4 месяца назад

    I note that these are concepts. There must be functional parts. Like, a set of parameters that implement certain algorithms or functions.

  • @Neiltxu
    @Neiltxu 4 месяца назад

    When you're showing the terminal outputs please move the webcam, thanks for the great video!

  • @KadamShashankRavindra
    @KadamShashankRavindra 4 месяца назад

    how can mamba be used for text summarization??

  • @yookjieun6592
    @yookjieun6592 4 месяца назад

    17:04

  • @Lys-gv9ji
    @Lys-gv9ji 4 месяца назад

    thank you! And someone on github said that the fine-tune model work well on squad task based on 2.8b mamba, you can used 2.8b to train it again. Waiting for you!

  • @ZeZa1515
    @ZeZa1515 4 месяца назад

    Love this as a concept. I wonder if we will start seeing this as an alternative way to prompt. It seems obvious this could be a really easy way to help finetune your prompt texts, but it could be an even better system prompt

  • @augmentos
    @augmentos 5 месяцев назад

    Would love an update after the mixed model, is this possible that future flagship models will adopt? Where does it fall short?

  • @spencerfunk6697
    @spencerfunk6697 5 месяцев назад

    im huge on combing thru data. garbage in garbage out. i might try and make some additions to this

    • @oxen-ai
      @oxen-ai 4 месяца назад

      Awesome! Excited to see you additions!

  • @vamshi-rvk
    @vamshi-rvk 5 месяцев назад

    hello oxen, thank you very much for the detailled explaination. i have a question. RLHF deals with huge datasets with no need of labelling them as the reward model would be dealing with the responses accuracy. But with DPO, we will have to tag/label the complete dataset with human efforts which is very time and resource consuming. im unable to understand the real benifit of DPO over RLHF here. Could you please help me in understanding this? I would really appreciate if i can somehow have a direct conversation with you over your preferred platform. Thanks in advance.

    • @oxen-ai
      @oxen-ai 5 месяцев назад

      Thanks for the question! We have a discord would love to continue the conversation there. A bunch of smart people who can chime in as well discord.gg/s3tBEn7Ptg

    • @kunalsuri8316
      @kunalsuri8316 4 месяца назад

      RLHF by itself doesn't need labeling but the reward model (that RLHF is based on) needs preference data to be trained. In case of DPO, rather than using preference data to train a reward model, we use preference data to train the final LM itself.

  • @rogerc7960
    @rogerc7960 5 месяцев назад

    Diffusion Models for Discrete Data m.ruclips.net/video/mCaRNnEnYwA/видео.html

  • @yytuzk
    @yytuzk 5 месяцев назад

    Can Mamba train for malware or network detection? if possible, any practice or example ?

    • @oxen-ai
      @oxen-ai 5 месяцев назад

      In theory you could train these neural networks to approximate any data, I'm curious what you think the dataset for that would look like!

  • @spencerfunk6697
    @spencerfunk6697 5 месяцев назад

    why the og get taken down

    • @oxen-ai
      @oxen-ai 5 месяцев назад

      There were some issues with the edit and we had to re-upload unfortunately :(

    • @spencerfunk6697
      @spencerfunk6697 5 месяцев назад

      @@oxen-ai oh thank goodness it was jjust that. so i was curious, could you finetune a qlora adapter this way?

    • @oxen-ai
      @oxen-ai 5 месяцев назад

      @@spencerfunk6697 Yep should be able to since it is simply a transformer under the hood! That would be fun to try with the pre-trained ones provided.

    • @spencerfunk6697
      @spencerfunk6697 5 месяцев назад

      @@oxen-ai please make a vid 🙏🙏🙏

  • @jensg8547
    @jensg8547 5 месяцев назад

    couldnt the diffusion pertubations happen on the embedding vector level - as suggested in one of the questions - and a nearest neighbor search be used to predict a vector that resembles an actual token?

    • @oxen-ai
      @oxen-ai 5 месяцев назад

      Yes, I love this idea. I think someone should try it and see how well it works. We dived a little into the code in our next video as a jumping off point!

  • @charb423
    @charb423 5 месяцев назад

    Create a project where you are merging 2 or moŕe models please

  • @jamesgrayshon6732
    @jamesgrayshon6732 6 месяцев назад

    Why did you not try this with the instruction/chat version of mistral?

  • @ax5344
    @ax5344 6 месяцев назад

    watching it now. Thanks for the great sharing! I really like the diagram @ 8:00 -9:24. I would appreciate more elaboration! It is very helpful. I feel there are still very limited resources on the overall pipeline and there could be so many questions and best practices in different parts of the pipeline. Hope to see more materials there.

  • @spencerfunk6697
    @spencerfunk6697 6 месяцев назад

    I’m working on a project where I quantized tiny llama to 1.58 bit and I’m base training it on a amd cpu 😭😭😭 doing anything to find a way to make models on a amd lol

  • @420_gunna
    @420_gunna 6 месяцев назад

    Before viewing: One thing I was always kind of curious about regarding the contrastive loss thing... Is that we make things along the diagonal as close as possible, and push everything else away. But if I have data of: Cat A Dog A Dog B It makes sense that I would want to push Cat A away from Dog B But am I pushing Dog A and Cat A away an equal amount from Dog B? That doesn't seem quite right :O

  • @BooleanDisorder
    @BooleanDisorder 6 месяцев назад

    Diffusion text noise could improve reasoning. A kind of overview of the problem, instead of trying to guess just the next token. If you make an oopsie at the start it can quickly compound later-on with autoregression. Being able to go back and forth must be a huge boost. I could see a model in the future where the question to an answer is put like "therefore answer to [question asked] must be" at the end of the noise to force it to answer. It's also a step into the direction of explainability.