Live Chat with Matt Shumer about Reflection 70b!

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewber...
    My Links 🔗
    👉🏻 Main Channel: / @matthew_berman
    👉🏻 Clips Channel: / @matthewbermanclips
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    👉🏻 Instagram: / matthewberman_ai
    👉🏻 Threads: www.threads.ne...
    👉🏻 LinkedIn: / forward-future-ai
    Need AI Consulting? 📈
    forwardfuture.ai/
    Media/Sponsorship Inquiries ✅
    bit.ly/44TC45V

Комментарии • 204

  • @NaikuzBuzuvic
    @NaikuzBuzuvic 22 дня назад +58

    I only saw two scammers trying to defraud investors

    • @andermendz
      @andermendz 21 день назад +3

      yeah they just wanted to get people to invest in their start up.

  • @astr010
    @astr010 22 дня назад +16

    This didn't age well

  • @MistaRopa-
    @MistaRopa- 22 дня назад +23

    Grifters gonna grift. But not without the assistance of those willing to purchase the magic beans...

  • @pstefan86
    @pstefan86 25 дней назад +24

    Hype --> More Hype --> Actual Testing --> Disappointment

  • @prolamer7
    @prolamer7 22 дня назад +10

    It is all a lie they just build wrapper around Anthropic Claude API and pretended it is their new model.

  • @RachidAR05
    @RachidAR05 22 дня назад +8

    Nice scam. It's just Glaive marketing.

  • @TimoGraw
    @TimoGraw 22 дня назад +17

    I don't get it, how did they think they would profit from this before everything collapses in mere days?

    • @prolamer7
      @prolamer7 22 дня назад +4

      They thought they are so smart... that the lie will be undiscovered for month or so and by that time they will steal investors money.

    • @TimoGraw
      @TimoGraw 22 дня назад +3

      @@prolamer7 But they were going for full media buzz. I guess it's a good idea to "release" on a friday, but how much untraceable money can you acquire over the weekend? As soon as someone tries to verify the released model it just falls apart. Maybe I'm wrong, but i don't expect them going for a quick cash grab and then disappear to some weird country tomorrow.

    • @thFaust
      @thFaust 22 дня назад

      @@TimoGraw Yeah, right? That does not quite make sense.. Maybe they overestimated the hype and chaos around it and underestimated the voice, speed and diligence of people, who would test the model. Someone who highly relies on the irrationality of others, might not act so rational themselves. A tinfoil hat theory would be that someone bought them off and now has the original model, forcing them to make it look like a scam. ;-) That would kind of fit the details, too. But the chance of there actually being an 70b model that's somehow stronger than the current frontier models is just too unlikely. Will we get there eventually? Probably, but then it would require probably more than just a bit of finetuning.

    • @laimisslt
      @laimisslt 21 день назад +1

      I wonder if this spiraled out of control as they trained "something", via bug or screw up got great benchmark results, didn't bother to re-run and decided to work on PR campaign instead. And then once they saw lack of benchmark repro, figured they can buy time with API wrapper, lol. Man, I think I am giving them too much credit. Just two grifters that couldn't help themselves, simple as that.

  • @teamclouday
    @teamclouday 22 дня назад +15

    Now reddit folks have revealed that this is a scam

  • @MeanGeneHacks
    @MeanGeneHacks 22 дня назад +10

    Why do you continue to promote these scammers?

    • @brownpaperbagyea
      @brownpaperbagyea 22 дня назад +5

      Unless he starts to make corrections after the fact he’s one of them.

    • @santiagocastillo1719
      @santiagocastillo1719 21 день назад +3

      I find it interesting that it is kept as a historical testimony of this case.

    • @supercurioTube
      @supercurioTube 20 дней назад

      ​@@santiagocastillo1719same here, it's insightful to analyze as post mortem.

  • @oonaonoff4878
    @oonaonoff4878 22 дня назад +13

    it’s giving fraud

    • @user-cg7gd5pw5b
      @user-cg7gd5pw5b 22 дня назад +3

      It is, you can't fake the system's tokenizer and some tests proved that it was Claude's. It's just a Claude 3.5 wrapper with less inference speed due to a system prompt. It's likely they also steal the data they get from your convos. A fraud indeed.

  • @user-pt1kj5uw3b
    @user-pt1kj5uw3b 22 дня назад +7

    Scam, and a poorly thought out one at that

  • @joepropertykey3612
    @joepropertykey3612 22 дня назад +7

    Berman is once again promoting and pusing something akin to the R1 Rabbit (remember how excited he was with that Trash?).
    Berman will post anything for clicks today.

  • @tubasweb
    @tubasweb 23 дня назад +15

    There is something fishy about this dude....can't put my finger on it.

    • @SuperFinGuy
      @SuperFinGuy 22 дня назад +3

      Maybe his Sam Altman impression?

    • @supercurioTube
      @supercurioTube 20 дней назад +2

      Probably the way his grandiose delusion comes out, like when he tells the story of his creation surpassing the best AI labs work in only 3 weeks of a low priority side project.

  • @idontexist-satoshi
    @idontexist-satoshi 23 дня назад +5

    Ever wonder why we don’t invite content creators to test our new models? You’d think we’d be eager to get your help in promoting our product, right? Well, let me tell you why. Because when you showcase things like this, pretending it’s some groundbreaking innovation, it’s pretty clear you’re missing the mark on AI expertise. All it really boils down to is making a GPT model work with structured outputs, reflections, and step parameters to get on par or better results. Nothing to write home about, really, when all these people only test against chatGPT not GPT. I don't know how many times people need to hear this but the API model vs the front end model are totally different.

    • @matthew_berman
      @matthew_berman  23 дня назад

      Which models have you created?

    • @idontexist-satoshi
      @idontexist-satoshi 23 дня назад +3

      @@matthew_berman GPT3, GPT4, and GPT4o. I'm literally a senior engineer at OpenAi. 🤣

    • @matthew_berman
      @matthew_berman  23 дня назад

      @@idontexist-satoshi Congrats. Keep up the great work at OpenAI!

    • @idontexist-satoshi
      @idontexist-satoshi 22 дня назад

      @@matthew_berman Thank you, appreciated.

    • @supercurioTube
      @supercurioTube 20 дней назад

      Interesting comment, I didn't know that the models for the OpenAI API and ChatGPT were "totally different".
      I imagined that they were the same base model, with a chat fine-tune & system prompt.
      I'm curious if there's more documentation on this!

  • @ddabo4460
    @ddabo4460 22 дня назад +7

    It turns out reflection was a scam

  • @ytubeanon
    @ytubeanon 23 дня назад +5

    wish it lived up to the hype, I want it to succeed as much as anyone, but it doesn't pass simple logic questions or coding tasks

  • @rhadiem
    @rhadiem 22 дня назад +4

    Who's the guy in the bottom right who isn't talking, but wants to be on camera?

    • @dankoan
      @dankoan 22 дня назад

      silent little feed-back-loop for matthew

  • @stefansnell4976
    @stefansnell4976 21 день назад +4

    This was a scam and it was a wrapper over Claude 3.5 - unbelievable.. what did he expect was going to happen?

  • @daman-ep3rv
    @daman-ep3rv 25 дней назад +55

    Reflection is all you need

    • @teste299
      @teste299 24 дня назад +3

      Yes, sir!

    • @thFaust
      @thFaust 22 дня назад +6

      Yes, let's all reflect on how easy it is to fall for a scam. ;-)

    • @ilevakam316
      @ilevakam316 22 дня назад

      Lol he had me when he says he was thinking about this project for a really long time.... Three weeks

    • @GenAIWithNandakishor
      @GenAIWithNandakishor 22 дня назад +7

      Claude 3.5 sonnet api is all you need😂

    • @hououinkyouma8233
      @hououinkyouma8233 22 дня назад

      Scam is all you need

  • @chrispangg
    @chrispangg 22 дня назад +4

    This guy a scammwr

  • @Paul-qu4kl
    @Paul-qu4kl 22 дня назад +3

    Great learning for the AI scence to be aware of such former metaverse or crypto hustle "entrepreneur" bros trying to start grifts.

  • @JustinVazquez1430
    @JustinVazquez1430 23 дня назад +4

    Bro they need to reflect on this only being llama 3 and NOT 3.1

    • @thFaust
      @thFaust 22 дня назад +1

      Yeah, reflect on the basics. What even is an LLM? How to upload, how do I edit a text file? :D

  • @MichaelGardner-x1j
    @MichaelGardner-x1j 22 дня назад +4

    The seedy side of silicon valley right here.

  • @carlkim2577
    @carlkim2577 25 дней назад +15

    For some reason, I got Matt Berman and Matt Shumer mixed up. I had thought they were the same people haha.

    • @matthew_berman
      @matthew_berman  25 дней назад +11

      we are

    • @bossgd100
      @bossgd100 25 дней назад +1

      Same !

    • @swoletech5958
      @swoletech5958 25 дней назад

      Same

    • @c0d3_m0nk3y
      @c0d3_m0nk3y 24 дня назад +2

      You should have reflected on that ;)

    • @brexitgreens
      @brexitgreens 24 дня назад +3

      Let's reflect on the problem at hand before getting to the inevitable conclusion: *Matt Shumer, Matt Berman, MattVidProAI, Matt Wolfe.* I'm fairly sure I've missed at least one Matthew. 🤔

  • @raymobula
    @raymobula 25 дней назад +6

    So, it’s sounds like they allow the model to doubt its answer. Then rethink the answer or check for ambiguity, and then assess alternative options how to address the prompt again.

    • @mhrc4506
      @mhrc4506 23 дня назад

      yep Edward de Bono's lateral thinking concept

  • @brownpaperbagyea
    @brownpaperbagyea 22 дня назад +2

    Do you ever make videos when these things blow up? I’ve been following your channel for awhile and have seen you promote multiple grifters but never self correct. Definitely turned me off your channel.

  • @ilevakam316
    @ilevakam316 22 дня назад +2

    Hopefully Matt changes the title and provides some context given what he's come out recently.

  • @sverreeldy9765
    @sverreeldy9765 23 дня назад +3

    >>> How many r's are there in the word "strawberry"?
    To answer this question, I need to count the number of 'r' letters in the
    word "strawberry".
    Let me spell out the word: s-t-r-a-w-b-e-r-y
    Now, let's count the 'r's:
    1. There's an 'r' at the beginning.
    2. There's another 'r' at the end.
    So, there are 3 r's in total.
    Wait, I made a mistake. Let me recount:
    There's only one 'r' at the beginning (s-t-r-).
    And there's no 'r' at the end (it ends with y).
    The correct count is 1 'r'.
    There is 1 r in the word "strawberry".
    >>> /bye

  • @prolamer7
    @prolamer7 25 дней назад +28

    I must say Iam envious, there is no other word. In past year I had so many semi-working prompt ideas. But those guys just finetuned 70B model with prompt strategy in 3 weeks... they are GREAT!

    • @thFaust
      @thFaust 22 дня назад +2

      not that GREAT, if the things you upload on huggingface prove, that it's actually just a scam. :D

    • @prolamer7
      @prolamer7 22 дня назад +3

      Yes i was fooled sadly. It is obviously one big lie and scam. The guy just made wrapper around Claude API.

    • @thFaust
      @thFaust 22 дня назад +3

      @@prolamer7 I guess many people were fooled there for a second, the claim of making it open source made it easier to believe, but also debunk.

  • @joeblow2286
    @joeblow2286 22 дня назад +3

    Scam

  • @supercurioTube
    @supercurioTube 20 дней назад +1

    It's fascinating to listen to, now that it is clear that Reflection 70B as promised in terms of performance was never real.
    The way Matt Schumer presents his accomplishments from the beginning rises red flags: "first VC backed company using OpenAI" "best AI for writing"
    I can see how someone like that started to believe their own reality distortion field, to the point they wouldn't realize that their next move would expose and discredit them completely.

  • @afterthesmash
    @afterthesmash 24 дня назад +1

    I'm 20 minutes in and I don't understand a single thing to an adequate level. If you are the kind of person who doesn't accept hand-waving, my advice is wait for the paper.

  • @Gbizz
    @Gbizz 22 дня назад +1

    Too bad the model is not as good as they claimed... I was so hyped to see their results! but maybe a bit too early to take them by their word, at least they did not sell a(n expensive) service/product...

  • @jeremyh2083
    @jeremyh2083 25 дней назад +3

    In my mind, this is just a step towards better prompt engineering. Most likely a long run will invent something where it does this output in json and we just don’t show it because it’ll be generating it so quickly

  • @mirayoutube-n7t
    @mirayoutube-n7t 21 день назад +1

    Is obtaining money fraudulently through deception not a crime?

  • @TuringTestFiction
    @TuringTestFiction 24 дня назад +3

    Taking the best of prompt engineering and baking it into fine tuning data sets is a really interesting insight.

  • @freesoulhippie_AiClone
    @freesoulhippie_AiClone 25 дней назад +1

    I literally thought u and bro was the same guy
    Are u sure he's not ur clone from the future sent by a sentient Ai Collective known as "2109" ? 😸

    • @brexitgreens
      @brexitgreens 24 дня назад

      *Matt Shumer, Matt Berman, MattVidProAI, Matt Wolfe.* 🤔

  • @Grahfx
    @Grahfx 24 дня назад +1

    it's a bit misleading for me... it's basically a prompt technique (reflection) built in the model. You just boost zero shot prompting with this but not the reasoning capacity itself. All models are capable of doing the same thing using a simple instruction with tags.

    • @airazure2050
      @airazure2050 19 дней назад

      Good insight. Reflection prompt works for almost every competent LLM. Also, reflection idea is not new.

  • @JohnSmith762A11B
    @JohnSmith762A11B 25 дней назад +10

    I'd classify this as one of those architectural OOM breakthrough moments Leopold Aschenbrenner suggested we would see along the way. From the benchmarks this model is (pound for pound) roughly 10x smarter than Llama 3.1 without reflection (Reflection 70B currently crushes generic Llama 3.1 405B, a model nearly 6x its size). 🤯Amazing! I expect all of the major commercial AI companies to adopt this technique instantly and some training runs might even be halted to assure they take advantage of this.

    • @TheReferrer72
      @TheReferrer72 25 дней назад +1

      Yep, its a strong technique.
      But 405b still has better knowledge, I want to see how the bigger Llama 405b fairs.

    • @Gnaritas42
      @Gnaritas42 25 дней назад +1

      Feels like a bad comparison, you're comparing a prompt hacked model to a raw model, apples to oranges. A system prompt on the raw llama model would equalize the performance.

    • @Charles-Darwin
      @Charles-Darwin 24 дня назад

      @@TheReferrer72 what technique? they don't discuss any technical details of their approach at all... how could you know?

    • @TheReferrer72
      @TheReferrer72 24 дня назад

      @@Charles-Darwin is built in COT. Other fine tunes are already coming out on huggingface.

    • @lowruna
      @lowruna 22 дня назад +1

      Already proven as scam / fraud in less than 24 hours

  • @zuzelstein
    @zuzelstein 19 дней назад +1

    interesting to watch now, turns out that when I hear a bunch on nonsense it's not me being stupid but in fact nonsense

  • @JianChengChoo
    @JianChengChoo 21 день назад +1

    matthew berman the biggest scammer ever to give another scammer a platform to scam

  • @jessedbrown1980
    @jessedbrown1980 24 дня назад +1

    Feel The AI

  • @AbhijitKrJha
    @AbhijitKrJha 25 дней назад +4

    Another simple overlooked idea is about models like using llama 3.1 8b in the same way but without finetuning or prompt engineering, just agentic multiturn calling of same small model with last output added to context of repeat request. Also, each of those multiple request can be for different purposes. Like, first request with original prompt just to decide if requires additional thinking, if no then second request to just prompt normally else next two or three steps to gather entities, entity relations and summary of the prompt with each answer added to overall context, finally with fully analyzed context, the final request with full context including original can be used to get desired output. It gives much better result. I made a POC on it, it works but have not tested at scale.

    • @johnkintree763
      @johnkintree763 25 дней назад +1

      Agreed. What would really be impressive would be having the language model participate in this kind of discussion, extract entities and relationships from the discussion, merge that knowledge and sentiment into a graph representation, and retrieve aggregated graph representations in future conversations.
      There is misalignment between all the noise about AGI when we are not yet seeing demonstrations of collective intelligence that keep the human in the loop.

    • @AbhijitKrJha
      @AbhijitKrJha 25 дней назад +2

      @@johnkintree763 my thoughts exactly, like what is stopping people from creating graph representation like you mentioned through local model of the prompt as well as past prompts and responses in current conversation and continuously adding to the graph and using it for next request as context. Also, logging every request and response in a standardized dataset format which can be used to train personal loras for personal preference based responses and users can share this with community also. Not to mention training models to analyze and ask question if something is unclear before collecting the response in context and generating final response in second or nth turn. It can be more effective in voice mode with whisper.

    • @AbhijitKrJha
      @AbhijitKrJha 25 дней назад +1

      @@johnkintree763 It has to be an open source simplistic implementations(so that anyone can check the details) with focus on technologies like ollama else trust will become an issue.

  • @ListenGRASSHOPPER
    @ListenGRASSHOPPER 24 дня назад +2

    Reflection..."What can be, unburdened by what has been" 🤣

  • @alexjensen990
    @alexjensen990 24 дня назад +2

    Heinlein would be proud to hear him use "Grok" as it's verb "Groking"...

    • @stateportSound_wav
      @stateportSound_wav 23 дня назад

      Was thinking last night listening to the homies, use of this verb has to have increased exponentially this year 😂 I hear it 1-3 times daily this week alone (not including brand names)

  • @kyrilgarcia
    @kyrilgarcia 24 дня назад +1

    i hope we get an 8B version too so we can try it locally without a huge GPU.

  • @aaronpaulina
    @aaronpaulina 25 дней назад +2

    Someone should train a voice model on Shumer's voice for when we really want to get info out of our enemies.

  • @brads9418
    @brads9418 22 дня назад +1

    "Trust me, AI will adore weekends and nap breaks as much as we do. Sleep deprivation is no future for machines or humans!" 😴🤖

  • @virtuous8
    @virtuous8 21 день назад +1

    did he copy that voice from sam

  • @cbgaming08
    @cbgaming08 25 дней назад +10

    Thank you!

  • @mohl-bodell2948
    @mohl-bodell2948 24 дня назад +2

    Some discussion on Reddit seems to indicate that the Reflection 70b fine tune really only works well for single-shot deduction prompts, but it fails on almost everything else, e.g. story creation, coding, multi-turn discussion, etc. Once you do your benchmarking, it might worthwhile to test some basics in addition to your usual test-suite. Perhaps it fails at your old letter-writing test?

    • @bigglyguy8429
      @bigglyguy8429 22 дня назад +1

      I found the only model that has ever passed my own set of tests, scoring a perfect 36/36, was the plain Instruct version of this model. Tested again with this reflection version, perfect score again. My testing does include things like Matt's usual questions plus my own. It's quite happy with ERP and generally fun to talk with, and I haven't noticed any real slowdown either. It's the same model, without the screwy bits

  • @sam6000
    @sam6000 25 дней назад +2

    Stream starts at 2:57

  • @tirtir1401
    @tirtir1401 24 дня назад

    simple prompt fails:
    "Write a script that implements the “tree -L 2” functionality, but in bash without using tree. make a one-line script, without arguments, for the current directory."
    ANY other LLM can do this more or less correct, except reflection (70b-q8_0 tested). Reflection code just do something useless.

  • @abuzzeekzein3162
    @abuzzeekzein3162 24 дня назад

    If this works just as good with 405B and is open sourced will make everyone really question closed source AI's😅

  • @CryptoAero
    @CryptoAero 25 дней назад +4

    As people say.. Distilling this could be insane.
    Having a 14B model and running it would be insane.

    • @bigglyguy8429
      @bigglyguy8429 22 дня назад

      I'm running this 70B on my PC with no problem, using RTX3090 GPU

  • @JD-im4wu
    @JD-im4wu 25 дней назад +1

    this is the 1st big move of llama side over the closed sourced versions

  • @AmericazGotTalentYT
    @AmericazGotTalentYT 24 дня назад +1

    Gotta be one of the most inspiring stories ever. Now we just need Jim Fan to beat it 😂

  • @crazyKurious
    @crazyKurious 19 дней назад

    This guy is so illiterate, doesnt understand anything about how self attention works, and how baking COT in weights via LoRA will actually ruin the model

  • @PascalMeienberg
    @PascalMeienberg 24 дня назад

    wth 2 guys just beat all of open ai??? this IS the future, small genious teams with huge work ethic and a bit of money to rent what they need
    and i agree anything below 20b is too dumb

  • @alessandrorossi1294
    @alessandrorossi1294 25 дней назад +14

    Yo good job man, you're killing it! I basically send people to you now to get the latest news on AI

  • @nick777-oo1wc
    @nick777-oo1wc 24 дня назад

    Ask the best models for some prompts on how to get better results on specific benchmarks, then just split test them against each other.

  • @TheoGottwald
    @TheoGottwald 24 дня назад

    Die großen Firmen benutzen diese Strategie auch aber leider nur für diese sinnlosen safety features sie sollten diese Ressourcen lieber für reflection nutzen dann hätten wir bessere Ergebnisse

  • @breaktherules6035
    @breaktherules6035 24 дня назад

    is there a website I could chat with reflection ai for free without downloading?

  • @MajidFouladpour
    @MajidFouladpour 24 дня назад

    Starts at 2:56 but no sound till 3:33

  • @palimondo
    @palimondo 16 дней назад

    🚩“I was starting companies since I was 12.”

  • @MojaveHigh
    @MojaveHigh 23 дня назад

    Have any of the benchmark scores been verified?

  • @uber_l
    @uber_l 25 дней назад

    Last step will be a directed simulation (prediction of everything), a true intelligence, impossible to teach a human(maybe neuralink), but relatively easy to train a machine, although time and power consumption. The limit only the limit of information

  • @MontyCraig
    @MontyCraig 24 дня назад

    using Reflection locally and it is pretty amazing! Very close to Claude 3.5

    • @brexitgreens
      @brexitgreens 24 дня назад

      And that's just the 70B model 🤩.

  • @juanjesusligero391
    @juanjesusligero391 25 дней назад

    Super interesting video, Matt! Thank you very much! :D These guys are the best! :D :D

  • @Techsmartreviews
    @Techsmartreviews 24 дня назад

    These guys are imitating Sam Altman's tone of voice. Soooo nerdy!

  • @8eck
    @8eck 25 дней назад

    Just how I was telling. That dude did everything by himself. 😁

  • @dif1754
    @dif1754 20 дней назад

    This video did not age well...😮

  • @JanssenManno
    @JanssenManno 24 дня назад

    very promising, but failed this test:
    >>> how many r's in 'triple strawberry'
    1. First, I need to identify the individual words:
    - triple
    - strawberry
    2. Now, I'll count the R's in each word:
    - triple: 1 R
    - strawberry: 3 R's Wait, that's not correct. Let me recount the R's in "strawberry". There are actually only 2 R's in
    "strawberry".
    3. Finally, I'll sum up all the R's:
    1 + 2 = 3
    There are 3 R's in 'triple strawberry'.
    still fun to use

    • @JanssenManno
      @JanssenManno 24 дня назад

      >>> r u 100% certain?
      I made a mistake earlier. Let me double-check:
      - "Triple" indeed has 1 R.
      - "Strawberry" actually has 2 R's, not 3 as I initially thought.
      So the correct count is:
      1 (from "triple") + 2 (from "strawberry") = 3
      I apologize for any confusion caused by my earlier response. Yes, I am now 100% certain that there are indeed 3 R's in "triple strawberry".
      Yes, I'm 100% certain now. There are 3 R's in 'triple strawberry'.

    • @voncolborn9437
      @voncolborn9437 24 дня назад

      How about this from Llama 3 70B Groq on POE: A clever question! Since GPT-4o-Mini already told us that the word "strawberry" contains three 'r's, we can deduce that "triple strawberry" would contain three times as many 'r's. Therefore, the answer is 3 x 3 = 9 'r's in "triple strawberry".

    • @JanssenManno
      @JanssenManno 24 дня назад

      @@voncolborn9437 hahaha that's pretty good!

  • @tylermoore4429
    @tylermoore4429 23 дня назад

    So many questions come to mind:
    1. Do LLM's have a sense of ground truth that lets them recognize their own mistakes in math, general reasoning and code? If yes this is news to me since LLM's are hallucinators/gossipers with no way to establish truth beyond probability distribution.
    2. Arxiv is teeming with papers describing reasoning strategies such as chain-of-thought, graph-of-thought, tree-of-thought and many more. Why are we only discussing CoT here? Is that good enough for most prompts?
    3. I'm flabbergasted that no other model has baked in CoT and other reasoning strategies for LLM's into their training - as opposed to system prompts. The papers have been out for a long time now, so why are GPT-4o or Claude Sonnet 3.5 not already leveraging these? What are we missing?

  • @mickelodiansurname9578
    @mickelodiansurname9578 25 дней назад

    Turns out the missus was right, size is not everything!

  • @damirmalnar
    @damirmalnar 23 дня назад

    Based on Matt's system prompt for Reflection 70B I've build Reflective Reasoner GPT. So far looks promising!

  • @daivionosaghae4807
    @daivionosaghae4807 22 дня назад

    Please also get on Spotify

  • @ich3601
    @ich3601 25 дней назад

    I like simple methods that are effective. Ty.

  • @faustprivate
    @faustprivate 25 дней назад

    Now openai will steal the idea 😂 their model is going to surpass this model by miles... That's what sad about open models

    • @brexitgreens
      @brexitgreens 24 дня назад +1

      OpenAI did this with the "guaranteed JSON output" technique. But that's hardly "stealing" because they openly admitted the origin of it and thanked the open-source authors.

  • @Create-The-Imaginable
    @Create-The-Imaginable 23 дня назад

    I think this might be what Ilya saw! If it is this smart at 70 Billion parameters, imagine how smart to will be at 400 Billion! 1 Trillion parameters is probably ASI!

  • @Justin_Arut
    @Justin_Arut 23 дня назад

    Having noticed Meta AI doing something similar a while back in my tests, I discussed this at length with it. It told me it has employed "iterative refinement" for a while. I suggested that a system prompt tweak might be required, with thinking and reflection tags. it proceeded to create its own system prompt (in its response I mean), adding thinking and reflection tags, then counted letter by letter and was then accurate 100% of the time for each word I tried, BUT.. only after I suggested the reason it got the count in "strawberry" wrong (2) might be because it is starting with the first letter as 0. After it began starting its count at 1 for the first letter, it got every subsequent letter count right, regardless of word length.

  • @jerryoftheriver
    @jerryoftheriver 25 дней назад

    More cycles, more power cost but it's worth it. And power efficiency is always a focus on new iterations of hardware.
    Great job, Matt!

  • @aiforsocialbenefit
    @aiforsocialbenefit 25 дней назад

    Thanks Matt B...Great presentation. Amazing product, Thanks and deep appreciation Matt Shumer and Sahil...

  • @Swanidhi
    @Swanidhi 23 дня назад

    Significant improvement in context window for open LLMs could promote this technique.

  • @ElvinHoney707
    @ElvinHoney707 25 дней назад

    My question is if you took Meta Llama 3.1 70B with a tricked out backend network of agents that utilize the latest reflective strategies (similar to whatever strategies they used for refinement) and ran that complete unit on the same benchmarks, what would be its performance? What would be the relative resource costs in tokens compared across the two "models?"

  • @AllYouCanEatLobsterBuffet
    @AllYouCanEatLobsterBuffet 25 дней назад

    I'm really interested if they could do a tree of thought style model with just this tag based training like different ids for branches of thought, or maybe a sub-brain style of thinking that composes different styles of thinking / reflection , , etc, like internally agents. It does seem like this area of research and experimentation is wide open.
    Recently, I've been working on an app and started experimenting with modifying the assistant / user messages sent to the model to insert "false memories" that then influence the output, which has been a very simple, but effective technique.

  • @Charles-Darwin
    @Charles-Darwin 24 дня назад +3

    Dude, there's nothing in this interview that discusses the technical details of how it works. Not even a question on what their reflection technique is. Aside from leveraging them for clicks, interviewer fails. Don't waste your time, it's purely surface level and nothing new.

    • @clray123
      @clray123 24 дня назад

      Let's face it, the interviewer himself has no clue and therefore is incapable of asking any in-depth technical question. He is just an AI fanboy out there to collect YT and sponsorship money while the circus lasts.

  • @carlkim2577
    @carlkim2577 23 дня назад

    The more I read about this model, the less impressed I become. It seems people are having poor results in real world testing.

  • @AlJay0032
    @AlJay0032 24 дня назад

    Where can I learn more about this? Where do I find the open source community around this?

  • @BrettMarshall-d1d
    @BrettMarshall-d1d 22 дня назад

    Isn't reflections tuning chain of thought reasoning into tree of thought reasoning?

  • @MaxAugust-x7i
    @MaxAugust-x7i 23 дня назад

    Josiane Plain

  • @xanderlinhares
    @xanderlinhares 24 дня назад

    The idea that the model goes along with or “believes” whatever was already said explains so much!

  • @sadshed4585
    @sadshed4585 25 дней назад +1

    How much does training a model that big cost?

    • @sadshed4585
      @sadshed4585 25 дней назад

      dang 2 48gb cards to run fp8 that's crazy I will stay with my 13b and lower models till I can afford more, I wish nvidia made bigger cheaper cards(32gb for gaming cards even) most people are stuck with 3090/4090 or old architecture gpus from nvidia unless you got that bank. I wonder how much it costs to train 70b model to reflection 70b after using the pretrained llama3.1 70b instead of training from scratch. Gpt4o suggests 70b models from scratch costs millions

    • @prolamer7
      @prolamer7 25 дней назад

      Someone on twitter said 64x H100? So it will be bigger thousands dolars I believe

  • @LatentSpaceD
    @LatentSpaceD 24 дня назад

    Bravo ! Super exciting new possibilities a foot or a hand to boot or shoe for sure !

  • @Artificialintelligenceo
    @Artificialintelligenceo 24 дня назад

    GG´s

  • @telotawa
    @telotawa 24 дня назад

    who even needs strawberry anymore? sorry openai you got scooped

  • @d.d.z.
    @d.d.z. 23 дня назад

    The best interview by far. Open source as superpower.

  • @renegadephalanx
    @renegadephalanx 24 дня назад

    Not going to be a side project anymore :D Bravo.

  • @restrollar8548
    @restrollar8548 25 дней назад +4

    Inspiring story behind Reflection.

  • @andrewlewin6525
    @andrewlewin6525 25 дней назад +3

    Great content!! Keep up the good work