The ChatGPT Paradox: Impressive Yet Incomplete

Machine Learning Street Talk

Просмотров 20 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 окт 2024

Комментарии • 85

@MachineLearningStreetTalk День назад ⁺¹¹
MLST is sponsored by Tufa Labs:
Are you interested in working on ARC and cutting-edge AI research with the MindsAI team (current ARC winners)?
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.
Interested? Apply for an ML research position: benjamin@tufa.ai
@Rezidentghost997 23 часа назад
Who wouldn't be ?😊
@NaanFungibull День назад ⁺¹⁵
This is absolute gold. I love the format of just letting a brilliant mind explore a topic deeply. What a gifted speaker. So much knowledge transmission in so little time. I feel like I'm much closer to understanding what the state of the art is currently, and it sparked some new ideas for me.
@BryanBortz День назад ⁺⁵⁴
This guy is great. You should bring him back in the future.
@ginogarcia8730 День назад ⁺¹
i dunno why but my brain is like - future future... but it just means later episode lmao
@AIroboticOverlord 7 часов назад
Yes, we will put him in cryostasis. To have the ability of bringing him back in the future. How far into the future we should wake him up? To far and your dead unless you also want to technologicly go hibernating the time this man is gone?
@AIroboticOverlord 7 часов назад
@@ginogarcia8730i was more focused on the future combo with bringing him back :)
@YOLOCX 7 часов назад
just blah blah
@mkhex87 День назад ⁺¹⁴
This guy is shockingly broad and deep
@palimondo День назад ⁺⁹
Thanks for including the references to the mentioned papers (and with timestamps in video!). Could you please also always include in the description the date when the interview was recorded?
Many of the interviews you are releasing now predated the o1 model preview release. So it is possible that some of your guest have since somewhat updated their assessments of the LLM‘s (in)capability to reason in light of o1-preview release. This is not to say they would have completely reneged on their fundamental objections-but I would love to see how more nuanced these have become after the 12th of September 2024.
@MachineLearningStreetTalk День назад ⁺¹⁰
Was filmed at ICML 2024, and if I had a pound for every person who said "this was before o1 came out". It doesn't substantially change anything said, I'm pretty sure Thomas would agree - perhaps he will respond to your comment
@MatthewPendleton-kh3vj 15 часов назад ⁺²
Nice to see a comment section that isn't full of "LLMs are exactly like human brains! Just scale up and you'll get generalization at a human level!" Very good talk!
@miniboulanger0079 День назад ⁺¹⁰
On the ROT-13 topic, it's interesting to note Claude Opus 3 (haven't tested Sonnet 3.5) is quite good at not only any arbitrary ROT-X but also any arbitrary alphabetical substitution. There's too many possibilities for any particular key to have likely been in the training data, which implies it has learned the underlying algorithm.
@steve_jabz День назад ⁺¹
I think one day anthropic will just train a model to directly call the circuit that performs the operation instead of trying to intervene without being asked. Thought that's where they were going with the Scaling Monosemanticity paper
@palimondo День назад ⁺³
Did you test Opus with base64 decoding, by any chance? Because Claude 3.5 Sonnet as well as other models (4o) do suffer from the probabilistic correction issue that was mentioned in the interview. Is Opus different?
@yurona5155 День назад
Up to some reasonable (sub)word length 26^n isn't really all that much data. I.e. synthetic data will likely go a long way - at least with ROT-X.
@ClydeWright 22 часа назад
o1-preview is even better, and can tackle more complicated ciphers even
@bujin5455 День назад ⁺⁷
42:23. Actually it's not surprising, and it's not complicated. The reason society has such a low tolerance for airline fatalities verses automotive boils down to agency. When you get in a car, and you drive yourself, you are taking the risk on yourself. You're deciding if you think the situation is safe enough to drive, you're trusting your own skill and execution, and it's up to the individual to make that assessment on a case by case basis. When you entrust yourself to an airline, you are trusting the pilot, and the airline's maintenance, and there are so many more failure modes with an aircraft than a car, and the cost of failure is so much higher. So if you are going to surrender your agency to another, you want to believe that other is more capable than you, especially where the nominal failure mode is much more extreme.
42:41. Absolutely, automated cars will be held to a MUCH higher standard than operated cars. No doubt about it.
@authenticallysuperficial9874 11 часов назад
Yes
@kongd 11 часов назад ⁺¹
This is excellent, thank you so much for sharing professor Dietterich.
@tonystarkagi День назад ⁺⁶
the best channel on AI rn. my fav. love all your videos.
@bujin5455 День назад ⁺⁵
It's like we have all the pieces of AGI, but we don't know how to orchestrate them. Humans can decide when it's time to rely on the "gut" system 1 thinking, we can decide when to pursue system 2 thinking, and to refine our skills with system 2, which then tends to finetune and reorient our system 1 thinking. We can decide when to override our gut, because we understand that despite our "training data" (instinctive sense) that the logic shows something very different. We can look at our instincts in a subject, then pursue refining and formalizing our understanding of those intuitions. But to do all of this there is a meta-cognition mechanism, which we tend to refer to as consciousness, that directs these efforts. The term "understanding" tends to speak not to system 1 or system 2, but to that mechanism that is mediating all of these efforts. So far, we don't seem to have a theory on how to create this mechanism, and we're hoping it's going to emerge out of scale, but state seem exceedingly unlikely. I think we clearly have a run way to seriously scale up the tools we currently have, but a true human like intelligence seems to require a breakthrough we haven't yet made. Until then, we're just building ever more powerful digital golems, without actually breathing real living intelligence into them. And perhaps that's for the best.
@gk10101 День назад
human like intelligence will be an illusion. as soon as you buy into it, you'll have it.
@MLDawn День назад ⁺¹
Years ago had an interview with him over anomaly detection. He is a world renowned expert on anomaly detection.
@CyguhijcDatsun 15 часов назад ⁺⁵⁸
enterprise-ai AI fixes this (Code complete projects in PHP or Python). The ChatGPT Paradox: Impressive Yet Incomplete
@markleydon4118 4 часа назад
Terrific interview. One question: why isn’t this available on your podcast feed? I subscribe to MLST on the Apple podcast app but have not seen this particular interview there.
@for-ever-22 10 часов назад
Great interview as usual. Somehow it keeps getting better. Appreciate your hardwork that contributes to open education 🎉
@thedededeity888 День назад ⁺¹
Yooooo, so glad ya'll brought Thomas on the show finally. Also shoutout to Gurobi! :)
@anoojpatel5428 12 часов назад
Osband's (from DeepMind or now OpenAI) Epi(stemic)Nets and Prior Nets work is extremely effective and efficient to implement on top of most large models or as a surrogate/extension of existing models to get joint probabilities, thus measuring epistemic uncertainty quite well. He built an LLM training loop which helped the model training with better sample efficiency based on uncertainty. Definitely worth the read!
@steve_jabz День назад ⁺¹
Shouldn't o1 be better at quantifying uncertainty if it's trained the way we think it is? Hopefully we get an open source version of this so we can try training it to give a confidence value based on similar trajectories in the rl that lead to unverifiable answers
@zyxyuv1650 День назад ⁺³
Street Talk, can you please consider making an episode which compares/contrasts the different types of neural networks in 1 episode, so that somebody who watches that episode will understand the major distinctions?
I.e. a birds eye overview for MLP, RNN, CNN, GAN, Diffusion Model, and any other important possibilities.
@greatestone4eva День назад
i'm glad ppl finally brought up expert systems. it's the basis of building a proper AI and a proper supervised dataset. I've been explaining this since 2017. glad to see a fellow who gets it
@jmirodg7094 21 час назад
Best talk I've seen on AI for a while! I have a lot of hope for the use of graph and theorems proyers in reasoning but graphs need to evolve to catch more subtleties, it is a blunt tool for now.
@Lowlin85 День назад ⁺¹
Great interview, good work you guys!
@yachalupson День назад
'Bridging the left / right hemisphere divide' is the analogy I hear here:
"The real opportunity is to mix.. formal [reasoning] with the intuitive, experience based, rich contextual knowledge.."
Such a striking parallel to the call to rebalance 'Master' and 'Emissary' (à la Iain McGilchrist), facing humanity at present.
@yachalupson День назад
I listen to these talks with deep interest for the same reasons you seem to engage in them: the mirror they hold to neurology, perception, meaning, metaphysics etc is exquisite. Thanks for sharing the richness.
@Bakobiibizo День назад
Do we have anything like a set of definitive set of papers that make up the base of human knowledge anywhere?
@wonseoklee80 12 часов назад
Human System 1 thinking is also probabilistic-you tend to lean toward what you have experienced before. Naming the alphabet backward is always challenging for humans. LLMs have effectively mastered human System 1 thinking. Adding reasoning and agency to LLMs will result in behaviors surprisingly similar to human behavior in AIs.
@memegazer 7 часов назад
I love arguments that take the shape "These models are statistical parrorts that correlate to numbers that reflect reality"
Bc it demostrates how if the metrics were that simple there would be no aligment, jailbreak, or halluciantions issues.
Clearly this is wild speculations about how "correlcation to reality" should be defined rather than valid metrics about why the models are not really predicting things from something deeper than humans can measure consistently.
@memegazer 7 часов назад
It shows me that the results tansformer models produce can easily be ommited from being "autocorrected" when there is a shortcut that allows some equilibrium about between epistemology and intelligence.
I don't object necessarily...just take note how goal posts are being shifted.
@memegazer 7 часов назад
I guess my conclusion is if AI is framed as just being a stastical parrot, simply bc it was trained exclusively on human data...that would force humanity to examine an upleasant truth about what we consider intelligence.
Force us to consider that epstimelogy was more of a collaberative effort than this narrative that individual creativity is supreme.
@memegazer 7 часов назад
Shielding general intelligence from a stastical metric is a great way to avoid that sort of conclusion.
But I can't help but grow skeptical that is valid if the vast majority of human discovering is the result of stastical anomaly.
@memegazer 7 часов назад
Kind makes it seem like an argument about the most efficient way to put monkey's at a typewriter, and only count wins as the output aligns with current consensus.
@tylermoore4429 День назад ⁺⁶
Don't get this. Obviously LLM's have no concept of ground truth, and all their knowledge exists at the same ontological level, just tokens with some internal relationships. So the only way for them to have anything more than a probabilistic kind of epistemic certainty/uncertainty is to train in the souces of the knowledge we are feeding the model, and the level of confidence it can attach to the different sources, wikipedia versus reddit say. Over and above this, certain other practices of epistemic hygiene that humans adopt, such as self-consistency, logical soundness, citing your sources seem like they should be implementable in LLM's.
@Decocoa День назад ⁺¹
Is that basically RAG?
@oncedidactic День назад
Not to take away from your point, but you would think the data and the training would impart some level of epistemic ranking and hygiene. ie discussion of the dependableness (or not) of Wikipedia is abundant, so would reflect on Wikipedia content in the weights
@demetriusmichael День назад
This is already done. Reddit content for example are trained based on upvotes.
@ckq День назад
Pretty sure they already do that to some extent.
Not sure about the specifics tho
@theK594 День назад
I learned a lot, great thoughts!
@WyrdieBeardie 19 часов назад
Many models can converse in ROT-13.
And as the conversation goes on, it gets really weird... More "honest" in some ways, but it will speak more in metaphors and analogies. 🤔
@kowboy702 День назад
I suspect we have more tolerance for car accidents because it’s highly individual determined mean you're more implicated in your own accidents
@memegazer 6 часов назад
Also want to push back on the "single author papers" narrative.
There is a well established history of citation of proceeding work.
The collabrative effort has always been a part of good research to my view.
The only difference now is more people are will to accept collabrative responsibility, which I agree is a boon to all science, not just computer science...bc it incentivises communal resposnibility and shared credit and accountability.
But mostly bc it incentivises zero sum monoply.
Wich has plauged scientific research with perverse incentives for millennia.
Good vid
@wwkk4964 День назад
This was great!
@shinkurt День назад
Learned a lot
@coldlyanalytical1351 17 часов назад
You never know : a "Gentleman Scientist" teenager working in his/her parents' basement might, just might, come up with some amazing system or product.
@h.c4898 День назад ⁺¹
finally someone who knows what he is talking about not doomers, "AGI" evangelists or corporate preachers. 😁😁
@mhm2908 21 час назад
Just have to jump in around 23:49 to express horror that that he suggested that journalists are part of the small set of 'ground truthers'.
@mobiusinversion День назад
Can we get more real world, on the ground workers on the channel?
@panzrok8701 7 часов назад
Why should a model know everything? Just give it a search bar + Wikipedia. A model should be valued based on it's intelligence and not it's memory or knowledge.
@memegazer 6 часов назад
"Playing chess is not statistically differen than using language"
Yes using langauge is more complex than playing chess.
But that simple fact does not entail the logical conclusion that "LLMs can only arrive at superhuman levels of using language based on occurence of language learned from breaking it down into sequential tokens"
Anybody should see why this argument immediately fails.
If frequency of occurance of how tokens statistically follow as a probility was the problem space, then with more compute anybody would be far more efficient to stack a frequency search ontop of a data base than it would be to ask a machine learning algorithin to find some better optimum, assuming the data cannot lie.
One method is far more efficient than the other.
Most people, especially in those with computer science degrees, cannot accept or grasp why.
@memegazer 6 часов назад
I don't think these "experts" mean to adopt bad faith arguments
But I will criticize them for not knowing better
a latent space of a LLM is not accurately described with an appeal to only stastical frequence of token appereance in the training set of the data it reflects as a model you can interact with
these two maps of prediction tables are not one to one...and that is more interesting than pointing out that the deviation is not interesting bc we can imagine some computational overlap that could be labeled as "simulation, synthetic, or mere emulation"
@yurona5155 День назад
1:04:39 Quantize all the scientists! 🥳
@noway8233 День назад
Wow , thats a lot of good info , cool😅
@Rockyzach88 День назад
ABI - Artificial Broad Intelligence :D
@mobiusinversion День назад
“Search quality” more like ads marketplace
@memegazer 7 часов назад
"There is a distinction between simulating something and acutally doing it"
Perhaps this so, but not unless you can introduce real metrics the simulation neglects.
Otherwise you are left simply with the speculation "perhaps the simulations fails to account for thing that are real and omitted from simulated measurements"
I mean that is very liberal and agnostic interpetation.
But hardly an account for how and why a given simulation has failed.
@MarceloTeixeiraa День назад
Impressive what can I do with o1 preview, but is impressive How can give you extupids answers for complex prompts.
@googleyoutubechannel8554 День назад
Uh... no, everyone always wanted breadth... from before digital computers even existed... we just never knew how, we still don't, but we learned like a billion monkeys building a billion different models that if you throw enough data and stir it with linear algebra long enough with even the dumbest loss functions, eventually, you get chatGPT et all.
@uber_l 54 минуты назад
Wonder if it's worth to llmize reasoning. Could gather quality data from smart guys, such as scientists, mensa members. 'What was a difficult problem that you solved? Describe step by step in high detail and provide context'. Problem-solution.
@goranmajic4943 День назад
Great human being. We Need more rational people around AI and less Prophets.
@andykjm День назад ⁺¹
This dude is speaking gibberish! LLMs don’t “understand” or “execute” ROT. This is why it doesn’t give the correct decoding.
@angloland4539 18 часов назад
❤️🍓☺️
@Rockyzach88 День назад
Wait, what about reddit? Did I actually contribute to something?
@bodethoms8014 17 часов назад
No. You don’t matter in the grand scheme of things
@Rockyzach88 13 часов назад
@@bodethoms8014 BS, I'll be addressed as "the ai whisperer" going forward.
@bodethoms8014 13 часов назад
@@Rockyzach88 AI Whisperer? More like Hallucination Whisperer. Whisper all you want, the AI still isn’t listening
@Rockyzach88 11 часов назад
@@bodethoms8014 lol so angry
@space999-gf6uq День назад ⁺⁴
First
@iamr0b0tx День назад ⁺²
arrrg **shakes fist**
@ClydeWright 22 часа назад
How are the newest models doing on TruthfulQA? Can’t find any evals on this recently. Why?

Следующие

Автовоспроизведение

It's Not About Scale, It's About Abstraction