Take your personal data back with Incogni! Use code bycloud at the link below and get 60% off an annual plan: incogni.com/bycloud maybe we are all bots and the dead internet theory is true
Please create your own style of thumbnails and stop trying to mimick Fireship lol... I'm being honest rn but the contents in your videos don't feel as interesting/ funny/ easy-to-understand as his. Hope you take that as constructive criticism because you do cover lots of cool topics that Fireship doesn't.
I disagree with the comments complaining that the video is too technical. I really like that you provide enough detail to roughly understand the technique, awesome video!
I found the pacing a bit off. In general very well eddited and summarized information. But its hard to keep track with all the vocabulary, personally id ether need to linger longer on these details or get an even shorter overview on those aspects. I really like the style of Yannic Kilcher Paper reviews but his videos are also 3 times as long, so in any case its a tradeoff what one prefers.
@@seriousbusiness2293Honestly, I feel like it's because his target audience was different, and now it's more technical, so he'd need more time to explain those concepts instead of expecting a baseline understanding. But going more in detail would scale logarithmically with video length, which would also hurt his YT channel, considering we all expect at best 5~15 minute videos from this channel. So, yea, it's a trade-off.
@@w花b But the problem is that bycloud tries to mimick Fireship's thumbnail style to lure in Fireship viewers then throw them off with 10+ minutes videos of overly technical stuffs, who prefer ~5 mins of mixed interesting, meme-y, simplified contents instead.
But imagine if those ML models are CNNs and you can see how the kernel adapt to the context of the input in real time, wouldn't that be actually easier to interpret?
I love that you tell us how the method in the paper roughly works. A lot of RUclips channels just say this new technique is better without any explanation and just show results, so I have to skim the paper to get the gist of it.
I would never suspect that this video would help me write my PhD, but the "compression heuristic" is exactly the term I needed but didn't know to express my idea. Thank you!
Good short dense overview of an even super denser subject matter. Still waiting for the paper that modularizes all these component processes and flows then runs training against all the permutations to bootstrap itself.
Perfect amount of complexity, please do not make your longer videos like this more simple, im not involved in any form of computer science but ive kept up with ai since tensor flow was brand new and i understood almost everything first try
Whenever a new architecture takes over, the tech companies heavily invested into developing hardware specifically optimized for the transformer architecture are gonna be sad.
Dude no kidding, I came up with something similar a month ago. In concept. I'm afraid I have a limited num of insights in my life time. And without timento persue them I will never make any diference in the world. 😢. But hey that also proves, to me at least, that my math intuition is on point. 😅
If you thought of it other people thought of it or will, so don’t worry about not being the one who gets credit, what matters is that the idea is in the memosphere
Imagine giving money to a service for a sense of security because it is now the status quo to let every substential company out there infringe on your privacy rights. Just a thought. What parallel universe is this?
Well, some year ago or so I had thoughts about going into ML, but you have lost me on this one. 👍 I guess it's only gonna get more complicated from now on.
Nowadays its easier to learn ML than ever. You should start with something simple enough that you understand around 80% and only actually doing 20% as smth new. There are lots of free shared classes like MIT, Stanford etc.. lots of tutorials, examples, code documentation. First get a general yet simple view bout NN, then chose what you'd like to specialize: image recognition, text or smth else
@@bycloudAI btw this was posted on r/singularity where there are more normies - obv u need normies if you want growth though, but any technical video is automatically going to have a very niche audience understandably so, so you probably dont mind that aswell. i mean i watch ur stuff and most of it goes over my head but interesting regardless, but just letting u know the feedback here is kind of skewed.
AFAIK, output context windows are not a thing for the models themselves, the model is just called once for every token it has to generate, you can perform that process a million times if you want, however, it's not useful if the LLM outputs text up to a point where its prompt gets out of its context window, so in the early days the "output window" was just set to whatever the model's context window was, nowadays, it's probably capped for economic reasons, LLMs get more expensive the longer the input is, so by limiting the output window, they force you to pay for tokens several times, once as the model's output, and subsequent times as input to the next outputs
A single brain neuron needs something like 5 layers or so to encode its behavior. So this kinda maps each node now to somethibg like a neuron. I know biological features map poorly to neural nets but neurons in the brain change how and when they fire as the brain learns.
@@kingki1953 Actually dark internet is really lame right now. You can spot these comments from a mile away Your videos are always so informative and interesting! Thank you for that! Thank you for your work! Your videos are always top notch! Always a pleasure to watch your videos! I will be looking forward to new episodes!
guys I'm in high school and I'm trying to choose a career path. my no.1 choice considering the things I like and that I'm good at is becoming an AI reaearcher, can anyone in the academic world tell me if it would be a fun job or not?
It definitely is. But the field is getting increasingly complex, fast-paced, and hyper-competitive. I'd recommend studying computer science and mathematics, since you will not be able to compete in this field without a very strong mathematical background. Except for that, go for it. I'm a researcher in parallel processing and numerical high-performance computing. It is definitely fun and rewarding, but be prepared for a painful journey.
The quadratic complexity is not the main problem of current LLMs. It's that they are dog sh*t at reasoning (and tasks that depend on it) and a better scaling with context length won't solve that.
Take your personal data back with Incogni! Use code bycloud at the link below and get 60% off an annual plan: incogni.com/bycloud
maybe we are all bots and the dead internet theory is true
Please create your own style of thumbnails and stop trying to mimick Fireship lol... I'm being honest rn but the contents in your videos don't feel as interesting/ funny/ easy-to-understand as his. Hope you take that as constructive criticism because you do cover lots of cool topics that Fireship doesn't.
I disagree with the comments complaining that the video is too technical. I really like that you provide enough detail to roughly understand the technique, awesome video!
Agreed, this is very approachable to someone who knows some architecture fundamentals!
I found the pacing a bit off. In general very well eddited and summarized information.
But its hard to keep track with all the vocabulary, personally id ether need to linger longer on these details or get an even shorter overview on those aspects.
I really like the style of Yannic Kilcher Paper reviews but his videos are also 3 times as long, so in any case its a tradeoff what one prefers.
@@seriousbusiness2293Honestly, I feel like it's because his target audience was different, and now it's more technical, so he'd need more time to explain those concepts instead of expecting a baseline understanding. But going more in detail would scale logarithmically with video length, which would also hurt his YT channel, considering we all expect at best 5~15 minute videos from this channel. So, yea, it's a trade-off.
Yeah they might as well just watch Fireship because that's what they're asking.
@@w花b But the problem is that bycloud tries to mimick Fireship's thumbnail style to lure in Fireship viewers then throw them off with 10+ minutes videos of overly technical stuffs, who prefer ~5 mins of mixed interesting, meme-y, simplified contents instead.
Turn all the hidden states into ML models? That scream of pain you all just heard was from the interpretability researchers ;)
OK, but then their employment is secured forever LOL
We need black boxes for the black boxes!
But imagine if those ML models are CNNs and you can see how the kernel adapt to the context of the input in real time, wouldn't that be actually easier to interpret?
@@koktszfung CNNs are more like DL, ML models are simpler
@@naumbtothepaine0which simpler ML models? XGBoost? SVM? Because CNNs are ML models too
In conclusion, Trouble in Terrorist Town is cooler than some transformers and some snakes.
Please make a video explaining all of these terms, apart from that, keep the technical videos coming!
Another day, another attempt to re-invent LSTMs
Whats that now?
@@babyjvadakkan5300type of rnn that google used to use (or still does?) for language translation before we got transformers
It is more of a generalization of both LSTMs and Attention, it is theoretically much more powerful IMO
It's definitely an interesting idea
holy shit this method is so interesting. and the way they encapsulated the entire idea into the title LOL!
I love that you tell us how the method in the paper roughly works. A lot of RUclips channels just say this new technique is better without any explanation and just show results, so I have to skim the paper to get the gist of it.
Yup makes you feel like you're learning something rather than information without enough context
"too technical for this video"
man you lost me at the thumbnail
me to bro yet I still watch he entire video 💀
My brother speaking greek
As an IA, thanks you for teaching me this, i will use it to train myself
intelligently artificial
@@ginqusinteligencia artificial
intelligent anti-africa
@@truongao5425😂 troll
I would never suspect that this video would help me write my PhD, but the "compression heuristic" is exactly the term I needed but didn't know to express my idea. Thank you!
This channel explaining AI and using anime references in the visuals is exactly what I needed. Great video!
Good short dense overview of an even super denser subject matter.
Still waiting for the paper that modularizes all these component processes and flows then runs training against all the permutations to bootstrap itself.
As always, thank you for the video! I really appreciate the amount of technical details here. Don't know why other people complain but I love it!
Let's put transformers into transformers. Maybe we end up with baby transformers.
Ah yes, hot transformers in transformers action
"Mom! They are adding more weights to the models again!"
It’s trainable models all the way down! Great video, thanks!
2:32 Waiting for bycloud to be on that page like others!
I got up to 6 minutes and loved the ride! Gonna have a snack and p and dive right back in!
How do we know the ones complaining about the bots in youtube chat aren't bots themselves?
I have definitely seen bots complain about bots before. In fact, you could also be a bot. Who knows at this point?
I might be a bot
I'm definitley not a bot, what a stupid idea.
Ban both then. Spamming is annoying ether way.
The interesting thing is it's probably cheaper for a bot to spam "bot" than create LLM comments.
It's some mamba jamba
More videos like this, please.
Audience: Less reading, more technical content!
Also audience: AAAAAAHH, MY EYES! TOO TECHNICAL FOR MY EYES AND EARS! 😢
Perfect amount of complexity, please do not make your longer videos like this more simple, im not involved in any form of computer science but ive kept up with ai since tensor flow was brand new and i understood almost everything first try
TTT is literally short term memory. Wild.
that intro was amazing
6:08 it resembles Trouble in Terrorist Town
I love words.
Love the video!!
Cool video, for a beginner all these terms together seem very technical, can someone suggest a playlist to learn more in depth about these topics ?
What part are you struggling with?
Super well explained. And full of memes
Training on test data,,, unless I severely miss-understand this I'm just going say; "jikes, nope, get out, and don't come back"
Sooooo many bot comments!
Putting innocent comments to change it into ads later
@@bolon667 I think you are right. The comments which I noticed earlier have gone?
Earth shattering.
great video but transformers in practices do not have quadratic complexity, only if u implement it in the vanilla way
I want a TTT-Linear (T) with TTT-MLP (M) as it's inner loop.
Good video, thanks
I know some of these words!
I tried to read this when u wrote about in your newsletter. But it was not an easy paper
Whenever a new architecture takes over, the tech companies heavily invested into developing hardware specifically optimized for the transformer architecture are gonna be sad.
Nahh they actually cooking with this architecture though
Bumblebee is my favorite
4:11 Why not use wavelet transform for this?
I think it would be useful here since
did i understand this
Wouldn't this model be slow in operation if it has to train on the context?
Bro this model is too complicated to be simplified more. Keep up the complexity it’s what makes it interestijg
Dude no kidding, I came up with something similar a month ago. In concept. I'm afraid I have a limited num of insights in my life time. And without timento persue them I will never make any diference in the world. 😢. But hey that also proves, to me at least, that my math intuition is on point. 😅
A lot of people have zero insights.
Its important to work on your ideas to test them in reality
If you thought of it other people thought of it or will, so don’t worry about not being the one who gets credit, what matters is that the idea is in the memosphere
Imagine giving money to a service for a sense of security because it is now the status quo to let every substential company out there infringe on your privacy rights.
Just a thought. What parallel universe is this?
This is crazy
Well, some year ago or so I had thoughts about going into ML, but you have lost me on this one. 👍
I guess it's only gonna get more complicated from now on.
Nowadays its easier to learn ML than ever. You should start with something simple enough that you understand around 80% and only actually doing 20% as smth new.
There are lots of free shared classes like MIT, Stanford etc.. lots of tutorials, examples, code documentation.
First get a general yet simple view bout NN, then chose what you'd like to specialize: image recognition, text or smth else
Does this relate in anyway to liquid time constant neural networks?
so nobody gonna talk about how we just got rickrolled? at 3:43
I watched half of the video and this is too technical for me. I’m skipping this one. Congrats to everyone who understands this video!
it's like RNN's hidden states are just ML models
thanks for watching till half way tho
Just watch it 3 times
@bycloudAI ili bro I explain hold up, I'm getting what hes saying. You need break it down in simple terms that relate to real world apps. Visualize.
@@bycloudAI btw this was posted on r/singularity where there are more normies - obv u need normies if you want growth though, but any technical video is automatically going to have a very niche audience understandably so, so you probably dont mind that aswell.
i mean i watch ur stuff and most of it goes over my head but interesting regardless, but just letting u know the feedback here is kind of skewed.
my brain is exploding send help
Next up is cisformers
detransformers
biformers
formers
@@anywallsocket forms
performers?
MAMBA IF YOU CAN HEAR ME PLEASE SAVE US
No, no, no, I do not want to add neural networks to recursion, I JUST BEGAN TO UNDERSTAND RECURSION DON'T DO THIS TO ME!!!
This looks block recurrent transformers by DL Hutchens
good videos👍
tell me the current paradigm is hitting a dead end without telling me
The human brain is a recurrent neural network, not a transformer. Eventually, recurrent will win.
But who said the human brain is better than the transformer
Lol thats an intense amount of memeage
Yeah, if you made up everything you said in the video I wouldn't be able to tell at all. Stuff is getting harder and harder to understand.
Tho didn't they warn us against meta-optimizers due to the alignment becoming impossible?
Short answer: no
Yo dawg, I heard you like ML models...
It seems very convoluted, but I guess it should learn with less data? That could be good for startups that don't have big datasets.
why is every llm's OUTPUT context window fixed to 4096?
AFAIK, output context windows are not a thing for the models themselves, the model is just called once for every token it has to generate, you can perform that process a million times if you want, however, it's not useful if the LLM outputs text up to a point where its prompt gets out of its context window, so in the early days the "output window" was just set to whatever the model's context window was, nowadays, it's probably capped for economic reasons, LLMs get more expensive the longer the input is, so by limiting the output window, they force you to pay for tokens several times, once as the model's output, and subsequent times as input to the next outputs
To stop it. While still giving enough space to make “satisfying” answers.
A single brain neuron needs something like 5 layers or so to encode its behavior. So this kinda maps each node now to somethibg like a neuron.
I know biological features map poorly to neural nets but neurons in the brain change how and when they fire as the brain learns.
just give them more memory!
Nice explanations but go easy on the vocabulary. I don't reckon every joe out there knows will understand all the terms. The pacing is too quick too.
Wouldn’t that take forever to train??
I hate such advertisment shockers that are not separated adwautely form the main material. Not gonna subscribe to a channel that does that.😢
很棒😀
You should consider to ban bot in your channel.
You just upload and 3 bots already comment, dark internet is scary 😢
@@kingki1953 Actually dark internet is really lame right now. You can spot these comments from a mile away
Your videos are always so informative and interesting! Thank you for that!
Thank you for your work! Your videos are always top notch!
Always a pleasure to watch your videos! I will be looking forward to new episodes!
fractal ai models
Are we all botted comments?
I'm here for the waifu memes
Good video tho
guys I'm in high school and I'm trying to choose a career path. my no.1 choice considering the things I like and that I'm good at is becoming an AI reaearcher, can anyone in the academic world tell me if it would be a fun job or not?
It definitely is. But the field is getting increasingly complex, fast-paced, and hyper-competitive. I'd recommend studying computer science and mathematics, since you will not be able to compete in this field without a very strong mathematical background. Except for that, go for it. I'm a researcher in parallel processing and numerical high-performance computing. It is definitely fun and rewarding, but be prepared for a painful journey.
This video was too much for me.
damn yeah that's AI stuff right hahaaa? tbh I understand a quarter of this, but I really enjoy a lot of your videos
look at the amount of bots lol
This is nothing. Check out any popular video about trading
what?
The quadratic complexity is not the main problem of current LLMs. It's that they are dog sh*t at reasoning (and tasks that depend on it) and a better scaling with context length won't solve that.
leenear
dude are all your videos infomercials for half the video????????
would love to collaborate and learn with you
First!
Nah, the bot beat you to first bro
Bro your thumbnail and fireship's thumbnail are looking similar. Someone has to change/alter their thumbnail
Yeah i did not understand shit. Basically better archetecture
“unlock linear complexity having expressive memory bla bla bla bla bla bla” was this written by chatGPT?
It sounds human to me, even if it contains some technical jargon. ChatGPT writes differently
what a load of bollocks