I don't know if I understand this correctly. Isn't the reason for the residual connection that it is often easier to learn the difference between state x and state y than the transformation from x to y. That's how it is explained in the ResNet paper.
Loving these shorts for transformers🙌
Thanks so much!
Loving your shorts! Actually this question was asked to me in an interview. Please keep on posting such videos ❤
Thanks for the feedback. Super glad this type of short was helpful to you 🙂
Great series of shorts about Transformers!
Can you please explain cascaded/densenet next? They seem underrated, like more powerful versions of resnet
I don't know if I understand this correctly. Isn't the reason for the residual
connection that it is often easier to learn the difference between state x
and state y than the transformation from x to y. That's how it is explained in the ResNet paper.
So kinda like gradient boosting residues?
Okay
Keep Going Man, Great to see useful content than shitty shorts 🤭