![Sourish Kundu](/img/default-banner.jpg)
- Видео 14
- Просмотров 78 020
Sourish Kundu
США
Добавлен 14 авг 2023
Hi all! I produce content about computer science and technology in general, along with some of my thoughts on life. I believe that in order to become the best version of yourself, one must find true passion and joy in what they work on and this channel is me sharing that with others.
I recently graduated college from the University of Wisconsin-Madison majoring in Computer Science, Data Science, & Economics. One thing that was really hard for me during this period was preparing for internships and getting offers. I also want to share some of my best tips and advice for current college students so hopefully their experiences are a little bit smoother than mine!
All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.
I recently graduated college from the University of Wisconsin-Madison majoring in Computer Science, Data Science, & Economics. One thing that was really hard for me during this period was preparing for internships and getting offers. I also want to share some of my best tips and advice for current college students so hopefully their experiences are a little bit smoother than mine!
All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.
Day in the Life of a Machine Learning Systems Engineer @ TikTok Bay Area!
Join me on a typical day as a college new grad balancing my work as a machine learning systems engineer at TikTok with my RUclips channel. Life after college has converged onto a much more predictable routine for me and I'm really excited to share that with you guys!
*Disclaimer: All opinions are my own, and do not represent the position or opinions of the Company.*
Resources:
Galvatron Paper: arxiv.org/abs/2211.13878
MegaBlocks Paper: arxiv.org/abs/2211.15841
Timestamps:
0:00 - Good Morning!
0:46 - Morning Routine & RUclips
1:26 - Get Ready for Work
2:02 - Driving to Work
2:37 - Workout
3:10 - Starting the Work Day
4:10 - What I do at TikTok
6:09 - Lunch
6:33 - After Lunch
6:56 - Dinner
7:27 - Driving H...
*Disclaimer: All opinions are my own, and do not represent the position or opinions of the Company.*
Resources:
Galvatron Paper: arxiv.org/abs/2211.13878
MegaBlocks Paper: arxiv.org/abs/2211.15841
Timestamps:
0:00 - Good Morning!
0:46 - Morning Routine & RUclips
1:26 - Get Ready for Work
2:02 - Driving to Work
2:37 - Workout
3:10 - Starting the Work Day
4:10 - What I do at TikTok
6:09 - Lunch
6:33 - After Lunch
6:56 - Dinner
7:27 - Driving H...
Просмотров: 805
Видео
Who's Adam and What's He Optimizing? | Deep Dive into Optimizers for Machine Learning!
Просмотров 44 тыс.Месяц назад
Welcome to our deep dive into the world of optimizers! In this video, we'll explore the crucial role that optimizers play in machine learning and deep learning. From Stochastic Gradient Descent to Adam, we cover the most popular algorithms, how they work, and when to use them. 🔍 What You'll Learn: Basics of Optimization - Understand the fundamentals of how optimizers work to minimize loss funct...
Building My Ultimate Machine Learning Rig from Scratch! | 2024 ML Software Setup Guide
Просмотров 11 тыс.2 месяца назад
Join me on an exhilarating journey as I dive into the world of Machine Learning by building my very own ML rig from scratch! This video is your all-in-one guide to assembling a powerful machine learning computer, explaining every component's role in ML tasks, and setting up the essential software to get you up and running in the world of artificial intelligence. 🔧 What's Inside: Component Selec...
Welcome to NVIDIA GTC 2024! | Let's Go to the World's Premiere AI Conference
Просмотров 2252 месяца назад
Join me on an incredible journey to NVIDIA GTC 2024, the groundbreaking tech conference where the future of AI, gaming, and graphics comes alive! In this vlog, I dive into the heart of innovation, bringing you exclusive insights, interviews, and a firsthand look at the latest breakthroughs in technology. 🚀 Highlights Include: Keynote Speeches: Get the lowdown on the cutting-edge announcements f...
Transform Any Room into an Art Gallery with AR! || Intro to Augmented Reality with SwiftUI - Part 2
Просмотров 1663 месяца назад
Welcome back to Part 2 of our exciting journey into Augmented Reality app development! Building on the basics covered in Part 1, this tutorial takes your AR skills further by introducing dynamic picture frames and interactive features that bring your wall art to life. Using Apple's advanced RealityKit and ARKit libraries, we'll enhance our app to make your memories not just visible but interact...
Transform Any Room into an Art Gallery with AR! || Intro to Augmented Reality with SwiftUI - Part 1
Просмотров 2523 месяца назад
In this tutorial, we dive into the fascinating world of Augmented Reality (AR) by creating an app that brings your memories to life right on your walls! Using Apple's powerful RealityKit and ARKit libraries, I'll guide you step-by-step on how to develop an AR application that allows you to place pictures from your camera roll onto any wall in your home, office, or anywhere you like. Each image ...
Cascadia Code Calls to Codersl! || Why and How to Customize Your IDE's Font
Просмотров 6544 месяца назад
👨💻🔍 In this quick chat, we explore Cascadia Code, the innovative font designed specifically for developers and its impact on the coding experience. 🌟 What's Inside: 0:00 - Intro 0:45 - Why Cascadia Code 1:48 - Ligatures 2:27 - Setup & Installation 3:51 - Conclusion 🖥️ Discover the ergonomic benefits of Cascadia Code's design, see how its unique features like ligatures and spacing enhance reada...
Let's Create a NAS! || An Automated Backup Solution with the Zimaboard, TrueNAS, & Proxmox
Просмотров 7275 месяцев назад
Today, we'll be creating a NAS, or a Network Attached Storage to store all of my files. We'll talk about how to set one up using TrueNAS inside of a Proxmox virtual machine. After discussing redundancy and backups, we'll also proceed to create an automated backup solution that backs up our NAS to BackBlaze's servers. 📘 Chapters: 0:00 - Intro 0:49 - Parts Used for the Build 3:06 - Zimaboard BIOS...
Uncovering Meaning Amidst Randomness! || A Beginner's Guide to Monte Carlo Integration
Просмотров 7605 месяцев назад
🎲 Welcome to our deep dive into Monte Carlo Integration! 🎲 In this video, we're unraveling the mysteries of one of the most intriguing and powerful mathematical techniques used in various fields from finance to physics: Monte Carlo Integration. Perfect for students, professionals, and anyone with a curiosity for mathematics and computational methods! 🔍 What You'll Learn: 1. Monte Carlo Integrat...
Understanding Bloom Filters || How to Save Space at the Cost of Certainty!
Просмотров 4626 месяцев назад
Welcome to our in-depth exploration of Bloom Filters! In this video, we demystify this advanced data structure, making it accessible and understandable for beginners. We'll be inserting our favorite fruits into the bloom filter and learning what the catch is when we go to retrieve them! 🔍 What You'll Learn: - Conceptual Overview: Get a clear understanding of what Bloom Filters are and their uni...
How to Find the Best Apartment with Optimal Stopping Theory || The Secretary Problem Explained
Просмотров 6156 месяцев назад
🔍 Unraveling the Mysteries of the Secretary Problem! 🧠 Welcome to our deep dive into the fascinating world of the Secretary Problem, also known as the Marriage Problem or the Best Choice Algorithm! This mathematical puzzle has perplexed and intrigued researchers and enthusiasts alike. In this video, we'll explore the intricacies of this classic problem, delving into its history, mathematical fo...
Landing that First Tech Internship and Beyond! || Advice I Wish I Had as a CS Freshman
Просмотров 4077 месяцев назад
👨💻 Are you a computer science student struggling to land that all-important first internship? Look no further! In this video, I share my personal journey and the strategies that helped me secure my first internship in the competitive field of computer science. 🔍 What You'll Learn: Resume Must-Haves: Discover the key elements every computer science resume should include to catch a recruiter's e...
The Ultimate Guide to UW-Madison || Why You Should Apply!
Просмотров 4,7 тыс.8 месяцев назад
Welcome to the world of the Wisconsin Badgers! 🦡 If you're a high school senior contemplating your college choices, this guide is specially crafted for you. Dive into the heart of UW-Madison with me as I share my personal journey, lessons learned, and the myriad reasons why this iconic institution might just be your dream college destination. In this video, we'll explore: - The unparalleled aca...
Train AI to Beat Super Mario Bros! || Reinforcement Learning Completely from Scratch
Просмотров 14 тыс.8 месяцев назад
Today we'll be implementing a Reinforcement Learning algorithm named the Double Deep Q Network algorithm. A lot of other videos will use a library like Stable Baselines, however, today we'll be building this completely from scratch. It'll be used to train the computer to play Super Mario Bros on the NES! This is a tutorial aimed at people that have a base level understanding of ML, but not nece...
The problem I have with these videos is that they always assume the "terrain" is a like a bowl. However, I test on problem where there are 5 parameters, and the "terrain" is more like the Grand Canyon. The path is extremely narrow and the "walls" are steep. Non of these algorithms work very well by themselves. I have also found that a line search really helps more than any of the mentioned methods. I did gives this video a thumbs up because it is one of the best on this topic. BTW, for my data the scipy.optimize.least_squares works MUCH better. Memory is cheap,
Wow super complete information now I'm subscribed now on 😮
Great video! Though I'm having trouble running parts of the code that involves calling classes from other modules. Is there anything I can do about it? This is the error message:- ImportError: cannot import name 'AgentNN' from 'agent_nn'
Excellent
“Only two things”, think you forgot one bud
you look more gujarati/marwari than bengali
bhalo video TY
Hi Sourish, the video was very helpful I found the following config on Amazon, how would you rate it. Plan to run some Ollama models and few custom projects leveraging smaller size LLMS Cooler Master NR2 Pro Mini ITX Gaming PC- i7 14700F - NVIDIA GeForce RTX 4060 Ti - 32GB DDR5 6000MHz - 1TB M.2 NVMe SSD
Thank you for such easy, simple, and great explanation. I searched quick overwiev how Adam is working and found your video. Actually I am training DRL Reinforce Policy Gradient algorithm with theta parameters as weights and viases from CNN, where exactly Adam is involved. Thanks again, very informative.
This feels like a step by step tutorial, great job! I’m building my RTX 4070 Ti machine learning PC soon, can’t wait!
This video is amazing! You covered most important topic in ML, with all major optimization algorithms. I literally had no idea about Momentum, NAG, RMSprop, AdaGrad, Adam. Now, I have a good overview of all, will deep dive in all of them. Thanks for the video! ❤
I'm really glad to hear that it was helpful! Good luck on your deep dive!
Woah what a great video! And how you are helping people on the comments kind of has me amazed. Thank you for your work!
Haha thank you, I really appreciate that!
Fantastic video for those learning about the topic. Thank you!
10:19 What a weird formula for NAG! It's much easier to remember a formulation where you always take antigradient. You want to *add* velocity and take gradient with *minus* . The formula just changes to V_t+1 = b V_t - a grad(Wt + b V_t) W_t+1 = W_t + V_t+1 It's more intuitive and more similar to standard GD. Why would anyone want to change these signs? How often do you subtract velocity to update the position? Do you want to *add* gradient to update V right after you explained we want to subtract gradient in general to minimize the loss function? It makes everything twice as hard and just... wtf...
Hi! Thanks for bringing this up! I've seen the equation written in both forms, but probably should've elected for the one suggested by you! This is what I was referring to for the equation: www.arxiv.org/abs/1609.04747
Great clear nd thorough content. I look forwrard to seeing more! 🤓
Awesome, thank you!
I echo other comments; this is such a great video and you can see the effort put in, and you present your knowledge really well. Keep it up :)
Wow, thank you!
Thanks for the great explanations! The graphics and benchmark were particularly useful.
I'm really glad to hear that!
I tried using momentum for a 3SAT optimizer token i worked on in 2010. Doesn't help with 3SAT since all variables are binary. It's cool that it works with NNs though!
Oh wow that's an interesting experiment to run! Glad you decided to try it out
Im new to coding im generell. how do i make a python file like you did?
Hi! I would recommend with setting up an IDE such as VSCode, along with a local Python environment with Anaconda. Then, you'll need to install and set up PyTorch, the instructions for which are in the repo. There are plenty of resources online about how to get started with programming so I would treat Google as your best friend!
check out fracm optimizer (Chen et al. - An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G-L Fractional-Order Derivative)
i wonder if you can combine nag and fractional gradients to make a generally even better optimizer
very clearly explained - thanks
Glad you liked it
is the code available?
Unfortunately, the code for the animations are not ready for the public haha. It's wayyy too messy. However, I didn't include the code for the optimizers because the equations are straight forward to implement, but how you use the gradients to update weights depends greatly on how the rest of the code is structured.
What is the total Cost of this Setup?
Hi! The total cost was about 2.8k although some parts I probably should’ve gone cheaper on like the motherboard. I have a full list of the parts in the description
@@sourishk07 Thank you! I did not notice the sheet in the description. Very Helpful!
This video is super helpful my god thank you
I’m really glad you think so! Thanks
I might just wanna attend GTC next year, nice video!
Love to hear it! Hopefully I’ll see you there
the implementation has more information than a whole semester of my MSc in AI
Haha love to see that. Thanks for watching!
Sorry did I misunderstand something or did you say SGD when it was only GD you talked about? When was stochastic elements discussed?
I guess technically I didn’t talk about how the dataset was batched when performing GD, so no stochastic elements were touched upon. However, I just used SGD as a general term to talk about vanilla gradient descent, like how PyTorch and Tensorflow’s APIs are structured.
@@sourishk07 I see! It would be interesting to see if/how the stochastic element helps with the landscape l(x, y) = x^2 + a|y| or whatever that example was :)
If you're interested, consider playing around with batch & mini batch gradient descent! There's been a lot of research on how batch size affects convergence so it might be a fun experiment to try out.
I am the only one to not understand the RMS propagation math formula? What is the gradient squared is it per component or is the Hessian? How do you divide a vector by another vector? Could someane explain me please.
Hi! Sorry, this is something I should've definitely clarified in the video! I've gotten a couple other comments about this as well. Everything in the formula is component-wise. You square each element in the gradient matrix individually & you perform component-wise division, along with the component-wise square root. Again, I really apologize for the confusion! I'll make sure to make these things clearer next time.
what a title 😂
Appreciate the visit!
Very Clear Explanation! Thank you. I especially appreciate the fact that you included the equations.
Thank you! And I’m glad you enjoyed it
I wonder if we could use the same training loop NVIDIA used in the DrEureka paper to find even better optimizers.
Hi! Using reinforcement learning in the realm of optimizers is a fascinating concept and there's already research being done on it! Here are a couple cool papers that might be worth your time: 1. Learning to Learn by Gradient Descent by Gradient Descent (2016, Andrychowicz et al.) 2. Learning to Optimize (2017, Li and Malik) It would be fascinating to see GPT-4 help write more efficient optimizers though. LLMs helping accelerate the training process for other AI models seems like the gateway into AGI
@@sourishk07 Thanks for the answer!
The intuition behind why the methods help with convergence is a bit misleading imo. The problem is not in general with slow convergence close to optimum point because of a small gradient, that can easily be fixed with letting step size depend on gradient size. The problem that it solves is when the iterations zig-zag because of large components in some directions and small components in the direction you actually want to move. By averaging (or similar use of past gradients) you effectively cancel out the components causing the zig-zag.
Hello! Thanks for the comment. Optimizers like RMSProp and Adam do make step size dependent on gradient size, which I showcase in the video, so while there are other techniques to deal with slow convergence close to the optimum point due to small gradients, having these optimizers still help. Maybe I could've made this part clearer though. Also, from my understanding, learning rate decay is a pretty popular technique used so wouldn't that just slow down convergence even more as the learning rate decays & the loss approaches the area with smaller gradients? However, I definitely agree with your bigger point about these optimizers from preventing the loss from zig-zagging! In my RMSProp example, I do show how the loss is able to take a more direct route from the starting point to the minimum. Maybe I could've showcased a bigger example where SGD zig-zags more prominently to further illustrate the benefit that RMSProp & Adam bring to the table. I really appreciate you taking the time to give me feedback.
@@sourishk07 Yeah, I absolutely think the animations give good insight into the different strategies within "moment"-based optimizers. My point was more that even with "vanilla" gradient descent methods, the step sizes can be handled to not vanish as the gradient gets smaller, and that real benefit of the other methods is for altering the _direction_ of descent to deal with situations where eigenvalues of the (locally approximate) quatratic form differs in orders of magnitude. But I must also admit that (especially in the field of machine learning) the name SGD seem to be more or less _defined_ to include a fixed decay rate of step sizes, rather than just the method of finding a step direction (where finding step sizes would be a separate (sub-)problem), so your interpretation is probably more accurate than mine. Anyway, thanks for replying and I hope you continue making videos on the topic!
Thanks for sharing your insights! I'm glad you enjoyed the video. Maybe I could make a video that dives deeper into step sizes or learning rate decay and the role that they play on convergence!
Absolutely loved the graphics and intensive paper based proof of working of different optimizers , all in the same video. You just earned a loyal viewer.
Thank you so much! I'm honored to hear that!
The “problem” the Adam algorithm in this case is presented to solve (the one with local and global minima) is simply wrong - in small amounts of dimensions this is infact a problem, but the condition for the existence of a local minima grows more and more strongly with the amount of dimensions. So in practice, when you have millions of parameters and therefore dimensions, local minima that aren’t the global minima will simply not even exist, the probability for such existence is simply unfathomably small.
Hi! This is a fascinating point you bring up. I did say at the beginning that the scope of optimizers wasn't just limited to neural networks in high dimensions, but could also be applicable in lower dimensions. However, I probably should've added a section about saddle points to make this part of the video more thorough, so I really appreciate the feedback!
This Server is a dream 😄
Haha stay tuned for a more upgraded one soon!
I used to have networks where the loss was fluctuating in a very periodic manner every 30 or so steps and I never knew why that happened. Now it makes sense! It just takes a number of steps for the direction of Adam weight updates to change. I really should have looked this up earlier.
Hmm while this might be Adam's fault, I would encourage you to see if you can replicate the issue with SGD w/ Momentum or see if another optimizer without momentum solves it. I believe there are a wide array of reasons as to why this periodic behavior might emerge.
why not using a metaheuristic approach?
Hi! There seems to be many interesting papers about using metaheuristic approaches with machine learning, but I haven't seen too many applications of them in industry. However, this is a topic I haven't looked too deeply into! I simply wanted to discuss the strategies that are commonly used by modern day deep learning and maybe I'll make another video about metaheuristic approaches! Thanks for the idea!
Great video dude!
Thanks so much! I've seen your videos before! I really liked your videos about Policy Gradients methods & Importance Sampling!!!
@@sourishk07 thanks! There was some hard work behind them, so I’m happy to hear they’re appreciated. But I don’t need to tell you that. This video is a master piece!
I really appreciate that coming from you!!
Gemini 1.5 Pro: This video is about optimizers in machine learning. Optimizers are algorithms that are used to adjust the weights of a machine learning model during training. The goal is to find the optimal set of weights that will minimize the loss function. The video discusses four different optimizers: Stochastic Gradient Descent (SGD), SGD with Momentum, RMSprop, and Adam. * Stochastic Gradient Descent (SGD) is the simplest optimizer. It takes a step in the direction of the negative gradient of the loss function. The size of the step is determined by the learning rate. * SGD with Momentum is a variant of SGD that takes into account the history of the gradients. This can help the optimizer to converge more quickly. * RMSprop is another variant of SGD that adapts the learning rate for each parameter of the model. This can help to prevent the optimizer from getting stuck in local minima. * Adam is an optimizer that combines the ideas of momentum and adaptive learning rates. It is often considered to be a very effective optimizer. The video also discusses the fact that different optimizers can be better suited for different tasks. For example, Adam is often a good choice for training deep neural networks. Here are some of the key points from the video: * Optimizers are algorithms that are used to adjust the weights of a machine learning model during training. * The goal of an optimizer is to find the optimal set of weights that will minimize the loss function. * There are many different optimizers available, each with its own strengths and weaknesses. * The choice of optimizer can have a significant impact on the performance of a machine learning model.
Thank you Gemini for watching, although I'm not sure you learned anything from this lol
Very nicely explained. Wish you brought up the relationship between these optimizers and numerical procedures though. Like how vanilla gradient descent is just Euler's method applied to a gradient rather than one derivative.
Thank you so much. And there were so many topics I wanted to cram into this video but couldn't in the interest of time. That is a very interesting topic to cover and I'll add it to my list! Hopefully we can visit it soon :) I appreciate the idea
7hrs of work per day, that's pretty sweet. Wondering how many hours your Chinese colleagues do...
It really depends on the team! My work life balance is pretty good, but some nights I do have to work after I get back home!
Gem of a channel!
Thank you so much!
what a good video, I watched it and bookmarked so I can come back to it when I understand more about the topic
Glad it was helpful! What concepts do you feel like you don’t understand yet?
I dont know what I did for youtube to randomly bless me with this gem of a channel, but keep your work up man. I love your content, its nice to see people with similar passions.
I’m really glad to hear that! Thanks for those kind words
This is so cool, im definitely gonna try this when I get my hands on some extra hardware. Amazing video. I can also imagine this must be pretty awesome if youre some sort of scientist/student at a university that needs some number crunching machine since youre not limited to being at your place or some pc lab.
Yes, I think it’s a fun project for everyone to try out! I learned a lot of about hardware and the different softwares
Just found out your channel. Instant follow 🙏🏼 Hope we can see more Computer Science content like this. Thank you ;)
Thank you so much for watching! Don't worry, I have many more videos like this planned! Stay tuned :)
I need help. I tried using the code and the trials are being saved somewhere, but I can't find it. can you tell me where it is getting stored at? Edit: I found it. it was stored in the C:User\(UserName)\AppData\Local\Temp folder.
If you're simply running main.py, then the checkpoints should be saved in the same directory as main.py under a folder titled 'output.' Let me know if that's what you were looking for!
@@sourishk07 what do I do if I can't find the output folder?
@@simsimhaningan Are there any errors while running main.py? My guess is you're not in the same folder as main.py when you run it. Make sure you're in the root directory of the repository when you run main.py!
Why do you move your head so much
LMAO idk man...
I remembered when my teacher gave me assignment on optimizers I have gone through blogs, papers and videos but everywhere I see different formulas I was so confused but you explained everything at one place very easily.
I'm really glad I was able to help!
love that title haha
Haha thank you!