Thank you so much for the video. I am a chemical engineer, who just started learning about Bayesian Optimization as a potential strategy to optimize the reactive system I am currently working on. You nicely summed the basics. I also appreciate the visual representation of the kappa effect on the acquisition function and the selection of next sampling point. Waiting for more such informatve videos.
Thank you! I am a synthetic chemist and I am trying to learn about bayesian optimisation for predicting optimal reaction conditions. I would love to learn more about acquisition functions and how to transform variables like temperature, solvents, reactants into a mathematical model.
excellent explanation with visual intuition. One thing that was not clear to me is what differentiates minimization and maximization problems. For example, let's say my f_objective returns the metric R2 (maximization), how do I configure the search for this? and if I change the metric to mean squared error (MSE, minimization), what changes in the optimization???
Hi. It is not very clear for me. So we are starting with a subset of the original dataset and we keep adding new points to better model the function. This is done using a method that is something similar to Gradient Descent that says which points from the original dataset should be added to continue evaluating the function. And kappa is similar to the learning rate in GD. Does this summarize it?
With gradient descent, we use gradient information. Nowhere are we using gradient information here. Instead, we are modelling the unknown blackbox function as a Gaussian process. In other words, give me an x and I will give you back a mean and a standard deviation for the output point y. That is why, where the points are actually sampled, the standard deviations are zero. Now, Kappa is indeed a hyper-parameter similar to learning rate. But here, we're using it to decide which point to sample next in order to find the global minima. Now, if Kappa is low, we are, in effect assuming that we have high confidence in our modeled function. So, we sample nearby points itself to the lowest point found in our original samples. If our Kappa is high, we are assuming that we don't have full confidence in our modeled function. Therefore, we stumble around with points all over the input domain.
Sorry for asking such a naive question (as a total beginner)... Why isn't the pure standard deviation (which directly indicates the uncertainty of the prediction throughout the search space) used as acquisition function?
Really very good video! You boil it down to the necessary and that is very well explained. Just a quick question. When you talk about the function you are minimizing you are basically encapsulating the neural network model and weights into a black box and the only input to that function are the hyper parameters and the only output to that function is the result of a loss function, correct? In your opinion, would Bayesian optimization scale to a large number of hyper-parameters?
I find it so strange that a GP for regression is often used to merely optimize hyperparameters for a NN. In the model I have designed, the whole NN is a GP for regression, although in an unconventional format.
Thank you so much, it was strongly useful. I need some more detail knowledge about gaussian process. Actually I want to learn about way of creating original function with concepts of gaussian process. If it possible please explain about it in another video.
Well explained, thank you. Just in case it doesn't show up in the suggestions, paretos follow-up to this video for hands-on BayesOpt tutorial is here. paretos - Coding Bayesian Optimization (Bayes Opt) with BOTORCH - Python example for hyperparameter tuning ruclips.net/video/BQ4kVn-Rt84/видео.html
Thank you so much for the video. I am a chemical engineer, who just started learning about Bayesian Optimization as a potential strategy to optimize the reactive system I am currently working on. You nicely summed the basics. I also appreciate the visual representation of the kappa effect on the acquisition function and the selection of next sampling point. Waiting for more such informatve videos.
that slack notification sound at 4:30 got me checking my slack 👀
🤣🤣🤣🤣🤣🤣
Thank you! I am a synthetic chemist and I am trying to learn about bayesian optimisation for predicting optimal reaction conditions. I would love to learn more about acquisition functions and how to transform variables like temperature, solvents, reactants into a mathematical model.
Is this about Knots and Splines?
excellent explanation with visual intuition. One thing that was not clear to me is what differentiates minimization and maximization problems. For example, let's say my f_objective returns the metric R2 (maximization), how do I configure the search for this? and if I change the metric to mean squared error (MSE, minimization), what changes in the optimization???
Very nice! I really would like to see a video explaining the Tree Parzen Estimator
Hi. It is not very clear for me. So we are starting with a subset of the original dataset and we keep adding new points to better model the function. This is done using a method that is something similar to Gradient Descent that says which points from the original dataset should be added to continue evaluating the function. And kappa is similar to the learning rate in GD. Does this summarize it?
With gradient descent, we use gradient information. Nowhere are we using gradient information here. Instead, we are modelling the unknown blackbox function as a Gaussian process. In other words, give me an x and I will give you back a mean and a standard deviation for the output point y. That is why, where the points are actually sampled, the standard deviations are zero. Now, Kappa is indeed a hyper-parameter similar to learning rate. But here, we're using it to decide which point to sample next in order to find the global minima. Now, if Kappa is low, we are, in effect assuming that we have high confidence in our modeled function. So, we sample nearby points itself to the lowest point found in our original samples. If our Kappa is high, we are assuming that we don't have full confidence in our modeled function. Therefore, we stumble around with points all over the input domain.
can we use bayesian optimization to find a parameter that minimises the function? pls make a video for that
This video is very compact and intuitive, so it is very helpful for me to understand what Bayes Opt is. Thank you for the good explanation. :D
Sorry for asking such a naive question (as a total beginner)...
Why isn't the pure standard deviation (which directly indicates the uncertainty of the prediction throughout the search space) used as acquisition function?
Really very good video! You boil it down to the necessary and that is very well explained. Just a quick question. When you talk about the function you are minimizing you are basically encapsulating the neural network model and weights into a black box and the only input to that function are the hyper parameters and the only output to that function is the result of a loss function, correct? In your opinion, would Bayesian optimization scale to a large number of hyper-parameters?
Thank you :)
Good vid mate, I'd like to watch a video of the different kinds of GP's and when to choose what kind!
@@paretos-com is this out now?
Thank you for the easy explanation! great content 🔥
I find it so strange that a GP for regression is often used to merely optimize hyperparameters for a NN. In the model I have designed, the whole NN is a GP for regression, although in an unconventional format.
Awesome work! Has the video about hyperparameter tuning been uploaded?
Thanks. Other videos?
I'd like to see a vid on how to use this optimization method for hyperparameter tuning in a NN
This is great! Very straight to the point and easy to understand. Thank you!
Is there a more basic video? Don’t really understand Gaussian processes.
Really helped me to dig my way into the topic 🤞🏼
Thank you for such a nice video! very clearly explanation and demo
Thank you for this video, very clear, i needed it to optimize some expensive function!
Optimization King 🔥🔥💯💯
Thank you so much, it was strongly useful. I need some more detail knowledge about gaussian process. Actually I want to learn about way of creating original function with concepts of gaussian process. If it possible please explain about it in another video.
World-class content in the making
this channel is incredible, thanks
good explanation, thanks.
Nicely explained, subscribed 👍
Thank you for this great video !
thank you
Very clear, thank you
Black Box Problem Solver💯💯💯💯🤝🤝
That's a pretty good explanation for complete beginners. Very helpful, thanks mate.
Well explained!!!!
Thanks a lot!!
Amazing!!!
Disnt get it :(
why? :( What did you miss
Tutorial en castellano de optimizacion bayesiana, por si a alguien le interesa: ruclips.net/video/nNRGOfneMdA/видео.html
What a lousy video. It does NOT tell you how to optimize hyperparamters. Instead, it covers gaussian regression.
Well explained, thank you. Just in case it doesn't show up in the suggestions, paretos follow-up to this video for hands-on BayesOpt tutorial is here.
paretos - Coding Bayesian Optimization (Bayes Opt) with BOTORCH - Python example for hyperparameter tuning
ruclips.net/video/BQ4kVn-Rt84/видео.html
🔥
You are literally reading a script on the video, bro