Adjoint State Method for an ODE | Adjoint Sensitivity Analysis

Поделиться
HTML-код
  • Опубликовано: 25 авг 2024

Комментарии • 91

  • @MachineLearningSimulation
    @MachineLearningSimulation  3 года назад +4

    The derivations are inspired by this paper: www.sciencedirect.com/science/article/pii/S1053811914003097
    Links to go deeper: arxiv.org/abs/1806.07366

  • @frankfang2902
    @frankfang2902 2 года назад +3

    Thank you so much for your video. I was just reading the paper on neural ODE, and your explanation really helps.

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +3

      Nice! That's an amazing paper. It also won the best paper award at the conference it was presented at, which it truly deservers. I am going to make a video on their particular view on the adjoint method, because it looks slightly different from our approach here. In this video, we derived the adjoint problem as
      d(lambda)/dt = - del(f)/del(u)^T lambda - del(g)/del(u)^T
      lambda(t=T) = 0
      Here we have an inhomogeneous ODE with a homogeneous (=zero) terminal condition. The ODE is inhomogeneous because of the last term with "g" as there is no lambda multiplied to it, loosely speaking. We needed that form of the adjoint because we defined the loss functional as the integral over the entire time horizon (from 0 to T).
      And that is the big difference to Neural ODEs. Their loss functional only considers the value of u at the very end of the time horizon with t=T. Then (and only then), they can make the ODE homogeneous and consider the loss functional derivative as the inhomogeneous terminal condition. In symbolic notation:
      d(lambda)/dt = - del(f)/del(u)^T lambda
      lambda(t=T) = - del(J)/del(u)^T
      I hope that adds some more detail for anyone interested in the relation to Neural ODEs :)
      And of course, thanks for comment and the feedback

    • @frankfang2902
      @frankfang2902 2 года назад +3

      @@MachineLearningSimulation Thank you so much for your response! I made the deduction again following your explanation, this time without expanding loss functional J as an integration over 0 to T, but just a functional of u(T) and \theta. And in the end, I do arrive at the homogeneous ODE with the inhomogeneous terminal condition! I am really grateful for your elaboration and it really helps me to understand why the adjoint method will help the training of the neural network. Please keep your great work and I will always be your ardent supporter!

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +2

      Well done 👍 It was also an enlightening moment the first time I managed to do the derivation and then their paper suddenly started to make sense 😅
      Thanks for the nice comment :)
      I think it will be extremely helpful for someone in a similar situation.

    • @pauljeha8110
      @pauljeha8110 2 года назад +1

      @@MachineLearningSimulation Hi! Thank you for the super helpful video ! I am working on Neural ODE and I struggle to reconcile what I read in their paper and your derivation. Would you mind sharing the details about your derivation that lands on an homogeneous equation with a terminal condition ? :)
      Thank you very much !

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +1

      Hi @@pauljeha8110,
      I can give you some pointer on how you could come up with such a form. If you redo the derivations as shown in the video, with the loss functional being defined as J = int( g * delta(t-T) ) . In words: the loss functional is not just an integration over the time horizon for the composite loss function, but for it in multiplication with the delta function. This delta function has the property to evaluate the g function at T when convolved with g (see property titled "translation" under en.wikipedia.org/wiki/Dirac_delta_function#Properties ). Then, you would still have the delta function multiplied with the del g/ del u and the del g/ del theta in the step below the "plug back in" of the handwritten notes raw.githubusercontent.com/Ceyron/machine-learning-and-simulation/main/english/adjoints_sensitivities_automatic_differentiation/adjoint_ode_by_lagrangian.pdf . At this point, you could apply the property of the Dirac delta function and move the quantity del g/del theta and the lamda ^T * del g/del u * d u / d theta outside of the integral by evaluating them at the terminal point T.
      The expression at this stage would look like:
      d L/d theta = int_0^T lambda^T * del f/del theta + (lambda^T del f/del u + (d lambda/ d t)^T) d u / d theta dt + lambda(0)^T * d u/d theta (0) - lambda(T)^T * d u / d theta (T) + del g / del theta (T) + del g/ del u (T) * d u / d theta (T)
      Under a similar argument as in the video, you create our adjoint ODE by the bracketed term inside the integral. Now, the last summand outside of the integral is troublesome as it would need the solution sensitivities du/d theta at the terminal time T. You do not want to calculate it. You can get ride of the term, by "choosing lambda such that it is equal to del g/ del u at the terminal time" because then these two epressions cancel each other. For this, notice the negative sign from the boundary evaluations during the integration by parts.
      Notice also that you are getting another term for the loss sensitivity evaluation (titled step (3) in the summary).
      Hope that helped :)

  • @skewer45
    @skewer45 2 года назад +1

    Thanks for the video - your explanations were crystal-clear! I just had a couple of questions on evaluating the time integral to compute dL/dtheta:
    1. Let's say that both the forward and adjoint ODEs are solved with an adaptive stepper over the horizon [0, T]. If the forward solve computes u at the intermediate times [0, t0, t1, ... tN, T], the adjoint solve isn't guaranteed to sample lambda at the same points. To fix this, I guess we'd need to *force* the adjoint stepper to evaluate lambda at the forward solve's timesteps, {t0, t1, ..., tN} - is that correct?
    2. Building on the previous point that the two solvers may sample the solutions at different points: imagine an ODE where du/dt changes rapidly in the interval 0 < t < T/2, forcing the forward solver to take many small steps in that region. OTOH, say the system's adjoint is very oscillatory in the other interval, T/2 < t < T. Then, if we only evaluate u and lambda at the forward system's timesteps, the computed quadrature would be very inaccurate over T/2 < t < T. Would this significantly affect the computed sensitivities' accuracy, in practice?
    Thanks again!

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +1

      Hey, thanks for the feedback and the great questions 😊
      Probably a point regarding both of your questions: In general, the adjoint ODE will be as well-behaved or even better than the forward/classical ODE. This is by the fact that the adjoint problem will always be linear and therefore yields way less trouble than any nonlinear ODEs. On top of that, the adjoint ODE has a system matrix that is the Jacobian of the forward/classical problem. In a first-order stability analysis of a non-linear dynamical system, one looks at the eigenvalues/eigenvectors (=the spectrum) of the Jacobian and thereby determines certain properties of the system, some of which could potentially cause troubles in the ODE integration. And both the forward/classical and adjoint problem would then share these properties. Also check out @Schorsch Unterlugauer's comment where I provide some more details to that.
      Based on that, let's come to your questions 😉:
      1) Practically speaking, this is probably also how I would do this. Most interfaces to adaptive ODE solvers will anyway give you the option to evaluate the time stepping at certain points. You would also do that in the forward/classical solve, since usually you are not interested in all time steps your adaptive solver takes. I like to view the time steps an adaptive solver uses as something "internal to the solution" which a simple user might not be interested in. Hence, you can just use a uniform spacing in time to both evaluate the forward/classical and the adjoint pass and let the adaptive ODE solver do the magic in between for you.
      2) This is a really good question. There is quite some black magic involved in correctly and efficiently applying any kind of adjoint method. For the problems with time-causality like ODEs and PDEs it becomes even more difficult. I do not have too much experience yet, but I would agree with you that if you under-resolve certain points in the sensitivity integral, your computed gradient might not be as accurate. In some cases, it might then be helpful to use something else than a uniform spacing, but that is probably highly problem-dependent. But a different view: It might actually be not as problematic to have a gradient which is not 100% accurate. If you are not doing local sensitivity analysis, but rather use the gradient as part of an optimization process, a not too good gradient could still be sufficient. Since the gradient would somehow be used to update the parameters and this update process is iterative over many optimization steps, you could still end up at a quite good local optimum.
      That would be my take on the questions😊. Hope it was helpful. Let me know what you think.

    • @skewer45
      @skewer45 2 года назад

      @@MachineLearningSimulation Thanks so much for giving such a thoughtful and detailed response! As I dig deeper into how professional CFD codes implement this type of optimization, I'm definitely starting to feel your point, that these adjoint solvers are powered by some deep, arcane magic 😆 I'm personally trying to apply this work for problems in electrostatics and ion beam optics, but even the task of forming the correct (continuous) adjoint equations doesn't seem very straightforward (especially for more complicated cost functionals).
      And yes, it definitely seems more and more common to use some surrogate or statistical estimate of the gradient in the update loop - perhaps the most visible examples are in the RL & ML communities. Thanks again for taking the time to write these answers! I'll keep hammering away at my problem and am looking forward to more of your awesome content :)

    • @lbf_
      @lbf_ 2 года назад +1

      @@MachineLearningSimulation Don't we also need the same intermediate time steps because the Jacobian df/du in the adjoint equation is evaluated at u? For this reason in the Neural ODE paper they also solve backward in time for the original ODE for u (another benefit of this is not having to save the solution at intermediate times).

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +1

      @@lbf_ Yes, certainly this is also one way to do it. :) I don't think there is one definitive approach to it. The disadvantage of solving the original ODE backward is the additional computational cost; what you'd avoid when storing intermediate points in time. You could of course also choose some check pointing strategy to take the best of both worlds.
      This paper might also be a good pointer: arxiv.org/abs/1812.01892

    • @lbf_
      @lbf_ 2 года назад +1

      @@MachineLearningSimulation Thanks for the quick response and the paper link! very much appreciated :)

  • @shaunkaufmann6091
    @shaunkaufmann6091 28 дней назад

    thanks for the Vid! now i need to figure out what this looks like if theta is a function of t.

  • @schorschunterlugauer3763
    @schorschunterlugauer3763 2 года назад +2

    Really great explanation, by far the best i have seen!!! This is the first time i comment a video, but i simply had to thank you.

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад

      Thank you :)
      When I first learned about the topic, I was also desperately searching for good but thorough explanations. It is amazing to see, I could deliver that. Thanks again

    • @schorschunterlugauer3763
      @schorschunterlugauer3763 2 года назад

      @@MachineLearningSimulation One question just popped up to me, can you maybe explain me a little bit more why the original system and the forward systems share the same stability? In which way does the linear system with the Jacobian of f wrt to u link to the nonlinear ODE with f on the rhs? This might seem trivial to you, so sorry for asking such a simple question.

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +1

      ​@@schorschunterlugauer3763 Hey, that's a great question! (and actually not a simple one to answer ;) ) I should have probably elaborated on it in the video: I think one has to differentiate between the stability of the ODE system itself and whether an integrator applied to it will be stable, for instance given a certain dt step size.
      I was referring to the former and somehow indirectly to the latter. When analyzing ODE systems, in particular nonlinear ones as generally assumed in the video, one can test for first order stability by linearizing the system and looking at the eigenvalues/eigenvectors of the Jacobian Matrix. This Jacobian Matrix is identical to the system matrix in the adjoint ODE (given the same linearization point u). Therefore, both systems might share similar stability properties. Since the Jacobian for the adjoint system is evaluated at the points of the forward u trajectory, this should be yielding a similar set of eigenvalues/eigenvectors.
      However, it was probably a little bold to say that they "share the same stability". I do not have a mathematical proof. What one can say for sure, is that the adjoint system will be "easier to solve", that is because it is a linear ODE and you do not run into all the troubles with nonlinear dynamics.
      Let me know if that answer was deep enough ;)

    • @schorschunterlugauer3763
      @schorschunterlugauer3763 2 года назад +1

      @@MachineLearningSimulation Thank you a lot for the detailed explanation. This makes really sense to me :)

  • @gamebm
    @gamebm 2 месяца назад +1

    This video saved my day. Thank you so much!

  • @fuvapro
    @fuvapro 2 года назад +2

    That is amazing, thank you very much for making these videos!

  • @hardikbhardava9721
    @hardikbhardava9721 6 дней назад

    Thank for the great explanation. However I am wondering why didn’t you add initial value condition in the lagrange function? Only lambda^T * ode has added.

  • @daikithereal
    @daikithereal Год назад +1

    Hi!! I love the video! Just at 10:15 I think u0 is a initial constant so d(U0)/d(teta) might be equal to 0

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад +1

      Thanks for the comment and the kind words 😊
      I would say it depends on how you set it up. There could definitely be cases in which the parameters enter both the right-hand side and the initial condition, i.e., they use a "shared parameter vector". In that case, the initial condition is not a full constant. Of course, if that was not the case, then you are right, it would be a constant and that derivative vanished.

    • @daikithereal
      @daikithereal Год назад

      @@MachineLearningSimulation That's interesting that initial values aren't always constant...hmmm...and thank you for answering me!😆

  • @crapadopalese
    @crapadopalese 4 месяца назад +1

    6:00 this next step doesn't look correct to me, e.g., if i choose a specific
    g(u,\theta) = \theta^u,
    i don't think i would get a sum with these two terms. Am i missing something? The right solution of the derivation of the integrand would have been just
    \theta^u
    Your equation peoposes something else

  • @user-kj1ve8mw8l
    @user-kj1ve8mw8l Год назад +2

    Hi, nice video and explanation.
    A practical example how also be highly appreciated to consolidate the lecture.

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад +2

      Thanks for the kind comment 😊
      Indeed, a simple python implementation similar to the ones for linear systems is missing. I planned for a video, but got lost in the details. Will definitely have one at some point in the future. Until then, maybe you find this draft script of mine helpful: github.com/Ceyron/machine-learning-and-simulation/blob/main/english/adjoints_sensitivities_automatic_differentiation/adjoint_ode.py

    • @user-kj1ve8mw8l
      @user-kj1ve8mw8l Год назад +1

      @@MachineLearningSimulation A question if possible.
      In this derivation it is also assumed that the variable u is dependent on the parameters, theta. Is this correct?

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад

      @@user-kj1ve8mw8l Yes, that's correct. If you change theta, your primal solution changes :)

    • @user-uc2lk1uf2b
      @user-uc2lk1uf2b Год назад

      @@MachineLearningSimulation, I have been watching your videos. I have been trying solve a least squares problem subject to ODE system (minimal model for OGTT) that doesn't have an analytic solution . I have implemented an algorithm to compute the gradient but I have not succeed on this. Can I contact you for interchange ideas ? Thank you a lot for the explanations of your videos.

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад +1

      @@user-uc2lk1uf2b There will be a video release on Monday with a different perspective of differentiating over ODE integration for reverse sensitivities. Maybe this can be helpful to the solution of your problem 😊
      Feel free to ask your question as comment there if you can break it down. Unfortunately, I am unable to provide free personal consultation just because of my limited time. I'm sorry.

  • @srinivasd3778
    @srinivasd3778 Год назад +1

    Amazingly Explained!! Really Helpful!

  • @viniciusviena8496
    @viniciusviena8496 Год назад +1

    I really like your explanations! Thanks for sharing.

  • @alemorita92
    @alemorita92 2 месяца назад

    Many thanks for the great video, I was looking for something along these lines. I have two questions.
    The first is regarding how you just chose that the term multiplying the Jacobian vanishes. Where do you get the freedom to impose that without this hurting the optimization procedure? From my recollection on Lagrange multipliers, we usually solve for them once we consider the variation of the Lagrangian to be zero, but we don't get much freedom - in fact, whatever they end up being may even have physical meaning, e.g. being normal forces on mechanical systems which are constrained to be on a surface.
    The second question regards why using Lagrange multipliers should work in the first place. I understand that, if we wanted to find saddle points for the loss then indeed this is the path; but how do we justify using a Lagrange multiplier we found during the minimization process to compute the Jacobian of solution w.r.t to parameters in general?

  • @kidding2640
    @kidding2640 2 года назад +1

    Great video!helps a lot.Thank you so much!

  • @ccuuttww
    @ccuuttww 3 года назад +1

    Not study that deep in ODE but here's like for your effort

  • @wilhelmkirchgaessner7879
    @wilhelmkirchgaessner7879 2 года назад +1

    incredibly helpful, thank you very much

  • @hamidrezamoazzami
    @hamidrezamoazzami 2 года назад +1

    Amazing work! Thank you so much.

  • @schorre8313
    @schorre8313 Год назад +2

    Very nice video, can you maybe tell me why we can‘t just use reverse mode AD? Or is this just the same?

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад +2

      Thanks a lot 😊
      That's a great question. The overall goal of both approaches is similar, to obtain gradient estimates in a runtime that scales constantly with the number of parameters.
      There is indeed a difference between the two. The literature of optimal control usually calls them Discretize-then-Optimize (DtO) and Optimize-then-Discretize (OtD). The latter for the approach shown in the video (that is based on deriving a continuous adjoint ODE) and the former for the approach you suggested, i.e., just call reverse-mode AD through the solver's computational steps.
      The comparison is not trivial because the OtD comes in different flavors. Something that was not considered to intensely in the video was the fact that when solving the adjoint ODE, you need to have access to the primal trajectory, i.e. the u(t) from the forward solve. The classical AD approach would be to just save it from the forward one. However, one could also run the original ODE (not just the adjoint ODE) backward in time to always get that information when needed for the vjp evaluations in the adjoint ODE. That is the approach done in the Neural ODE paper. It is elegant and theoretically has an O(1) memory footprint.
      Additionally, there is the open point in how to evaluate the final integral with the lambda(t) in order to arrive at the parameter gradient. This can be either done after the full adjoint trajectory has been integrated or also "on the fly" while solving it. The latter is again what the Neural ODE paper did. Again, quite smart.
      However, in subsequent analyses it was shown that the elegant approach of the Neural ODEs has problems, especially if the ODE is chaotic.
      To answer your question: to my understanding the reverse mode AD approach should converge against an OtD approach (with remembered forward trajectory) if the temporal discretization is refined. I do not have a proof, this is just my intuition.
      Hope that helped a bit :)
      Some recommended reading: docs.sciml.ai/SciMLSensitivity/stable/
      arxiv.org/abs/1902.10298
      arxiv.org/abs/2005.13420

  • @toddmorrill5345
    @toddmorrill5345 2 месяца назад +1

    Can you say why du/d\theta is difficult to compute? I'm happy to look at references or other videos if that's easier! Thanks for the terrific content.

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 месяца назад +1

      First of all, thanks for the kind comment and nice feedback 😊
      I think the easiest argument is that this Jacobian is of shape len(u) by len(theta) for which each column had to be computed separately. This means solving a bunch of additional ODEs. If you have a lot of compute, you could do that in parallel (yet for large theta, let's say greater than 10'000 or 100'000 dimensional parameter spaces, which is reasonable for deep learning, this is infeasible and you have to resort to sequential processing). With the adjoint method, one only solves one additional ODE.

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 месяца назад +1

      There is a similar video to this which approaches it more from the autodiff perspective rather than the optimal control perspective. I think this can also be helpful: ruclips.net/video/u8NL6CwSoRg/видео.html

  • @sameerpurwar4836
    @sameerpurwar4836 2 года назад +1

    This is amazing, thanks.

  • @kehanli9999
    @kehanli9999 2 года назад +1

    Thanks a lot, very helpful

  • @Elias-ce3ze
    @Elias-ce3ze 8 месяцев назад +1

    The videos are amazing. What is the software you have been using to write?

  • @engenglish610
    @engenglish610 2 года назад +2

    🙏🙏🙏🙏 please can you present us the Gamma Mixture Models: theory and implementation.
    👍👍👍 Thank you in advance.

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад

      Hey, I will definitely put it on my To-Do list, but it will probably have to wait since there are some other topics/videos I want to do before it. :)

  • @xiaoxianrou142
    @xiaoxianrou142 Год назад +1

    Thanks a lot for the explanation! Its very clear. I have one question btw. Here the parameter theta is time invariant, right? I'm wondering if there is any research area working on time dependent parameter, theta(t) is not constant. More like a variational optimization to me.

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад

      Thanks a lot :). Happy to hear. I remembered it to be quite a relief after I uploaded the video. These derivations bugged me for a long time. I also did a follow-up video that puts the adjoint ODE techniques more into the perspective of automatic differentiation, relates them to Neural ODEs and explains the O(1) memory backprop, in case you are interested: ruclips.net/video/u8NL6CwSoRg/видео.html
      Regarding your question: Yes, that also seems to me like it would be a rather niche optimization problem. Typical real-world optimizations (like in ML, operations research, design opt, inverse problems etc.) are almost exclusively finite dimensional. You could encode some form of time-dependency if you used "the first half of the parameters for the first half of your integration horizon and the second half for the second half of the horizon". However, I am unsure if there is any application for it.
      Still, thanks again for your sincere interest :)

  • @matthiashoffmann8814
    @matthiashoffmann8814 Год назад +1

    Hey, thanks for the great video, really helped me with understanding the topic quickly! One question: How do you deal with the different time grids of the forward and backward solution? When doing the forward solution with an adaptive step-size integrator I do not expect to get the same timepoints as for the backwards solution. Thanks in advance!

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад

      Thanks a lot for the kind comment 😊
      That's an interesting question. There are many options, the easiest I think is to (linearly) interpolate the primal/forward solution when needed in the adjoint/backward pass.
      Oftentimes, even if one has an adaptive integrator, one does not save all states at the time levels the adaptive time stepping heuristic chose, but only at a predefined temporal mesh (often chosen to be uniformly distributed).
      Another strategy (which was popularized by the Neural ODE paper) is to run the primal solution reversely in time, next to the adjoint pass.

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад

      I discussed all the options in this video which is a bit more from the perspective of how to integrate ODE integration into an AD engine. 😊
      It's a long one, but should hopefully answer all the open points of this video.

    • @matthiashoffmann8814
      @matthiashoffmann8814 Год назад +1

      @@MachineLearningSimulation Thank you a lot for the answer!

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад

      You're welcome 😊

  • @hfkssadfrew
    @hfkssadfrew 2 года назад +1

    Very nice video. But can you post a video about efficient Jacobian-vec product, for example: 1. even storing df/du, a NxN matrix, can be crazy if N is a million in CFD. People usually don't do that. Can you dig a bit further?

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад

      Thanks a lot for the kind words. :)
      You are absolutely right, in realistic applications with many degrees of freedoms, like in CFD, it is infeasible to store full Jacobians. I have something planned in that regard and I also want to show some coding with Jacobian-Vector Products in Jax. Stay tuned ;)

  • @EMG-space1999
    @EMG-space1999 2 года назад +2

    Very nice! Would you also have a didactic python example? 😊

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад +3

      Thanks a lot 😊
      I had a video planned on that, but I got a bit distracted by the thesis I'm currently writing. After the Thesis is done I will provide more coding examples on the Adjoint method (probably starting from April on).
      However, I uploaded my python script that I used in preparation for the video, you can find it here github.com/Ceyron/machine-learning-and-simulation/blob/main/english/adjoints_sensitivities_automatic_differentiation/adjoint_ode.py
      It is not as well documented as some other examples on the channel, but I think you can still find it helpful. 😊
      I will replace it once I have the proper video up.

    • @EMG-space1999
      @EMG-space1999 2 года назад +1

      @@MachineLearningSimulation Many thanks. I'll have a look at the code.

  • @t.w.7065
    @t.w.7065 Год назад +2

    Is g a given funtion?

    • @MachineLearningSimulation
      @MachineLearningSimulation  Год назад

      Yes, it's a given function. For instance, it could be a quadratic loss :)

    • @t.w.7065
      @t.w.7065 Год назад

      @@MachineLearningSimulation I see. Thank you! I guess I am a bit confused that when evaluating d_J/d_theta, do we typically perform this analysis for all the t in time domain? I feel that this could be quite different from one time point to another time point. Or do we just care about the sensitivity at t = T?

  • @huzaifaunjhawala6942
    @huzaifaunjhawala6942 Год назад

    This is a great video, thank you so much!
    I however would like to understand the difference between the Continuous Adjoint Sensitivity Method and the Discrete Adjoint Sensitivity method. I imagine that this boils down to the constraints of the optimization problem where for the discrete method we would require a constraint for each and every time step. Are there any other differences? Are you planning to make a video on that and do you have any resources from where I could read and understand?

    • @MachineLearningSimulation
      @MachineLearningSimulation  7 месяцев назад

      Hi,
      thanks for the kind words and the interesting question :). Sorry for the late reply.
      I wanted to make a video about the difference between continuous and discrete adjoint method but it opens up a lot questions, and I wasn't happy with my approach so far.
      The continous adjoint method is elegant and never requires us to "open the black box" of how the ODE is integrated which is necessary for the discrete method (typically done by just calling the autodiff engine on the solution process). The discrete method, on the other hand, computes the correct gradient to the discrete optimization problem.
      There are many things that can cause differences between the two. I can recommend checking out the manuals by the two currently most popular ODE integration libraries in high-level languages : docs.sciml.ai/SciMLSensitivity/stable/manual/differential_equation_sensitivities/ and docs.kidger.site/diffrax/api/adjoints/ . They also have references.
      Once you look into literature, you easily find yourself in a rabbit hole. Especially, the field of Computational Fluid Dynamics has spent decades discussing pros and cons of both continous and discrete adjoints to the PDEs they are solving (for PDEs it becomes even more challenging because the continuous adjoint requires the correct adjoint boundary conditions).

  • @rollingmean6835
    @rollingmean6835 2 года назад +1

    why you are not uploading any new videos?

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад

      Hey, I am on vacation at the moment. Sadly, I wasn't able to produce any videos before that. I expect to release new videos beginning of October. :)
      Additionally, I am also preparing an introductory course in Scientific Python that I am going to hold at TU Braunschweig in October. It is planned to record it and then upload it to RUclips later on. If that works out, you can find it next to the introductory videos that are already up: ruclips.net/video/fJtErsjgk2w/видео.html

    • @rollingmean6835
      @rollingmean6835 2 года назад +1

      @@MachineLearningSimulation October 1st week is already over, be a good boy and please release new videos asap. :)

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад

      Haha 😂
      No worries, a new video is already partially recorded. Will be up in the next days. The weeks thereafter it should then be more frequently.

  • @radwamuhammad8106
    @radwamuhammad8106 2 года назад +1

    Could you help me applying this in Matlab? Or do you have any material that may help me ?

    • @MachineLearningSimulation
      @MachineLearningSimulation  2 года назад

      Hey,
      thanks a lot for commenting :)
      Unfortunately, I do not use MATLAB, so I cannot help you much with an implementation in the language. However, I have a video planned where I implement the adjoint state method in Python. It's unclear, when I will be able to do this video, probably not before end of April this year.
      If you know MATLAB, it should be simple to catch up on Julia. The great package DifferentialEquations.jl offers many tutorials on the adjoint method together with Automatic Differentiation, e.g. here: diffeq.sciml.ai/stable/analysis/sensitivity/
      But except for that, I can understand that resources on these methods are rather rare on the internet. Sorry for that.

    • @radwamuhammad8106
      @radwamuhammad8106 2 года назад

      @@MachineLearningSimulation thanks a lot for your interest 😄😄