Fourier Neural Operator for Parametric Partial Differential Equations (Paper Explained)

Поделиться
HTML-код
  • Опубликовано: 25 авг 2024

Комментарии • 136

  • @DavenH
    @DavenH 3 года назад +77

    The intro is cracking me up, had to like.

  • @AE-cc1yl
    @AE-cc1yl 3 года назад +48

    Navier-Stonks equations
    📈

  • @RalphDratman
    @RalphDratman Год назад +2

    "Linearized ways of describing how a system evolves over one timestep" is BRILLIANT!
    I never heard PDEs described in such a beautiful, comprehensible way,
    Thank you Yannic Kilcher.

  • @dominicisthe1
    @dominicisthe1 3 года назад +13

    Cool to see a paper like this pop up on my youtube. I did my MSc thesis on the first reference solving ill-posed inverse problems using iterative
    deep neural networks.

  • @taylanyurtsever
    @taylanyurtsever 3 года назад +27

    Vorticity is the cross product of nabla operator and the vector field of velocity, which can be thought of as the rotational flow in that region (blue clockwise and red ccw).

    • @judgeomega
      @judgeomega 3 года назад +6

      or more simply: twisting

    • @CharlesVanNoland
      @CharlesVanNoland 2 года назад +2

      AKA "curl" en.wikipedia.org/wiki/Curl_(mathematics)

  • @errorlooo8124
    @errorlooo8124 3 года назад +19

    So basically what they did is kind of like taking a regular neural network layer added jpeg compression before it, and jpeg decompression after it, then built a network and trained it on navier stokes images to predict the next images. The reason i say jpeg is because the heart of jpeg is transforming an image into the frequency domain using a fourier-like function, the extra processing jpeg does is mostly non-destructive(duh you want your compressed version to be as close to the original), plus a neural network would probably not be impeded by the extra processing, and their method throws away some of the modes of the fourier transform too.

    • @errorlooo8124
      @errorlooo8124 3 года назад +3

      @Pedro Abreu Yeah DCT is derived from the DFT which is basically the Fourier Transform but can work on actual data instead of needing a continuous function. (DCT is just the real component of DFT, with a bit of offsetting(it uses n+1/2) and less rotation(it uses pi instead of 2pi))

  • @channuchola1153
    @channuchola1153 3 года назад +3

    Wow.. simply awesome. Fourier and PDE good to see togather

  • @shansiddiqui8673
    @shansiddiqui8673 3 года назад +4

    Fourier Neural Operators aren't limited to periodic boundary conditions the linear transform W works as a bias term which keeps track of non-periodic BCs.

  • @user-fl8ql8fe8w
    @user-fl8ql8fe8w Год назад +1

    This is an excellently clear description. Thanks for the help.

  • @PatatjesDora
    @PatatjesDora 3 года назад +3

    Going over the code is really nice!

  • @pradyumnareddy5415
    @pradyumnareddy5415 3 года назад +4

    I like it when Yannic throws shade.

  • @kazz811
    @kazz811 3 года назад +5

    Cool video as usual. Quick comment, vorticity is simply the curl of the velocity field and doesn't have much to do with "stickiness". Speaking of which, viscosity (measures forces within the fluid molecules) is not actually related to "stickiness", a property that is measured by surface tension (how the fluid interacts with an external solid surface). You can have highly viscous fluids which don't stick at all.

  • @soudaminipanda
    @soudaminipanda 9 месяцев назад +1

    Fabulous explanation. Crystal clear

  • @DavenH
    @DavenH 3 года назад +10

    I hope this is going to lead to much more thorough climate simulations. Typically these require vast amounts of supercomputer time and are run just once a year or so. But it sounds like just a small amount of cloud compute would run them on this model.
    Managing memory would then be the challenge, however, because I don't know how you could afford to discretize into isolated cells the fluid dynamics of the atmosphere, where each part affects and flows into other parts. It's almost like you need to do it all at once.

    • @PaulanerStudios
      @PaulanerStudios 3 года назад +2

      Well from what I have seen climate simulations are at the moment also discretized into grids for memory management... at least the ones where I have looked at the code... I guess its more of a challenge to enforce boundary conditions in this model such that neighbouring cells don’t diverge at their shared boundaries... I guess traditional methods for dealing with this would suffice tho... you’d still have to then blend the boundaries occasionally, so the timesteps can’t be arbitrarily large

    • @DavenH
      @DavenH 3 года назад +1

      @@PaulanerStudios Hmm. Maybe take a page from CNNs and calculate 3x3 grid cells, so you get a centre cell with boundaries intact, then stride 1 cell and do another 3x3 calculation; hopefully the interaction falloff is steep enough to then stitch the centre-cells together without discontinuities. Or maybe you need to do 5x5 cells throwing away all but the centres.
      Another thing, I thought the intra-cell calculations were hand-made heuristics with these climate simulations, not actually Navier-Stokes. Could be wrong, but if no even eliminating those heuristics and putting in "real" simulations is a good improvement.

    • @PaulanerStudios
      @PaulanerStudios 3 года назад +2

      @Mustache Merlin The thing with every compute job is the von Neumann Bottleneck... running massively parallel compute jobs on CPU or GPU, the limiting factor is always memory bandwith... since neural networks are in the most basic sense matrix multiplications interspersed with nonlinearities, VRAM is the limiting factor for how large a given multiplication/network and thus network input can be... there is really no sense in streaming anything from a drive no matter how fast, because the performance will tank by orders of magnitude for backprop and such, if the network (and computation graph) can‘t be held in graphics memory at once... If u‘re arguing the case for regular simulations, well, supercomputers already have terabytes or petabytes of ram... the issue is swapping the data used for computation in and out of cache and subsequently registers... optane drives will not solve the issue of the memory bottleneck there either... the only thing they can solve is maybe memory price, which really is not a limiting factor in HPC (most of the time)

  • @mansisethi8127
    @mansisethi8127 Месяц назад

    Thank you for the paper presentation!!

  • @herp_derpingson
    @herp_derpingson 3 года назад +5

    36:30 I like the idea of throwing away high FFT modes as regularization. I wish more papers did that.
    37:35 IDK if throwing out the little jiggles is a good idea because the Navier Stokes is a chaotic system and those little jiggles were possibly contributing chaotically. However perhaps the residual connection corrects that.
    46:10 XD
    I wish the authors ablated the point to point convolution and showed how much does that help, same for throwing away modes.
    Also I wish the authors showed an error accumulation over time graph.
    I really liked the code walkthrough. Do it for other papers too if possible.

  • @clima3993
    @clima3993 2 года назад

    Yannic always give me an illusion that I understand things that I actually don't. Anyway, good starting point and thank you so much!

  • @idiosinkrazijske.rutine
    @idiosinkrazijske.rutine 3 года назад +5

    Looks similar to what is done is so called "spectral methods" for simulation of fluids. I'm sure this is where they draw their inspiration from.

  • @dawidlaszuk
    @dawidlaszuk 3 года назад +6

    Coming from signal processing and getting head into the Deep™ world, I'm happy to see Fourier showing up. Great paper and good start but I agree with the overhype. For example, throwing away modes is the same as masking with square function, which in the signal space is like convolving with a sinc function. That's a highly "ripply" func. Nav-Stks is general is chaotic and small perturbations will change output significantly over time. I'm guessing that they don't see/show these effects because of their data composition. But that is a good start and maybe an idea for others. For example replace Fourier kernel with Laplace and use proper filtering techniques.

    • @DavenH
      @DavenH 3 года назад +1

      Hey Dawid, you produce any YT content? I'm also from DSP and doing Deep learning, curious what you're working on.

  • @antman7673
    @antman7673 3 года назад +1

    Vorticity is derived from vortex.
    The triangle pointing down is the nabla Operator. It was pointing to the lowest value.

  • @markh.876
    @markh.876 3 года назад +4

    This is going to be lit when it comes to Quantum Chemistry

  • @billykotsos4642
    @billykotsos4642 3 года назад +5

    Damn the opening title blew my mind

  • @Mordenor
    @Mordenor 3 года назад +37

    Normal broader impact: This may have negative applications on society and military applications
    This paper: I AM THE MILITARY

  • @kristiantorres1080
    @kristiantorres1080 3 года назад

    Thank you! I was just reading this paper and somewhere around page 5, I started to fall asleep. Your video will help me to understand this paper better.

  • @simoncorbeil4081
    @simoncorbeil4081 7 месяцев назад +1

    Great video, however I would like to correct a few facts. If Navier-Stokes equations needs the development of new and efficient methods like neural networks it`s essentially because they are strongly Nonlinear especially for high Reynold number (low viscosity, like with air, water; typical fluids we daily meet ) where Turbulence is triggered. Also, I want to rectified, the Navier-Stokes systems shown in the paper is in incompressible regime, and the second equation is the divergence of of velocity, which is the mass conservation equation, nothing related to vorticity (it`s more the opposite, vorticity would be the cross product of the nabla operator with the velocity field).

  • @diegoandrade3912
    @diegoandrade3912 Год назад

    Fabulous thank you for the explanation and time to create this video, keep it coming.

  • @raunaquepatra3966
    @raunaquepatra3966 3 года назад +2

    I wish the authors showed the effects for throwing away modes in some nice graphs😔.
    Also show the divergence for this method from ground truth (using simulator) when used in a RNN fashion(ie feeding the final output of this method back to itself to generate time steps possibly to infinity and show at what point it starts diverging significantly)

  • @JurekOK
    @JurekOK 3 года назад +16

    So . . . they have taken an expensive function (which is itself, already an approximation of an even more expensive function), and trained up an approximated function.
    Then, there is no comparison of predictions with any experiment (least a rigorous one), only with that original "reference" approximated function.
    Is this a big deal? I have been doing that during the 2nd year of my undergrad in mechanical engineering, 18 years ago. Come on.
    How about the long-term stability of their predictor? How does it deal with singularities at corners? moving or deforming objects? deconveregence rate? is the damping spectrally correct? My point is that this demo is really unimpressive to a person that actually uses fluid dynamics for product design. It might be visually impressive for the entertainment industry.
    Hyped titles galore.

  • @lucidraisin
    @lucidraisin 3 года назад +3

    Woohoo! New video!

  • @esti445
    @esti445 4 месяца назад

    8:30 It is the laplacian operator - the second derivative with respect to space..

  • @tedonk03
    @tedonk03 2 года назад +2

    Thank you for the awesome explanation, really clear and helpful. Can you do one for PINN (Physics Informed Neural Network)?

  • @reinerwilhelms-tricarico344
    @reinerwilhelms-tricarico344 3 года назад +1

    I found this article quite abstract (which may explain why it's interesting ;-). I could sort of get it after first reading an article by the same authors where they explain neural operators for PDEs in general (Neural Operator: Graph Kernel Network for Partial Differential Equations, 2020). There they show that the kernel they learn is similar to learning the Green's function for the PDE.

    • @kristiantorres1080
      @kristiantorres1080 3 года назад

      It is abstract and there are some things that I don't understand. Is this the paper you are referring to? arxiv.org/abs/2003.03485

    • @reinerwilhelms-tricarico344
      @reinerwilhelms-tricarico344 3 года назад

      @@kristiantorres1080 Yes. I read that paper and it somehow helped me understanding the paper presented here.

  • @lestroarmonico
    @lestroarmonico 3 года назад +2

    6:26 vorticity is derivation of viscosity? No it is not. Viscosity is the fluid's property, vorticity is ∇×V (curl of the velocity). Edit: And at 8:18, that is not vorticity equation, that is the continuity equation which is about conservation of mass. Very helpful video as I currently study on this very paper myself, but there are a few mistakes you've made that needs correction :)

  • @airealguy
    @airealguy 3 года назад +6

    So I think this approach has some flaws and has been hyped too much. The crux of the problem is the use of FFT's which impose some severe constraints on CFD problems. First, consider complex geometries (ie those that are not rectangular). How does one take an FFT on something that is not rectangular? You can map the geometry using a spatial transform to a rectangular coordinate system, but then the learned parameters are specific to that transform and thus that geometry. Secondly, there are no good ways to do FFT's efficiently at large scales (ie scales above the memory space of one processor). Even the best algorithms such as heFFTe which can achieve 90% of the theoretical max performance are quite poor in comparison to the algorithmic performance of standard PDE solvers. heFFTe only achieves an algorithmic performand of 0.05% of peak on summit. So while this is fast on small scale problems, it will likely suffer major performance problems at large scales and will be difficult if not impossible to apply to complex non rectangular geometries. The neural operator concept is probably a good one, but the basis function makes this difficult to apply to general purpose problems. We need a basis function which is expanded in perception but not global like an FFT. Even chopping the FFT off can have issues. If you want to compute a N

    • @crypticparadigm2180
      @crypticparadigm2180 3 года назад

      Great points... On the topic of memory consumption and allocation of neural networks-- what are your thoughts about Neural Ordinary Differential Equations?

  • @CoughSyrup
    @CoughSyrup Год назад

    This is really huge. I see no reason this couldn't be extended to solve magnetohydrodynamic behavior of plasma. And made to work for the 3D equations. This currently requires supercomputers to model. Imagine making it run on a desktop PC.
    This means modeling of plasma instabilities inside fusion reactors.
    Maybe with fast or real-time modeling, humanity can finally figure out an arrangement of magnets in 3D for plasma that is stable and robust to excursions.

  • @Andresc93
    @Andresc93 Год назад

    Thank you, you just save a bunch of time

  • @sujithkumar824
    @sujithkumar824 3 года назад +11

    Download this video to save it personally because it can be taken down because of pressure by the author, for stupid reasons.

    • @herp_derpingson
      @herp_derpingson 3 года назад +2

      Why?

    • @judgeomega
      @judgeomega 3 года назад +2

      @@herp_derpingson i think the author can neither confirm nor deny any reasoning for a take down

    • @sujithkumar824
      @sujithkumar824 3 года назад +1

      @@judgeomega yes, I'm glad Yannic didn't even respond publically to her, this is exactly the treatment every attention seeker should get.

    • @matthewtang1489
      @matthewtang1489 3 года назад +1

      what?? paper author or article author? there is a fiasco about this?

    • @amarilloatacama4997
      @amarilloatacama4997 3 года назад +1

      ??

  • @andyfeng6
    @andyfeng6 3 года назад +2

    The triangle means Laplace operator

  • @DamianReloaded
    @DamianReloaded 3 года назад +1

    47:00 If they wanted to predict longer sequences they could use the solver for the first tensor they input and just feed in the last 11 steps of the latest prediction back in right? I wonder after how many steps it would begin to diverge if they used the maximum possible resolution of the data.

    • @YannicKilcher
      @YannicKilcher  3 года назад +1

      True, but as you say, the problems would pile up

  • @surbhikhetrapal1975
    @surbhikhetrapal1975 22 дня назад

    Hi, found this review of the paper very helpful. I could not locate the code at the link shared in video description. Does anyone know under what name in the github neuraloperator repository is this code present?

  • @yusunliu4858
    @yusunliu4858 3 года назад +4

    The process Fourier Transformation -> Multiplication -> Inverse Fourier Transformation seems like a low pass filter. If that is so, why not doing a low pass filter at the input A'. Maybe I didn't get the idea correctly.

    • @YannicKilcher
      @YannicKilcher  3 года назад +1

      I think one of the steps is actually explicitly a low pass filter, so you're right

    • @weishkysiliy4420
      @weishkysiliy4420 2 года назад

      @@YannicKilcher Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

    • @YannicKilcher
      @YannicKilcher  2 года назад +2

      @@weishkysiliy4420 the architecture is somewhat agnostic to the resolution, unlike traditional image classifier models

    • @weishkysiliy4420
      @weishkysiliy4420 2 года назад

      ​@@YannicKilcher
      After training on small size (64*64) and loading the model directly, change the input dimensions to 256*256? Can I understand it this way?

    • @weishkysiliy4420
      @weishkysiliy4420 2 года назад

      @@YannicKilcher I really like your song. Nice prelude

  • @mohsensadr2719
    @mohsensadr2719 2 года назад

    Very nice work of explaining the paper. I was wondering if you have any comments about:
    - Fourier works well if you have equidistance grid points. I think if the initial data points are random in space (or unstructured grid), one has to include more and more terms in the Fourier expansion given the irregularity of the mesh.
    - FNO has to be coupled with an exact solver since one has to give the solution of the first several time steps as input.
    - I think it is not possible to train FNO on a small solution domain and then use it for larger ones. Any comments on that?

    • @weishkysiliy4420
      @weishkysiliy4420 2 года назад

      Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

  • @konghong3885
    @konghong3885 3 года назад +3

    jokes aside, as a Physics student, I wonder:
    is it possible to apply periodic boundary condition on the FNO?
    how to actually estimate the error of the solver, for MCMC, the error can be estimated with probability, but not for the ML case

    • @artyinticus7149
      @artyinticus7149 3 года назад +1

      Highly unlikely

    • @dominicisthe1
      @dominicisthe1 3 года назад

      I think it is the non periodic boundary conditions u are worried it about.

  • @boffo25
    @boffo25 3 года назад +1

    Nice explanation

  • @antman7673
    @antman7673 3 года назад

    So this is kind of like an approximation of the development of the fluid with pixels instead of the infinite resolution “vector graphic” provided by the equation.

  • @digambarkilledar003
    @digambarkilledar003 5 месяцев назад

    what is number of input channels and output channels ?

  • @beginning_parenting
    @beginning_parenting 3 года назад

    On the line 87 of the code in FNO3D , it is mentioned that input is a 5d tensor (batch, x,y,t, in_channels).. What does in channels represent? Does that mean that each point in (x,y,t) is a vector containg 13 channels?

  • @MaheshKumar-iw4mv
    @MaheshKumar-iw4mv Год назад

    Can FNO be used to train data from Reaction-Diffusion dynamics with no-flux boundary conditions?

  • @davenovo69
    @davenovo69 3 года назад +1

    Great channel!
    What App do you use to annotate PDFs?

  • @meshoverflow2150
    @meshoverflow2150 3 года назад +1

    Would there be any advantage to doing convolution in frequency space with a conventional cnn for say image classification? On the surface it seems like it could be faster (given that an fft is very fast) than regular convolution, but I assume there’s a good reason why it isn’t a common practice.

    • @nx6803
      @nx6803 3 года назад +1

      Octave convolutions are sorta based on the same intuition, yet don’t actually use fft.

    • @andrewcutler4599
      @andrewcutler4599 3 года назад +1

      Convolution preserves spatial relationships which makes it useful for images. Neighboring pixels are often related to one another. A CNN in FFT world would operate on frequency. Not clear that there is a window where only near frequencies should be added together to form feature maps.

    • @meshoverflow2150
      @meshoverflow2150 3 года назад +4

      @@andrewcutler4599 The cnn wouldn’t operate on frequencies though. Multiplication in frequency space IS convolution, so a feed forward network in frequency space should do the exact same thing as a conventional cnn. I feel like the feed forward should be smaller than the equivalent cnn, hence the question.

    • @DavenH
      @DavenH 3 года назад +1

      @@meshoverflow2150 Interesting observation.

  • @sui-chan.wa.kyou.mo.chiisai
    @sui-chan.wa.kyou.mo.chiisai 3 года назад +10

    8:30 Triangle for Laplace operator ?

  • @JM-ty6uq
    @JM-ty6uq 3 года назад

    24:40 I suppose its worth mentioning that you can make a cake with 0.5 eggs or 2 eggs

  • @southfox2012
    @southfox2012 3 года назад +1

    great

  • @weishkysiliy4420
    @weishkysiliy4420 2 года назад

    Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

  • @sohrabsamimi4353
    @sohrabsamimi4353 3 года назад +1

    Thank you so much for this video! can you explain how we learn the matrix R at 32:36 ?

    • @pedromoya9127
      @pedromoya9127 2 года назад

      tipically by backpropagation update of its weights according the loss,

    • @weishkysiliy4420
      @weishkysiliy4420 2 года назад

      @@pedromoya9127 Training on a lower resolution directly evaluated on a higher resolution. I don't understand how he can do It?

  • @sinitarium
    @sinitarium 6 месяцев назад

    Amazing! This must be how Nvidia DLSS works!?

  • @konghong3885
    @konghong3885 3 года назад +1

    behold, the new title formate for ML community

  • @perlindholm4129
    @perlindholm4129 3 года назад

    Idea - Scale down the ground truth video. Then train a model on a small matrix 4x4 part of the frame and learn the expansion 16x16 submatrix of the original frame. This way you can train 2 models each on the different aspects of the calculation. One scaled down time learning and one scale up learning.

  • @cedricvillani8502
    @cedricvillani8502 2 года назад

    Should update your video

  • @Beingtanaka
    @Beingtanaka 11 месяцев назад

    Here for MC Hammer

  • @jean-pierrecoffe6666
    @jean-pierrecoffe6666 3 года назад +1

    Hahahahaha, excellent intro

  • @acharyavivek51
    @acharyavivek51 2 года назад

    very scary how ai is progressing.

  • @Neomadra
    @Neomadra 3 года назад +1

    I don't quite get why you said (If I understood you correctly) that the prediction cannot be made arbitrarily far into the future. Couldn't you just use the output of the forward propagation as new input for the next round of forward propagtion. So you apply a chain of forward propagations until you reach the time you want. If memory is a problem, then you can simply clear the memory of the previous outputs.

    • @seamusoblainn4603
      @seamusoblainn4603 3 года назад +1

      Perhaps as the network is making predictions as opposed to the ground truth sim which is using physics. In the latter there only is what it's rules generate, while in the former you are using 'feedforwarding' which must by necessity diverge, and on a fine degree of granularity probably is from the beginning.

    • @YannicKilcher
      @YannicKilcher  3 года назад

      it's true, but you regress to the problem you have when running classic simulations

  • @RalphDratman
    @RalphDratman Год назад

    All those little bumps could be creating the digital environment in which the upper layers of GPTx are doing their magic.

  • @kesav1985
    @kesav1985 3 года назад +3

    So much fuss about curve-fitting!
    Curve-fitting is not a numerical scheme for solving PDEs. :-)

  • @artyinticus7149
    @artyinticus7149 3 года назад +1

    Imagine using the intro to politicize the paper.

    • @artyinticus7149
      @artyinticus7149 3 года назад +1

      @adam smith Imagine using the military for non-political purposes.