0:02:35 Kolmogorov-Arnold Representation Theorem KART ~The only true multivariate function is the sum. 0:03:45 Details of (two layers) KART: 1d edge functions and node sums 0:05:05 KAN Kolmogorov-Arnold Network (orig 2-layer) 0:05:55 Multi-layer KAN 0:07:55 MLP and KAN comparison 0:09:45 B-splines basics 0:14:30 B-spline Cox-de Boor recursion formula (inefficient) 0:14:45 Implementation tricks: residual activations, initialization, grid update 0:38:05 Q: Expressivity vs generalization, bias-variance tradeoff, U-shape loss as fn of p (number of features) 0:39:15 Q: What if activation is out of range of the finite spline domain? -> Use the residual activation fn! 0:40:40 KANs to solve physics problems from raw data or already partially processed data? 0:43:15 KANs to solve PDEs? 0:44:35 Grid resolution finetuning is done manually 0:47:20 Can you replicate KANs by MLPs with the right breadth and depth? Yes. Would be nice to see a unified theory. 0:51:18 What's the novelty of KANs? At the technical level what makes a KAN a KAN? 0:58:16 Inductive bias: KAN's or DNN's inductive biases better fit a task: vision, language, science 0:59:25 History of connection vs symbolism 1957 - Frank Rosenblatt, Invention of perceptron 1969 - Marvin Minsky & Seymour Papert, Perceptrons: An introduction to computation geometry: "Perceptrons cannot do XOR" 1974 - Paul Werbos, "A multi-layer perceptron can do XOR" 1975 - Robert Hecht-Nielsen, Kolmogorov networks (2 layer, width 2n+1) 1988 - George Cybenko, "2 Layer Kolmogorov networks can do XOR" 1989 - Tomaso Poggio, "KA is irrelevant for neural networks 2012 - year of modern deep learning Expert systems/symbolic regression vs KANs vs MLP/Kolmogorov networks 1:04:20 KAN vs MLP phylosophy: High internal degrees of freedom, reductionism, parts are important vs Low internal degrees of freedom, holism, interaction of parts is important 1:08:45 Intricacies of developing something new: KANs beyond 2 layers 1:11:45 Github repos
Wow, this is blowing up. Most of the journal club videos get hundreds of views. This already has thousands! I look forward to watching the talk and reading the paper.
Gemini 1.5 Pro: This video is about Kolmogorov-Arnold Networks (KANs) presented by Ziming Liu, a Phd student at MIT. KANs are a new type of neural network architecture inspired by the Kolmogorov-Arnold representation theorem. This theorem states that any continuous function can be represented as a finite sum of compositions of single-variable functions. The video talks about the following aspects of KANs: * Motivation: Why KANs were developed and what problems they address (0:00-2:22) * Mathematical foundations: Explanation of the Kolmogorov-Arnold representation theorem (2:22-7:44) * Visualization of KANs: How KANs are visualized as networks (7:44-12:12) * Training KANs: How to train a KAN to approximate a function (12:12-15:37) * Comparison with MLPs: How KANs compare to traditional Multi-Layer Perceptrons (MLPs) (15:37-20:22) * Applications of KANs: Examples of using KANs for symbolic and special function approximation (20:22-29:31) * Interpretability of KANs: How KANs can be interpreted to reveal the underlying structure of the function they approximate (29:31-41:26) * Discovery with KANs: How KANs can be used to discover new relationships between variables (41:26-47:22) * Case study: Recovering scientific results with KANs (47:22-58:12) * Open questions and future directions: Discussion on limitations and future research areas for KANs (58:12-1:00:00) In conclusion, KANs are a promising new direction in neural network research that leverages the Kolmogorov-Arnold representation theorem to achieve interpretable function approximation. They have the potential to be particularly useful in scientific applications where understanding the relationships between variables is important.
So based on my understanding, this model's usability is hitched on the assumption that one has a "perfect" mapping function whereby no information is lost when applying Kolmogorov's theory to return the 1D edges? Because that in itself can be extremely difficult, even a near approximation.
You simply make activation functions for each instructions and protect the activation output between layers, it would probably work. Except for Idk how to protect the activation between layers in a graceful way. Softmax helps self attention to protect *that*. BN seems not to be used anymore, but it actually protects *that*. None of them is grace since they all distort the forward path in some way. Or, if we don't use the B spline, we still can use sigmoid (with MIG) to so similar job. Edit: sigmoid way doesn't provide any known interpredability. It's only about the black box way.
Since u haven't given up on KAN u can apply normalization function to the whole data set eg: x=y^2 may be out of bounds for large value of x u can simply represent the section of b-spline where curve of differential would explode with a representation while keeping the curve = x=y^2 but on the side would be it's multipliers. Eg: u can represent billion with b as in calculator it also saves space. It's multipliers would show the different between x=y^2 nd nx=y^2.......I don't know I understood it right if it does best of luck for ur P.hd
So in principle, clearly you could simply take the functions KAN is built upon to be NNs. Furthermore, you could take a KAN of KANs, which strikes me as a second way to "go deep" on KANs. It also feels a little bit to me like the connections between objects, functions, functionals, natural transformations... - i.e. you'd essentially be able to encode category theoretical notions in KANs. - Is that a reasonable comparison to make? If so I wonder if you could simply take your base objects to be, say, the primitives of your favourite proof assistant plus arbitrarily deep, arbitrarily nested KANs to effectively efficiently find arbitrary functions that well represent whatever relationships you'd throw at them It's probably not at all easy to do, but that'd seem to me to be the most powerful version.
Are we talking here about a general representation theory? Are b-splines the only basis set that can be used? What about wavelets, Fourier series, etc.?
People are now experimenting with other curves. Radial basis functions seems to be a low cost drop in for splines. But people are using fourier or wavelet functions, for example, which are not at all splines.
God, I wish this entire "AI Boom" happened when I was in collage almost 20 years ago. I would be able to publish so many papers. Now its a sigmoid, boom paper, now it's an exponent, paper, now a spline, paper, what's next, directed graph, paper, fully connected graph, paper. When exactly did the level of research papers started to be like my freshman year homework?
So circle x circle= donuts but to define direction u need trigonometry......eg: circle x sin 2 or somthing or sin circle or circle sin(x)= donuts invite homer please.....1 hole then train to find hole of knots...
LLM to design physical language representation.......sphere representing nothing then twist n stretch to represent some memories........... cluster of neuron might represent memory but it still is capable of processing.....since audio n videos have same zeros n 1
Tegmark attention-whoring again and giving a bad name to physicists. This is a completely worthless paper. Learning activiation functions isn’t a new idea it’s just unnecessary.
0:02:35 Kolmogorov-Arnold Representation Theorem KART
~The only true multivariate function is the sum.
0:03:45 Details of (two layers) KART: 1d edge functions and node sums
0:05:05 KAN Kolmogorov-Arnold Network (orig 2-layer)
0:05:55 Multi-layer KAN
0:07:55 MLP and KAN comparison
0:09:45 B-splines basics
0:14:30 B-spline Cox-de Boor recursion formula (inefficient)
0:14:45 Implementation tricks: residual activations, initialization, grid update
0:38:05 Q: Expressivity vs generalization, bias-variance tradeoff, U-shape loss as fn of p (number of features)
0:39:15 Q: What if activation is out of range of the finite spline domain? -> Use the residual activation fn!
0:40:40 KANs to solve physics problems from raw data or already partially processed data?
0:43:15 KANs to solve PDEs?
0:44:35 Grid resolution finetuning is done manually
0:47:20 Can you replicate KANs by MLPs with the right breadth and depth? Yes. Would be nice to see a unified theory.
0:51:18 What's the novelty of KANs? At the technical level what makes a KAN a KAN?
0:58:16 Inductive bias: KAN's or DNN's inductive biases better fit a task: vision, language, science
0:59:25 History of connection vs symbolism
1957 - Frank Rosenblatt, Invention of perceptron
1969 - Marvin Minsky & Seymour Papert, Perceptrons: An introduction to computation geometry: "Perceptrons cannot do XOR"
1974 - Paul Werbos, "A multi-layer perceptron can do XOR"
1975 - Robert Hecht-Nielsen, Kolmogorov networks (2 layer, width 2n+1)
1988 - George Cybenko, "2 Layer Kolmogorov networks can do XOR"
1989 - Tomaso Poggio, "KA is irrelevant for neural networks
2012 - year of modern deep learning
Expert systems/symbolic regression vs KANs vs MLP/Kolmogorov networks
1:04:20 KAN vs MLP phylosophy: High internal degrees of freedom, reductionism, parts are important vs Low internal degrees of freedom, holism, interaction of parts is important
1:08:45 Intricacies of developing something new: KANs beyond 2 layers
1:11:45 Github repos
Wow, this is blowing up. Most of the journal club videos get hundreds of views. This already has thousands! I look forward to watching the talk and reading the paper.
Gemini 1.5 Pro: This video is about Kolmogorov-Arnold Networks (KANs) presented by Ziming Liu, a Phd student at MIT. KANs are a new type of neural network architecture inspired by the Kolmogorov-Arnold representation theorem. This theorem states that any continuous function can be represented as a finite sum of compositions of single-variable functions.
The video talks about the following aspects of KANs:
* Motivation: Why KANs were developed and what problems they address (0:00-2:22)
* Mathematical foundations: Explanation of the Kolmogorov-Arnold representation theorem (2:22-7:44)
* Visualization of KANs: How KANs are visualized as networks (7:44-12:12)
* Training KANs: How to train a KAN to approximate a function (12:12-15:37)
* Comparison with MLPs: How KANs compare to traditional Multi-Layer Perceptrons (MLPs) (15:37-20:22)
* Applications of KANs: Examples of using KANs for symbolic and special function approximation (20:22-29:31)
* Interpretability of KANs: How KANs can be interpreted to reveal the underlying structure of the function they approximate (29:31-41:26)
* Discovery with KANs: How KANs can be used to discover new relationships between variables (41:26-47:22)
* Case study: Recovering scientific results with KANs (47:22-58:12)
* Open questions and future directions: Discussion on limitations and future research areas for KANs (58:12-1:00:00)
In conclusion, KANs are a promising new direction in neural network research that leverages the Kolmogorov-Arnold representation theorem to achieve interpretable function approximation. They have the potential to be particularly useful in scientific applications where understanding the relationships between variables is important.
thank you for uploading this :)
So based on my understanding, this model's usability is hitched on the assumption that one has a "perfect" mapping function whereby no information is lost when applying Kolmogorov's theory to return the 1D edges? Because that in itself can be extremely difficult, even a near approximation.
I suggest to use piece wise function instead of spline, which show some similarities with FEM , which may easy to train
You simply make activation functions for each instructions and protect the activation output between layers, it would probably work. Except for Idk how to protect the activation between layers in a graceful way.
Softmax helps self attention to protect *that*.
BN seems not to be used anymore, but it actually protects *that*.
None of them is grace since they all distort the forward path in some way.
Or, if we don't use the B spline, we still can use sigmoid (with MIG) to so similar job.
Edit: sigmoid way doesn't provide any known interpredability. It's only about the black box way.
I didn't expect to se a mention to Jone polynomials...last time I talked about that was...well in the 80s.
Since u haven't given up on KAN u can apply normalization function to the whole data set eg: x=y^2 may be out of bounds for large value of x u can simply represent the section of b-spline where curve of differential would explode with a representation while keeping the curve = x=y^2 but on the side would be it's multipliers. Eg: u can represent billion with b as in calculator it also saves space. It's multipliers would show the different between x=y^2 nd nx=y^2.......I don't know I understood it right if it does best of luck for ur P.hd
So in principle, clearly you could simply take the functions KAN is built upon to be NNs.
Furthermore, you could take a KAN of KANs, which strikes me as a second way to "go deep" on KANs.
It also feels a little bit to me like the connections between objects, functions, functionals, natural transformations... - i.e. you'd essentially be able to encode category theoretical notions in KANs. - Is that a reasonable comparison to make?
If so I wonder if you could simply take your base objects to be, say, the primitives of your favourite proof assistant plus arbitrarily deep, arbitrarily nested KANs to effectively efficiently find arbitrary functions that well represent whatever relationships you'd throw at them
It's probably not at all easy to do, but that'd seem to me to be the most powerful version.
I like the 1d showing the integration. Great for PDEs
So n can be represented as function itself. Instead of going to infinity.
Yes we Kan ? I swear i've already heard this somewhere.
Can Kan be extended to math transformer
Can we use this for time series to forecast the future value ?
Thank you very much for your video. I still have some doubts, is the KAN network suitable for multiple outputs?
Interesting.......Thanks.
What about vector functions?
Do you also have an example to solve ODE using KAN?
Thanks guys
exactly 10% of your subs liked
Thanks
Eg: π in circle is present so KAN is good for producing formula
The question at 1:18:43 killed me 🤣
ruclips.net/video/5p4JEXweboE/видео.html
Guy took 5mins to ramble wtf
Are we talking here about a general representation theory? Are b-splines the only basis set that can be used? What about wavelets, Fourier series, etc.?
People are now experimenting with other curves. Radial basis functions seems to be a low cost drop in for splines. But people are using fourier or wavelet functions, for example, which are not at all splines.
Tutorial de Kolmogorov Arnold networks en castellano: ruclips.net/video/Jb9wMCPUlnc/видео.html
God, I wish this entire "AI Boom" happened when I was in collage almost 20 years ago. I would be able to publish so many papers. Now its a sigmoid, boom paper, now it's an exponent, paper, now a spline, paper, what's next, directed graph, paper, fully connected graph, paper. When exactly did the level of research papers started to be like my freshman year homework?
Piss off ghost
Why so salty 😅?
Bro's jealous
The research meta nowadays is who imagines it first, implements it, puts on arxiv first
You can still publish
So circle x circle= donuts but to define direction u need trigonometry......eg: circle x sin 2 or somthing or sin circle or circle sin(x)= donuts invite homer please.....1 hole then train to find hole of knots...
LLM to design physical language representation.......sphere representing nothing then twist n stretch to represent some memories........... cluster of neuron might represent memory but it still is capable of processing.....since audio n videos have same zeros n 1
Legend!
Can we make a cnn with kan layer
Yes you can... Change the grid to a subset of pixels in the window
Hannes, if you say thank you after a speaker has answered your question you let them know that your done. Just saying “yup” is kinda rude.
👏😎
Tegmark attention-whoring again and giving a bad name to physicists. This is a completely worthless paper. Learning activiation functions isn’t a new idea it’s just unnecessary.
You should give reasons for this comment
A statement with no arguments is unscientific.