The Physics Of Associative Memory

Artem Kirsanov

Просмотров 40 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 июл 2024
Get 20% off at shortform.com/artem
In this video we will explore the concept of Hopfield networks - a foundational model of associative memory that underlies many important ideas in neuroscience and machine learning, such as Boltzmann machines and Dense associative memory.
Socials:
X/Twitter: x.com/ArtemKRSV
Patreon: / artemkirsanov
OUTLINE:
00:00 Introduction
02:17 Protein folding paradox
04:23 Energy definition
08:25 Hopfield network architecture
14:03 Inference
18:40 Learning
22:48 Limitations & Perspective
24:43 Shortform
25:54 Outro
References:
1) Downing, K.L., 2023. Gradient expectations: structure, origins, and synthesis of predictive neural networks. The MIT Press, Cambridge, Massachusetts.
2) towardsdatascience.com/hopfie...
3) ml-jku.github.io/hopfield-lay...
Credits:
Protein folding: • protein folding
🎵 Music licensed from Lickd. The biggest mainstream and stock music platform for content creators
Viva La Vida by Coldplay, lickd.lnk.to/4aEPvoID License ID: RXj082JWjbA
Try Lickd FREE for 14 days for unlimited stock music and get 50% off your first mainstream track: app.lickd.co/r/47462149f85b4b...

Комментарии • 145

@ArtemKirsanov 7 дней назад ⁺¹⁰
Join Shortform for awesome book guides and get 5 days of unlimited access! Get 20% off at shortform.com/artem
@NicholasWilliams-uk9xu 6 дней назад ⁺¹
I have a more streamline answer to the protein problem. The protein doesn't start folding when it's a complete sequence, it folds as the sequence is being built. This computationally and temporally constrains the degrees of movement, limiting the number of molecular forces at work at any one given time. Meaning that the part of the sequence that has already has been constructed is already folded into it's low energy state, and the part that hasn't been build isn't preturbing the current folding stage. The folding process is constrained to occur as sequentially as possible, not in parrallel.
@NicholasWilliams-uk9xu 6 дней назад
This is top notch content, good work.
@NicholasWilliams-uk9xu 6 дней назад
A threshold activation heatmap over a parallel distribution of temporal sequential threads is more descriptive. Each thread operates in its own input/output relative connection space and favors specific input sequences over time. Maximum amplification of a sequence (i_1/time + i_2/time + i_3/time...) indicating highly favored temporal sequence and (i_3/time - i_2/time - i_1/time...) indicating the least favored temporal sequence (with temporal sequences in-between these 2 extremes). Each thread is measured against its threshold (T), Amplification (A), and a latent timeframe (L) and elapsed time (E) for sequence-coordinated activation. When A exceeds T, the output is calculated as (1 - |L - E|) to output partners. Favored detection sequences can be defined by a integer to define (most favored position) within the temporal sequential thread. The process can be tuned by sensory reward detections over time, increasing mutation velocity in a direction, changing the param magnitude in that direction, acting on thresholds and shift the sequence of most value for each thread. There is more optimizations to further optimize this style of learning, by extending it with threads that mutate other threads based on their activation levels, allowing mutation behavior to be inferred and leveraged by the network as it's trained (it begins to handle it's own mutations internally based on inference). Then it's a matter of reading the heat map to see what parts of the network like doing certain task, and seeing the state transitions of the network.
@Dent42 6 дней назад ⁺⁸⁸
Ladies, gentlemen, and fabulous folks of every flavor, the legend is back!
@tonsetz 6 дней назад ⁺³
bro got lost into obsidian css configuration, but now he returns to brain cell
@giacomogalli2448 4 дня назад ⁺³
He's something else, manages to make computational neuroscience engaging WHILE not giving up on the details
@tfburns 6 дней назад ⁺¹²⁶
John Hopfield wasn't the first to describe the formalism which has been subsequently popularised as "Hopfield networks".
It seems much fairer to the wider field and long history of neuroscientists, computer scientists, physicists, and so on to call them "associative memory networks", i.e. Hopfield was definitely not the first/only to propose the network some call "Hopfield networks". For instance, after the proposal of Marr (1971), many similar models of associative memory were proposed, e.g., those of Nakano (1972), Amari (1972), Little (1974), and Stanley (1976), which all have a very similar (or exactly the same) formalism as Hopfield's 1982 paper.
Today, notable researchers in this field correct their students' papers to replace instances of "Hopfield networks" with "associative memory networks (sometimes referred to as Hopfield networks)" or something similar. I would encourage you to do the same in your current/future videos.
I deeply regret making a similar mistake regarding this topic in one of my earlier papers. However, I am glad to correct the record now and in the future.
Refs:
D Marr. Simple memory: a theory for archicortex. Philos Trans R Soc Lond B Biol Sci, 262(841):23-81, July 1971.
Kaoru Nakano. Associatron-a model of associative memory. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):380-388, 1972. doi: 10.1109/TSMC.1972.4309133.
S.-I. Amari. Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions on Computers, C-21(11):1197-1206, 1972. doi: 10.1109/T-C.1972.223477.
W.A. Little. The existence of persistent states in the brain. Mathematical Biosciences, 19(1):101-120, 1974. ISSN 0025-5564. doi: doi.org/10.1016/0025-5564(74)90031-5.
J. C. Stanley. Simulation studies of a temporal sequence memory model. Biological Cybernetics, 24(3):121-137, Sep 1976. ISSN 1432-0770. doi: 10.1007/BF00364115.
@Marcus001 6 дней назад ⁺³¹
Wow, you cited your sources on a RUclips comment! Thanks for the info.
@TiagoTiagoT 6 дней назад ⁺³
See also: Sitgler's law
@maheshkanojiya4858 6 дней назад ⁺²
Thank you for sharing your knowledge
@ArtemKirsanov 6 дней назад ⁺²²
Wow, thanks for the info!
@NicholasWilliams-uk9xu 6 дней назад
Relative connection spaces are dimensionally agnostic, they don't presupose a dimensionality for each node in the connection space, it's better at tracking large distributions (where a heat map can highligh areas of activity [threshold activations], to see the areas that light up when the system is doing specific task or undergoing a specific sensory data pattern). This way the dimensionality isn't constrained to a 2d sheet and predifined curvature manifold, you can better see the modal transitions of the system via this heat map.
@SriNiVi 5 дней назад ⁺¹¹
This is an insanely educational video, as a ML researcher working on representation learning for multi modal retrieval, this is insanely helpful and relatable. I think you just gave me a new area to look at now, how exciting, i owe you one.
@davidhand9721 2 дня назад ⁺⁶
In fact, many proteins function in a _local_ minimum that is _not_ the global minimum. This is why proteins denature irreversibly when exposed to heat; there's an energy barrier that they can never come back from if they cross it.
@ProgZ 6 дней назад ⁺¹⁶
At the beginning, when you mention the O(n) problem, as a programmer it just intuitively makes you want to use a tree or a hash map lol
In any case, another banger!
Its fascinating to see how these things work!
@vastabyss6496 2 дня назад
I had the same thought. Though, a hashmap or something similar probably wouldn't work, since many times the key is incomplete or noisy, which would cause the hashing function to return a hash that would map to the wrong index
@ricklongley9172 6 дней назад ⁺²⁵
Minor correction: 'Cells that fire together, wire together' was coined by Carla Shatz (1992). Unlike Donald Hebb's original formulation, Shatz's summary of Hebbian learning eliminates the role of axonal transmission delays. By extension, neural networks which remain true to Hebb's original definition should go beyond rate coded models and instead simulate the time delays.
@NicholasWilliams-uk9xu 6 дней назад ⁺⁴
Yes latent time parameters need to be implemented. A threshold activation heatmap over a parallel distribution of interconnected temporal sequential threads is more descriptive in targeting what he is trying to convey in larger distibutions where hopfield computational structure fails. Each thread operates in its own input/output relative connection space and favors specific input sequences over time. Maximum amplification of a sequence (i_1/time + i_2/time + i_3/time...) indicating highly favored temporal sequence and (i_3/time - i_2/time - i_1/time...) indicating the least favored temporal sequence (with temporal sequences in-between these 2 extremes). Each thread is measured against its threshold (T), Amplification (A), and a latent timeframe (L) and elapsed time (E) for sequence-coordinated activation. When A exceeds T, the output is calculated as (1 - |L - E|) to output partners. Favored detection sequences can be defined by a integer for each input within a temporal sequential thread (a mutable trainable param), representing the input's favored position within the temporal sequence. The process can be tuned by sensory reward detections over time, increasing mutation velocity in a direction, changing the param magnitude in that direction, acting on thresholds and shift the sequence of most value for each thread. There is more optimizations to further optimize this style of learning, by extending it with threads that mutate other threads based on their activation levels, allowing mutation behavior to be inferred and leveraged by the network. Then it's a matter of reading the heat map to see what parts of the network like doing certain task given a specific time slice, and seeing the state transitions of the network when other distributions become active.
@u2b83 2 дня назад ⁺²
4:00 LOL "The ball doesn't search through all possible trajectories to select the optimal parabolic one."
The visualization of the "trajectory space" is even funnier ;)
I suspect there's different encodings for proteins with identical function, but which are more robust wrt folding consistently.
@danishawp32 7 дней назад ⁺¹⁹
Finally, you are comeback 🎉
@didack1419 6 дней назад ⁺⁸
I was thinking about your channel less than an hour ago.
@FreshMedlar 6 дней назад ⁺⁷
Thanks for the incredible quality in your videos
@raresmircea 6 дней назад ⁺³
Exceptional pedagogical skill! I’m not able to hold these types of explanations in my mind, so any attempt at following such a web of relations would quickly have me lost. But this is a masterclass in clear considerate communication 🙏
@NeuroScientician 6 дней назад ⁺⁵
This looks like a run-of-the-mill gradient descent, how does resolves false bottoms?
@psuedonerd1236 7 дней назад ⁺¹⁵
Respect for using Coldplay 🔥🔥🔥
@thwhat6567 6 дней назад ⁺³
Your back!!! awesome vid as always.
@user-lx6qz6zs7i День назад
Man, really thankful to your contents. I was facinated by your video about TEM, and started trying to fully understand that network(and memory in general) in my leisure time since about a year ago. I learned about latent variables, transformer architecture(fantastic videos by andrej karpathy), autoencoders, etc, but got stuck at (modern)hopfield nets, which I think is super important in the architecture of TEM. Very glad to see that you start to touch this field of Hopfield Nets, this is probably the best video about vanilla hns I've ever watched. Really looking forward to your video about Boltzman Machines and Modern Hopfield Nets, always appreciates your videos!
@shasun99 6 дней назад ⁺²
Waiting for your video so long. Thank you so much
@agurasmask2210 6 дней назад ⁺¹
Much love bro incredible video ❤ thank you
@Mede_N 6 дней назад ⁺⁵
Awesome video, like always.
Just a small nitpick: your speaker audio jumps between the left and right audio channel, which is quite distracting - especially with headphones.
You can easily solve this by setting the voice audio track to "mono" when editing the video.
Cheers
@zeb4827 6 дней назад ⁺¹
very cool video, keen to see how the broader arguments progresses in this series
@jeffevio День назад
Another great video! I really liked your energy landscape and gradient descent animations especially.
@laurentpayot3464 5 дней назад
Awesome. I just can’t wait for the next video!
@asdf56790 6 дней назад
As always, outstanding video!
@user-vf5jc4ig7o 5 дней назад
Can wait to see this video!!!!!
@felipemldias 6 дней назад
Man, I just love your videos
@swamihuman9395 6 дней назад
- Excellent! Thx.
- Very well presented: clear/concise, yet fairly comprehensive - and w/ great visualizations.
- Keep up the great content!...
@antonionogueras6533 6 дней назад ⁺¹
So good. Thank you
@jverart2106 6 дней назад
I was reading and watching videos about metacognition and bayesian probability and now you have thrown me into a new rabbit hole! 😅
Your videos are incredible and it's great to have a new one. Thank you!
@SteveRowe 5 дней назад
This was really clear, accurate, and easy to follow. 10/10, would watch again.
@PastaEngineer 6 дней назад
This is incredibly well put together.
@vinniepeterss 6 дней назад
great video as always
@foreignconta 6 дней назад
Waited for this...!!
@keithwallace5277 4 дня назад
Love you man
@spiralsun1 5 дней назад
One of the best most
Clear videos I’ve ever seen EVER❤ 🙏🏻THANK YOU ❤😊
@tornyu 22 часа назад
I love that you made everything dark mode. (Noticed when I saw the Wikipedia logo)
@josephlabs 6 дней назад ⁺¹
I wanted to do research on something like this a year or two ago. This is amazing, I've got some work to do with this.
@jamesphillips9403 6 дней назад
Holy cow, this makes a complex topic so intuitive.
@filedotjar 3 дня назад
Super interesting to see that the hopfield network basically reinvents binary operations like XOR and XNOR for neurons, with the two differentiated by the weight.
@kellymoses8566 День назад
One reason I like using Neo4J is that graph networks seem like the work a bit like human memory with links between things making finding related items fast.
@F_Sacco 6 дней назад ⁺⁵
Hopfield networks are amazing! They are studied in physics, biology, machine learning, mathematics and chemistry
The rabbit hole goes extremely deep
@SilentderLaute 6 дней назад
Another awesoem Video :)
@deotimedev 4 дня назад
Thank you so much for creating this video, genuinely one of the most educational I've ever come across. I've been trying to learn more about how brains work since that's always been something I've been very curious about literally since birth, and along with entropy being my favorite physics concept this video has just led to me googling and researching for the last 4 hours (its 3am lol) trying to find out more. Really impressed with how complicated, yet still high-quality and clear, some of the topics are in this is and I'm really looking forward to watching the rest of your videos to learn more on how all of this stuff works in such an intricate way
@ArtemKirsanov 4 дня назад
Thank you!!
@joeybasile1572 2 дня назад
Please keep going. Keep dedicating your time to your pursuit of wonder.
@is44ct37 6 дней назад
Great video
@pushyamithra223 День назад
please try to make more videos, your content is extremely good
@marcoramonet1123 6 дней назад
This is one of the best channels
@Dawnarow 6 дней назад
Thank you. This is unbelievably simple and potentially more accurate than any other speculation. Next step: determining the shape of the proteins and categorizing them. The tools may not be there, yet... but a good hypothetical certain helps to reach certain conclusions.
@Harsooo 6 дней назад
Кайф слушать и офигевать)
Greetings from Austria, keep doing what you're doing!
@dann_y5319 2 дня назад
Awesome
@edsonjr6972 6 дней назад
My God, your videos are amazing
@MrGustavier 6 дней назад
Genius !
@13lacle 2 дня назад
Great video as always. For 22:50, has anyone tried stacking layers of Hopfield networks yet as a work around? Basically each layer acts in it's own feature level space and resolves for that level's most likely feature, then passes it up to the next higher order Hopfield feature space to be resolved there. It seems like this would allow you to store exponentially more overall patterns has they get resolved separately to avoid the overly busy end energy landscape.
Also interestingly you can see how it would carve out the energy landscape from just the raw inputs with this. You have the Hopfield network constantly comparing itself to the some abstraction of the source input layer, meaning the more times a pattern seen the stronger it gets in the Hopfield network. Also for faster convergence, it is likely the greater the xi and hi difference the faster xi updates.
@porroapp 5 дней назад
12:21 Watching this to maximise my happiness. Max happiness Min unhappiness, this is the way. Thank you!
@futurisold 4 дня назад ⁺²
> "when you throw a ball the ball doesn't search through all the possible trajectories"
QM has entered the chat
@simdimdim День назад
2:16 a* great introduction
@ExistenceUniversity 6 дней назад ⁺²
This content is so high level, it's almost impossible to tell if it is true or not. Physically and philosophically, I have bought in, but my want of it to be true doesn't make it so.
I cannot imagine this is wrong, but where does this come from?
This stuff is just crazy and I don't know if it is crazy good or just crazy lol but I'm along for the ride
@h.mrahman2805 День назад
Plz make a video about modern hopfield net or dense assosiative memory. Cuz it a different and generalize perspective of mopdern hopfield nets.
@ArbaouiBillel 6 дней назад
Agree 💯
@Tutul_ 3 дня назад
Because the neurons have two edge weights (A->B and B->A) does that might explain the case where we have the memory just out of reach and get lock trying to get it?
@Neptoid 6 дней назад
Your font missed the contextual ligatures
@thegloaming5984 4 дня назад
Can you do a video on the work of Dmitry Krotov showing that attention mechanisms are a special case of associative memory networks
@Snowflake_tv 6 дней назад
Long time no see😁
@angeldiaz5520 5 дней назад
Si that means that our neurons do some type of gradient descent? That’s very interesting to know
@Xylos144 5 дней назад ⁺¹
Great video. Little sad to see that anti-training wasn't mentioned. It doesn't really solve the problem with training two sequences that are 'close' together, so that's fair. But it does help, and has an interesting analogy with physiology.
Essentially, if you try to train two sequences that are too close to each other, their valleys will overlap, which means you might try to aim for one specific sequence and end up falling into the other. And if they're really close, you'll actually create a new local minimum that sits between the two.
In those cases, what you can do is identify all your local minima and then run the algorithm backwards, training hills on top of all your local minima. For stand-alone minima, this doesn't matter because they're still local minima. But if a minima is a false sequence that sits between two or more neighboring targets, this builds a hill in between those two neighboring valleys, helping to make those nearby sequences more distinct.
As Geoffy Hinton has pointed out, this has an interesting conceptual analog to dreaming, where we seem to replay experiences and concepts from our day (to a vague extent) and sleeping/dreaming also seems to help with learning and memory. Similarly by making the Hopfield focus on its memoreis while playing them backwards, so to speak, helps to solidify its own memory.
It may be little more than a metaphorical analog, but I think its still quite interesting.
@ArtemKirsanov 4 дня назад ⁺²
That’s exactly right! Bolzmann machines, which are an improved version of Hopfield nets in fact do just that, with contrastive hebbian learning, by increasing the energy of “fake” memories.
Hopfield networks, being the first model, don’t have that property in the conventional form though. So we will talk about this in the Bolzmann machines video.
Good catch!
@Xylos144 4 дня назад
@@ArtemKirsanov Ah, gotcha I didn't realize the idea of 'anti-learning' applied to boltzman machines. I've only messed with restricted boltzmann machines and I always thought of them as stacked reversible auto-encoders.
Never though that the updating method may be replicating the same 'anti-learning' process - though it does make sense since autoencoders are trying to make a bunch of weird, distinct hyper-dimensional valleys. Maybe it's more apparent with the more general Boltzmann machine.
Looking forward to that video!
@mc.ivanov 6 дней назад
Didnt you already upload a video on boltzman machines? I thought I saw it last year.
@andersreality 7 дней назад ⁺³
Just in time for my studies into Z12 cyclic groups, which surely have nothing to do with cognition 🧐
@manymany1191 23 часа назад
Do you have merch?
@davidhand9721 6 дней назад
I've always thought of energy has unhappiness, too.
@roshan7988 7 дней назад ⁺¹
Wow
@wege8409 5 дней назад
Hey, someone pointed me to your page because of the links between neuroscience and ML. I think that the thalamus might be a cross-modal auto encoder. Main reason I think this is because there are 10 times as many connections going into the thalamus as there are going out, sounds like encoding to me. I was wondering, does that ring true to you?
@benfield1866 6 дней назад
how does this relate to fristons free energy principle?
@angelodelrey3357 5 дней назад
This seems like Friston's explanetion of inference. Can you explain that to in your videos?
@ShpanMan 3 дня назад
AGI will likely need thinking in this way, and creating at least partial ideas of how our brain works.
@Kram1032 6 дней назад
It seems to me this "fire together wire together" notion is actually also present in the attention mechanism of transformers, except you don't just have a 1D 1 bit + - but rather an nD dot product.
This still has the same basic structure: Neurons try to more closely align to each other. But the added dimensions give each neuron more ways to accomplish that: Among three neurons, one neuron can be pretty close to the other two while those two might be pretty far away from each other.
@kmo7372 6 дней назад
I wonder if the convergent process can be done parrellelly state by state.
That would be awsome.
@TeslaElonSpaceXFan 5 дней назад ⁺¹
😍
@mgostIH 6 дней назад ⁺¹
Wait, this is just the Ising model with extra steps!
@mastershooter64 6 дней назад ⁺¹
Just as I suspected, everything is just the principle of stationary action! It's all just making an action functional stationary what if we considered non-local actions?
@waff6ix 6 дней назад
STUFF LIKE THIS IS SO INTERESTING TO ME TO LEARN😮‍💨GOD DESIGNED SUCH AN AMAZING CREATION🤩🙏🏾🙏🏾🙏🏾
@neon_Nomad 6 дней назад
I use an excell spreadsheet for all my memories
@vinniepeterss 6 дней назад
❤❤
@disgruntledwookie369 6 дней назад ⁺³
Ironically, within the framework of quantum mechanics, one could actually say that the ball *does* "search" every possible path in order to find the "correct" one. It simply performs the "search" in parallel, not sequentially. And it's less of a search and more of an average of all paths. The principle of stationary action is the driving principle behind Newtonian dynamics and itself follows directly from the interference between many "virtual" trajectories, it turns out that the paths which are close to the "true path" (the classical path) have very little variance in their action, which rough speaking means that they end with nearly identical phase shifts (e^iHt/h, Ht ~ action, h = Planck constant) and can interfere constructively, whereas paths which are far from the "true path" have wildly varying actions, even if two paths are similar to each other. So they pick up big phase shifts and end up interfering destructively, leaving only the contributions from the paths "close to" the classically observed path. As far as I know this is the only way to derive the principle of stationary action, and the same basic idea is essential to finding transition probability amplitudes in QFT. It really does seem like the universe simultaneously tries all conceivable paths, superimposed together.
@asdf56790 6 дней назад ⁺²
One could also say this for optics with Fermat's principle or classical mechanics with Hamilton's principle. Even though variational formulations are mathematically beautiful, I'd be cautious to assume that "reality works this way" i.e. "searches through all paths". They are one equivalent description of many (even though it is remarkable that they pop up basically everywhere).
@disgruntledwookie369 6 дней назад ⁺¹
@@asdf56790 I agree with your caution. I'm just increasingly convinced that theory and experiment are pointing this way and the onl obstacle is our flawed intuition and prejudice. We want there to be only a single, consistent reality. But this forces us into some intense mental and mathematical gymnastics to make the equations of QM fit observations. If you take the equations at face value then you have no trouble, you just have to contend with the idea that reality is not a single line of well defined events, but multiple histories occurring simultaneously and generally able to interfere with each other. An electron passing through a Stern-Gerlach device would then actually travel both paths, in "separate worlds" but these paths can still interfere and superimpose so long as you don't take any steps to determine which path was taken. Like if you redirect the paths to converge into a single path and put the whole thing in a box so you can only see the output, you cannot determine which path was taken and you can show experimentally that the output electron superposition of spin states is preserved. But the universe doesn't know ahead of time (superdeterminism notwithstanding) whether you will take a peek in the box and catch the electron with its pants down. In my view, the explanation requiring the fewest assumptions is that all paths really are taken, but with the assumptions that 1. When we observe a property we can only observe definite values, not superpositions, and 2. paths can interfere (unless decoherence has occurred). A poor and rushed explanation but this is kind of my thought process. As you say, there are many alternative interpretations. It's pretty much philosophy at this point. 😅
@raresmircea 6 дней назад
What’s the status of all those parallel paths? You’ve used the term "virtual" so you seem to view them as part of some kind of potentiality, not getting actualized in the observer measured reality (I’m using these terms very loosely, I don’t know exactly what they mean). If I’ve understood right, in the MWI there’s no actual convergence on a path, each of the possible parallel paths are actualized paths that get to be part of reality.
@mastershooter64 5 дней назад ⁺¹
@@asdf56790 This is true, all subsequent physical theories are simply better and better approximations of reality, we shouldn't assume reality works that way without more experiemental and theoretical verification.
@peterfaber7124 5 дней назад
Great explanation!
This applies to fully connected networks, correct? So memories are Dense Distributed Representations.
It's not how the brain does it. The brain uses the opposite: Sparse Distributed Representations.
Is there any way you could explain it using SDRs?
@JoaoLucas________ 6 дней назад ⁺¹
Resumindo: intuição inconsciente guiada pelo princípio da conservação de energia do ego absoluto.
@jagsittermedsimonochjobbar 6 дней назад
🙇
@nanotech_republika 6 дней назад
Basic question: you are using word "inference" for outcome during the training stage (about 15 minutes into the video). But I've heard that word being used for specifically only the recall stage, not the training stage. For example in the transformers use. Can you clarify? Are you simply wrong using that term? Or does the usage vary?
@ArtemKirsanov 4 дня назад
That’s right, inference usually refers to running the computation with fixed weights. In the case of associative memory this is when we’re recalling the pattern. It is different from training, when we’re setting the weights.
I’m not sure where in the video i used “inference” in the context of training. Can you specify the exact time code?
@nanotech_republika 4 дня назад
@@ArtemKirsanov Sorry, I misunderstood what you described at about 13:30.
@revimfadli4666 2 дня назад ⁺¹
Does this mean artbros were right about diffusion models being databases?
@u2b83 2 дня назад
My guess is they're "databases" of chaotic attractors. After all, the diffusion process is effectively a differential equation that evolves over time and settles in some stable basin.
@Antoinedionsexo 6 дней назад
Hi there great videos! I am curious about your view of free will and agency. I myself explore this philosophical/psychological/biological subject, and your videos are reinforcing notions I found in books like "chaos and nonlinear psychology, Schulter et al" and "determined, Sapolsky" among others. It seems to me that, from a scientific point of view at least, the notion of dualism, and free will, doesn't make sense. Is that something you think about? :) In any case keep up the work, it's really appreciate here. Antoine Dion, Canada
@StephenRayner 5 дней назад
❤❤❤❤❤❤❤❤❤❤❤❤
@johnwolves2705 6 дней назад
sir so proteen folding is affected by gravity too 😵😱
@SALAVEY13 6 дней назад ⁺¹
wow! simplifying weights to be simmetric is a pretty huge imho, so in brain contrary to neural nets that supposedly work like brain neurons are not activated in layers from left to right, there is a mess with simultaneous back propagation waves, cools
second time my brain exploded at "maximizing the sum is same as minmizing the loss", daaamn.. so just like that the hill on the graph becomes valley - i just think it semantically triggers analogy with gravity, which is imho bad, look - even in protein folding analogy, there is energy "push", not gravity "pull", like, proteins can't fold TOO fast, right, there is some limit how fast it could be pushed in place - this analogy breaks when we start thinking about valleys instead of hills on graph, as it would become obviuos that that steeper the valley the faster solution will be converged, BUT, i just think TOO steep path sometimes can't be jumped around, so it's kinda misleading maybe. and if we keep hill representation - there should be "path up" not so steep to be able to afford it with available energy.. very interesting!)
@SALAVEY13 6 дней назад
Let's break it down and elaborate on each aspect.
Symmetric Weights
Simplifying weights to be symmetric is a pretty huge imho:
In biological neural networks, connections between neurons (synapses) are not necessarily symmetric. The strength and direction of the connection from neuron A to neuron B can be different from the connection from neuron B to neuron A. This asymmetry introduces a layer of complexity that is not present in traditional Hopfield networks.
Hopfield networks simplify this by assuming symmetric weights (i.e., 𝑤𝑖𝑗=𝑤𝑗𝑖w ij=w ji). This simplification makes the mathematical analysis more tractable and ensures convergence to a stable state. In other words, it guarantees that the network will not get stuck in an infinite loop of state changes, but will eventually settle into a stable configuration (a local minimum in the energy landscape).
Neurons Activation and Propagation in the Brain
In brain contrary to neural nets that supposedly work like brain neurons are not activated in layers from left to right, there is a mess with simultaneous back propagation waves, cools:
Traditional artificial neural networks, like those used in deep learning, often operate in discrete layers where information flows in a feedforward manner from input to output. However, the brain operates very differently. Neurons in the brain are massively interconnected and communicate through a complex network of excitatory and inhibitory signals, often in parallel and asynchronously.
In this context, the analogy with Hopfield networks is closer because Hopfield networks also consider a fully connected network where every neuron can potentially influence every other neuron. This creates a more "messy" and intertwined propagation of activation, which is more biologically plausible.
Maximizing Happiness vs. Minimizing Energy
Maximizing the sum is same as minimizing the loss:
This concept is a fundamental principle in optimization. In the context of Hopfield networks, the "happiness" of the network (which is the sum of weighted pairwise state agreements) can be thought of as the inverse of the "energy" of the network. Maximizing the happiness is equivalent to minimizing the energy.
When you maximize the sum of weighted agreements, you are effectively pushing the system towards a state where all the neurons are in configurations that are most favorable given their connections. This is conceptually similar to minimizing the loss function in machine learning, where you adjust weights to make the network's predictions as accurate as possible.
Energy Landscape and Analogy with Protein Folding
Energy "push" vs. gravity "pull":
Your observation about the analogy between protein folding and neural network optimization is insightful. In protein folding, the molecule finds its stable configuration by moving towards a state of lower potential energy. This process is driven by physical forces that minimize the system's energy, much like a ball rolling downhill due to gravity.
In neural networks, this analogy is often visualized as moving towards the bottom of a valley in an energy landscape. The idea is that the system will naturally "descend" towards the lowest energy state, which corresponds to the optimal configuration.
However, as you pointed out, this analogy can be misleading if taken too literally. In protein folding, the process is driven by physical interactions that might have constraints on how quickly the system can reach its stable state. Similarly, in neural networks, if the energy landscape has very steep valleys, the system might face difficulties in finding the optimal path, as it might get stuck in local minima or encounter barriers that prevent smooth convergence.
Steepness and Convergence:
In optimization, a very steep valley (or hill) in the energy landscape can indeed pose problems. If the gradient is too steep, the optimization algorithm might take very large steps and overshoot the minimum, or it might face numerical instability. This is why techniques like gradient clipping or adaptive learning rates are used in training neural networks to handle such issues.
Path Representation:
When you mention that the "steeper the valley the faster solution will be converged," it's important to consider that while steep gradients can accelerate convergence initially, they can also cause problems if the system overshoots or gets trapped. A more moderate and manageable path can ensure steady and reliable convergence to the optimal solution.
Conclusion
In summary, your comment highlights the nuances and complexities in drawing analogies between biological processes and artificial neural networks. The simplifications made in Hopfield networks (such as symmetric weights) serve to make the system mathematically tractable and ensure convergence, but they do not capture all the intricacies of biological neural networks. Additionally, the analogy with energy landscapes, while useful, must be carefully interpreted to account for practical challenges in optimization.
@SALAVEY13 6 дней назад
ruclips.net/video/43nsldnfkM8/видео.html
@mishaerementchouk 2 дня назад ⁺¹
The symmetry assumption is actually strange. The way how the objective function is defined, the antisymmetric part of the weights doesn’t matter. The extrema of the function are not affected by it. This assumption is only needed for the essentially sequential Hopfield update rule to converge. But realistic neurons are not updated in some orderly fashion. This raises a lot of questions.
@justanotherytaccount1968 5 дней назад
Comment for the algorithm
@artemonstrick 6 дней назад ⁺¹
У этого канала криминально мало подписчиков.
@DarrenMcStravick 7 дней назад ⁺¹
Hopefully no copyright issues from using coldplay 🤞😬
@ArtemKirsanov 7 дней назад ⁺⁸
hehe, i paid for the license to use it, so should be okay :)
@107cdb 6 дней назад ⁺¹
I didn't want to sleep anyway.
@tofolcano9639 6 дней назад ⁺¹
So real neurons are connected asymmetrically but Hopfield networks have to have symmetric connections in order to actually work.
But as it turns out Hopfield networks are not useful for practical uses, so despite purposely being different from biological memory in order to actually work they still don't work.
Also the neurons in the brain aren't all connected with each other but this type of artificial neurons are.
So what's the deal? Haven't we found anything better than this since then? Isn't ChatGPT's ability to recall information much much better than this system?
@LettersAndNumbers300 6 дней назад
I want to work with you so bad
@MDNQ-ud1ty 4 дня назад
Realize that the product is actually `and` and 1,1 or 0,1 are equivalent. What is being computing is logic functions. A NN is simply a logic machine. It is trained to find logical representations of the data. That is, it's a program. There is no difference between data, logic/math, and programs. They are different forms of the same thing. Similarly how a banker, a politician, and a criminal are all just forms of the same thing.
@supthos 6 дней назад
just use a directed graph? 10:10
@Patapom3 3 дня назад ⁺¹
It looks way too complicated compared to basic symbolic memory... And it also has the O(N²) flavor to it! Clearly I would be surprised if actual biological memory was looking anything like these networks...
It's much simpler to elicit a few key features of a memory from inputs then to an exhaustive search on the few possibilities that have been selected...
@mishaerementchouk 2 дня назад
It’s an associative memory - a knee-jerk reaction. A certain degree of sloppiness is allowed but then it’s O(1). The apparent O(N^..) is due to the artificially sequential Hopfield rules.
@u2b83 2 дня назад
Personally, I suspect we hash things similar to Bloom Filters.
@tedarcher9120 6 дней назад
Thoughts or experiences are semi-cyclical patterns of activation of particular neurons, memory is stored in the likelihood of one neuron to others, when an experience or a thought activates enough neurons that store a memory it is recalled

Следующие

Автовоспроизведение

The Most Important Algorithm in Machine Learning