2 Years of My Research Explained in 13 Minutes

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 213

  • @dhruv0x0x0
    @dhruv0x0x0 Месяц назад +67

    doing literature survey or just getting into a field would 1000x easier if somehow in an ideal world we could have videos like these, and thank you so much, your effort which you put into making this vid while being in academia too!!!

    • @EdanMeyer
      @EdanMeyer  Месяц назад +2

      There is plenty of space for others to join me, I wish there were more quality youtube videos about research

  • @JoeTaber
    @JoeTaber Месяц назад +82

    My Policy model appreciates good jokes about subscribing.

  • @IxCIHAoX
    @IxCIHAoX Месяц назад +52

    Really cool, congratulations. You can really be proud of yourself.
    I did just the course work and thesis, as what is what we are supposed to in my uni. And now after graduating i really regret never doing anything more in that regard. Videos like yours really motivate me to try researching in my freetime. But I still don't know where to start really, and fear it will be a waste of time. But videos like yours, people at the start of their academic career, give me hope. Thanks

    • @PhiKehr
      @PhiKehr Месяц назад +9

      research is never a waste of time. even if nothing comes out of it, you can still provide what not to do for other people that have similar research interests

    • @MatthewKelley-mq4ce
      @MatthewKelley-mq4ce Месяц назад

      I didn't really understand "waste of time" either. No end product doesn't mean the journey was pointless, as it often means more experience to carry forward. That said, I understand the emotional sentiment and uncertainty. It can be daunting.

    • @kaganaytekin1804
      @kaganaytekin1804 18 дней назад

      ​@@PhiKehr is it really the case? Most papers that are accepted to highly respected venues are solely about what works. I feel that it's unlikely that people have not tried to publish nice papers about what does not work. Counter examples are very much appreciated. 🙏🙏

  • @andrestorres7343
    @andrestorres7343 Месяц назад +10

    I love the way you approach research. The questions you ask are very good!

    • @EdanMeyer
      @EdanMeyer  Месяц назад +1

      Asking the right questions is perhaps the most important thing in research, and something I improved a lot at over the past 2 years. Still a lot more go through!

  • @firewind1334
    @firewind1334 Месяц назад +8

    You have a real knack for presenting complicated things in an easy to understand/digest manner man! Your research is awesome and I can’t wait till I can pick up a ejmejm brand robot third arm

  • @thatotherguy4245
    @thatotherguy4245 Месяц назад +66

    I wish I understood more than 1% of what you said, but from what I did get, it looks very impressive. Congratulations.

    • @ckpioo
      @ckpioo Месяц назад +8

      If you want, I would suggest you to go start from the basics because the way this was formatted was extremely basic and lament

    • @premmahyavanshi7741
      @premmahyavanshi7741 Месяц назад

      I was feeling down thinking I didn’t know enough, thank you for reminding me why I’m studying so much ❤

    • @ruiyangxu790
      @ruiyangxu790 10 дней назад

      Welcome to the “cult”! 😂

  • @pieriskalligeros5403
    @pieriskalligeros5403 Месяц назад +5

    Edan you're just amazing dude! Nobody asked you to explain AE compression, bottlenecks, latent states, and yet you did! Love your format and your research, keep shinin' and keep educating on RUclips with your iPad! RL Professors keep note (Looking at you slide ppt addicts!) Also, as you were explaining the 1 step (TD0?) action prediction, I couldn't help but wonder if LLMs (say GPT-2) could just be model-based RL agents!

  • @gobblegobble2559
    @gobblegobble2559 14 дней назад +2

    Dude, your 13 minutes of video literally explained more than a whole semester at uni!!! Thanks!!!

  • @dylancope
    @dylancope Месяц назад +6

    Very interesting work! Congrats on being accepted to RLC 😊 and thank you for sharing.
    My research is in MARL and communication between agents. I generally work with discrete communication symbols for various reasons, and I've often argued for intuitive benefits of discretisation (even in noise-free channels). It's nice to see some concrete work that lines up with my intuitions!

    • @omrisharon
      @omrisharon Месяц назад

      Sounds super interesting! Can you share a link to one of your papers? Have you tried adding "self-communication" between each agent and itself in addition to the communication between agents?

  • @ramzykaram296
    @ramzykaram296 Месяц назад +8

    You're the first channel to ask the questions i have in my mind which always push me to read more papers, why things work !! Usually channels and books focus on the tutorial and catchy titles. I have set notifications of your channel to all and hope to see more videos

  • @rmt3589
    @rmt3589 23 дня назад +1

    This amazing, and puts an explanation to a concept I've been pondering on. Would be a great start to a self-simulating module for an AI, even if it only works on a projection like a video game state. You gained a new subscriber!

  • @thebarrelboy9552
    @thebarrelboy9552 Месяц назад +4

    Really nice to see a new video from you Edan, keep it up! You're finding your style with the videos, both artstyle and script-wise, and I love it, I think this is one of your best videos so far :)
    This topic is super interesting as well, I based my master thesis on the precursor to DreamerV3, PlaNet, and its so cool to see these deepdives into the model-based RL methods, keep it up!
    Please don't dumb down your videos more, you've found the perfect sweetspot now in my opinion, where we can actually learn the small intersting advancements of different methods and papers, without going down into the maths and formula derivations, its perfect!

  • @hdhgdhgdhfhjfjhfjh
    @hdhgdhgdhfhjfjhfjh Месяц назад +11

    simple explanation could be that continous eutoencoder defeats whole purpose of 'compressing' the information as it can really represent wide range of information as compared to discrete eutencoder

  • @TheSMKD
    @TheSMKD Месяц назад +9

    Wow this is great research progress. Thanks for sharing here on youtube!

  • @leeris19
    @leeris19 Месяц назад +18

    I can't help but associate Discrete representations and Continuous representations to learning rates, in algorithms like gradient descent. If a representation is discrete, its step size (learning rate) is bigger (moving from 0-1 takes only a single step), allowing it to learn faster, but it may sometimes miss the local minima or may never reach it. If it's continuous, moving from 0-1 will take it more steps, thus making it learn slower, but eventually with enough steps, it will reach the local minima better. Other than that, Great work!

    • @unvergebeneid
      @unvergebeneid Месяц назад +1

      If it was just that, why can't you just adapt the step size in the continuous case? Make it 1.0 if you find that 1 works great in the discrete case. So while I find your analogy fascinating, I'm not quite convinced it holds any water.

    • @leeris19
      @leeris19 Месяц назад +1

      @@unvergebeneid I did not mean for the values of the matrix to be the "actual" step size. What I'm trying to imply is that the movement of the values inside can be associated to stepping. 1.0 will have more steps to consider in order to reach 2.0. Unlike if we use a discrete 1 which will only consider a single step which is 2 in order to reach 2. This is why 1.0 and 1 are different based on my analogy

    • @unvergebeneid
      @unvergebeneid Месяц назад

      @@leeris19 hm okay, I still don't quite get it. I mean, with gradient descent you can _choose_ a step size. Or do you mean the gradient itself being quantised instead of continuous?

    • @strangelaw6384
      @strangelaw6384 Месяц назад

      @@unvergebeneid I think what they mean is that gradient descent would converge faster when applied to a discretized function than a continuous function because there would be fewer local minima?

  •  Месяц назад +4

    This is very interesting for me. I have come to realize the importance of discrete representations from a totaly differrent perspective. As my cognitive science PhD thesis (which i will present a few days later wish me luckk :D ) i was doing experiments with artificial grammar learning and generalization with various architectures. Only memory structures that allow to record to memory as discrete (and adressable) representations allow truly learning of complex grammars.
    Other architectures decrease the error but when you investigate out of distribution data with longer sequence lengths than training, they fail. Only way to achieve this is to learn the rules. And to apply the rules and the store representations, my opinion is that, the feasible and learnable way is to have discrete representations that are not vaugely meshed into each other but those can be distincly and seperately recallable.
    IMHO this open ups the possibilities of neurosymbolic processing which is a important part of our syntactic and higer level cognitive cababilities.
    For sensory processing parallel processing and non discere representations is OK and works well. When it comes to higher level cognitive stuff we need to process sequentially step by step. Almost like a computer doing operations one at a time. And We have to be able to apply operations in a instruction-like way with discete representations. So thay they are basically variables for the brain.

    •  Месяц назад

      A little disclaimer I might have a slightly diferrent definition of what discrete representation is but still interesting :)

  • @ryanstout22
    @ryanstout22 Месяц назад +2

    Great work! I remember looking for papers a few years ago to explain why discrete representations outperformed continuous in many situations. There wasn't much research on it at the time. (I did a few experiments and came to the same conclusion that it enabled a model with less parameters to learn more efficiently, but if you weren't parameter constrained they would both get to the same point) Looking forward to reading the paper!

  • @joshuasonnen5982
    @joshuasonnen5982 Месяц назад +1

    THANK YOU! You explained model-based RL than anyone else! I've been trying to learn this for a month

  • @AIShipped
    @AIShipped Месяц назад +1

    Please make more of this type of video on this type of subject. One of the first times I was truly interested in a video

  • @keshamix_
    @keshamix_ 25 дней назад

    Simply wow! An amazing video. I’m not really in the field, not even a uni student yet, just curious about technology, but I feel like I understood most of what was mentioned there! And it only proves the point that you are an amazing narrator, good job!

  • @jordynfinity
    @jordynfinity 9 дней назад

    I just subscribed because I'd love to hear more about your RL research.

  • @denizcanbay6312
    @denizcanbay6312 Месяц назад +3

    Feels like curse of dimensionality, less numbers to tweak on the discrete so it's faster to approach to the model

  • @ElijahGalahad
    @ElijahGalahad Месяц назад

    Thanks for sharing such an interesting exploration of a few problems that also intrigued me. I quite like the part that you mentioned about "you never know the secret sauce of xxx" where xxx refers to some great/famous research. Thanks for your endeavour in finding out "the secret sauce"!

  • @OscarTheStrategist
    @OscarTheStrategist Месяц назад +1

    Sort of like real life :)
    Thanks for the vid. Wonderfully explained!

  • @ivandeneriev7500
    @ivandeneriev7500 Месяц назад +2

    This seems like such a simple question:
    Continuous models can be more rich than discrete ones but need more data to work

  • @feifeizhang7757
    @feifeizhang7757 23 дня назад

    What a great topic! It is what I need to listen ❤

  • @apeguy21
    @apeguy21 Месяц назад +2

    Hey, first of all congrats on the paper and the content is very interesting.
    Just wanted to say at the end that I noticed a slight typo in your plot titles regarding sparsity where both plots have "Sparisty" instead of "Sparsity". Not a big deal but just thought I would mention it since I noticed it. Great work.

  • @fiNitEarth
    @fiNitEarth Месяц назад +4

    What a fascinating topic! I’m finishing my masters in statistics and data science in a year and I’m thinking about perusing a phd in representation learning and reinforcement learning! What universities do research in this field?

  • @akkokagari7255
    @akkokagari7255 Месяц назад

    THANK YOU SO MUCH
    I figured it probably existed but I never new the concept was called "Model-based reinforcement learning"
    cool video

  • @CC-1.
    @CC-1. Месяц назад +1

    It’s similar to using maximum and minimum functions in mathematics for various tasks ( if your counter and button that increases the counter are offset, you can use the maximum value of the counter to get the result). Instead of an offset, you might use clamping effects, where values are restricted to 0 or 1 rather than more precise values. Given that the environment may introduce noise, especially for modal values, it could be easier to obtain values, though it might be coincidental. Additionally, editing fewer values is typically easier than editing many. While continuous adjustments can lead to better learning over time, it takes longer due to the complexity of optimizing many digits.
    here by noise I mean Pesudo Noise possibly

  • @CppExpedition
    @CppExpedition Месяц назад +1

    in this video i have learnt that subscribing to Edan Meyer wourld give me the reward i expect

  • @LatelierdArmand
    @LatelierdArmand 10 дней назад

    super interesting!! thanks for making a video about it

  • @richardbloemenkamp8532
    @richardbloemenkamp8532 Месяц назад

    Great work, great approach, great video. Very intriguing where the difference really originates from. I always like mini-grid approaches.

  • @MaxGuides
    @MaxGuides Месяц назад +1

    Excellent work, love your simple explanations.

  • @bobsmithy3103
    @bobsmithy3103 Месяц назад +1

    hmm, very cool research. I enjoyed your explanations and graphics, made it really easy to understand even with only cursory ml knowledge. I'll have to keep in mind to test discrete representations in the future.

  • @pauljones9150
    @pauljones9150 Месяц назад +1

    Very cool! Love the pencil on paper style

  • @coopercoldwell
    @coopercoldwell Месяц назад

    This reminds me of Chris Olah’s “Toy Models of Superposition” paper, where the claim is made that sparsity increases a model’s propensity to “superimpose” features to represent them more compactly, which makes the model behave like a much larger model.

  • @eliasf.fyksen5838
    @eliasf.fyksen5838 Месяц назад +2

    This was very interesting, although I would love to hear your take on how stochastisity in the environment might change this. My initial thought is that when there is stochastisity, a perfect continuous world-model is impossible (or at least very hard to learn, as you would need to learn a continuous distribution), so it will attempt to do average predictions of next states, so over time it will "blead" out into a blurry state. However, discrete representations have less of this problem since you can easily sample the distribution for the next state. This seems to be an inherent advantage of discrete latents when simulating trajectories IMO, but I might be wrong... :P

    • @EdanMeyer
      @EdanMeyer  Месяц назад

      This is a great thought! It's the same thing I thought of when I started looking into discrete representations, and why I thought they may be great for stochastic environments. As it turns out, there is still a problem because in a world model, each of the discrete state variables are dependent on each other (unless you explicitly learn them in a way that this is not the case, which is very hard). If you sample them independently that leads to problems.

    • @eliasf.fyksen5838
      @eliasf.fyksen5838 Месяц назад

      @@EdanMeyer So cool that someone else seems to be thinking along the same lines. I've thought about this before, and it has always appeared to me that this is what one of the primary purpose of the KL divergence term in the dreamer architecture has been: to attempt to make the different one hot vectors in the state independent of each other, as this would maximise the predictability of each distribution independently

  • @rockapedra1130
    @rockapedra1130 Месяц назад +1

    Very cool and clear video. Loved it!

  • @Marksman560
    @Marksman560 Месяц назад

    Quality work dude! Good luck on your future endeavors ♥

  • @SirajFlorida
    @SirajFlorida 17 дней назад

    Great work man. 🙌

  • @AnalKumar02
    @AnalKumar02 Месяц назад

    This is an excellent video showcasing your research. I wish more people make such videos of their papers (I know I am going to once my paper is published).

  • @BlissfulBasilisk
    @BlissfulBasilisk Месяц назад

    Super cool! can’t wait to check out the paper to get a better understanding. I’ve been reading anthropics work on feature analysis with SAEs for transformer interpretability, and this has so many similarities!

  • @h.h.c466
    @h.h.c466 Месяц назад +1

    It's fascinating how concepts from RE learning can mirror strategies used by successful leaders and executives, isn't it? VAEs in the world of deep learning === effective leaders also excel at balancing detail with abstraction or it may be they shrink the world model to fit their stage :-)

  • @deror007
    @deror007 Месяц назад +1

    I find this inspiring. Good Work!

  • @chrisschrumm6467
    @chrisschrumm6467 Месяц назад

    Very interesting. Moving RL to the real world is difficult but I feel like this research moves us one step closer toward that objective.

  • @albertwang5974
    @albertwang5974 Месяц назад +1

    Yes, for some small scale cases, you can do better with Discrete representations, but for most of cases, Continuous representations will win since it can represent more states by consuming the same computing resource!

    • @EdanMeyer
      @EdanMeyer  Месяц назад

      Not necessarily. Even just with 32 discrete variables, each with 32 possible values you can represent 32^32 = 10^48 possible states. That's kind of the point of the work, it's not clear that continuous is always better.

  • @ladyravendale1
    @ladyravendale1 Месяц назад

    Very interesting video, I enjoyed it a ton

  • @AleixSalvador
    @AleixSalvador Месяц назад

    It may be that, since the set of possible encoders for the discrete case is much smaller (36^n vs 42000000000^n) the optimization algorithms run much faster. In any case, great video and amazing work.

  • @ninjalutador8761
    @ninjalutador8761 Месяц назад

    If ur still doing research, it could be potentially interesting to incorporate the findings from the paper about Mamba since that they solved the complexity and vanishing/exploding gradient problem with rnn/rcnns. Maybe either ppo or the world model could perform better if they had long term memory.

  • @greg7633
    @greg7633 Месяц назад +1

    Really great work!🚀

  • @yensteel
    @yensteel 28 дней назад

    Never imagine how critical Sudoku is to Reinforcement Learning!

  • @username.9421
    @username.9421 28 дней назад +1

    I have thought the same question you posed in the Research Paper before, and wondered if the performance of 1-hot discrete Representations might arise from the smaller distribution of Inputs for the Actor.
    Have you thought about seeing what happens if you were to normalise the continuous Autoencoder‘s Output to Length 1 ? This would allow the Autoencoder to utilise the continuous space of each dismension, whilst also reducing variation in the actual scalars when two latent vectors are similar.

  • @Talec-7
    @Talec-7 Месяц назад

    I don't know much but I think it not because of a lack of model capacity, but is because integer representation fits more neatly into the binary environment, with continuous representation small errors compound, but with integers the errors don't compound because they are not off by 0.01 they are exactly 0 or 1, as you add more capacity, they are able to over fit and ignore this integer benefit.

  • @emmettraymond8058
    @emmettraymond8058 Месяц назад

    I haven't finished the video, but I'd like to hazard a preemptive guess for why discrete representations are better: decisiveness and cleaner categorization. The model might more cleanly 'switch modes' given fewer, clearer options, and if a neuron has to set a threshold for specific behavior, it's probably easier with less information to sift through.

  • @raimondomancinelli2654
    @raimondomancinelli2654 Месяц назад +1

    great video ! I would like to read your thesis !

  • @rrogerx92III
    @rrogerx92III 12 дней назад

    Great video!!

  • @VincentKun
    @VincentKun Месяц назад +1

    This video made my policy get better rewards when I think to enroll in a PhD.

  • @____2080_____
    @____2080_____ Месяц назад

    8:05 I think your research here is on something.
    If you apply this to humans, we have a discrete way of learning. We do task without worrying about how we’re breathing. We don’t try to boil the ocean with our consciousness, trying to have this ultimate omnipresence awareness. We are so discreet, we don’t listen to love ones talking .
    You’ve essentially eliminated the continuous models, trying to boil the auction. If it’s smelly a basic task of getting the key to open the door, you’ve helped it focused the continuous model trying to do the same thing is noticing the texture of spinning electrons it can perceive every step it makes in the universe and is busy trying to count them as is trying to open the door and find a key
    That is a better analogy

  • @ALPAGUN77
    @ALPAGUN77 Месяц назад +1

    Super interesting stuff and awesome presentation!
    Two questions:
    1. Are you using nominal (unordered) or ordinal (ordered) states in your discrete representation?
    2. Did you play around with the number of discrete states in the encoding?
    My hypothesis would be that the discrete approach should approach the continuous one with increasing size of the latent space (for ordinal encoding that is)

    • @EdanMeyer
      @EdanMeyer  Месяц назад

      1. Ordinal
      2. Yes, increasing the size of the latent space does not necessarily help. There is a sweet spot.

  • @swannschilling474
    @swannschilling474 Месяц назад

    Thanks a lot for this one!!

  • @younesselbrag
    @younesselbrag Месяц назад

    Thank you for sharing such educational content!!
    I would like to see more about RL ( self-Play ) for LLM agent how’s merged

  • @byrnesy924
    @byrnesy924 Месяц назад

    wow, incredible video. I’ve got a hobby project implementing a DQN reinforcement learning for a task (going poorly haha), would be fascinated to compare to the architecture in your work

  • @Jack-tz7wj
    @Jack-tz7wj Месяц назад +1

    The comparison between discrete and continuous learning rates reminds me a lot of the work done on comparing the kinetics of chemical reactions. I'm wondering if there is an equivalence where certain problems/world modelling that are "thermodynamically" gated or "kinetically" gated which would map onto discrete and continuous methodology.

  • @JohnDlugosz
    @JohnDlugosz Месяц назад

    I have to point out that the concrete representation of a floating-point variable in the computer is not actually continuous. It will have the same number of discrete states as the integer of the same length (fewer actually due to reserved states), just spread out over a larger domain. It's actually a difference of _linear_ vs _log_ values.
    The floating-point representation might not have enough integers packed around 0. As soon as you exceed 2^24, it starts to *round off* and you lose exact representation. In contrast, the signed integer representation will go up to 2^31 without loss.

  • @noahchristensen3718
    @noahchristensen3718 25 дней назад

    Just watched this, and I love the visuals. I would really like to know more about A.I. and their use in pattern recognition, because I have a lot of data that I find strenuous to analyze. Some patterns I can identify, but unfortunately my small brain doesn't have the recognition skills to understand the interplay between them. I'd like to train or build a model which will coherently describe where they are coming from. If OP reads this, I would love any information that will bring me closer to understanding.

  • @niceshotapps1233
    @niceshotapps1233 10 дней назад

    c) discrete representation better represents the world because the world itself (that you run your experiments on) is discrete
    basically you introduce outside a priori knowledge about your problem into your solution through its architecture
    continuous representation has one more thing to learn about your toy world, that it's discrete ... possibly that's why it takes longer

  • @calcs001
    @calcs001 29 дней назад

    fantastic! 🎉

  • @cmilkau
    @cmilkau Месяц назад

    For descriptions of the real world, we like to use discrete AND continuous variables.

  • @rukascool
    @rukascool Месяц назад +1

    easy sub, good stuff :)

  • @eveeeon341
    @eveeeon341 Месяц назад

    This immediately makes me think of how human brains "prune" memories, while we don't have a good understanding of the brain, it's thought that sleep helps consolidate memories and knowledge by getting rid of less important information, and solidifying the key points. This feels very akin to the kind of stricter representation of a discrete model, in that it just throws away noise and ignores nuance that continuous models may capture.

  • @R.GrantD
    @R.GrantD 26 дней назад

    It seems like it would be worth attempting to use a combination of discreet and continuous in real world applications. Also, It may be that the better choice depends on which representation more naturally reflects the world being modeled.

  • @tuongnguyen9391
    @tuongnguyen9391 Месяц назад +4

    Can you make a youtube course on your own research ?

    • @EdanMeyer
      @EdanMeyer  Месяц назад +1

      I'll maybe consider making a course on how to do research after I've done a bit more research myself lol

    • @tuongnguyen9391
      @tuongnguyen9391 Месяц назад

      @@EdanMeyer a primer on how RL work in josh starmer explanations style might get more view and create more impact. What frustrated engineering forks the most is having to explain all these academic stuff to product manager who have no technical background.

  • @hanyanglee9018
    @hanyanglee9018 Месяц назад

    The magic comes from the noise.

  • @WernerBeroux
    @WernerBeroux 28 дней назад

    Most LLM also try reducing quantisation whilst keeping the scale as opposed to reducing the scale but keeping the quants. The (simple) extreme of that being binary. Would be interesting to see if binary is the ideal for learning speed compared to say ternary.

  • @michaela.delacruzortiz7976
    @michaela.delacruzortiz7976 Месяц назад

    0s and 1s are nice to manage and monitor. The code algorithms to mangle (normalize and denormalize) 0s and 1s is less daunting and sometimes more efficient. Boiling data down to 0s and 1s takes a decent amount of work but it's worth it because the AI can grab a stronger hold on the data on less time of training or at least it feels that way. 0s and 1s are fairly precise while floating points values are less precise and though this might not be a problem for AI, it becomes kind of nasty sometimes trying yo monitor the AI or system of neural networks when training and validating. Until AGI comes in the data gathering, training and validation process is very involved and the simpler the numbers can be the better. Even when AGI comes in the challenge will be how to controll its intake and output of misinformation and what have you stuff like that. Discrete representation is kind of cool because it still works with simpler values relatively speaking but it goes a layer of high abstraction further which makes it so that the ai can dumb down its output for us to see and understand on more complicated binary matrices. Better monitoring. I can also see it as being a nice way to normalize data to but then the AI has to be programed in a way where it knows that the values being trained on are just integers and not floating point values that just happen to be rounded very neatly to whole values so that it doesn't overthink itself to taking too long to train on relatively cookie cutter data because then that defeats the point of normalization.

  • @robodoctor
    @robodoctor 29 дней назад

    This is pretty interesting. You've earned a subscriber :) Thanks for making this into a video and sharing with the world!
    I had a question- could it be possible that policy learning from discrete representations is better because discrete representation learning encodes environment dynamics faster than continuous representation learning? One way to verify this is to plot the autoencoder loss for discrete and continuous representations.

  • @HaroldSchranz
    @HaroldSchranz 9 дней назад

    Cool video on a field which I am curious enough about to want to dabble in. So I am just wondering. Based on research on nonlinear models in a totally different field: sometimes less is more (manageable and faster to converge). Dimensionality and parameter count does not need to be truly high to capture the essence of a model. Of course efficiency of the approach used will affect the effort required - kind of similar to how different methods of numerical integration: quadrature or Monte Carlo can require adaptive and even importance sampling and course graining.

  • @jakeaustria5445
    @jakeaustria5445 Месяц назад

    Really cool. I tried creating discrete neural networks myself hehe, but I got only a little progress.

  • @shouvikdey7078
    @shouvikdey7078 Месяц назад +1

    want to see more RL stuffs.

    • @EdanMeyer
      @EdanMeyer  Месяц назад +1

      There will be more

  • @phillipmorgankinney881
    @phillipmorgankinney881 7 дней назад

    Really appreciate u making this video, I love auto encoders.
    Could the higher rate of learning on the discrete model be because the total space of representations on discrete models are dramatically smaller? If you imagine latent space modeling as some map of all possible representations to explore looking for the best possible representation, then a space where every vector is of N x 32 is much much smaller world than a space where every vector is N x 42,000,000,000. I imagine local optima are easier to stumble into
    It's like if you were looking at an image, 32px x 32px, on your 1080p monitor. In a flash you see it's an elephant, and you can find its trunk. But f that same picture was 42Billion x 42Billion, displayed at true scale on your 1080p monitor, and someone asked you to find the elephant trunk... you're just too close to the elephant. You'll be doing a lot of scrolling around, following clues until you find what you're looking for

  • @BooleanDisorder
    @BooleanDisorder Месяц назад

    It kinda feels like we're prodding the very nature of knowledge and behavior.
    I look forward to 1.58-bit byte-based multimodal models.

  • @01174755
    @01174755 10 дней назад

    I was reading through the paper, and there's a particular sentence that I'm finding a bit challenging to fully grasp: 'Demonstrating that the successes of discrete representations are likely attributable to the choice of one-hot encoding rather than the “discreteness” of the representations themselves.'
    I was under the impression that one-hot encoding is a form of discreteness. If one-hot encoding is effective, wouldn't that imply that the 'discreteness' aspect also plays a role?

  • @markdatton1348
    @markdatton1348 28 дней назад

    In order to represent the world better, could you make a relatively small but very high dimensional tensor network, to represent sort of "tiers" of outcomes? For example, one dimension of the tensor may represent how happy a result should be, ranging from unhappy to happy. Or angry. Etc. In that way, you could modify the interpretation of the world via dimensionality rather than pure scale?

  • @vslaykovsky
    @vslaykovsky Месяц назад

    Edan, this is a great discovery! I have two questions:
    - Do you think quantized models have some traits of discrete representation models and therefore perform better than continuous models of the same size?
    - Could discrete representations act as a bridge between continuous models and neurosymbolic models?

  • @duttaoindril
    @duttaoindril Месяц назад

    If I understand this correctly, representation learning isn't hands down better than continuous learning, but it is better in specific scenarios, especially with high state dimensionalities requiring high adaptability.
    I'd be curious to see if there was an inbetween of discrete and continuous, maybe where you use both side by side in the world modeling, to get the effects of both for learning?

    • @EdanMeyer
      @EdanMeyer  Месяц назад

      We do use a sort of inbetween in the paper called Fuzzy Tiling Activations. And as for when one is better, that goes beyond the scope of what we looked at in the paper, but it's an important question.

  • @corgirun7892
    @corgirun7892 Месяц назад

    nice work

  • @VictorGallagherCarvings
    @VictorGallagherCarvings Месяц назад +1

    Ok, I have another paper to read.

  • @judepuddicombe8748
    @judepuddicombe8748 Месяц назад

    Maybe snapping to discreet representations means the prediction accumulates less noise?

  • @relapse7545
    @relapse7545 Месяц назад +1

    Amazing video as always!
    I have a follow up question I'd like to ask. How does the gap between discrete and continuous representation change when learning the policy for a VQ-VAE that has a larger embedding table, i.e. larger than the 32 you mentioned?

    • @omrisharon
      @omrisharon Месяц назад

      It is equivalent to having more capacity. He took the number 32 because in Dreamer V3 they uses 32 vector with length of 32 for world representation.

    • @EdanMeyer
      @EdanMeyer  Месяц назад +1

      There is an ablation on this in the appendix of the paper. There is a sweet spot for each task, and 32 was the sweet spot here. More or less than 32 performs worse. On other tasks it would be different.

  • @marcomonti5758
    @marcomonti5758 Месяц назад

    cool video!

  • @acyclone
    @acyclone Месяц назад

    Great subject to research! I’ll check out your paper soon. Doesn’t the discrete method prevail more often due to the fact that the step size (while learning) is bigger by default?

  • @ElijahGalahad
    @ElijahGalahad Месяц назад

    My takeaway: discrete latent space models more accurately with less capacity, and enables faster adaptation of policy learned on it.
    If would be better if you shed more light on why discrete latent space models have such benefits or provide some links? Thanks!

  • @andydataguy
    @andydataguy Месяц назад

    The blinding paper at 13:24 gave my pupils 3rd degree burns

  • @ilyboc
    @ilyboc Месяц назад

    Congratulations!

  • @grapesurgeon
    @grapesurgeon Месяц назад

    Congrats!

  • @serkhetreo2489
    @serkhetreo2489 Месяц назад

    Congratulations

  • @CharlesVanNoland
    @CharlesVanNoland Месяц назад +2

    Where do I click the Policy button to subscribe???

  • @lucar6897
    @lucar6897 Месяц назад +1

    In the graph shown at 13:07, the discrete representation learns faster than the continuous representation even before changing the layout of the puzzle. Is this because we are still in the early stages of learning (eg time step 1 in the graph shown at 11:08)? If so, does the discrete representation still learn faster if you increase the time the model has before changing the puzzle (eg let it get up to time step 4-5 where continuous representation performed better in the graph at 11:08)?

  • @arngorf
    @arngorf Месяц назад +1

    As a computer scientist (and PhD) focusing on numerical methods, image analysis, etc. I, initially, was a bit dissatisfied by how the ML field seemingly did not understand why stuff worked the way it did, but just bashed more data against larger models, using the 572th attempted model architecture. It is nice to see that the past 5 years has been increasingly spent trying to understand better what happens behind the big pile of linear algebra. Your research is among some of the most interesting in the field.
    When I first saw the Dreamer v3 network and it's discrete world representation, I too was a bit confused, because it seems to defy logic. Surely, a continuous space can represent more, so it must be more powerful. Well, as has been the case many times, there are many moving parts in a neural network, and intuition might be right on one place, but a competing factor outweighs it, making it not dominant in that one space you are working, so you need to look elsewhere for improvement.
    Similarly, larger networks is not always better, because your loss function becomes harder to optimize, forcing you to also change the model architecture, both to better model a particular problem, but also to improve the performance of the optimizer by modifying the loss landscape.
    It's nice to be able to look at the ML field now, and see interesting results as something we have yet to understand. It is now more like a gold rush, than just a group of simpletons throwing linear algebra at a ton of internet-scraped data.
    Nice work sir!