don't lru_cache methods! (intermediate) anthony explains

Поделиться
HTML-код
  • Опубликовано: 11 янв 2022
  • today I show a common pitfall with `lru_cache` and how it will almost always be a memory leak if used on a method!
    - what is lru_cache: • python: functools.lru_...
    - what is a decorator: • python @decorators - (...
    - pytest lru_cache performance regression: • pathlib is slow! false...
    playlist: • anthony explains
    ==========
    twitch: / anthonywritescode
    dicsord: / discord
    twitter: / codewithanthony
    github: github.com/asottile
    stream github: github.com/anthonywritescode
    I won't ask for subscriptions / likes / comments in videos but it really helps the channel. If you have any suggestions or things you'd like to see please comment below!
  • НаукаНаука

Комментарии • 65

  • @ponysmallhorse
    @ponysmallhorse 2 года назад +17

    THANK YOU!!!! Found a memory leak in very old script.

  • @magnuscarlsson6785
    @magnuscarlsson6785 2 года назад +11

    Another great video!
    And a special thanks for showing why the unexpected things happens, like how the _ keeps the garbage collector away.
    I had forgotten about this when viewing so I was right there with you ;-)

    • @anthonywritescode
      @anthonywritescode  2 года назад +3

      yep -- I have another video about that as well: ruclips.net/video/VKz1aQbNnyI/видео.html

  • @kvetter
    @kvetter Год назад +12

    Your example has another problem with @cache on a method: if you change the value of self.y, then the cached value will be incorrect.

  • @cmyuii
    @cmyuii 2 года назад +5

    wow that's a sneaky one - simple but something i hadn't considered - cheers for fixing my code yet again!

  • @sparkyb6
    @sparkyb6 2 года назад +6

    At the end when you mentioned creating an object pool and doing some magic in __new__, I wondered whether I could also just stick a lru_cache on __new__ to do that. It worked, but I had to move the initialization also inside __new__, because if I just call the superclass __new__ and leave initialization to __init__, even though __new__ will return the same object each time (for the same y), it will re-call __init__ on it and replace that inner lru_cache (self.compute). Just thought that was interesting.

    • @sparkyb6
      @sparkyb6 2 года назад +2

      apologies for what RUclips did to my underscores

    • @anthonywritescode
      @anthonywritescode  2 года назад +1

      lol yeah youtube really hates underscores. the `__new__` / `__init__` thing is kind of annoying, I haven't found a nice way to work around it yet :(

    • @yehoshualevine
      @yehoshualevine Год назад

      @@anthonywritescode ___init___ with triple underscore to cause youtube to print two (and understand the 3rd as markdown)

  • @jbrnds
    @jbrnds 2 года назад +2

    So factoring out the compute function as a seperate function `compute(y, x)` and decorate that with the lru_cache will work correctly and speed up even across instances. The `C.compute(x)` method will just return the `compute(self.y, x)`.

    • @anthonywritescode
      @anthonywritescode  2 года назад +2

      yes, that is precisely what I said in the video

    • @jbrnds
      @jbrnds 2 года назад +2

      @@anthonywritescode perfect! Just wanted to be sure if i understood correctly. Thanks for the great videos you are always making and your humble energy. You are a great explainer and i use your maintained projects daily. A deep bow.

  • @tobb10001
    @tobb10001 2 года назад +5

    Another solution would be to create a static function to do the computation and put @lru_cache on that. So the actual method would only pass the needed values to the cached function instead of the whole object.
    Would allow sharing cache between multiple instances, but would remove the ability to modify the object.
    Or am I missing something completely here? 😅

    • @anthonywritescode
      @anthonywritescode  2 года назад

      yep that would work -- and is one of the alternatives I outlined in the video

    • @tobb10001
      @tobb10001 2 года назад

      @@anthonywritescode than I must have missed it, my bad. 😄

    • @anthonywritescode
      @anthonywritescode  2 года назад

      heh yeah the subtlety being that a static class function and a module function aren't really any different :)

  • @rdean150
    @rdean150 Год назад +4

    Clever solution to assign the function as an instance variable in the __init__. I usually just take the other approach you describe - define the decorated function at module level and have the instance method call it with the relevant attribute values. Your other approach is clever but given the shared nature of the module-level function solution, it seems the simplest solution is probably still the way to go. Not to mention that it helps ensure my primarily-dotnet-coding teammates can still understand what the code is doing. *sigh*
    Anyway, thanks for the good tip. I appreciate that you cover advanced Python topics. Most coding channels seem to cater primarily to beginners, which makes sense but is disappointing for the folks with some years under their belt already.
    Cheers!

    • @anthonywritescode
      @anthonywritescode  Год назад +2

      the problem with your approach is the cached value outlives the lifetime of the object instance -- there's a very specific reason to make it instance-cached

    • @rdean150
      @rdean150 Год назад

      @@anthonywritescode But if the cache keys depend only on simple values, the fact that the cached value outlives whatever object may have originally requested it is not a problem at all. Just because object_a requested a value originally, it does not mean that cached value is unique to object_a or that object_b cannot request the same value even after object_a has been garbage collected. As long as you are requesting the values via
      memoizedfunc(object_a.val1, val2)
      rather than
      memoizedfunc(object_a, val2)
      Then the lifespan of object_a is not particularly relevant.
      If the values ARE unique to the individual object that requested it, then yeah it doesn't make sense to use a shared module-level cache, as there will never actually be any sharing of the values that cache contains, and it may end up evicting values even while they are still needed if you set a maxsize on the cache. Which it sounds like may have been the situation in your use-case and definitely a consideration when applying these principles. So, fair point!

    • @anthonywritescode
      @anthonywritescode  Год назад +1

      your situation sounds more like it shouldn't have been a method to begin with

    • @rdean150
      @rdean150 Год назад

      @@anthonywritescode Lol yeah that's true, and I didn't write it as such. But given the simplicity of the caching decorators, it is certainly easy to imagine people just slapping the decorator on a method anyway without thinking for very long about it. After all, if they had thought about it much, they would have recognized the reference count implications of self being used in a cache key also. Assuming that they understood how the decorators work, of course. Which may or may not be more of a stretch than the developer thinking about whether the cached values could/should be used by other instances. TBH I'm not sure which is more likely.

  • @alexandreboisselet8336
    @alexandreboisselet8336 2 года назад +4

    Thank you for the great explanation 🙏
    How well does caching work when the compute has **kwargs?

    • @anthonywritescode
      @anthonywritescode  2 года назад +3

      it caches using the name and value of each named argument: github.com/python/cpython/blob/8c49d057bf8618208d4ed67c9caecbfa71f7a2d0/Lib/functools.py#L462-L470

  • @kramstyles
    @kramstyles 3 месяца назад +1

    How on earth does someone know so much? How can I attain this level of expertise?

  • @siddsp02
    @siddsp02 2 года назад +4

    Why not use weakref in this case, and construct a caching function using weakdicts?

    • @anthonywritescode
      @anthonywritescode  2 года назад

      you certainly could -- but many objects are not weak referenceable so the caching mechanism wouldn't be that useful

    • @siddsp02
      @siddsp02 2 года назад

      @@anthonywritescode Fair enough. I think standard library specifically includes weak methods (though I haven't read into it), so maybe something could be possible. This was an interesting video!

    • @anthonywritescode
      @anthonywritescode  2 года назад

      you would have to weakly reference all the called _parameters_ -- not the method itself (after all, the tuple of parameters is what's used to make a cache key)

  • @sopidf
    @sopidf 2 года назад +1

    Great video, thank you! Why do you show your keyboard and hands?

    • @anthonywritescode
      @anthonywritescode  2 года назад

      I also stream on twitch and it's fun -- I used to toggle the scene when I'd record for youtube but I'd always forget so I just keep it now

  • @unvergebeneid
    @unvergebeneid 2 года назад +1

    Oof, what a gotcha! I never would've guessed this behavior!

  • @ZephyrNX9
    @ZephyrNX9 2 года назад +1

    So these classes would get garbage collected at the end of the program? Or would it memory leak even after Python exits?

    • @anthonywritescode
      @anthonywritescode  2 года назад +1

      you can't really leak memory outside of your program -- when the program ends the memory space is torn down

  • @lord_toad
    @lord_toad 10 месяцев назад

    No one ever uses indefinite caching.. It's like saying don't use loops "while 1" because they'll run forever.. duuuh

  • @Tyokok
    @Tyokok 2 года назад

    Thanks a lot for great video!

  • @xan1716
    @xan1716 2 года назад

    I was thinking an exception to this rule might be cached classmethods. Unlikely to blow up the memory since classes typically are created at parse time, right?

    • @xan1716
      @xan1716 2 года назад

      and they'll probably not be garbage collected till the end of the program, anyways (unless a class is defined in a closure, or something)

    • @anthonywritescode
      @anthonywritescode  2 года назад

      they are still descriptors so you're going to get the instance passed through them (if it's called on the instance) and that's what'll get cached

    • @xan1716
      @xan1716 2 года назад

      ​@@anthonywritescode woah -- that had not occured to me! tricky, tricky stuff.. :)

  • @RoyAAD
    @RoyAAD 3 дня назад

    Was this fixed with @cache in 3.12? Cause it seems not to have a maxsize argument.

    • @anthonywritescode
      @anthonywritescode  3 дня назад +1

      cache is just a shorthand for maxsize=None

    • @RoyAAD
      @RoyAAD 3 дня назад

      Is this a problem also for functions?
      And did you do the sequel to solve the each instance cache? If yes can you put the link in the description please?

    • @anthonywritescode
      @anthonywritescode  3 дня назад

      I would hope the first part is pretty clear from the video explaining _why_ this is a problem (and why or why not that applies to plain functions). I didn't follow up with that but it basically involves calling lru_cache in `__init__`

    • @RoyAAD
      @RoyAAD 3 дня назад

      ​@@anthonywritescode Yes. Your videos are one of the best on python. I always learn something new.

  • @abdelghafourfid8216
    @abdelghafourfid8216 Год назад

    Why the underscore variable did not get re-assingned to None ?

  • @sadhlife
    @sadhlife 2 года назад

    Instead of __new__ I'd probably use a class decorator with another lru cache:
    from functools import cache
    def classcache(cls):
    @cache
    def wrapper(*a):
    return cls(*a)
    return wrapper
    @classcache
    class C:
    def __init__(self, x):
    self.x = x
    print("made new")
    print(C(1))
    print(C(1))

  • @arnoldwolfstein
    @arnoldwolfstein 2 года назад +1

    Thanks for the video Anthony. I'm just wondering; whether you're using Ubuntu in VM or on a host (main or dual boot)

    • @arnoldwolfstein
      @arnoldwolfstein 2 года назад

      Probably in a VM, I just saw your VM video.

    • @anthonywritescode
      @anthonywritescode  2 года назад +1

      on this machine (and most of the things I actively develop on) I'm in a VM -- though I did have a dual booted macbook at my last job

    • @arnoldwolfstein
      @arnoldwolfstein 2 года назад

      @@anthonywritescode Great; I'm on a similar situation; running in a VM or dualboot on a macos host. According to your experience -which I can totally count on :), which option will you prefer?

    • @arnoldwolfstein
      @arnoldwolfstein 2 года назад

      I mean for most cases you are using VM as you said, but do you see any performance difference?

    • @anthonywritescode
      @anthonywritescode  2 года назад

      getting linux to run on a modern mac is a ton of work -- a VM is much much easier. as for performance, most of the difference is in io as that has to be virtualized -- but the cpu usage usually has direct hardware support and doesn't really suffer from being in a VM -- this is the steps I used last time I dual booted, but that was back in 2015: github.com/asottile/scratch/wiki/Ubuntu-on-MBP

  • @OrCarmi
    @OrCarmi 2 года назад +2

    Great video! This is a pretty big gotcha, I'd expect a warning about this in python docs

    • @petertillemans2231
      @petertillemans2231 Год назад

      There is : this is from the docs :
      > In general, the LRU cache should only be used when you want to reuse previously computed values. Accordingly, it
      > doesn’t make sense to cache functions with side-effects, functions that need to create distinct mutable objects on each
      > call, or impure functions such as time() or random().

  • @StephenBuergler
    @StephenBuergler Год назад

    Does python have weak references? If it did would it help here?

    • @anthonywritescode
      @anthonywritescode  Год назад +1

      only a small number of things are weak referencable (and with significant overhead). strings for example aren't

  • @wexwexexort
    @wexwexexort Год назад

    Fantastic!

  • @NoProblem76
    @NoProblem76 11 месяцев назад

    oh no memory leak

  • @average_random_ant985
    @average_random_ant985 2 года назад +1

    You forgot to say about cached_property decorator. It solves all the provided issues. And it is the simplest solution.

    • @anthonywritescode
      @anthonywritescode  2 года назад +4

      cached_property does not help because the function takes a parameter

  • @akshaymestry971
    @akshaymestry971 2 года назад

    USEFUL GEM... 💠