ONE TERABYTE of RAM saved with a single line of code (advanced) anthony explains

anthonywritescode

Просмотров 54 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 3 июл 2024
today I show off a small change I made at work with huge impact and explain how it works!
- fork vs spawn • multiprocessing: fork(...
- python sucks at copy-on-write • python sucks at copy-o...
playlist: • anthony explains
==========
twitch: / anthonywritescode
dicsord: / discord
twitter: / codewithanthony
github: github.com/asottile
stream github: github.com/anthonywritescode
I won't ask for subscriptions / likes / comments in videos but it really helps the channel. If you have any suggestions or things you'd like to see please comment below!
Наука

Комментарии • 78

@htol78 5 месяцев назад ⁺⁴⁶
The thing i would really enjoy is troubleshooting process which lead to this solution.
@anthonywritescode 5 месяцев назад ⁺²⁵
you'll want to check out next week's video then :)
@avapsilver 5 месяцев назад ⁺⁴⁷
i work at datadog and its so cool seeing you use it and visualize everything nicely!!
@Slangs 5 месяцев назад ⁺²
It will always feel good to see our product being used in the wild even when working for major companies, great job guys, amazing product
@myalpaca5 5 месяцев назад ⁺⁴⁰
How do you locate the position in the code where optimization is possible? Do you learn about gc.freeze() somewhere else first and then realize it could be used in the project? Or you notice there is high memory usage for the services and then actively looking for potential solutions and encounter gc.freeze()?
@anthonywritescode 5 месяцев назад ⁺²³
it depends on the framework and how things are set up. usually you want it as late in the parent process before forking as possible.
I've known about this particular function for a while (even made a video on it a year or so ago). I'm currently trying to upgrade python and was hunting for a memory leak and decided to try this out for fun (and profit). had some success with this and similar approaches at previous employers
@lucaalberigo6302 5 месяцев назад ⁺⁷
For me to locate a problem usually it is a mix of debugging, experience(checking known bottlenecks for your application, example: access to disk, API interactions, parsing of big data sources, DB queries), and bench-marking; running operations containing different data to evaluate response times. You follow the data step by step until usually you hit a performance drop on a specific function(rarely your hole chain of calls is equally slow in all parts ).
The whole optimization process usually goes like this: optimization is needed for a certain piece of code, because is too slow/resource consuming; we analyze the code to try to understand the cause of the issue (eg. inefficient algorithm, too much memory used, slow operation because of too many api/database requests.. ). We first try to just make the code better, . If not sufficient then we try to apply known but maybe more complex optimization methods (if appropriate) like caching, optimizing external interactions. if we are not satisfied we try to find new solutions, by studying existing libraries, or checking if we need to use some new tools or libraries, or even restructure part of the code/ infrastructure.
It is a set of skills that you acquire with study (knowing the industry way to do something) and knowing the tools at your disposal by reading documentation of your libraries; then with time you build a set of solutions, at least for many common problems.
@codeman99-dev 5 месяцев назад ⁺⁴
Talk about some great numbers to add to the resume!
@redcrafterlppa303 5 месяцев назад ⁺³³
Isn't that why you generally avoid fork and use threads instead? All threads live in the same process sharing the heap while having their unique stack.
@JohnZakaria 5 месяцев назад ⁺¹⁴
But python can't run true parallelism when you use threads. Maybe the new subinterpreter might deliver the solution
@redcrafterlppa303 5 месяцев назад ⁺²⁴
@@JohnZakaria I would say that's a design flaw in the language. Just another reason to hate on python 😂
@JohnZakaria 5 месяцев назад ⁺⁷
Python was designed in the time where single core CPUs were the norm.
Yeah it might be a problem now.
Yes they could release python 4 and break everything for that to work, but that's painful for everyone
@wernersmidt3298 5 месяцев назад ⁺⁵
@@JohnZakaria Wasn't there some news that they are going to remove the GIL?
@JohnZakaria 5 месяцев назад ⁺⁵
You're right i forgot about pep 703.
I think it was more for library devs.
The pep by itself wouldn't speedup code.
If I remember correctly it would slow down regular code
@__Brandon__ 5 месяцев назад ⁺¹
great work
@vinitkumar2923 5 месяцев назад ⁺⁸
Could we use in any Django project that uses celery or is it only specific to Sentry?
@anthonywritescode 5 месяцев назад ⁺⁷
should be pretty universally useful, yeah
@ember2081 5 месяцев назад ⁺⁸
you've got to be so proud of yourself jesus
@jlowe_n 4 месяца назад
Hey Anthony - the just found your last few videos and they have been great - I've been using memray cprofile pystack a lot the last year and its good to see how other folks are using it.
One question on gc.freeze() --- I've tried to recreate the standard Python behavior with CoW and fork with a basic example. (load a handful of modules, fork, do some minor calculations, force gc.collect). Examining the shared memory unique memory set in Debian, I don't seem to be able to recreate the issue in trivial cases.
@anthonywritescode 4 месяца назад
it's impossible to tell without seeing your setup
@skreftc 5 месяцев назад ⁺⁵
This is a great video. Could you mention whether you saw a visible change in CPU usage and task latency?
We implemted this at work and we did see a decrease in memory consumption but the CPU increased quite a bit. Which is also seen by some tasks taking twice as much time.
@anthonywritescode 5 месяцев назад ⁺²
our CPU didn't change noticeably, if anything it improved a tiny bit (which is what I expect)
@pieter5466 5 месяцев назад ⁺¹⁴
4:40 Oh this is cool, I really need to learn more about the C implementation underlying Python.
edit: now I wonder how a circular garbage collector works...
@lonterel4704 5 месяцев назад ⁺³
Generation algorithm
@Barteks2x 5 месяцев назад ⁺³
I don't know for sure ow it's implemented in python, but in general a GC works not by deleting stuff that needs to be deleted, and instead attempts to find everything that is referenced, and keeps that (by just traversing the object graph and keeping everything that is reachable)
@throwaway3227 5 месяцев назад ⁺¹
It's not the way Python does it, but Floyd’s Cycle Finding Algorithm is a pretty interesting way of finding circular references.
@Jorge86797 5 месяцев назад
In my work I also noticed 9:25 this.. block algorithm specifics aligned for small objects optimisations.
However I have a need to... optimize if for storing bigger objects. It's.. bytes and str objects with sizes up to 5-10 MB (to be precise - thousands of incoming and outcoming html responses) which as we know - immutable and require.. continuous block of big size to store.
As result of this I have.. strange situation when process have for example total 50 MB of free RAM allocated to process but as It doesn't have free single continuous block with size of 5MB - process asks OS to allocate more RAM so I quickly run out of RAM with a lot of free memory I can't efficiently use.(All things inside single process)
Where or How I can get more detailed info about this? And in what direction I need to route?
@anthonywritescode 5 месяцев назад
try jemalloc perhaps?
@Jorge86797 5 месяцев назад
@@anthonywritescode Thank You for advice. I will try that
@lonterel4704 5 месяцев назад ⁺²
I think you can also do this trick with gunicorn
@anthonywritescode 5 месяцев назад ⁺²
yep! or really any prefork framework
@sepgh2216 5 месяцев назад
Exactly why I came to the comments. Wondering if anyone has tried this on Gunicorn and saw the results.
@australianman8566 2 месяца назад
how did he open paint when he's on ubuntu?
@rkdeshdeepak4131 5 месяцев назад ⁺²
Hey how do you use these windows apps directly on your linux desktop?
@drz1 5 месяцев назад ⁺²
VM
@rkdeshdeepak4131 5 месяцев назад
@@drz1 I know that , how does he make the individual apps appear directly on the linux desktop ? I have seen multiple times, e.g. paints in this video
@kamilogorek 5 месяцев назад
This is not linux desktop. It's Windows with Linux VM in fullscreen mode, so he can simply tab out to other window apps@@rkdeshdeepak4131
@anthonywritescode 5 месяцев назад ⁺⁵
not even full screen either but yes -- I crop the obs scene to just the Linux vm
@shadowpenguin3482 5 месяцев назад ⁺¹
Had to think a bit to understand, to put it in other words, he does not have a Windows VM in Linux, but a Linux VM in Windows, OBS is running on windows and is cropped the area of the Linux VM. When he moves a windows window on top of the Linux VM window it is not in the VM but on top of it.
@trainerprecious1218 5 месяцев назад
i am sorry if i missed but what does "paging into those objects" mean?
@anthonywritescode 5 месяцев назад
without going into too much detail memory is segmented into chunks which are called pages. when paged in they become resident (copied from the parent process)
@itay51998 5 месяцев назад ⁺³
I know some python but not so in-depth, can barely understand what you are showing in cpython.
How would one learn this stuff?
@CouchPotator 5 месяцев назад
That would be because the cpython stuff is C code, not python. And must of that code are Macros ( the lines begin with a #) and, to simplify, that is code that is run before it's complied. Mostly it's checking what compiler and system it's going to be used on.
___GNUC___ being the GCC and ___CLANG___ being the Clang C Compilers, respectively . ___STDC_VERSION___ is the version of the C language standard being used. _MSC_VER is the version of Microsoft's Visual C complier.
@brookskd87 5 месяцев назад
Neat trick. Instead of using Celery prefork why not use the solo worker which is single process and let k8s scale the workers? This works well for our application and uses much less resources. The health probes and pod termination are tricky with long running tasks but possible by touching a file periodically. This way k8s handles hung tasks and more pods not worker processes is how you scale up.
@anthonywritescode 5 месяцев назад
in theory that's better. practically though there are memory leaks and significant (unused) overhead of just getting the django app initialized. so single worker would be pretty wasteful (that prefork had such an impact is kind of a testament to that)
if each worker were a separate service that had very specific dependencies it would probably make sense? though that would involve tons of work since we have hundreds of different tasks
@spaghettiking653 Месяц назад
If you disable the GC at this point before the fork, doesn't that make your program never free memory at any point after the fork? Do you ever re-enable the GC?
@anthonywritescode Месяц назад
gc freeze does not disable the gc
@eduardmart1237 5 месяцев назад ⁺¹
Can you make a guide on how to use Celery with Flask and Django? Especially when you create celery workers and wait them in flask.
@anthonywritescode 5 месяцев назад ⁺¹
personally I would not recommend using celery. the architectural decision to use it at work predates me and is almost too big to change at this point
@eduardmart1237 5 месяцев назад
@@anthonywritescode what are the alternatives?
@anthonywritescode 5 месяцев назад
any work queue really
@ractheworld 5 месяцев назад ⁺²
What a good engineer! This is why some guys rake in more dough than others.
@smccrode 5 месяцев назад
Hope you get a raise or a bonus for this! ;)
@christianremboldt1557 5 месяцев назад
You know how to make programs more efficient
I know how to use Paint more efficiently
We are not the same
@sconnz 5 месяцев назад
Jeeze what type of server has 6+ terabytes of ram 😮
@anthonywritescode 5 месяцев назад ⁺³
not a single server, a kubernetes cluster
@sconnz 5 месяцев назад
@@anthonywritescode Thanks, that makes sense.
@Rachelebanham 5 месяцев назад
dang python sucks at copy on write!!
@miguelborges7913 5 месяцев назад
Is that an ubuntu vm on windows?
@ferdynandkiepski5026 5 месяцев назад ⁺²
No GC, no problem.
@anthonywritescode 5 месяцев назад ⁺⁴
the gc is still on to be clear. just it's not processing a large chunk of objects any more
@danieloberhoff1 5 месяцев назад ⁺⁴
hmm, tbh i would never runb something as big in python. maybe rather nodejs? but maybe that has another can of worms...still, the severe performance problems I keep running into with python would strongly disincentivise investing that deeply on it on a high performance server...
@robertfletcher8964 5 месяцев назад
at this point your looking at Rust, C++, or GO. all of which have their own worm cans.
Really though I think this video proves that Python is currently doing the job at enormous scale, and its being used by allot of smart, and very experienced people.
@joshix833 5 месяцев назад ⁺²
NodeJS has big performance problems too. Something native like Rust would be better
@protonjinx 5 месяцев назад ⁺⁴
this just reinforces my belief that garbage collection based memory management is evil
@anthonywritescode 5 месяцев назад ⁺³
a bit naive don't you think
@nezbrun872 5 месяцев назад
@@squishy-tomato Projection much?
Yeah, just throw hardware at the problem. Cloud vendors must love you.
@andrey6104 5 месяцев назад ⁺¹
Нихуя не понял, но видос интересный, спасибо Антоха.
@k1zmt 4 месяца назад
Ну чего ты не понял-то? Сказали сборщику мусора не отслеживать ссылки. Его структуры перестали копироваться в дочерние процессы.
@Terrados1337 5 месяцев назад ⁺¹
Imagine someone tries to learn python and they start on their merry way, learning the basics, building their first hello world. And then you run in Dumbledore style and ask them calmly: "HARRY! Did you waste a Terrabyte of RAM using garbage collection?!?!"
@user-ni5hr1bw6z 5 месяцев назад ⁺²
Do you think that running gc.freeze after gc.collect would improve more memory usage?
def _create_worker_process(self, i):
worker_before_create_process.send(sender=self)
gc.collect() # Issue #2927
return super()._create_worker_process(i)
I put that signal just before collect and thats why this come into my thought.
@anthonywritescode 5 месяцев назад ⁺¹
collect will likely make it worse because it will make more holes in arenas
@djtomoy 5 месяцев назад
Huh?

Следующие

Автовоспроизведение

using memray to debug (and fix) a memory leak in krb5! (advanced) anthony explains #567