To the inevitable "nuh huhh" commenters: save your collective breaths. I have lost any interest in discussing this with anyone unwilling to listen. This video isn't about language superiority or "what X was designed to do", it's about showing a way to write better code involving python. If that is helpful to you, great. If that somehow makes you angry, maybe take a step back and re-evaluate your priorities in life. I will no longer be responding to "no this is wrong" comments, and I _will_ remove any of them that are disrespectful or offensive. You have been warned.
Your channel fills the much needed niche for intermediate-leveled programmers. Sometimes I'd rather watch a succinct video on a topic than read through pages of docs or the source. Got yourself a new sub :)
5:48 you are not understimating the CPU, but the JIT compiler which just folded your for loop operating on fixed input into a constant. This also explains why it barely goes above 100% when run in parallel, since the part of the execution that runs wthout the GIL is very short for each run.
Even if python can make system threads as you demonstrated in 2:00 ~ 2:55, I think it is NOT truly multi-threaded because of the GIL. A user who uses system threads expects those threads to be literally run parallel; but in python, thread scheduling is done by GIL which means that only a single thread is allowed to be run at a time. You can even see this in 2:53 that only a single thread (the one that is highlighted) is in state R (Running) and others are in state S (Sleep), due to the GIL. In the latter case, I'm not familiar with the internals of NumPy or so, but it bypasses the GIL through FFIs, and thus almost (still, not all) states are in R, but this is thanks to the libraries that use FFIs, not python's threading. So it is not a fair example for arguing that "python is not a single-threaded". All language that has GIL in it can bypass GIL by using FFIs or libraries that use FFIs like this. So, it is true that "Python is NOT Single Threaded" literally, but many people don't think because it is still sequential.
finally someone gets it. No matter how many threads you spawn under a single process, in reality only one thread is running at a time due to GIL while true meaning of multithreading is that multiple threads are being executed on the multiple cores at the same effing time.
The video literally shows how to do the opposite (as in of course normal ass python code has to retain the gil but there's sufficient ways to release it in situations where it matters) ... Like, the proof is in the pudding maybe try a bit of it?
@@JackofSome I think when some smart people said that Python is single threaded what they really meant was pure python will have single thread running at all times even though you might have had some new threads spawned manually. Tbh, if I was not intermediate python programmer, what I really would have taken away from point 1 was that "using threading module I can run multithreaded apps using multiple cores just like Java or C#. Also, maybe if you have time, you can make a part two of this video and add these points to make a lot more clearer what Python with multithreading means. Multithreading in itself is a little advanced topic to wrap head around, and with the Python GIL in the mix, it makes it a lot harder.
The reason I made this video is because I've conversed with good programmers that have been coding in python for years and still think that python threads are green threads (rather than system threads) managed by some background runtime, and that python itself can only run one thread. This hurts their ability to write good multithreaded python code (even if they're only using pure python code but especially if they're using libraries implemented in C or C++, which 90% of the time is the case).
Very good demonstration. There are 4 distinct levels of thread execution touched on; physical processors, SMT (simultaneous multi threading, where one core has multiple contexts; "virtual core" is a bit of a misnomer as they have their own physical registers and are just as real as the sibling they share execution units with), operating system threads (which were repeatedly referred to as "proper"), and in-process coprocesses ("green" threads). The load spike moving from processor to processor in the GIL demonstration is probably execution, not threads, moving between cores. It just varies which thread acquires the GIL. Linux tends to keep threads in the same core when possible, something that can be enforced using thread affinity. This reduces overhead from warming up the caches for particular tasks. In some processor cores, e.g. Xmos XS1, a single thread can't use all the execution resources; in XS1, 4 threads are required to saturate the pipeline. The multiprocessing example probably bottlenecked in the scheduling of work or transfer of parameters and results, which are serialized due to execution on the parent control process. Larger queues are often used to mitigate this, but it won't help if the task generation takes longer than task execution.
The main issue is that a video like this will have the opposite intended effect - people are now going to go around and use threads simply because they heard that python is in fact using proper system threads, ignoring the nuance. Ive seen that happen often enough. Ultimately, in python threads ACT as if they aren't real threads - nicely proven by spawning a few threads that all append the current timestamp to separate lists. You can very clearly see that the multiple threads are never executing at the same time. This is your pudding. It's what the threads act like. that they don't act like that in NumPy etc. is a nice detail, but ultimately not disproving the core statement: python threads are not (read as: dont act like) threads. I will continue to point this out to beginners and i will continue to recommend other ways of speeding up their code - like using numba (with gil-released threads if you want more speed). Ultimately, its the same as telling people python is interpreted - it's also compiled, but explaining that to beginners is entirely pedantic as python acts entirely like a interpreted language.
Man, you make great videos. It's just at the right level. As someone who would class themselves as an intermediate Python programmer, your videos provide just the right amount of information and examples for me to immediately apply them and run with it. Thank you and keep up the great work!
People say "single threaded" when they don't know what it means. What they mean is, python does not run code in parallel (multicore). Using C++ in Python (e.g. through numpy) IMHO is not a positive, its a escape hatch that has been used over and over again because of this Python limitation. Python succeeded despite its GIL, not because of it.
Python does not run _Python_ code in parallel. The GIL is fine. It does its job for reducing complexity and adding safety in concurrent code. I dunno if I'd call ffi an escape hatch. It's definitely not done because of multi processing, it's done because of speed reasons (that's a whole other can of worms). With the ML and Data Science success it's become python's primary MO. I find myself constantly wrapping C++ code in python these days just to get a more pleasant and flexible interface.
Thanks for the awesome video! I have a question though, sorry of it is dumb. On 7:28 you say "The code that we written doesn't need an interaction with inter[preter", I don't really understand this. How it doesn't need an interpreter if it used to execude code? Maybe you can explain what you mean, please? Or, at least, point me to a place to read more about it? Thanks again for the content!
That bit of code is being JIT compiled by numba and runs "outside" of the python interpreter (similar to calling a C function). So once the function call has been made the code has no additional interactions with the python interpreter until it returns.
@@userou-ig1ze Pretty much, unless like in this example it's some external piece of code (or system call) that does not rely on anything managed by the interpreter. To the interpreter these are simply black boxes: it copies some data to them and gets something when they return.
Good video. Racket calls its green threads "thread" (which may not be ideal in all cases) and its multiprocessing thing "places". I like that it calls it places, because it is a nice analogy. I started writing an explanation but it looked to much like a book, so I deleted it. Another thing I want to mention: it also has "distributed places" which is a way to do distributed programming / writing programs that run across multiple machines (of course there is no magic to make that super easy).
When I was first learning Python 3 years ago, I fell into this category of being upset with Python's GIL and claiming it's "not true multithreading." This being because I came from a C++ background, where if I needed to do parallel processing on complex data types/structures, such as flying PSO particles in a distributed fashion, it was fairly straightforward in C++ but rather impossible in Python without the performance hit. Since then, I've learned to use Python as a "wrapper" for calling more complex operations underneath. Occasionally, I still find myself using the multiprocessing library as a clutch to run a set of numpy or pandas operations in parallel, especially when I'm in a bind. But, the one thing Python seriously lacks is native performant parallel processing on complex structures. It's not a problem if you can fit your code into the interface of one of the underlying libraries, but can be a problem for novel approaches in research. It's still my biggest gripe about Python, even tho it's good at almost everything else. The whole pickle approach to passing args to processes almost made our work a no-go in a previous project.
Totally agree with you there. It's certainly a limitation and one you have to be hyper aware of whenever performance matters. On the multiprocessing front, I think python 3.8 introduced shared memory which should help a great deal, but I've never tried it.
The fundamental reason for the GIL is that you cannot manage reference-counting without it. Python, like Perl, uses reference-counting as a first resort for managing memory, only going to full-on garbage-collection à la Java when reference counting is no longer enough (like when you have reference cycles). The trouble with using garbage collection for everything is that it means your innocent little script, if it runs for long enough, can end up consuming all available memory on the system unless you impose artificial limits to constrain it. Python tries to avoid this. Hence the GIL. There are Python implementations that have no GIL -- e.g. Jython. Most people prefer CPython with its GIL -- should that be a surprise?
Makes perfect sense. The GIL is great in my book. I'm just trying to show that some programmers underestimate Python because they misunderstand the true limitations posed by the GIL.
One recurring design question with cpython's ref counter is why the ref count isn't atomic or otherwise synchronized to a scope finer than running program
subbed now . I just started learning Python 3 days ago. had no background of Computer programming at all but found the Python for dummies book really helpful. Suddenly I found your channel. I love your job. Please let guide me where to start as a total noob and beginner. thanks so much.
Async in python is different. It refers to green threads which are not really system threads. They run under an event loop and usually are contained inside a single system thread. Very helpful for things like web frameworks but not for computationally heavy tasks.
Great video! I think the misconception comes from the expectation that python should make threads that run parallel. The GIL prevents them from running in parallel, but your operating system will schedule the threads, so it acts more like async as an analogy but letting the OS take the wheel. This is all by design to prevent 90s-2000s era thread lock management like in C++ which was a mess. And anyways the web has shown the world that a single thread is all you need for GUIs (JavaScript stuff). Almost everything else like you said with numpy will bypass the GIL with c code.
"Pure python bytecode will at any time be able to use only one core". That design decision is about to be changed btw. In the next year or two the GIL will no longer be a requirement.
I think it would be helpful to add what kind of tasks can be multi-threaded in Python, since for certain computational tasks, without numba, the GIL would block multiple threads and make processing single-threaded.
More or less the reason why I more or less quit Python. Nowadays I often use a language that is more or less as slow as Python but allows for parallelism, Erlang. You may want to check it out.
1:00 when did you ever do smth computationaly intensive in python? i guess never.... meanwhile me trying to figure out how to run through 2^32 combinations of numbers using python :/
Don't. Pure Python is the wrong language for it. Use numba or cython for a familiar feel. Or learn to use mypyc. Or use a completely different language.
Very nice video. I enjoyed watching this and the other videos I have seen from you so far. I think it would be great to hear more about how you would approach/manage parallelizing computations with a little more threads in the order of ~20-40 threads with a focus on typical bottlenecks you face and how you address them.
This is a great demonstration. I had no idea that a package existed where you could switch off the GIL. Unfortunately, this only really seems to work for numerical stuff, and, according to my experiments, doesn't even raise an exception when there is something in the njitted function it can't deal with. Is there any reason in principle why a package couldn't be made where the GIL gets turned off, and everything is able to function as in a normal Python environment... all at the dev's risk of course?
It's not turning the GIL off, it's just releasing it when it's not needed. C/C++ extensions can do this as well. Pretty much any FFI can so long as you're doing something that doesn't need to interact with the python interpreter. Also njitted functions absolutely do raise exceptions when there's errors.
I've actually never heard anyone mistake Python for being single-threaded. I feel like most people understand pretty well that "it only uses up to 1 CPU worth of processing". What I see much more often is people saying Node.js is a single-threaded language.
With the GIL in place, does it make a difference (other than being technically wrong)? And regardless of how its internals might work, Node.js has worker threads, so you can have multiple single-threaded JavaScripts in one process.
In my case, i created a function, def s(a): sleep(a) I use 5 threads and call s(5) five times in each thread, total time is 25secs, but when t change the function to def s(): sleep(5) 5 threads, same target, total time is 0.05s how ?
Thanks a lot. By the way, do you happen to know a good book for python (intermediate level and up) for reading? I have been programming in Python for 3 years now, but I have never read a book, (maybe because I have been programming for 15 years now, and Python really started as a hobby and I went on with it, )Now I'd like to read a book from groundup/like a bible of sort in some languages, and have a better understanding of the language and latest standards etc as well. I'd be greatful
Never read a book myself either. Just watch talks from experienced people. Raymond Hettinger is pretty good. Reading standard library and other popular libraries is also a good way to learn
Great video. But from the perspective of the developer that only writes and care about "pure" Python, you really don't have to worry about race conditions since the GIL should coordinate concurrent tasks supposedly, right?
This was top video in Google search when we trying to investigate python usability in a role of 10000 websockets server. Consider altering description to "Python is NOT Single Threaded for calculations..." or something. Otherwise description de-facto works as "misleading".
The description is literal fact, with details and nuances being presented in the video. I would hope that if you're doing research for a big project you'd do a bit more than reading the title of a RUclips video.
Is my understanding correct? Python threads perform tasks concurrently by using single processor core without parallelism. With Python multiprocessing we can spawn multiple processes with different interpreters on different cores and achieve parallelism.
No. Python threads are also running in parallel but are artificially locked to running one at a time. In some cases you can get rid of this lock. With python 3.12 this is about to become a lot simpler
Firstly, I don't think anybody who knows Python thinks it's single-threaded. If it were, it wouldn't need a GIL. Secondly, async is not green threads, it's cooperative.
I think the argument of lesser humans is ironically exactly the inverse: Python is single-threaded BECAUSE of GIL. I.e. unless you have no idea how to escape it (which I imagine gets also progressively more complex in larger code bases), it would, for all intents and purposes, be single threaded. From that perspective the argument could follow that in theory, all programming languages CAN use multiple cores (e.g. spawning mutliple processes) but it's not practical. ...in my hallucination of a devils advocate, the argument could even follow that, because of this, you are right, but whatever you said has no practical relevance for inexperienced users (since according to author this misconception is common even in professional environments, I would assume this is a unexpectedly large proportion)
The real reason that no one changes these languages to have full parallelism like C++ is that you can't change a library or a compiler or an interpreter to do that - not and still have garbage collection anyway. It has to be baked in from the initial design - and the decisions that you have to make to make that possible are significantly different. If you wrote a new version of the language that did all that, none of the existing libraries would work with the new language. It still might be worthwhile, but it's a deeply breaking change AND it requires rewriting from the ground up. The kludges that are happening now, copying data across processes instead of sharing them might be enough - though it IS possible to write an actually sharing version of a dynamic language. Note Numba's nogil only works on code that doesn't use Python types... That's quite a restriction. By the way, languages with advanced garbage collectors like Java and .net don't actually use system threads = or rather they have no more than one system thread per hardware thread. This is because advanced garbage collectors need to do context switches only at safe points - otherwise, finishing a gc phase would have to wait for all language threads (including the ones that don't currently have an active timeslice) to count out the phase change because of the limitations on how cores see changes that other cores have made in memory. So these advanced systems DO use green threading, they use one green thread at a time per hardware thread or hardware hyperthread. Google's go language is similar except that it doesn't even emulate having preemptive multitasking. So no preemptive multitasking and no green threads. Instead there's a limit on the number of simultaneous threads so that counting out a gc phase only requires a response from the currently running hardware threads.
I don't know what you mean by a hardware thread on Windows. If you use multithreading in C# (I have done so since 2002), then one C# thread corresponds to one Windows system thread.
@@JackofSome alright. I'm also looking in to it but read some articles that it's not thread safe to release the GIL so I was wondering if you know some implementations that is thread safe.
Really cool way to "bypass" the GIL and running python multithread. I've actually spent years trying to find a good solution and finally have some promising results with numba, so thanks for the amazing video. Very difficult to integrate numba compiled code with Pandas, though. Not sure if you managed to run a multi-threaded, GIL-free pandas groupby function? Cheers
if Htop marks the different threads with different pids . Doesn't this mean that we can (delegate) the different threads to different cores ? I really didn't understand the second point. If the first is true and we are creating kernel level threads (green threads as oppose to user level threads is sooo weird to me) than it should be true that we can assign these very threads to different cores ?
They automatically get scheduled across multiple cores but python has a restriction where no thread can run python bytecode without first acquiring a singleton global lock (called the GIL). So pure python code would still behave in a single core fashion even as the OS schedules the threads across multiple cores. When calling out to non pure python code (e.g. compiled C++ code or llvm bytecode generated by numba) often the lock can be released allowing for some degree or parallel processing.
@@JackofSome Thank you ! But I have one more question. So lets assume a scenario where we , for example have a python code , with lets say 2 threads. We said that both of them can't run bytecode because of the GIL. We schedule them to run and at this time we call some non pure python code , releasing the GIL. What would happen to those two threads we started at the begining? Lets visualize : Starting T1 Starting T2 T1 -> has Lock T2-> Waits T1-> releases Lock T2-> has Lock T1-> waits. Starting non pure python code and releasing the GIL. T2-> Has Lock ( not anymore?) T1 -> What does this thread do? T3,T4-> Started from non pure python code. Doesn't releasing the Singleton GIL have affect on the threads T1 and T2 that we started at first and is this not dangerous?
I need handle thousand-level number of huge plain text files, and there are nested loops in my script. So could you please give me a advice for the acceleration lib? Numba? Multiprocess? Or something else?
Maybe misconceptions, but doesn't change the fact that Python's threading model is fundamentally broken as long as the GIL exists. I use python regularly mostly in the ways you imply are completely fine and also still regularly bump into the GIL. There are simply some computationally intensive things that are difficult to fit into things like numpy or pytorch, and there is a lot of stuff that is even difficult to fit well into the multiprocessing paradigm of python.
I don't really disagree with you. Python has definite limitations. The purpose of this video isn't to say "python is great for parallel processing" but demonstrate that if you understand the nature of python threading and the GIL you can make the language a little bit more useful and write better code. This kind of stuff becomes pretty important for me on a regular basis (e.g. I had a C++ camera driver that I wanted a python API for which needed to trigger N cameras at the same time. My colleague was adamant that this was only possible by running N threads on the C++ side and managing those separately, which is not the case. This video is meant to bust that kind of thinking).
@@JackofSome Yeah, I am not faulting you for getting the information out there. You just had the misfortune of reminding me of one of my least favorite aspects of the language. :)
i am doing computation (and memory) heavy right now. And whole team dont't want to do in in vanila python. Pushing to use pandas instead. Which is perfect example why python is single threaded. Not because it factually is. But because in actual tasks - no one use python multithreading.
This is exactly what I need, I am done with seeing "Introduction to *insert item here*" stuff, when all I wanted was some intermediate level information. Much appreciated mate, you earned another sub.
Great video! My question: it is possible to make my own good looking console that can represent a lot of data at once (with python). ( like the core performance monitor in i3 console.) (something like in your video) Thanks!
It definitely is. There's some nice libraries I've seen in the past but can't remember the name of. Try searching for "python TUI library". Curses or ncurses is the standard but also a bit of a pain to work with. I've definitely seen simpler libraries.
hey... Great video. to emphasize how dumb these sorts of blanket statements are... I run some extensively multiple simultaneous applications built in python, and the option that we use and people will say is cheating, but totally avoids the gil is: subprocess. Yeah, you just launch multiple processes and they have no issue with the GIL at all. We run systems with 900+ co-operating processes and they replaced C-programs that used IPC mechanisms... by avoiding formal IPC (all IPC mechanisms are just ways to slow code down so that they don´t smash common resources, so the fundamental improvement is to require fewer common resources). Modifying algorithms to use mechanisms that avoided locks whenever possible, the python ended up faster (like 10x) than the C it replaced. yes, it uses a lot of memory. My app is looking for execution speed, and we can afford memory... one use case, others will differ.
Right off the bat, The advice on using multiprocessing instead of threading came from Guido himself, it's not a "misconception", it's a quote from the creator of Python, And he said it without the context of "what if you offload some tasks in your code to a CPP code". The second part is, nobody ever claimed that python is using green threads, of course it creates a system thread. But because of the GIL every thread waits for the GIL to free up for execution, in that making your threading tasks not truly concurrent and rendering it being single threaded in that regard, no matter what your wrote in your code. unlike NodeJS, Golang and others.
The reason I made this video is that I keep running into programmers in the wild, some with decades of experience, that say that python has green threads. I'm glad you've never run into someone like that. If I have a task that runs more efficiently when using a solution like numba + pyhton threading vs multiprocessing, then I'm using threading, I don't care what Guido says, since the proof is in the pudding, not in some dogma. Now if I'm forced to use only pure python and nothing else, then yes I would use pure python, but I'll probably resign if it comes to that
It's creating new instances of Python as well as making copies of the data that's shared (that last one may not be true in newer versions of python). The ram use adds up
@@JackofSome Ah, it's the copying of the root instances data to the new instances that probably cost the most. I wonder what the memory footprint of a pure Python instance is if you create a new process immediately after starting the root process.
The main thesis is pretty wrong. Nobody cares if Python threads are implemented using kernel threads or not. What ACTUALLY matters is whether you can get SIMULTANEOUS EXECUTION, because that's what's going to let you use all of the many cores that are usually available. If you can't do that easily (e.g. you need to resort to multiple processes, or some third-party facility), then Python is not amenable to a VERY WIDE range of applications. This non-amenableness explains something said near the beginning: When was the last time you saw a CPU-bound Python program? The reason this doesn't exist is because when people set out to write a CPU-intensive application, they immediately throw out Python as a possible implementation language. Numba looks pretty interesting, and was new to me, so I did get something out of this, but the fact that you have to explicitly go out of your way to release the GIL in special sections of code proves the general wisdom correct: When using CPython, you have to jump through extra hoops to use all of your many cores, because in its "native" state, Python doesn't let you do that.
I agree with your last statement, that was pretty much the point of my video. I've run into enough Python developers that don't understand it and that's why I made the video. You _do_ understand it. Though, a few things: 1. Using FFI is in python's DNA, this is why numpy/Tensorflow/et al are so effective. Some of these libraries use their own internal threading, others are just very good at releasing the GIL at the right time., 2. Yes, Numba is pretty amazing, I have a video on that too.
What a clickbait video. Offloading some calculations through FFI does not make python multithreaded in a way programmers think when they hear "multithreading". Nor sequential concurrency does.
Ok. A good understanding of what I'm showing here _does_ make Python more useful than many assume it is, and that's the ultimate point. We can sit here and split hair, or we can get back to work and write better code :)
Hey man nice video. The GIL part was really interesting. But you need to invest more time on the code. I cant understand what you are trying to do if you are writing it on the fly on console. You need to have ready made notebooks and go over them while you speak. And many some more complex examples. More to real life
Wow.. thank you :) I have an app that runs a lot of threads, while i don't know it this will really work (using some shared resources like print() to console ) i would like if python would offer this as integrated in python to bypass for "sure" functions. I wonder if we can use intermediate no gil with gil enabled (i seen that decorators exist). I will try this .. Thank you again
yeah.. thats no good .. i use a lot of python object that rely on gil... nogil assumes that you don't you them . I had a glimmer of hope :d. I can't use MultiProcessing because i use some syncronizations between threads .. i could use ipc (not many on windows) or the db (which is a bit slow and sometimes unreliable at many access multithreaded operations). I would have liked a fast and safe way to update my code. Too bad :(
Never the less .. i learned something about gil on python.. and for that i bow to you. I will explore a bit the python code (if i understand any ... python programer with little c knowledge) and see if if i find any clues to improve my code
You can mix and match. Numba really shines when you have heavy operations that can be cleanly abstracted into simple functions. There's also @jitclass but I don't think it's out of beta yet.
Not a topic I was wondering about much, but as a Python programmer but without much in-depth knowledge, I find myself Googling and Wikipediaing a bunch of terms mentioned in the video just so I can follow. PS In the last few years, I have built up a stubborn belief that given enough effort, there is nothing that I cannot eventually understand. So I am not intimidated by more advanced content such as this as I was before. This video (and I'm sure the others) are perhaps more informative than you think.
I'm superman 👀 Yeah those are sped up. I try to make videos as concise as possible and it's my firm belief that no one wants to see me slowly type in code
@@JackofSome i've internally convinced myself that your wpm is 300+ and that i am a pathetic germ cell, rejected by the egg, and doomed to die off simple sugars.
It's interesting, but in a way, it feels like you just confirmed my idea of python... Python is just a facade. In order to show that "you can", you had to use numpy, which is actually C. So, in the end, you are not using multiple python threads/processes, but executing multiple C tasks. With Python, I have found that most of the libraries that I use, are just facades to tools written either in C, C++ or Java.
A few things to note: 1. I'm using numba, not numpy. Nunba translates python code to llvm, much in the same way that the rust compiler translates rust code into llvm. Saying "so this is just C in the end" is a bit disingenuous because that would be like saying "so rust is just C then", which is not quite accurate. 2. You're right, python makes a great interface to lower level languages, and is often best when used this way. 3. You're wrong, in that these aren't separate C/C++ processes (no more than the regular python environment is since CPython is ultimately implemented in C). The thread is spawned by python, it's managed by the OS, the calls are all made by the python interpreter, etc. Python calling a C function isn't much different than C++ calling a C function. The point of the video is that python is useful, but some times people incorrectly think that it can't do certain things (limiting its usefulness to them) but those assumptions may be flawed.
@@JackofSome Thanks for your response. What I meant is that at about 8:58, you used numpy in a function to get all cores to work. It was a bit odd to me that it was not possible for you to do it with just Python. TBH in minute 1 you actually make a point that pretty much applies to 99% of python apps. Everyone is so concerned about peak performance, and then they have some site that receives 100 visits per day... And thanks for the clarification in #3.
Oh I see what you mean. Yeah I forgot I was using a numpy function in the function I jit compiled. You don't need that though. If you watch my numba video I show something very similar with pure python code (which of course gets jit compiled) and what you can achieve there by releasing the GIL. Amen on the performance panic part though. People do worry too much some times when they really don't have a bottleneck
There are some misconceptions here. In the old days we had 1 cpu with 1 core. Nevertheless full multi-threading was possible, because the OS would apply time slices and alternate the threads using its thread manager. This is very useful with a GUI program, because the GUI code would be sitting idle most of the time, waiting for a keypress or mouse click. So the worker threads would be able to do their job in the meantime. Only when the user would take some action, the GUI thread would kick in. So the GUI would remain responsive, while the worker threads would do their long tasks. Think of reindexing a database. Also handy with scripts, e.g. when you want to cancel a job.
Of course. Back in my mechanical engineering days before I knew about threads I ended up making something similar to threading by using timers in an arduino so I could microstep some motors. Fun times
For best python performance use mpi4py. I have used upwards of 10,000 CPUs with it (about 200 nodes). Threading just plainly does not work efficiently enough in an HPC environment.
I wouldn't. I've been using patterns like this to supercharge my python codebases for years, even beating an equivalent C++ implementation in one case. If you can elaborate more on what you mean or show examples then we can actually have a discussion.
Yea, but don't you get rid of some of the advantages python provides. Wouldn't it be better just to stick to a language more fitted for multithreading i.e. C, C++, C#, Java, etc...?
I don't see how? If you're doing anything data related in python, you're using numpy. If you're doing any computer vision, you're using opencv. If you have a few pieces of code that need to be accelerated you may be using numba or cython. I'm just showing how you can make the most of your CPU given these existing scenarios. Being able to write the absolute fastest code ever isn't the only consideration when choosing a language.
Numba is too young. It don't even support python list to be passed into function. Instead we need to use numpy. And that case too it don't support unicode arrays.According to numba documentation, built in functions like .split() are slower than cpython implementation in nopython mode. There are too many limits while using numba which are not covered in this video.For industrial applications I think cython is the best option to accelerate python.
As well all things, it depends. I've been happily using numba in production for over two years, will continue to do so. Edit: just realized this comment wasn't on the numba video 😅
@@JackofSome The official numba itself docs lists all the limitations. May be you only do numerical computations.In that case it might work. But it fails for many other general purpose applications.
I'm aware of the limitations. Just pointing out that "just use cython/c/rust" isn't quite a good response to them since so much is situation dependent.
You basically proved yourself wrong in the beginning of the video by assuming that very common programming task of heavy calculations is for some reason very rare. In my current project I have multiple threads that do some data preprocessing for feeding it to the keras and that requires multiple cores. And no it's not just numpy arrays, it's selection from the sqlite database and then formulating input as numpy arrays accepted by the model. It is very slow and programmatically heavy to move processed data between processes through the pipes. Would be much faster with proper threading. I know it's my fault that I am too lazy to rewrite processing in go or something like that but still it proves python is very bad for multicore programming at the moment.
Any chance you can share a gist of the preprocessing code? I do a fair bit of data preprocessing work for neural nets and am often able to make it work with threading. Of course the video isn't meant to be a one size fits all. It's more "python is more useful for threading than people seem to believe".
In all possibilities, just like your video, people here (and probably you) will hate this comment... However, your video itself clarifies how bad Python is in contrast to new machines (multi-core) running it. There are multiple reasons and some you might agree already but the worst is, your pinned comment. One of the first tech video with such defensive comment. I don't care how accurate your video is or not, didn't expect such comment and defensive approach. Also, just pointing out the common misconception and not why those mis-conceptions came in the first place is one bad thing. Not accepting that python itself is not at all performant for multi-threading is another. If you need a wrapper for all C-libraries, many languages provide a bridge. Python might just be rich in C-bind libraries available. But libraries don't make a programming language. There are languages solving multi-threading in far superior ways on top of giving C-binding for running C-code e.g. Golang, Rust, Lua, Clojure, each in different ways and getting way better result. Heck, even JavaScript (Node.js) is way better and faster and has a better way of handling multi-threading. No hate comment, I again repeat. It just that, despite your explanation or going through extensive research on evolution of GIL through the years and through different versions of python and its interpreter, it still seems hacky and lagging behind modern machine capability by miles. If python wants to be just a wrapper, may be its not worth being a language (for multi-threading). May be I am a noob at python (totally true), but python seems to be just a toy for kid to start doing some small funny stuff without knowing anything about CPU arch, Processes and threads.
No you're absolutely right, the GIL was never designed with multi core in mind. That said that kind of doesn't matter to most of us as our job isn't "pick the best language" it's "here's the most convenient language that was chosen 10 years ago, make the best of it" and that's going to be Python for a LOT of things for a while. The point of this video is to make the best of Python. Oh and I do dislike your comment, but only because it's been said a million times before and ultimately has no relevance to this specific video.
Uhm, youre spreading misinformation with clickbait titles. Having multiple threads does not mean youre executing these in parallel as "system level threads." It doesnt matter how many threads you have running, only ONE is executing at a time, essentially making python behave as a single thread. That is what people say and always have been. The GIL is designed right into its core, trying to say "hey look guys i see another thread on htop, lolz proven wrong" is the worst attempt to disprove this ever. Prove they are executing at the same time with pure python code, they arent.
1. You didn't watch the whole video 2. That's not the take a lot of people have. There's a great deal of misinformation that python threads are green threads and there's absolutely no way to get anything initiated in python to run in parallel, pure python or otherwise. That's what this video is meant to counter. In this and other videos I show threads spawned from python executing code (either llvm code or compiled C++) running in parallel, which of course works fine when it's ok to release the GIL. You'd see this if... point 1.
@@JackofSome I did watch the full video, your title is clickbait. Again, the GIL makes native python behave AS a single thread, that IS what people say, regardless of how many threads you spawn. So youre simply wrong and trying to use really flimsly "proof" to make a point thats simply false. You cant get threads to run in parallel in the true sense of a p_thread, its built into the very core of python. If youre making a channel to educate people on python, then you have a responsibility to know what youre talking about, and youre simply wrong on this video. You also dont seem to understand why the GIL is still related to why multiprocessing is so inefficient. Go watch some david beazly videos on the subject
1. No, that's not the people I've met say. You're lucky if you've never encountered that. 2. Non native python code (as I said, either LLVM code generated by Numba or compiled code from C++/rust/whatever) will run in parallel fine inside python threads. The proof is in the pudding. I rely on that behavior in a number of my codebases. The video and title are perfectly fine for those it intends to educate. You're apparently not in that category, and that's ok.
To the inevitable "nuh huhh" commenters: save your collective breaths. I have lost any interest in discussing this with anyone unwilling to listen. This video isn't about language superiority or "what X was designed to do", it's about showing a way to write better code involving python. If that is helpful to you, great. If that somehow makes you angry, maybe take a step back and re-evaluate your priorities in life.
I will no longer be responding to "no this is wrong" comments, and I _will_ remove any of them that are disrespectful or offensive. You have been warned.
Your channel fills the much needed niche for intermediate-leveled programmers. Sometimes I'd rather watch a succinct video on a topic than read through pages of docs or the source. Got yourself a new sub :)
You call this intermediate!?! Boi I’ve got a long way to go lol
*whispers* we're all intermediate, expert is a made up word
@@twinkytwinklier4047 at least you're working on it. The world needs more programmers, especially the ones who do FOSS work
5:48 you are not understimating the CPU, but the JIT compiler which just folded your for loop operating on fixed input into a constant. This also explains why it barely goes above 100% when run in parallel, since the part of the execution that runs wthout the GIL is very short for each run.
What JIT compiler? He's using CPython, isn't he?
@@DanEllis numba does jitting on the function level like pypy does jitting on the program level
9:00 was mind-blowing ! ! Gonna try that some day when I do some heavy work . Thank you for sharing your knowledge !
Even if python can make system threads as you demonstrated in 2:00 ~ 2:55, I think it is NOT truly multi-threaded because of the GIL.
A user who uses system threads expects those threads to be literally run parallel; but in python, thread scheduling is done by GIL which means that only a single thread is allowed to be run at a time.
You can even see this in 2:53 that only a single thread (the one that is highlighted) is in state R (Running) and others are in state S (Sleep), due to the GIL.
In the latter case, I'm not familiar with the internals of NumPy or so, but it bypasses the GIL through FFIs, and thus almost (still, not all) states are in R, but this is thanks to the libraries that use FFIs, not python's threading. So it is not a fair example for arguing that "python is not a single-threaded". All language that has GIL in it can bypass GIL by using FFIs or libraries that use FFIs like this.
So, it is true that "Python is NOT Single Threaded" literally, but many people don't think because it is still sequential.
You just described the video and its thesis.
finally someone gets it. No matter how many threads you spawn under a single process, in reality only one thread is running at a time due to GIL while true meaning of multithreading is that multiple threads are being executed on the multiple cores at the same effing time.
The video literally shows how to do the opposite (as in of course normal ass python code has to retain the gil but there's sufficient ways to release it in situations where it matters) ...
Like, the proof is in the pudding maybe try a bit of it?
@@JackofSome I think when some smart people said that Python is single threaded what they really meant was pure python will have single thread running at all times even though you might have had some new threads spawned manually.
Tbh, if I was not intermediate python programmer, what I really would have taken away from point 1 was that "using threading module I can run multithreaded apps using multiple cores just like Java or C#.
Also, maybe if you have time, you can make a part two of this video and add these points to make a lot more clearer what Python with multithreading means.
Multithreading in itself is a little advanced topic to wrap head around, and with the Python GIL in the mix, it makes it a lot harder.
The reason I made this video is because I've conversed with good programmers that have been coding in python for years and still think that python threads are green threads (rather than system threads) managed by some background runtime, and that python itself can only run one thread. This hurts their ability to write good multithreaded python code (even if they're only using pure python code but especially if they're using libraries implemented in C or C++, which 90% of the time is the case).
The little detail of you speeding up typing code made my day. It only saves a few seconds but makes the video just a bit more enjoyable to watch
For quite a while there I thought he was just typing blazingly fast.
Me too
Very good demonstration.
There are 4 distinct levels of thread execution touched on; physical processors, SMT (simultaneous multi threading, where one core has multiple contexts; "virtual core" is a bit of a misnomer as they have their own physical registers and are just as real as the sibling they share execution units with), operating system threads (which were repeatedly referred to as "proper"), and in-process coprocesses ("green" threads). The load spike moving from processor to processor in the GIL demonstration is probably execution, not threads, moving between cores. It just varies which thread acquires the GIL. Linux tends to keep threads in the same core when possible, something that can be enforced using thread affinity. This reduces overhead from warming up the caches for particular tasks. In some processor cores, e.g. Xmos XS1, a single thread can't use all the execution resources; in XS1, 4 threads are required to saturate the pipeline. The multiprocessing example probably bottlenecked in the scheduling of work or transfer of parameters and results, which are serialized due to execution on the parent control process. Larger queues are often used to mitigate this, but it won't help if the task generation takes longer than task execution.
The main issue is that a video like this will have the opposite intended effect - people are now going to go around and use threads simply because they heard that python is in fact using proper system threads, ignoring the nuance. Ive seen that happen often enough.
Ultimately, in python threads ACT as if they aren't real threads - nicely proven by spawning a few threads that all append the current timestamp to separate lists. You can very clearly see that the multiple threads are never executing at the same time. This is your pudding. It's what the threads act like. that they don't act like that in NumPy etc. is a nice detail, but ultimately not disproving the core statement: python threads are not (read as: dont act like) threads. I will continue to point this out to beginners and i will continue to recommend other ways of speeding up their code - like using numba (with gil-released threads if you want more speed).
Ultimately, its the same as telling people python is interpreted - it's also compiled, but explaining that to beginners is entirely pedantic as python acts entirely like a interpreted language.
exactly
Man, you make great videos. It's just at the right level. As someone who would class themselves as an intermediate Python programmer, your videos provide just the right amount of information and examples for me to immediately apply them and run with it. Thank you and keep up the great work!
People say "single threaded" when they don't know what it means. What they mean is, python does not run code in parallel (multicore). Using C++ in Python (e.g. through numpy) IMHO is not a positive, its a escape hatch that has been used over and over again because of this Python limitation. Python succeeded despite its GIL, not because of it.
Python does not run _Python_ code in parallel.
The GIL is fine. It does its job for reducing complexity and adding safety in concurrent code.
I dunno if I'd call ffi an escape hatch. It's definitely not done because of multi processing, it's done because of speed reasons (that's a whole other can of worms). With the ML and Data Science success it's become python's primary MO. I find myself constantly wrapping C++ code in python these days just to get a more pleasant and flexible interface.
100% true
Thanks for the awesome video!
I have a question though, sorry of it is dumb.
On 7:28 you say "The code that we written doesn't need an interaction with inter[preter", I don't really understand this.
How it doesn't need an interpreter if it used to execude code?
Maybe you can explain what you mean, please? Or, at least, point me to a place to read more about it?
Thanks again for the content!
That bit of code is being JIT compiled by numba and runs "outside" of the python interpreter (similar to calling a C function). So once the function call has been made the code has no additional interactions with the python interpreter until it returns.
@@JackofSome When WOULD it need the interpreter? Maybe this needs a video for all the n00bs lagging behind :-/
@@userou-ig1ze If it had to access a global (variable, function) in the course of its execution.
@@IvanSeminara so that's any other library whatsoever?
@@userou-ig1ze Pretty much, unless like in this example it's some external piece of code (or system call) that does not rely on anything managed by the interpreter. To the interpreter these are simply black boxes: it copies some data to them and gets something when they return.
Good video. Racket calls its green threads "thread" (which may not be ideal in all cases) and its multiprocessing thing "places".
I like that it calls it places, because it is a nice analogy. I started writing an explanation but it looked to much like a book, so I deleted it.
Another thing I want to mention: it also has "distributed places" which is a way to do distributed programming / writing programs that run across multiple machines (of course there is no magic to make that super easy).
When I was first learning Python 3 years ago, I fell into this category of being upset with Python's GIL and claiming it's "not true multithreading." This being because I came from a C++ background, where if I needed to do parallel processing on complex data types/structures, such as flying PSO particles in a distributed fashion, it was fairly straightforward in C++ but rather impossible in Python without the performance hit.
Since then, I've learned to use Python as a "wrapper" for calling more complex operations underneath. Occasionally, I still find myself using the multiprocessing library as a clutch to run a set of numpy or pandas operations in parallel, especially when I'm in a bind. But, the one thing Python seriously lacks is native performant parallel processing on complex structures. It's not a problem if you can fit your code into the interface of one of the underlying libraries, but can be a problem for novel approaches in research. It's still my biggest gripe about Python, even tho it's good at almost everything else. The whole pickle approach to passing args to processes almost made our work a no-go in a previous project.
Totally agree with you there. It's certainly a limitation and one you have to be hyper aware of whenever performance matters.
On the multiprocessing front, I think python 3.8 introduced shared memory which should help a great deal, but I've never tried it.
The fundamental reason for the GIL is that you cannot manage reference-counting without it. Python, like Perl, uses reference-counting as a first resort for managing memory, only going to full-on garbage-collection à la Java when reference counting is no longer enough (like when you have reference cycles). The trouble with using garbage collection for everything is that it means your innocent little script, if it runs for long enough, can end up consuming all available memory on the system unless you impose artificial limits to constrain it. Python tries to avoid this. Hence the GIL.
There are Python implementations that have no GIL -- e.g. Jython. Most people prefer CPython with its GIL -- should that be a surprise?
Makes perfect sense. The GIL is great in my book. I'm just trying to show that some programmers underestimate Python because they misunderstand the true limitations posed by the GIL.
One recurring design question with cpython's ref counter is why the ref count isn't atomic or otherwise synchronized to a scope finer than running program
You can do both reference counting and garbage collection in the same languages without a GIL.
@@timseguine2 Prove it. Show us a thread-safe implementation of reference-counting that does not impede multi-threading.
@@lawrencedoliveiro9104 define "does not impede". Your ask is vague enough that whatever I answer will conveniently not fit your definition.
subbed now . I just started learning Python 3 days ago. had no background of Computer programming at all but found the Python for dummies book really helpful. Suddenly I found your channel. I love your job. Please let guide me where to start as a total noob and beginner. thanks so much.
3:15 What do you mean by async?, don't the threading library threads already work in an asyncronous way?, because of the context switches
Async in python is different. It refers to green threads which are not really system threads. They run under an event loop and usually are contained inside a single system thread. Very helpful for things like web frameworks but not for computationally heavy tasks.
Great video!
I think the misconception comes from the expectation that python should make threads that run parallel. The GIL prevents them from running in parallel, but your operating system will schedule the threads, so it acts more like async as an analogy but letting the OS take the wheel.
This is all by design to prevent 90s-2000s era thread lock management like in C++ which was a mess. And anyways the web has shown the world that a single thread is all you need for GUIs (JavaScript stuff). Almost everything else like you said with numpy will bypass the GIL with c code.
So this means that PYTHON is NOT PARALEL....great
"Pure python bytecode will at any time be able to use only one core".
That design decision is about to be changed btw. In the next year or two the GIL will no longer be a requirement.
@@JackofSome Exciting times for sure!!! I sure am gonna miss blowing my leg off with sketchy c++ interop to get more performance lol.
I think it would be helpful to add what kind of tasks can be multi-threaded in Python, since for certain computational tasks, without numba, the GIL would block multiple threads and make processing single-threaded.
2:45 what are the errors there?
More or less the reason why I more or less quit Python.
Nowadays I often use a language that is more or less as slow as Python but allows for parallelism, Erlang. You may want to check it out.
1:00 when did you ever do smth computationaly intensive in python? i guess never.... meanwhile me trying to figure out how to run through 2^32 combinations of numbers using python :/
Don't. Pure Python is the wrong language for it. Use numba or cython for a familiar feel. Or learn to use mypyc. Or use a completely different language.
Very nice video. I enjoyed watching this and the other videos I have seen from you so far. I think it would be great to hear more about how you would approach/manage parallelizing computations with a little more threads in the order of ~20-40 threads with a focus on typical bottlenecks you face and how you address them.
lowkey bummed that the frame did not zoom in more to the empty space below the text items hahaha
This is a great demonstration. I had no idea that a package existed where you could switch off the GIL. Unfortunately, this only really seems to work for numerical stuff, and, according to my experiments, doesn't even raise an exception when there is something in the njitted function it can't deal with. Is there any reason in principle why a package couldn't be made where the GIL gets turned off, and everything is able to function as in a normal Python environment... all at the dev's risk of course?
It's not turning the GIL off, it's just releasing it when it's not needed. C/C++ extensions can do this as well. Pretty much any FFI can so long as you're doing something that doesn't need to interact with the python interpreter.
Also njitted functions absolutely do raise exceptions when there's errors.
I've actually never heard anyone mistake Python for being single-threaded. I feel like most people understand pretty well that "it only uses up to 1 CPU worth of processing". What I see much more often is people saying Node.js is a single-threaded language.
With the GIL in place, does it make a difference (other than being technically wrong)?
And regardless of how its internals might work, Node.js has worker threads, so you can have multiple single-threaded JavaScripts in one process.
In my case, i created a function,
def s(a):
sleep(a)
I use 5 threads and call s(5) five times in each thread, total time is 25secs, but when t change the function to
def s():
sleep(5)
5 threads, same target, total time is 0.05s how ?
hey I am getting this error when I am importing numba
OSError: Could not load shared object file: llvmlite.dll
Thanks a lot.
By the way, do you happen to know a good book for python (intermediate level and up) for reading? I have been programming in Python for 3 years now, but I have never read a book, (maybe because I have been programming for 15 years now, and Python really started as a hobby and I went on with it, )Now I'd like to read a book from groundup/like a bible of sort
in some languages, and have a better understanding of the language and latest standards etc as well.
I'd be greatful
Never read a book myself either. Just watch talks from experienced people. Raymond Hettinger is pretty good. Reading standard library and other popular libraries is also a good way to learn
@@JackofSome Thanks a lot Jack. really appreciate it.keep up the great work
Which linux tool u using to visualise the threads on the left ?
Htop
Great video. But from the perspective of the developer that only writes and care about "pure" Python, you really don't have to worry about race conditions since the GIL should coordinate concurrent tasks supposedly, right?
So numba with nogil=true uses all threads together so it's running in parallel, and the clause parallel=true what it does?
nogil=True allows python threads to have true parallelism. Parallel=True instructs numba to run its own system level threads.
hey
great video
another point you could tackle is using taskset and mpiexec when calling python
Amazing! Could you please make the literate programming tutorial for Spacemacs? Stay healthy! ❤️
Is this possible in Jupyter Notebooks? Because if it is, the Machine learning model training is gonna be whole lot faster even for deep neural nets.
This was top video in Google search when we trying to investigate python usability in a role of 10000 websockets server.
Consider altering description to "Python is NOT Single Threaded for calculations..." or something.
Otherwise description de-facto works as "misleading".
The description is literal fact, with details and nuances being presented in the video. I would hope that if you're doing research for a big project you'd do a bit more than reading the title of a RUclips video.
Is my understanding correct?
Python threads perform tasks concurrently by using single processor core without parallelism.
With Python multiprocessing we can spawn multiple processes with different interpreters on different cores and achieve parallelism.
No. Python threads are also running in parallel but are artificially locked to running one at a time. In some cases you can get rid of this lock.
With python 3.12 this is about to become a lot simpler
Thank you for this video!
great video! One question: What is the OS that you are using?
This is Ubuntu with i3 window manager and xfce desktop environment
Wow I learned something new today. Thanks!
Firstly, I don't think anybody who knows Python thinks it's single-threaded. If it were, it wouldn't need a GIL. Secondly, async is not green threads, it's cooperative.
I think the argument of lesser humans is ironically exactly the inverse: Python is single-threaded BECAUSE of GIL. I.e. unless you have no idea how to escape it (which I imagine gets also progressively more complex in larger code bases), it would, for all intents and purposes, be single threaded. From that perspective the argument could follow that in theory, all programming languages CAN use multiple cores (e.g. spawning mutliple processes) but it's not practical.
...in my hallucination of a devils advocate, the argument could even follow that, because of this, you are right, but whatever you said has no practical relevance for inexperienced users (since according to author this misconception is common even in professional environments, I would assume this is a unexpectedly large proportion)
The real reason that no one changes these languages to have full parallelism like C++ is that you can't change a library or a compiler or an interpreter to do that - not and still have garbage collection anyway. It has to be baked in from the initial design - and the decisions that you have to make to make that possible are significantly different. If you wrote a new version of the language that did all that, none of the existing libraries would work with the new language. It still might be worthwhile, but it's a deeply breaking change AND it requires rewriting from the ground up.
The kludges that are happening now, copying data across processes instead of sharing them might be enough - though it IS possible to write an actually sharing version of a dynamic language. Note Numba's nogil only works on code that doesn't use Python types... That's quite a restriction.
By the way, languages with advanced garbage collectors like Java and .net don't actually use system threads = or rather they have no more than one system thread per hardware thread. This is because advanced garbage collectors need to do context switches only at safe points - otherwise, finishing a gc phase would have to wait for all language threads (including the ones that don't currently have an active timeslice) to count out the phase change because of the limitations on how cores see changes that other cores have made in memory.
So these advanced systems DO use green threading, they use one green thread at a time per hardware thread or hardware hyperthread.
Google's go language is similar except that it doesn't even emulate having preemptive multitasking. So no preemptive multitasking and no green threads. Instead there's a limit on the number of simultaneous threads so that counting out a gc phase only requires a response from the currently running hardware threads.
I don't know what you mean by a hardware thread on Windows. If you use multithreading in C# (I have done so since 2002), then one C# thread corresponds to one Windows system thread.
@@ernstraedecker6174 Windows system thread != necessarily hardware thread. It's whatever windows gives you :)
Hey Jack, which window manager do you use, i3? Also, how did you get the status bar at the bottom, and top?
It's i3 mixed with xfce. Top is i3bar. Bottom of xfce panel
@@JackofSome ??????????????
That's possible???????
HOW
If you Google i3 and xfce4 you'll find your answer
Hi can you also give an example of stuff like this in Cython?
I don't use cython but there's a way to release the GIL in cython as well. I recommend googling it. There's some more nuances to it.
@@JackofSome alright. I'm also looking in to it but read some articles that it's not thread safe to release the GIL so I was wondering if you know some implementations that is thread safe.
Hey! I loved this video. Do you have your dotfiles on github? (i3, bars, terminal &such)
can you compare cython with numba and how to use them? Thanks
I don't know how to use Cython actually. If I ever learn I can make a video
Very good job! Kudos!
I am curious if there is a way to use multi-threading on a pandas array where I can use the .apply on multiple columns in multi-thread mode ?
Really cool way to "bypass" the GIL and running python multithread. I've actually spent years trying to find a good solution and finally have some promising results with numba, so thanks for the amazing video. Very difficult to integrate numba compiled code with Pandas, though. Not sure if you managed to run a multi-threaded, GIL-free pandas groupby function? Cheers
Could you perhaps cover exceptions with multiprocessing? I've seen a lot of solutions on stackoverflow and none of them look remotely practical.
Interesting, but the colors on your terminal are difficult to read.
if Htop marks the different threads with different pids . Doesn't this mean that we can (delegate) the different threads to different cores ? I really didn't understand the second point. If the first is true and we are creating kernel level threads (green threads as oppose to user level threads is sooo weird to me) than it should be true that we can assign these very threads to different cores ?
They automatically get scheduled across multiple cores but python has a restriction where no thread can run python bytecode without first acquiring a singleton global lock (called the GIL). So pure python code would still behave in a single core fashion even as the OS schedules the threads across multiple cores.
When calling out to non pure python code (e.g. compiled C++ code or llvm bytecode generated by numba) often the lock can be released allowing for some degree or parallel processing.
@@JackofSome Thank you !
But I have one more question. So lets assume a scenario where we , for example have a python code , with lets say 2 threads. We said that both of them can't run bytecode because of the GIL. We schedule them to run and at this time we call some non pure python code , releasing the GIL. What would happen to those two threads we started at the begining?
Lets visualize :
Starting T1
Starting T2
T1 -> has Lock
T2-> Waits
T1-> releases Lock
T2-> has Lock
T1-> waits.
Starting non pure python code and releasing the GIL.
T2-> Has Lock ( not anymore?)
T1 -> What does this thread do?
T3,T4-> Started from non pure python code.
Doesn't releasing the Singleton GIL have affect on the threads T1 and T2 that we started at first and is this not dangerous?
I need handle thousand-level number of huge plain text files, and there are nested loops in my script. So could you please give me a advice for the acceleration lib? Numba? Multiprocess? Or something else?
By the way, in my script Pandas and os are major modules.
Cython + multiprocessing probably. Or use a different language
Maybe misconceptions, but doesn't change the fact that Python's threading model is fundamentally broken as long as the GIL exists. I use python regularly mostly in the ways you imply are completely fine and also still regularly bump into the GIL. There are simply some computationally intensive things that are difficult to fit into things like numpy or pytorch, and there is a lot of stuff that is even difficult to fit well into the multiprocessing paradigm of python.
I don't really disagree with you. Python has definite limitations. The purpose of this video isn't to say "python is great for parallel processing" but demonstrate that if you understand the nature of python threading and the GIL you can make the language a little bit more useful and write better code. This kind of stuff becomes pretty important for me on a regular basis (e.g. I had a C++ camera driver that I wanted a python API for which needed to trigger N cameras at the same time. My colleague was adamant that this was only possible by running N threads on the C++ side and managing those separately, which is not the case. This video is meant to bust that kind of thinking).
@@JackofSome Yeah, I am not faulting you for getting the information out there. You just had the misfortune of reminding me of one of my least favorite aspects of the language. :)
i am doing computation (and memory) heavy right now. And whole team dont't want to do in in vanila python. Pushing to use pandas instead.
Which is perfect example why python is single threaded. Not because it factually is. But because in actual tasks - no one use python multithreading.
The fact you use numpy to utilize your cpu in full - is prof as well. Python is single threaded. Some of his extensions are not. But Python is.
This is exactly what I need, I am done with seeing "Introduction to *insert item here*" stuff, when all I wanted was some intermediate level information. Much appreciated mate, you earned another sub.
Great video! My question: it is possible to make my own good looking console that can represent a lot of data at once (with python). ( like the core performance monitor in i3 console.) (something like in your video)
Thanks!
It definitely is. There's some nice libraries I've seen in the past but can't remember the name of. Try searching for "python TUI library". Curses or ncurses is the standard but also a bit of a pain to work with. I've definitely seen simpler libraries.
hey... Great video. to emphasize how dumb these sorts of blanket statements are... I run some extensively multiple simultaneous applications built in python, and the option that we use and people will say is cheating, but totally avoids the gil is: subprocess. Yeah, you just launch multiple processes and they have no issue with the GIL at all. We run systems with 900+ co-operating processes and they replaced C-programs that used IPC mechanisms... by avoiding formal IPC (all IPC mechanisms are just ways to slow code down so that they don´t smash common resources, so the fundamental improvement is to require fewer common resources). Modifying algorithms to use mechanisms that avoided locks whenever possible, the python ended up faster (like 10x) than the C it replaced. yes, it uses a lot of memory. My app is looking for execution speed, and we can afford memory... one use case, others will differ.
Right off the bat, The advice on using multiprocessing instead of threading came from Guido himself, it's not a "misconception", it's a quote from the creator of Python, And he said it without the context of "what if you offload some tasks in your code to a CPP code".
The second part is, nobody ever claimed that python is using green threads, of course it creates a system thread. But because of the GIL every thread waits for the GIL to free up for execution, in that making your threading tasks not truly concurrent and rendering it being single threaded in that regard, no matter what your wrote in your code. unlike NodeJS, Golang and others.
The reason I made this video is that I keep running into programmers in the wild, some with decades of experience, that say that python has green threads. I'm glad you've never run into someone like that.
If I have a task that runs more efficiently when using a solution like numba + pyhton threading vs multiprocessing, then I'm using threading, I don't care what Guido says, since the proof is in the pudding, not in some dogma.
Now if I'm forced to use only pure python and nothing else, then yes I would use pure python, but I'll probably resign if it comes to that
What linux distribution do you use btw ?
This was Ubuntu, but on another computer I also use Manjaro. All my configs are the same and carry over from one to the other
@@JackofSome How do you get "Spotlight" typing searching in Ubuntu?
That’s really helpful! Could you talk about cuda programming?
Soon. Cuda through Numba :)
Does anyone know technically why ProcessPoolExecutor utilizes 10gb more ram?
It's creating new instances of Python as well as making copies of the data that's shared (that last one may not be true in newer versions of python). The ram use adds up
@@JackofSome Ah, it's the copying of the root instances data to the new instances that probably cost the most. I wonder what the memory footprint of a pure Python instance is if you create a new process immediately after starting the root process.
Intro slow zoom-in causes motion sickness.
The main thesis is pretty wrong. Nobody cares if Python threads are implemented using kernel threads or not. What ACTUALLY matters is whether you can get SIMULTANEOUS EXECUTION, because that's what's going to let you use all of the many cores that are usually available. If you can't do that easily (e.g. you need to resort to multiple processes, or some third-party facility), then Python is not amenable to a VERY WIDE range of applications. This non-amenableness explains something said near the beginning: When was the last time you saw a CPU-bound Python program? The reason this doesn't exist is because when people set out to write a CPU-intensive application, they immediately throw out Python as a possible implementation language.
Numba looks pretty interesting, and was new to me, so I did get something out of this, but the fact that you have to explicitly go out of your way to release the GIL in special sections of code proves the general wisdom correct: When using CPython, you have to jump through extra hoops to use all of your many cores, because in its "native" state, Python doesn't let you do that.
I agree with your last statement, that was pretty much the point of my video. I've run into enough Python developers that don't understand it and that's why I made the video.
You _do_ understand it. Though, a few things: 1. Using FFI is in python's DNA, this is why numpy/Tensorflow/et al are so effective. Some of these libraries use their own internal threading, others are just very good at releasing the GIL at the right time., 2. Yes, Numba is pretty amazing, I have a video on that too.
And I talked about it with someone...
I know that Python can overcome those limitations.
What a clickbait video. Offloading some calculations through FFI does not make python multithreaded in a way programmers think when they hear "multithreading". Nor sequential concurrency does.
Ok.
A good understanding of what I'm showing here _does_ make Python more useful than many assume it is, and that's the ultimate point.
We can sit here and split hair, or we can get back to work and write better code :)
@@JackofSome Now that's a good point. Agree 100%.
How the hell does he type so fast ? is your PC in sync with your brain ?
Love the content btw
Through the magic of video editing.
I speed up parts because I have no interest in forcing people to watch my type some stuff.
Hey man nice video. The GIL part was really interesting. But you need to invest more time on the code. I cant understand what you are trying to do if you are writing it on the fly on console. You need to have ready made notebooks and go over them while you speak. And many some more complex examples. More to real life
great demo
I mean if you're calling out to C then the GIL doesn't apply.
It does. The call still originates from the python interpreter which will acquire the lock first, but you can instruct it to release the lock.
Wow.. thank you :) I have an app that runs a lot of threads, while i don't know it this will really work (using some shared resources like print() to console ) i would like if python would offer this as integrated in python to bypass for "sure" functions. I wonder if we can use intermediate no gil with gil enabled (i seen that decorators exist). I will try this .. Thank you again
You can totally use intermediate nogil inside gil enabled functions. Just can't do the opposite.
yeah.. thats no good .. i use a lot of python object that rely on gil... nogil assumes that you don't you them . I had a glimmer of hope :d. I can't use MultiProcessing because i use some syncronizations between threads .. i could use ipc (not many on windows) or the db (which is a bit slow and sometimes unreliable at many access multithreaded operations). I would have liked a fast and safe way to update my code. Too bad :(
Never the less .. i learned something about gil on python.. and for that i bow to you. I will explore a bit the python code (if i understand any ... python programer with little c knowledge) and see if if i find any clues to improve my code
You can mix and match. Numba really shines when you have heavy operations that can be cleanly abstracted into simple functions.
There's also @jitclass but I don't think it's out of beta yet.
Great video.
Not a topic I was wondering about much, but as a Python programmer but without much in-depth knowledge, I find myself Googling and Wikipediaing a bunch of terms mentioned in the video just so I can follow. PS In the last few years, I have built up a stubborn belief that given enough effort, there is nothing that I cannot eventually understand. So I am not intimidated by more advanced content such as this as I was before. This video (and I'm sure the others) are perhaps more informative than you think.
That's a fantastic attitude to have. More power to you!
how tf do you type so fast? are you speeding those sections up?
I'm superman 👀
Yeah those are sped up. I try to make videos as concise as possible and it's my firm belief that no one wants to see me slowly type in code
@@JackofSome i've internally convinced myself that your wpm is 300+ and that i am a pathetic germ cell, rejected by the egg, and doomed to die off simple sugars.
It's interesting, but in a way, it feels like you just confirmed my idea of python... Python is just a facade. In order to show that "you can", you had to use numpy, which is actually C. So, in the end, you are not using multiple python threads/processes, but executing multiple C tasks.
With Python, I have found that most of the libraries that I use, are just facades to tools written either in C, C++ or Java.
A few things to note:
1. I'm using numba, not numpy. Nunba translates python code to llvm, much in the same way that the rust compiler translates rust code into llvm. Saying "so this is just C in the end" is a bit disingenuous because that would be like saying "so rust is just C then", which is not quite accurate.
2. You're right, python makes a great interface to lower level languages, and is often best when used this way.
3. You're wrong, in that these aren't separate C/C++ processes (no more than the regular python environment is since CPython is ultimately implemented in C). The thread is spawned by python, it's managed by the OS, the calls are all made by the python interpreter, etc. Python calling a C function isn't much different than C++ calling a C function.
The point of the video is that python is useful, but some times people incorrectly think that it can't do certain things (limiting its usefulness to them) but those assumptions may be flawed.
@@JackofSome Thanks for your response. What I meant is that at about 8:58, you used numpy in a function to get all cores to work. It was a bit odd to me that it was not possible for you to do it with just Python.
TBH in minute 1 you actually make a point that pretty much applies to 99% of python apps. Everyone is so concerned about peak performance, and then they have some site that receives 100 visits per day...
And thanks for the clarification in #3.
Oh I see what you mean. Yeah I forgot I was using a numpy function in the function I jit compiled. You don't need that though. If you watch my numba video I show something very similar with pure python code (which of course gets jit compiled) and what you can achieve there by releasing the GIL.
Amen on the performance panic part though. People do worry too much some times when they really don't have a bottleneck
@@JackofSome Sure, I'll check it. Thanks.
I'll blow your bubble by telling java runtime is written in C
Thank you for having the key takeaways at the beginning, and for not burying them in the middle or the end
There are some misconceptions here. In the old days we had 1 cpu with 1 core. Nevertheless full multi-threading was possible, because the OS would apply time slices and alternate the threads using its thread manager.
This is very useful with a GUI program, because the GUI code would be sitting idle most of the time, waiting for a keypress or mouse click. So the worker threads would be able to do their job in the meantime. Only when the user would take some action, the GUI thread would kick in. So the GUI would remain responsive, while the worker threads would do their long tasks. Think of reindexing a database.
Also handy with scripts, e.g. when you want to cancel a job.
Of course. Back in my mechanical engineering days before I knew about threads I ended up making something similar to threading by using timers in an arduino so I could microstep some motors. Fun times
Damn! Thanks!!
For best python performance use mpi4py. I have used upwards of 10,000 CPUs with it (about 200 nodes). Threading just plainly does not work efficiently enough in an HPC environment.
Since I am very very interested in making video games by Python... ( 2d, 2.5d, and 3d ) this video is very important.
6:20 ... eeehhh 🤨 help(ex.map) !!!?
In that situation I'd probably need to do help(ThreadPoolExecutor.map) probably, or with a question mark, but point taken.
I would study those results a little bit more. You dont seem to be using multi-threading ...
I wouldn't. I've been using patterns like this to supercharge my python codebases for years, even beating an equivalent C++ implementation in one case.
If you can elaborate more on what you mean or show examples then we can actually have a discussion.
Yea, but don't you get rid of some of the advantages python provides. Wouldn't it be better just to stick to a language more fitted for multithreading i.e. C, C++, C#, Java, etc...?
I don't see how? If you're doing anything data related in python, you're using numpy. If you're doing any computer vision, you're using opencv. If you have a few pieces of code that need to be accelerated you may be using numba or cython.
I'm just showing how you can make the most of your CPU given these existing scenarios. Being able to write the absolute fastest code ever isn't the only consideration when choosing a language.
I see your point
Senpai Jack is hot to the core!
2:10 ...aaaand I lied. :D
Bahahahahaah. I made a conscious choice to leave that in the video
threading is fundamentally single cored anyway
Can you elaborate? Do you mean the general concept of threading?
@@JackofSome I think he is just trolling.
@@tahaan99 how is he trolling?
You can show ray and plasmastore and how big objects can be shared across processes so memory usage will stay lower due to no copies.
So you just said, when you don't use python you can get threads, isn't it?
You can believe whatever you want tp believe. I'll continue to write more performant code.
Numba is too young. It don't even support python list to be passed into function. Instead we need to use numpy. And that case too it don't support unicode arrays.According to numba documentation, built in functions like .split() are slower than cpython implementation in nopython mode. There are too many limits while using numba which are not covered in this video.For industrial applications I think cython is the best option to accelerate python.
As well all things, it depends. I've been happily using numba in production for over two years, will continue to do so.
Edit: just realized this comment wasn't on the numba video 😅
@@JackofSome The official numba itself docs lists all the limitations. May be you only do numerical computations.In that case it might work. But it fails for many other general purpose applications.
I'm aware of the limitations. Just pointing out that "just use cython/c/rust" isn't quite a good response to them since so much is situation dependent.
You basically proved yourself wrong in the beginning of the video by assuming that very common programming task of heavy calculations is for some reason very rare.
In my current project I have multiple threads that do some data preprocessing for feeding it to the keras and that requires multiple cores. And no it's not just numpy arrays, it's selection from the sqlite database and then formulating input as numpy arrays accepted by the model.
It is very slow and programmatically heavy to move processed data between processes through the pipes. Would be much faster with proper threading. I know it's my fault that I am too lazy to rewrite processing in go or something like that but still it proves python is very bad for multicore programming at the moment.
Any chance you can share a gist of the preprocessing code? I do a fair bit of data preprocessing work for neural nets and am often able to make it work with threading.
Of course the video isn't meant to be a one size fits all. It's more "python is more useful for threading than people seem to believe".
Thanks Sir, Dave Beazley would be impressed i guess.
Oh? How come?
@@JackofSome just guessing, u know :D
recursion go brrrrr
Your content is great, thanks!
nogil + threading = true parallelism? I wish I knew this trick earlier so I don't have to dig into the shithole of multiprocessing.
dont confuse parallel with multithreading...
Well then it's a good thing I didn't
This video is quite misleading for the audience it targets.
I'm not trying to target absolute novices...
In all possibilities, just like your video, people here (and probably you) will hate this comment... However, your video itself clarifies how bad Python is in contrast to new machines (multi-core) running it. There are multiple reasons and some you might agree already but the worst is, your pinned comment. One of the first tech video with such defensive comment. I don't care how accurate your video is or not, didn't expect such comment and defensive approach. Also, just pointing out the common misconception and not why those mis-conceptions came in the first place is one bad thing. Not accepting that python itself is not at all performant for multi-threading is another. If you need a wrapper for all C-libraries, many languages provide a bridge. Python might just be rich in C-bind libraries available. But libraries don't make a programming language.
There are languages solving multi-threading in far superior ways on top of giving C-binding for running C-code e.g. Golang, Rust, Lua, Clojure, each in different ways and getting way better result. Heck, even JavaScript (Node.js) is way better and faster and has a better way of handling multi-threading.
No hate comment, I again repeat. It just that, despite your explanation or going through extensive research on evolution of GIL through the years and through different versions of python and its interpreter, it still seems hacky and lagging behind modern machine capability by miles. If python wants to be just a wrapper, may be its not worth being a language (for multi-threading).
May be I am a noob at python (totally true), but python seems to be just a toy for kid to start doing some small funny stuff without knowing anything about CPU arch, Processes and threads.
No you're absolutely right, the GIL was never designed with multi core in mind.
That said that kind of doesn't matter to most of us as our job isn't "pick the best language" it's "here's the most convenient language that was chosen 10 years ago, make the best of it" and that's going to be Python for a LOT of things for a while.
The point of this video is to make the best of Python.
Oh and I do dislike your comment, but only because it's been said a million times before and ultimately has no relevance to this specific video.
Anyone understood this stuff? Timey-Wimey stuff to me.
You have to know:
* Basic Python programming
* Basic Multithreading and Multiprocessing in Python
Please tell how to run node js in Android in separate video 🙏 please I request you
Bruh. We are not using pure Python for computationally intensive work because we CAN'T. Python can't do it.
r/whoosh
Uhm, youre spreading misinformation with clickbait titles. Having multiple threads does not mean youre executing these in parallel as "system level threads." It doesnt matter how many threads you have running, only ONE is executing at a time, essentially making python behave as a single thread. That is what people say and always have been. The GIL is designed right into its core, trying to say "hey look guys i see another thread on htop, lolz proven wrong" is the worst attempt to disprove this ever. Prove they are executing at the same time with pure python code, they arent.
1. You didn't watch the whole video
2. That's not the take a lot of people have. There's a great deal of misinformation that python threads are green threads and there's absolutely no way to get anything initiated in python to run in parallel, pure python or otherwise. That's what this video is meant to counter. In this and other videos I show threads spawned from python executing code (either llvm code or compiled C++) running in parallel, which of course works fine when it's ok to release the GIL. You'd see this if... point 1.
@@JackofSome I did watch the full video, your title is clickbait. Again, the GIL makes native python behave AS a single thread, that IS what people say, regardless of how many threads you spawn. So youre simply wrong and trying to use really flimsly "proof" to make a point thats simply false. You cant get threads to run in parallel in the true sense of a p_thread, its built into the very core of python. If youre making a channel to educate people on python, then you have a responsibility to know what youre talking about, and youre simply wrong on this video. You also dont seem to understand why the GIL is still related to why multiprocessing is so inefficient. Go watch some david beazly videos on the subject
Secondly, youre using c extensions as examples of "releasing the gil". This isnt python anymore, so again youre misleading people
1. No, that's not the people I've met say. You're lucky if you've never encountered that.
2. Non native python code (as I said, either LLVM code generated by Numba or compiled code from C++/rust/whatever) will run in parallel fine inside python threads. The proof is in the pudding. I rely on that behavior in a number of my codebases.
The video and title are perfectly fine for those it intends to educate. You're apparently not in that category, and that's ok.