Python is Too Slow
HTML-код
- Опубликовано: 27 июл 2024
- Recorded Offline at 26.09.2021
References:
- Porth Source Code: github.com/tsoding/porth
- Porth Development Playlist: • Making Programming Lan...
- Project Euler: projecteuler.net/
- Project Euler Solving Stream: • I made a whole Languag...
- Porth Simulation in Go: github.com/drocha87/go-porth
- PyPy: www.pypy.org/
Support:
- Patreon: / tsoding
- Twitch Subscription: / tsoding
- Streamlabs Donations: streamlabs.com/tsoding/tip
Feel free to use this video to make highlights and upload them to RUclips (also please put the link to this channel in the description) Наука
"Python is not slower than C!"
*shows code that has byte magic, and assembly in the middle*
However, the test point is with list operations not byte magic and assembly.
Once your language becomes self-hosted (compiler written in your own language) so that you no longer have to bootstrap, that will be an amazing feeling for anyone involved in the project.
22:26 "that's why I'm unemployed"
That cracked me up. xD
Very cool video, seeing the process of profiling and troubleshooting porth was incredibly entertaining. The only thing I can think of that could be improved is the distinct lack of green frogs. In your future videos consider including more green frogs.
14:03 yes, you can set 0 to the first enumerand and the auto() will follow.
"oh don't tell me he switches to C AGAIN!!!"
In college we could choose which language we used for algorithms homework, there were run duration requirements for full marks and people who used python got a 5x time bonus because it’s so much slower. I’ll never forget that.
That would mean their algorithms needed to be faster than yours cause Python is about 200x slower than pure C.
The fact that 15 million appends was 100 seconds in your interpreter and was 6 seconds with pops in the stack stresstest tells me that your if ladder for all ops is very slow and could be optimized by putting common operations earlier in the ladder and potentially by eliminating it altogether and turning it into a map lookup to find a function pointer to call.
I am addicted to the channel
Tso did you try to run porth.py with python -O (disabled asserts) to check if it runs any faster?
I made sim run many times faster by avoiding the "elifs" with a map of functions. Alternatively, just putting the most common intrinsics like DUP, SWAP, DROP etc first in the "elifs" already makes it noticeably faster, for obvious reasons.
I find sim pretty usable now actually. With pypy, euler problem04 takes 10 seconds, and my PC is clocking at like a 1GHz, so.
when I see you struggle with the Python enum, I'm so happy with the way Go does enumerations xD
update: what is that travesty with the time function? :O
30:07 One simple optimization for the stack is to initialize it with a sequence of zeroes of some size (maybe 1k is enough) and and instead of pushing and popping, you just do common getitem and setitem, with help of some sort "stack pointer" local var. Probably append and pop deals with memory allocation/freeing and maybe that's what makes it slow.
Or you could use a numpy arrays though it would add a dependency Or use the deque type where searching is O(1) rather than O(N)
@@craiglobo2165 agreed. I always forget to use these...
@@craiglobo2165 Array indexing is O(1)
@@homelikebrick42 I'm pretty sure it would be O(n). Python would probably used linked lists which are non contiguous in memory.
@@HiHi-ur3on python does not use linked lists.
cython is also a cool alternative if you like python syntax. It compiles python code to C with "appropriate" levels of interaction with Python's C libraries. It has been around for a long time.
A bit of a learning curve, but I think it works quite nicely.
If you're a beginner learning Python, don't worry because this is the right time. By the time you're pretty good at python, version 3.11 will be released, which targets a 2x performance improvement of the vanilla interpreter (CPython)
Would using switch() help? Or maybe python has labels? Or function pointers?
Since you started the project and created the "stack" I was telling myself this will blow up big xd
You should consider hiding operations that weren't used from displaying and maybe even sorting list by time or use count
I wish I could make my own programming language. A compiled crossover between C and Lua, free of bloat, free of safety paranoias, and with optional typing, and begin-end blocks (implicit begin, like Lua's).
Alas, I still have a lot to learn and a looong way to go. :)
Really interesting one! Thanks!!
On my laptop using pypy for problem 4, on windows, it runs kinda fast
906609
wall: 00m:04s:111ms
user: 00m:00s:62ms
kernel: 00m:00s:31ms
Hi man, great videos!
Where are you from? I hear you have an accent
And could you arrange your videos in playlists so that it will be easier to track series?
Surely the Go version should be called Gorth? :)
C version: Corth, Java Version: Jarth and so on
I'm waiting for Haskrth (working title) XD
The append/pop loop gets a lot faster for me if I don't start it with an empty list. I used %timeit in IPython to measure it and it goes from 23.4ms±20.9 µs down to 8.97ms±58.3 µs for 100.000 Iterations. I assume this is because python deallocates/reallocates the memory of the list if it gets empty?
28:55 knowing exactly just how slow python is, I was thinking “oh dear, you’re in for a surprise, boss” 😂
I spat my drink when he was like “are you fucking kidding me??”
Love the content, love the voice and the quality of the audio. Just one recommendation, you should use a de-esser or eq to lo the sibilance in your S and CH, you could make it a preset and forget about it for the rest of your youtube carreer!. Amazing content! Keep it up
What a nice font. Can u give me a link to download that ? Is it `Losevka` or `JetBrains Mono` ?
ee-o-sev-ka, `I' not `L'.
22:25 “That’s why I’m unemployed.” Lol Tsoding makes my day, I love this series
mate, I guess you should use smth like np.ndarray with dtype=int instead of plain list, because list by default allocates each element on the heap separatly (to keep list dynamicly typed) and it cannot get advantage of knowing that you're using only ints in it
honestly expecting some optimisations for "stack" :)
pypy was unexpected solution for me.
what was the tool used for zooming onto python? is that a obs thing?
github.com/tsoding/boomer
I guess using list and vector is not the same. I'm not sure how list in Python is implemented, but C++ std::vector allocates one byte if you push into it (which is slow) and doubles the capacity when new element is inserted and does not fit, copying or moving all objects to newly allocated vector. But as when you push and pop capacity remains equal to one, it just internally increases size, assigns 69 and decreases size. It's not the stack operation. I'm not sure how to use stack directly, maybe by calling empty function which stores instruction pointer at the stack, but it does more. Reading from vector in Microsoft implementation of std::vector checks if you are accessing valid element and integrity of the vector, which is why debug mode on WIndows is painfully slow.
I guess std::list in C++ is very similar to deque (double ended queue) and std::stack or std::queue are deque with certain operations forbidden. Stack can be replaced by vector which is much faster.
In python it might be the same. Pushing means that you need to update head and tail of list to point at inserted element and allocate memory for it and free it which is very slow.
That's why I'm writing python compiler
Wow
That’s rlly cool
@@joshuachan6317 sure, unfortunately it's long and time-consuming process, but at least it's interesting
How would that fit with pythons typing system
@@evanwilliams2048 wdym by fit?
It is a very very small thing but i have to say, you can open a link in new tab using mouse middle click
a missed opportunity to call Go-Porth, gorth
this guys voice is amazing
If i'm not mistaken, python's pop for lists is O(N). Try switching to a O(1) implementation. It should be as easy as :
```
from collections import deque
stack = deque()
```
That's what I was thinking too deques are more optimised for this
The test program he had is still incredibly slow with that, but I guess it would help in the emulator
list.pop is not O(n). Why would it be?
@@Astana1337 pop() is not the same as pop(0) and pop() is O(1)
@@DJminecraft2445 Who said pop()? they said pop.
It would be smart to show time per operation on average
oh really? thats a surprise ofc. only a select few know this. btw love your videos
Did he just miss it that the 64bit read and write operations are full of swaps?
Shouldn't you use the timeit function to be timing these?
Лагает в load64, store64 потому что там по 8мь(+-) dup`ов, swap`ов и over`ов. Хотя автор сказал, что не из за этого (access mem 64).
Curious... why not make a Py3-to-C++ (Lightweight) standalone 3D Engine whereas the the C++ initially interprets which variable types to assign? I did that with JS which worked perfect.
Euler ❌
Oiler ✅
Doesn't python have a real bad Big O for pop from a list?
Nvm. only for list.pop(0)
And as far as I know, python list are arrays under the hood, and depending on how "array-growth" is implemented there is the possibility that python has to copy the memory a lot of times.
You could use sth. like flamegraphs to see were the python vm spends a lot of its time. But in the just switching to pypy is a lot easier and maybe CPython is really just freaking slow in this case
i almost didn't recognize you. You been going to the gym, seem more fit. That's good.
what is the name of the software u use to code in the video?
Emacs I believe
VIM
@@lalmiahmed3573 That's not Vim .. look at the status bar. He's using YAS, Company, and ElDoc. Obviously this is GNU Emacs.
Cython would be good to try for this
Semicolons are too slow in python
I am always like a skeptic about the thing that if I write a language(say a language similar to C but with no pointers and with a boehm gc) in ruby (which is more slower than python) and bootstrap it. Will it produce as fast software as C? Like a recursive code for fib(45) in pypy takes around 16-18seconds on my computer and same thing in gcc takes around 6-8seconds!... Is it possible to make a compiler which can atleast near performant to C lang.... (Note: I quote the example to express my question so that it can be understood).
W porth he can emulate it in python (slow) and can compile it to assembly (fast, but probs a bit slower then C cause optimizations)
C Compilers is complex and does lots of optimization, if you manged to get the same optimization level for your compiler than yes, you can also output stuff other than assembly like llvm and C code both would achive same speed and you will not need to do optimization
people use cpython because it's the default and reference implementation (and pypy has ... issues ... with c modules)
Instead of a list, use a deque. And instead of huge if else if chains (for the bytecode) use a dictionary that maps the code to the function, to get o(1) lookups.
dont know why i never thought of that.. but thats really creative and useful thank you
From the future!
I just ran the same stack stress test using all of the latest versions of Python, PyPy, and C++. The results are wild:
Python 3.11.7 - 1.3s. Respectable now!
C++ - 0.5s with optimizations off. Pretty good! With optimization level 3 it dropped to 0.07s! (compiled using g++ as tsoding did)
PyPy 7.3.15(Python 3.10.13) - 0.02s!! It somehow is even faster than C++ fully optimized.
Now, that being said, I also have had many instances where PyPy ran SIGNIFICANTLY slower than regular Python, so take it for what you will. That is impressive though. If only it was always like that, it would make PyPy and sometimes even Python a real speed contender 😂
lists aren't made for speed or efficiency especially pop and append methods. Use a NumPy array or Initialize all values to None for the max number of operations for the stack and use a stack pointer variable
lists are made for efficient appends.
@@greggoog7559 That has little to do with JavaScript and the list type. It has more to do with jit compilation.
@@greggoog7559 Because it is a language, just like JavaScript is not jit compiled. V8 is a jit compiler for JavaScript, which you probably use. PyPy is a jit compiler for python.
@@jensrenders4994 You are underestimating the imortance of data structures.
Also if your list doesn't contantain 50 000 elements and thoose elements are not very large(which is like 0.95 of all cases) then the data structure with the fastest appends is plain array.
@@patryk_49 in which implementation of which language?
13:45 are you serious? Hahaha that was funny.
List ist optimized for convenience
Like almost everything in python
haskell moment
Write Porth in Porth
That's the goal. Really looking forward to the step of self-hosting.
List is slow in python. My suggestion is to maybe look into python's native arrays (just `import array`) since you're using the same datatype for the elements or the deque datastructure (i think `from collections import deque`).
Alternatively there's always Cython, however this may be a bit too much of an investment for just a simulation mode
No, python’s list is like c++ std::vector. Linked list is a queue, but it made not for stack tasks absolutely. List is a the fastest native python stack solution.
@@markervictor I had a quick look online and you're right, putting an edit in my comment
it is not a linked list
@@danilo2735 yes I know a previous commentor pointed ot out to me. I ammended my comment
@@atrumluminarium Sorry. I didn't notice the other comment.
Now someone need to do a tool that convert python to this language and probably obfuscate it and compile it to a .exe or .bin in Linux and so on 😅
Linux is ".bin"? I never see someone said that. Both are ".bin" if we exclude file extension. Converting a Python to this language is kind of... a little impossible. Python is filled with many built-in function/modules.
Simply put, pypy doesn't support the latest features, especially the walrus operator which I use a lot
*Opens cpp doc to show example*
*Happy Programmer noises*
Takeaway : Python may be easy to pick up, but hard to master
Why would you bother mastering something that is a shity language by design? Only to realize that at the end of the day, the language sucks and there is nothing you can do about it? To be implementing more layers of abstraction and workarounds that will get exponentially more difficult to remove in the future? Is this something to be proud of for the next generations to come? Opinions lie, numbers don't. If python is this slow it should of been deprecated long time ago and thrown out the window. This is what happens in every other industry except this one.
Wierd my timing in regular cpython are 1.7seconds for 15.000.000 push / pops
start_time = time.perf_counter()
end_time = time.perf_counter()
print(end_time - start_time)
It was never meant to be fast
After watching this, I profiled my Rust implementation of the simulator with the problem 4 implementation using VTune, which ran it for 2 second's worth of execution to get good data (so executed about 10 times).
Of that 2 second runtime, a full 500ms was spent executing Swap. Digging a little deeper, 400ms is spent on the single instruction used to load: MOVDQU. It's not like it's hitting RAM because the top of the stack is hot, so I'm a little surprised it takes that long.
Other hotspots were division at 239ms, and fetching the next instruction at 800ms. Neither are particularly surprising.
stack stress took 0.5 sec with pypy
1. Claim Python is slow
2 (optional). Learn Python
yeah but.. python is slow.. by design, it can't be faster than C or C++, heck even go
@ishdx, Go is compiled, by design, to a native machine. But, the size of the result file is very big (usually 1 MB). You are comparing an interpreter "programming" language to a compiler "programming" language, that is unfair.
@@FaranAiki yes but do you realize that to run a python executable you need the runtime, and the runtime is far above the 1 MB threshold, CPython runtime is about 4 MB which is far more than something like go result executable, but that's only the runtime, because you also need to bundle libraries. in result if you want to have a final EXE file that an end user can execute without going through installing python and dependencies, you have to bundle the runtime with the executable, as such resulting executable will be over the 4 mb threshold, which is higher footprint.
i like python but if you want to go fast, python is probably not the language you need. C/C++ without CRT can give an minimal executable as small as 4kb, without any runtime or dependencies needed.
@@ishdx9374 The Circuit Python runtime is under 1.5 kb
@@Cons-Cat i'm pretty sure it's not possible, can you link some sources? for example i downloaded a patch from their website and it's 1.4~ megabytes, not kilobytes
Once the language becomes self aware, it will optimize it self 😆
pypy does not support many usefull libraries like pandas,some numpy functions,many scipy modules and lots of other
libaries
Yeah, I already encountered problems with pypy when I tried to use it with mypy which uses some internal CPython API :D
Yeah, you assumed that Python lists are arrays... Big mistake.
Try using f strings to format stuff when printing. Much more convenient.
28:00
one thing i found in python is that it's actually way better to use the beginning of a list, NOT the end, for stack-like operations.
var = [1,2,3]
var[:0] = [0]
var == [0,1,2,3]
var.pop(0)
var == [1,2,3]
of course using the end of a list for stack operations means you'll be trawling down the entire list each time. Tuple accessing is faster because, due to their immutability, the computer can create a constant-lookup set of memory locations, whereas lists cannot be optimized by their very definition
okay 30:00 shocked me. i had no idea even tiny lists were this slow
You might consider to try to rewrite some parts in cython. It's almost the same grammar.
Cython, not cpython, he already use cpython
In fact, it *is* the same but adds C types in C style or Python compatible style, and the compilation step.
I'm not sure if he'll find it worth it just for the sake of simulation mode but I guess it's an option
If you want to make python programs faster, please use a Jit such as PyPy or Numba, or use libraries with fast C backends such as numpy, or add C types and compilation using Cython.
You might want to watch the video before commenting...
@@welltypedwitch What do you mean? I watched the video... the comments are not any less valid.
@@barendscholtus1786 because he ended up using pypy as well?
ayy another i3 user in the wild
Python is definitly slower than C. But you seem to be one of so many who ignore the rich or should I say waste ... or even ... gigantic store of packages, all coded in C ...
So if your Python-Code is very much slower than it would run in pure C, than you know just one thing for sure ... your Python-Code has an awefull lot of potential for optimizations ... ;-)
Focus on isolated language statements is anything but has nothing to do with real life applications!
My future path is writing MVPs in Python and rewrite say optimize it all in Go ... ;-)
9:45
Python has a type in the collections library that does what you want.
You pass it a lambda or type to default to.
Code:
from collections import defaultdict
d = defaultdict(int)
# d = defaultdict(lambda: 0) # not needed because the default int is 0
d["Op1"]
print(d['Op1'])
d["Op2"] += 1
d["Op2"] += 1
print(d['Op2'])
Output:
0
2
Of course your list implementation is probably better because I would expect that it produces less interference in performance measurements. But it's good to know.
Thanks for the interesting video.
Please remove if elif chains and use dictionary for managing
Never!
@@TsodingDaily That's sacrilegious man :D
push pop was eye opening, god even js is better
Fcuk! I am switching
31:04 wtf hahaha
deque should be used for stacks, also please remove elif's - use list lookups instead
why not just port it to c++?
That is like saying, "Why not rewrite the entire thing into C++?"
@@FaranAiki yes, don't see any problems
@Puncherino Kripperino, wait, is this interpreter or compiled language? I do not follow the series.
@@FaranAiki Porth is compiled language, but Tsoding also implemented the simulation mode, which is basically interpretaition.
@Puncherino Kripperino, that would be very slow. An interpreted language (Porth) interpreted by an interpreter (Python) by which that interpreter is interpreted by another interpreter (CPU) is of course slow.
But, still, to rewrite it into C++ takes a lot of effort.
make porth in porth in porth :)
Hi--lllo
its not that python is slow but python sucks, when you got better languages now like phix and nim, which give you 1 binary file that has no dependencies , python code it once distribution hell forever
акцент топ!
this is not a news xD its obvious
wtf?!
Lol
let's use Pony lang!
You're unemployed because you are opinionated!!
I noticed for a long time that Python is slow like a turtle on crutches, so I program in C ++. Python is really bullshit Lego for kids.
The guy in this video barely knows how to use Python lol. Then he's surprised that it's "slow".
This is the most bullshit take I've ever seen.