@@Nohomosapien the joke was that the whole point was to speed test it, not to actually have "1000000000" in the console. So it makes no sense to write n = 1 billion lol
@@TheBuilder the only avalible optimisation is just to skip iterations and add billion at once. Compiling with optimisation flag will change nothing or will make the test meaningless, I think.
You see, Python is actually good since it gives you time to get a glass of water, do something, play a game, graduate and get a diploma, get a job, witness the year 3000, all while the compiler is doing its thing.
I told one of my professors that python was probably the best language to learn about parallelism and concurrency. He looked like I just admitted to support dog fighting. I then told him that python was so slow and inefficient, you would be able to visibly see the time difference from running on multiple threads. He laughed.
@@croma2068 It does have some niche uses. And I honestly believe it should be taught to children instead of cursive. That being said, it is pretty slow. And when its not slow, its because its using some other language.
Python is honestly the best. Sure if you're working on a large product or in a competitive setting, you'll need Cpp, Java, or sometimes C. But languages are just syntax, after a while using a new language is just deciding on the best language for the task then a few google searches. Python has so many uses, not just for learning. It's very heavy in ml, has many useful libraries (graphing, numpy), and can do straight up magic compared to other languages. Whenever I want to automate or check something, I just open idle. Edit: people replying about the "magic" just being simple pseudocode, yes that's exactly right lol. In just a few seconds I can reformat data with nested dict/list comprehensions into a structure that would take a dozen lines in other languages. I know lots of the stuff are written in C/Cpp anyways, that's like saying go use assembly because everything ends up being like that anyways. My whole point is that python is extremely useful, obviously there's a time and place.
@@noornasri5753 "straight up magic" aka it can do what other languages do in slightly shorter syntax but it by no means has an advantage over other languages library wise. The only reasons python is so heavily used are: 1. its installed by default on all major linux distros, so its a prime choice for automation without further setup on fresh boxes. 2. Due to its simple syntax, its been picked up by data scientists to use for ML, but mind you ALL the heavy lifting is done in C++ so you could easily take say javascript which is also very easy to use (and has equivalent packages for everything) but is also much faster (using this shitty benchmark, written in node I get performance thats just a few ms slowler than c++) Python is not the best, never will be. Its just a very conveniently placed language. Want a language that actually does magic that no other language can? Learn rust.
@@MrMeaty6 Well, @filipburcevski9566 is right because it's going to take you more than one second to name each number... So yeah, it's going to take a few centuries.
I studied C++ and C# in university, but before that I was learning python (through online courses) for exam. Now it is funny to remember how teacher said that if we won’t optimise our code, it will calculate for longer than the exam lasts
When I was at university (ok, it was Thames Poly. It only became a university when I left), if you wanted to add a gigabyte of memory to your computer you'd first have to apply for planning permission to build an extension. Now, 32GB comes in those handly blister packs sometimes near the till for 'impulse purchases'.
@@herseem I know the feeling. I wrote my first programs in 1978 in high school. I wrote my first professional programs in Fortran in 1984 on a VAX computer shared by about 20 people. It only had about 256 MB of RAM to support all the users.
@Game Plays 1230 Avoid dynamic features in inner loops, cache, use aggregate functions like the array functions in numpy since they are written in C and use vector instructions
This is so true. I had to rewrite an entire program during an exam because the way i wrote It the first time was so slow that i couldn't reiterate it the number of times needed before running out of time
That's why ChatGPT crashes so much after we've had a "long" conversation.When we open an old conversation, it crashes the entire browser, does not process new messages and crashes the entire chat.That too on the Android app...
I cut my teeth on C and only came to Python for work purposes target recently. I'm often told in coffee reviews that I'm getting "paren happy" in my conditionals and a few other places. Not so much an issue with semicolons, weirdly.
You might think that the major time difference was because he used different coding languages, but python would’ve been way faster if he just removed a few zeros.
I had never delved into programming before, but this seemed so straightforward that I couldn't resist giving it a try. Using VS Code, I successfully executed it! Thank you; it was an enjoyable experience!
You are lying lmao. You need to setup several things before using C++ and Python on VS Code. You can't just run it out of the box by downloading a code editor app.
Sometimes you don’t need the program to run quickly. You just need it to run. That’s why I love Python. Although it’s certainly not the fastest language, its ease of use is great for beginners, and I can write a program quicker in Python than I can in c++. I don’t think I would be a programmer if Python didn’t exist, quite honestly. And keep in mind I love C++ too just for different reasons.
You said everything right. However, there are people who are sure that there is only 1 language for all tasks in the world and that is python. I say this because I know a "hacker" who brutforce passwords in python (brutforce to find hash colision) and wondered why it took so long
Jokes aside, the level of draft code you can make in python is unmatched by any other language. And honestly, for most stuff nowadays python is trully enough.
@@ruynobrega6918 That depends on the application. There are a lot of languages that are MUCH easier to use in particular applications. I have done a lot of scripting programs in Perl and it is much easier than Python for that. And MATLAB is much easier for many engineering applications.
Is there a reason why that's not done on default? How would the assembly code look like? Would it just be an add instruction bilion times? Wouldn't that binary be absurdly large?
@@morgard211 sometimes optimization gets in the way of debugging so its an option. Enabling optimizations would probably result in the compiler computing the loop at compile time and giving you a binary that just prints the value
@@morgard211 also optimization is not perfect and sometimes breaks functionality in complex programs, requiring additional time to figure out what the optimized did and how to tell it to not do that.
@@morgard211 Assembly is also capable of loops bro (jumps/calls) So about 5 lines are enough for this (if you actually loop through it unoptimized) If you Compile on -O3 the compiler would register the loops and set the variable to 1000000 instantly instead of adding 1000000 times
@@chri-k Optimalization only breaks code if there is UB in it. (Assumeing that you have a good compiler) If it breaks a complex program then that is because someone wrote a wrong code somewhere in it. But if the faulty code is in a wrong place, then it is extreamly hard to figure out how to fix it, maybe harder than it worth.
I ran it in Common Lisp and Racket (translated by ChatGPT because I still suck at them) on Linux Mint with a 5600x CPU: - Python took 39 seconds. - CLisp using SBCL took 0.222 seconds. - I gave up on GNU Clisp after 6 minutes. - Racket took 0.845 seconds. I altered them to print the results to make sure they actually did the task, and they printed one billion. The Racket script is relatively complicated and might not be optimized well. But these results are crazy!
I love these weird languages. Like who uses them. Where did they come from. Why are there so many tutorials for them but literally no one has written a legit program outside of playing around with the language. Why does every university student have an inexplicable urge to learn one and then act superior about it. Like if you told a coworker you wrote a sphere packing algorithm in Rust that can calculate physics for 5 billion spheres in 3 femtoseconds they'd be like 'OK cool but can you just like edit the HTML so this Button glows when you hover over it?' it's like looking up the Wikipedia page for different regional variants of pizza and seeing people put cold grated cheese and sardines on hot pizza sauce in some small rural village in Cambodia. It's like OK. Sure go ahead I won't stop you. The rest of us are going to have the normal pizza but I appreciate the creativity. Maybe I'm missing out who knows though. I just don't have the time or energy to sit down and figure out how to make pizza that way or to go to a really specific restaurant that does this. I'd rather just get better at making normal pizza. There are plenty more good normal pizza classes to choose from than looking for classes for how to make this specific hot sauce pizza. And then they hold up a slice of the cold sardine and cheese pizza on hot sauce and you just nod and smile like 'looks good' before going back to eating your own pizza. It makes you think 'maybe making pizza is an art form and I'm just blind to beauty. Maybe I'm part of the problem of why people don't eat pizza this way. Is this really normal pizza? Is there such a thing as normal pizza? If all humans were to stop existing, would pizza still be normal? Is there some sort of recipe for pizza inherently stored in the mathematical language that makes up the universe? Or was normal pizza just in the right place in the right time?' It doesn't matter anyway because these pizzas aren't going to make themselves. And enough people like normal pizzas and I have enough experience with normal pizza that I'm going to keep going with making normal pizza. It doesn't matter that I'm not the best chef in the world because everyone likes pizza. And sure, I could make sardine and hot sauce pizza and be the best sardine and hot sauce pizza maker in the world but ultimately I just want to make pizza that a lot of people are happy to eat. I do wish when I was growing up I just learned how to make burgers instead tbh.
@@vladimirarnost8020 Thanks, that's interesting. Is there ever a danger this type of 'optimisation' happens in the wrong situation?...where it is absolutely not wanted?
@@ChrisM541 Good question. The compiler is usually doing a good job not breaking code by over-optimising it. However, when the goal of such a loop is to wait for a certain amount of time, e.g. in embedded code touching hardware directly, the loop removal might break such code as it would simply run too fast.
Python enjoyers: NOOOOOO you're not doing it right. Rust enjoyers: Let me try my best to show why Rust is faster than C++ than this video does, and why it should be the best! C++ Enjoyers: I wonder which libraries written in C++ they're going to use...
As far as I know most of the libraries used by python or rust are written in c not c++ Edit: just to clarify rust mostly uses libraries written in rust but there still are a lot of c libraries which are used in rust mostly because they are already high performance, well established and we'll documented.
@@TheBuilder it's not even that, the range psudo iterator is implemented in c, not in python. You spend much more time running c code when using range compared to a manual while loop
To be honest, Python was my first language to then go on to Java, C and C++. Its very good to learn with very high level and general programming but when you dig deeper, it falls short except from certain tasks like machine learning.
Similar story with me, except I started with C, switched to Python and thought it was the best thing ever since I could write 5 lines of code to do practical things....then I started working and realized lacking a strict type system makes Python and JavaScript prone to errors especially on larger projects
Used python to calculate random line vectors for FEM analysis it took a while..tried optimizing code & simplifying equations but just helped a little..maybe it’s not good for a lot of math calculations but otherwise floats make code slow too.
It would be really nice to include javascript timings. On my machine: C++ is 0.66s, Python is 42.37, and JS (Node) is (amazingly) 0.53s. Presumably V8 is doing some crazy optimization, so I added a Math.random() check in the loop, and it went up to 6.3 seconds. Still very impressive.
We're studying FEM/FEA by implementing it in Python code atm. And while it sits there forever using 26GB RAM and 96% CPU I think to myself, how fast would it be if written in C++? (Maybe it wouldn't be that bad of a difference since Numpy is written in C though. Idk really...) However, I wouldn't stand a chance to be able to implement the ideas/concepts we are using it for in such a short time if it was written in C++... It would at least double if not triple the time spent on the project is my guess 😅 Off course such an application would be implemented in C++ for an actual program. But each tool has its place. For us it is having an easy language, in order to learn FEA by implementing it in code, the coding part being secondary to the concepts we're coding at least within the course we're taking. That's where python shines I guess. It minimizes "barrier" between the scientific/engineering concept and the code implentation, and it's a great language for just that reason. Different tools for different purposes.
"the coding part being secondary to the concepts" always code is the second. It doesn't matter what you're doing whether it's a store or business rules in DDD or a game. In our company, we model the business in DDD using C#. (python is too slow and has poor objectivity - only by convention).
C++ and Fortran are best if you're solving CFD or FEA problems because as far as I know, C++ and Fortran are fastest when handling multidimensional array. OpenFOAM has its entire library written in C++ and is a great open source CFD solver.
Well, python was created organically so it wasnt planned through, and then people created amazing libraries like numpy. so if someone would do the same with c++ it would be as awesome as python, OR, if someone would write libraries to use compiled languages in python it would just use them for speed ?
Can someone explain this to a non-programmer? Wouldn’t the code to the CPU’s be the same? Don’t compilers, even if they’re being done in real time, change the code into assembly instructions for the processor? I would assume that such a simple program would result in the same instructions being seen by the CPU.
I know I am two months late to answer, but I think your question is interesting. Your mistake is to considered Python as a « compiled at runtime » language, which is not true. Compiled language are transformed to CPU instructions during compilation as you seem to know. Python is an interprated language, that means for each code line, the python program (python.exe or /bin/python) reads the line (reading a string is a set of CPU instructions), and updates its state in memory according to the python code. It is like there is a layer between CPU and code, the python code will never be translated into instructions. The python program is for python what the CPU is for C/C++ language : the executor. However the python program is made of CPU instructions,and running it to execute the code line takes CPU time to execute. The same applies to CPU, you can have an algorithm done with instructions that runs on a CPU, but you can also design a chip that is specified to do this same algorithm and it will be quite faster than the CPU (if it is well-designed) because there is one layer less (there is no more instructions that translates the algorithm). However you lose the versatility of a CPU which can execute different algorithms. It is the same with python, it is a language where programs can be run and debugged fastly because there is no compilation time, but with less efficiency than compiled language (some tools try to avoid that by compiling python code at runtime like Numba and get better performances).
You can get even faster c++ code if you enable optimizations with the -O2 flag. Although it's possible that the compiler optimizes the loop and removes it 😅
There are ways to ensure that loop is not removed. #include #include #include uint32_t n; int main() { clock_t begin = clock(); volatile int deopt; for (n = 0; n != 1000000000; ++n) { (void) deopt; } clock_t end = clock(); double spent = (double)(end - begin) / CLOCKS_PER_SEC; printf("%f ", spent); } 0.350250 with -O3, 0.98 with -O0.
True story, my friend and I went to a coding interview where they treated all languages the same and had a run-time limit. We both got the question, I used python w/ DP, he used C w/ brute force :)
Now process data on each iteration and you'll see how python performance decreases dramatically. Depending of the case I've noticed that python is a about 500× slower. For example try calling a method within a custom class.
@@ILubBLOfficial that's OK. However not everything can be run with those libraries. I'd prefer integrating python with C++ using a wrapper such as Pybind11, SIP, shivoken, boost::python, etc . That approach for optimisation is much more powerful as it brings the best of python and C/C++ together. By the way, Numpy is a descent approach. However Eigen which is the C++ counterpart is much faster and optimised. Pandas is a huge package. Too many things there that aren't always necessary.
@@ajskateboarder realistically speaking. Many companies specially in embedded systems, prototype in python and rewrite everything in C/C++ for production. I still consider that Python and C++ work fine together. They are just tools not religion.
@@ILubBLOfficial Eigen which is the C++ equivalent to numpy is about 20 or 30 times faster than numpy. And also, not always you can avoid Python bottlenecks. However I don't critice Python for that. I think it is still a great scripting tool for a range of applications. What I critice is that there seems to emerge a new generation of developers, most of them beginners that frenetically love a specific programming language as if it were a sort of religion. They wrongly think they can do everything in a single programming language and they don't admit the weaknesses of their favourite tool. They are just tools, not religious. There are also many areas in which Python is a better choice than other tools, but when it comes to optimisation, performance, and concurrency, Python performs very poorly. Probably worst than the vast majority of its competitors, and that's why it is recommended to learn more than just python.
A modern compiler such as the GNU C++ compiler you use in the video, do many passes and optimisations. There is a possibility the compiler may have changed the code to something other than 1 billion increment instructions that would make this comparison fair. However the point still stands that python will always be much slower for operations like this; it's not the right tool for this job, just as you wouldn't use a screwdriver to punch a nail into a wall, though the screwdriver has uses of it's own
I think there’s some settings you can tweak for cout speeds. I saw it at an article wrote by geeks for geeks or something it’s about competitive coding
Stupide debates honestly and the video is just showing the truth Python is just slow everyone should now that but its really easy to use and beginner friendly. C++ is just fast everyone should now that but its really hard to use and not beginner friendly at all. All programming languages are created for different purpose: -Python -> Small 2D games, Websites, AI, High level stuff... -C++ -> Game engines (2D/3D), Hardware, System, Low level stuff... So people who says Python is better than C++ or vis versa you guys are just kids who just discovered programming. --- In Life You Will Not Only Learn One Programming Language, You Should Know At Least Two Or More ---
@@frisoverweij7977 Because I can create programs ten times faster with Python. It's "slow" speed is almost unnoticeable in 99% cases. In the rest 1% cases you can use some optimization methods and get good results.
Python code’s 1st execution is also its compiling stage. On the 2nd run, it is way faster. You should have either calculated c++ compiling time + run time or compare python’s 2nd run
@@catfan5618 And if you think it is compiled. So where does the binarie goes? Where is executable file? Because when you run python script it does not change it's form. It's still editable. Even if it would be compiled to RAM, there is no sense to not get compiled files.
@@sinoichi Compiled does not mean we get a binary. See Java, we compile our Code, but dont get a binary. Python compiles it code into bytecode and stores this into .pyc files.
@@jebbi2570 The great thing about Python is that whenever you truly want/need C's speed, you can just write a shared library providing that functionality in C, then call it from Python. Of course this too has its limits, but it's a nice workaround for when there's one significant bottleneck.
@@TheBuilder My own, though I'm going to use LLVM as a backend because I don't want to lose my sanity. It's gonna support both high-level and low-level features. Also, it's gonna be gradually typed. I can tell you more about it if you want
@@arthur1112132 But it's c/c++, so the compiler gives you the power to choose if you, the creater of your own code, want to let the compiler do that or not.
I had my “road to Damascus” moment in 1981 when I wrote and ran a similar routine on the Original IBM PC. . . first, interpreted BASIC, then Compiled BASIC, then in C. I was showing my kids. Well, I was blown away. Even though I had written some “pretty good” code (worth about $1,000,000) for my employer, I NEVERwrote another line in BASIC.
My first programming experience was with ZX Basic. Speccy allowed to do Assembly (or rather literal machine code) by "poking" memory (e.g. storing literal values in it and then calling a desired memory address to execute), but sadly my child brain couldn't comprehend it. Basic was slow of course and I had to scratch my head trying to understand how all the other programs ran so fast for a while. At some point I learned there was a C editor, but I couldn't find the tape anywhere.
C++ has the advantage of using register variables. That reduces variable memory accesses to zero while in the loop. Python must push and pop n on and off the stack.
@@blip666 Compiler explorer can show the assembly output of compiled programs and you can see it directly interacting with cpu registers (eax, edx, etc.). Cpu registers are far faster than memory accesses. I don't use python so I don't know how it works internally but it's physically impossible for a language to optimize with cpu registers without compiling to binary. There are way more reasons why c/cpp/rs are magnitudes faster than python. 1, yes, it is a compiled language so any "actions" are just one instruction to the cpu. 2 Static typing also makes it faster because memory is constant size and no need for extra memory to store types. 3. Static typing and other restrictions allow optimizing compilers, esp. llvm to analyze data flow and other stuff to inline/remove/optimize assembly. 4. garbage collectors do add overhead to the code, i don't think by much but languages without gc, especially rust, know exactly what parts of memory to deallocate.
@@Fl4shback it would do that if any optimizations were enabled for the compilation in the video, the author said in another comment that it felt unfair to do since python cant do that, lol
well, I tried Python. On my i7 that 10 years old the while cycle took 52 sec but after that I decided to try the for loop - 22 secs only. I'm not a programmer but if c++ or whatever could run a simple cycle faster than the code in other language, does it mean that a complex program would be same faster?
Complex program would be even more faster. But for the python there are libraries that written on c++ and you can use them in python to speed up computing in comparison with native python code.
Im a new CS student with practically no experience. This was a great example to showcase different use cases of the various languages. I did not understand how much of a difference there would be! I think i have it right because python reads line by line which takes longer?
That’s just one of the reasons. I’d look up the difference between a compiler and an interpreter for more details. It’s also important to understand that it takes a lot more steps to translate Python code into binary that your computer can read than it does for C++. Running Python code has a lot more overhead because of all of the other things that need to happen in the background to make it run which makes it take a lot longer.
The whole point of python is to daisy chain modules written in c++ that have built in vectorisation and parallelism. Numpy is faster than a naive c++ implementation of the same code and it is also significantly faster to implement. Programs most suited for python are -Embarrassingly parallel -Involves a lot of repeated calculations -Involve large scale data manipulation -Designed because you want the output Main reason its the language of AI and the physical sciences (Fortran if numpy cant be jerry rigged).
Even when you're calling C/C++ or Fortran code to do the heavy lifting, writing a program meant to do serious computation should not be done in Python. It will still harm your performance very substantially. To show this: say you run some program fully written in Fortran in which 1% of your runtime is setup and internal logic, and 99% is "heavy lifting". Moving your 1% to python blows it up multiple orders of magnitude (100x slower is not unreasonable), so your runtime DOUBLES. Just because you're too lazy to write statically type your variables? Python is a disaster for scientific computing.
@@Maxifichter You forgot that python is already done before your ultrafast C++ code even starts running, cause you had to spend 3 months writing a library that already exists in python.
@@Shower_T Sure, writing code in a proper programming language is slower, let's be generous and say it takes three times (3x) as long to write C++ than Python? That still is nowhere close to the 100x (or higher) performance loss when you run Python code. Plus your C++ is more or less guaranteed to work properly for a while, the Python team can decide on a whim to delete some method you were relying on (this actually happened to me once), so you'll have guaranteed ongoing maintenance in a Python codebase. Python should never be used for any applications that need to be performant, or are going to exist for longer than a few months (or however long else Python versions last)
@@Maxifichter So you purposely installed a version that breaks your code? Thats not Pythons fault, the solution is simply to not be stupid lmao. If your program is shit, sure blame the language beacuse you dont know better. I'm sure the three weeks additional coding time are worth the three hours runtime you saved which are worthless btw, because guess what while your codes running anyway, you couldve just done something else.
@@Maxifichter most the time it's more like 1% setup and 99% heavy lifting *in python*, where switching to pure c++ offers a ~1% speedup assuming you have implemented the most efficient versions of all the wheels you are reinventing.
If you compiled C++ with (strong) optimizations on, it could actually become constant time. The compiler is allowed to optimize away the loop entirely as its body has no observable effects. Python can't do that because it's not a compiled language to begin with.
Addendum: For some machines, C/C++ simple loops as shown are more fast if you add one or more (try it yourself) "nop" instruction into the loop. It may appear bizarre but can be easily explained: because the instructions alignment and CPU's execution pipeline. Eg.: #include int main() { size_t n = 0; while (n++ < 1'000'000'000) asm( "nop;" "nop;" "nop;" ); ; std::cout
@@TheBuilderif the compiler would not eliminate the loop completely then it would add nops with strong optimization and I can tell you that from experience
arguably inline assembly might be a cheat in Cpps favor but yes it would be much better on newer processors not only because of the pipeline but also because of the instruction fetching from memory reading multiple bytes per clock cycle which lead to filling the pipeline itself
i knew python was slower, but didn't know it was this slow. about a week ago, we were given an assignment where we had to calculate the value of sin with the geometrical interpretation of sine (sin x = x - x^3/3! + x^5/5!....). i figured the larger the value of n, the more accurate the value of sine will be. so i started out with n = 1,000,000. didn't see an output for about 10s so i killed the program, thinking i did something wrong. after i spent a lot of time on it, i decided to just put n = 1 and see what happens. it executed. I was so mad.
@@JacobKinsley exactly. so i did 1300. that is the highest value that executes it seems. the value i got was close enough, and is workable with, but it still bugs me for some reason. im a bit OCD regarding stuff like this.
Nobody just counts- except to understand the overhead. The things I would do at each increment are probably written in C, and likely designed to run on a GPU. I use python to execute functions in libraries. These are likely to be faster implementations of code than I could write in any language simply because they are typically open source and have lots of eyes on them. Was that the point- python is slow at loops?
@@uwirl4338 no, with O2 optimization, the compiler will just set n to 1 billion before the while loop. To truly optimize the code, he can add a register keyword when declaring n, that will make the program runs 4 times faster on my pc
@@wrnlb666 Oh really? I was under the impression the register keyword didn't even really do much on modern C/C++ compilers and was more a relic of the past, since modern compilers are so much better at optimizing now.
@@uwirl4338 -O2 will make an enormous difference. The optimizer can and will realize that the end value of n does not depend on anything unknow at he time of complication that can change so it can be precalculated. If you look at the generated assembly -O2 will eliminate the loop completely. I tested the code on godbolt with the option and -O0 takes 3 seconds compared to -O2 at 30 microseconds. It is an optimization of a factor of 100,000, so an enormous difference. That is a runtime measure in the program around all of the code shown in the video so any startup time is not included. If iI ht optimized variant measure time to before the value is printed the time I get is 0 µs. So the 30 microseconds is the time to print the result. If you add the volatile keyword for n it can be optimized away and you get 3s gain regardless of -O0 or -O2.
@@taragnor compiler optimization optimize things that doesn't make the result looks different. In this case, setting n to 1 billion before the while loop doesn't make the result different. Because n is not volatile, and the compiler knows that it will not be changed outside of the program.
You should brush up on your python syntax, or watch the results of the program(s) running. The python while loop is closed by the outdent of the print statement.
Be careful with making something that simple with a constant number. Some compilers will set n directly to 1,000,000,000. Set n to volatile, or set n with a scanf.
How can it be 2,4 seconds to do that ??? How the optimiser know if it’s a pure function or if your n is not changing elsewhere… it is not a constant if you are using n++ or n += 1😅
@@salocin9695The compiler doesn't get it, because optimizations are off. With -O3, the program would not need 2.4 seconds to run, it would need only a few milliseconds, because it would just print 1,000,000,000 without doing any work with the number.
Tried the same thing on my PC with c++ and c Found something weird. With the exact same c++ code my PC finishes in 1.27 seconds. But if I replace the 'size_t' with an 'int' it finishes in 1.1 seconds. Which is kind of to be expect since size_t is bigger than int but when I tried the same in C there was NO DIFFERENCE in speed for both int and size_t??? Meaning that c++ is slower with size_t than c for some reason. C did both in 1.1 seconds btw
@@noapoleon_ C and C++ performance is identical. That's why I suppose one of the programs got compiled for 64-bit and other for 32, because in 64 size_t type is twice larger and takes more time to process
can you compile and run the python code in Linux rather than using the native interpreter execute approach? I have only used python for programing microcontrollers (dumping the compiled code to the microcontrollers e^2 chip for execute on power-up).
Idk whether it's still the case but Pypy (a python interpreter designed for speed) used to beat C++ on some regular expression benchmark (basically C++ had to redo all the work for every iteration whereas the JIT could hardwire the regular expression and optimize it).
if the comparison was against std::regex, then that's no surprise, because std::regex is hilariously slow. like, "outperformed by literally any other option" slow. CTRE would be a different story, though :) tracing JITs are interesting tech though! there are definitely cases where it would be totally expected to trash an AOT compiler's optimizations.
@@asuasuasu JIT shine when you have data that is constant in practice but variable in theory. Like, matching repeatedly against a fixed regex. A precompiled regex lib has to analyse the regex every time. A JIT can compile a short program that represents the regex.
@@cmilkau C++'s std::regex is not a good choice (nor a good way to benchmark C++ vs anything else), it is slow due to it's design, almost anything outperforms it.
@@cmilkau you can harness similar optimizations on compiled languages using PGO with a modern compiler to generate more optimized machine code for your input space.
@@vadiks20032 Probably something gets optimized if the compiler knows how long the loop will run for. Then again, I also heard that all for loops in python are while loops under the hood
In python, while-loops are faster than for-loops. Python's while-loops are exactly like C while-loops, but Python for-loops are more complex and contain more overhead. Python's for-loop has to call "next()" on an object to fetch the next value, but "next()" tells you it doesn't have any more values using an exception which has to be caught and used to break out of the for-loop. It can be implemented using a while-loop and a try-except block.
@@AnimatorArt24 that's a stackoverflow question from 13 years ago, about Python 2's xrange. Python 3 doesn't have xrange anymore, and the language went through many changes in 13 years.
@@TheBuilder Not really, we just need more tests. the implementation of ++n is something like this: n+=1; return n; n++ implementation; so awesome l=n;n+=1;return l;.
IF I run this raw C with no optimization flags in my asus potatobook from 2016, then I get similar times as this guy. But just using -O2 flag drops it to less than 10ms. It's an incredibly weird "flex" anyway considering python also takes well under a second to do this if you wrap the thing in a function and use a JIT compiler
As a software engineer for the past 35 years I have found that the different computer languages are a tradeoff of speed of writing the program versus the speed of running the program. Interpreted languages like Python are great for quick and dirty one-time programs but you wouldn't want to write a Python program that mines Bitcoins.
Python is a relevant language for some tasks, but it’s main claim is it’s a modern interpreted language - easy to teach. A few years ago, it would have been BASIC, compiled BASIC, then Pascal. Java tried to slide in, but was pigeonholed for slow web applications. Things have to move forward, just like C, C++ and so on.
That's a important trade off. Does anyone here have experimented compiling Python code with Cython or Numba? I wonder how it will perform in this situation. Also, now with Python 3.11 it got a bit faster. But still much slower than Cpp off course.
Numba needs to JIT compile your loop. It converts the code into C and compiles it on-the-fly. That's why the first execution is slow. If you use the function a lot, it's as fast as C. And it's even smart enough to put your vectorizable code onto the GPU without you noticing, it's crazy good 😂
@@marcotroster8247 Really I was not expecting that with numba the time is about 1/3 of a plain c++ loop... I was expecting something similiar in performance, so 6s about...
There are multiple ways you could speed up counting in Python. You can use the multiprocessing module to count in parallel across multiple CPU cores. Using Numba, a Just-In-Time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. You can also use a library called CuPy that leverages NVIDIA GPUs to accelerate array computations. these methods speed up counts to a few seconds.
@@Hephasto Nowadays if you use some better C compiler it generates like 95% perfect asm code. For "more realistic" = more complicated programs it's even faster to write in C unless you REALLY know how to optimize in assembly just because there is already 50 years of optimization experiences in the compiler
@@donovan6320 You can try it, just do the same loop but compile with the /02 or /03 flag. It won't compute values that aren't read. Now a better way to test how quickly a computer can increment would be to start a timer and use some time limit as the condition to exit the loop. The result should be in the hundreds of millions per second, but I'll test it myself in a sec
In C++ u can use ++n as probably the fastest way to increase varlue by 1. Also for loop can be faster. Python for loop is runnig C for loop so it would be much faster.
@@zephor6664 It may be faster, since i++ usually cannot be implemented without an intermediate copy, but ++i is done in place and returns a reference to itself.
@@decode9242 Both ++i and i++ produce the same INC instruction for integers. There is no difference. If you take their value, e.g. x = ++i; vs x = i++, then there is a difference of _when_ the value if i is copied to x but i is still simply incremented if it resides in a register in most cases. Depending on the context, the compiler might also use a LEA instruction to perform the arithmetic if it deems it beneficial. Where the prefix vs. postfix notation _may_ make difference is with C++ iterators. It's generally preferable to use the prefix form, i.e. ++it. If the ++operator is inlined, even the postfix form might actually result in identical code if the value of it++ is not used anywhere, the temporary copy is optimised away. It depends on many factors, including the compiler type, version, CPU type, register size, surrounding code, etc.
I actually came across this recently. I was trying out micro-python for the first time on an RP2040 Rasberry Pi PIco W. The embedded version of "Hello world" is "Blinky", you blink the LED on and off. Wow! I figured out the gpio library, but I couldn't bother to find the "Delay" function/method/module, so just counted to 100,000. I was expecting to have to ramp that up much higher and thankful I didn't need to think about datatype in python.... but no. The LED blinked with about 1Hz frequency. counting to 100,000 took 500ms! To be honest I haven't switched it back on again. I don't think it's the RP2040's fault, I think it's python's fault. If I did that in C on the same board the LED would just look a little dim and flickery. If I did it in ASM it would just look like it was on.
"counting up to a number" is a very bad baseline for comparisons. For one, these languages have extremely different performance characteristics. (Micro)Python is naively interpreted. It absolutely doesn't have advanced optimization passes that would make it able to reason about this. A C++ compiler will be able to make sense of your counting, and is likely to trash away your loop, 1. because it'd be able to figure out that you're just counting up to 100000, and 2. because you're probably not even using the value... so it can just optimize it away entirely. Though, if you need anything reasonably fast then MicroPython is not a good idea, for sure. I heavily use Python and I dislike the idea of using Python in embedded programming. Assembly is not a magic bullet. If you know your C/C++ and optimization, then dropping down to assembly is rather unlikely to let you write faster code.
Bear in mind that the raspberry pi foundation's goal is to teach people programming. Python is their standard language for everything they do (hence raspberry "pi"), because it's easy to learn. Having said that, you can program the pico in cpp as well.
Ported an NQUEENS algorithm to both J2ME (SPH-M330) and an arduino (ATMEGA328P). The arduino was ~20% faster. Arduino UNO clones usually clock the ATMEGA328P at 16Mhz. The Samsung has a full 32bit ARM processor at (probably) 192Mhz. It might even utilize the "jazelle" instruction set. Thats honestly really fucking pathetic for Java. Needless to say, I was incredibly pleased to find out that my stupid conway GOL demo ported to C (Qualcomm BREW) was so fast that blinkers are a blur on screen.
It depends upon the quality of the optimizer. A good optimizer could perform the calculation at compile time and simply spit out the calculated answer at run time, even for Python.
The reason is, Python is an interpreted language, means in a loop like this each line has to be "compiled" over and over again. C++ is compiled once into a system specific assembler code and than executed.
The Python ecosystem was developed around the idea of calling highly specialized functions to do complex tasks. It doesn't matter that Python is slow. As a developer you are just using some library anyway, and all the functions in that library are probably written in some other language, like C++ or Java
You have had to disable the optimization from gcc, obviously the optimized code is faster because it would just straight print the one billion and skips the loop
@@alexanderd.7818 In this case, he would get the result immediately, even if he 'looped' to a trillion. That's roughly the difference between O(1) and O(n) algorithms.
@@vladimirarnost8020 It depends on whether the dead code elimination is enabled or not. If it is, then the loop will be removed and the program will complete execution immediately. Otherwise, it will loop through all iterations honestly and it will take approximately 0.5s on his hardware.
What about if you compile both projects? Btw 2 different languages for 2 different uses. Love python versatility and the huge number of free libraries basically for every use
My tests with Rust: rustc: real 1.184s user 1.183s sys 0.001s Cargo (debug): real 1.177s user 1.176s sys 0.001s Cargo (release): real 0.003s user 0.000s sys 0.003s System runs on a Ryzen 7 5800U with 16 GB of RAM and Fedora 36 Workstation.
what, how? i have been waiting for over 7 minutes and its still going on..... here is the code i used for rust fn main() { let mut number = 0; while number < 1_000_000_000 { number += 1; println!("{}", number); } } can you send the code you used?
But your Python example wasn't slower in _human time,_ only in _CPU time._ I'd say it was about half as many characters long, so would have taken you half the time to type. Python takes less time to learn, less time to set up, less time to code, less time to debug, and less time to deploy. How long it takes for the code to actually execute is almost completely irrelevant in most use cases. Even in contrived examples where it takes very long indeed, CPU hours are still very cheap compared to man hours. I've been writing hacky Python for more than 20 years, never worrying about optimizing anything, and rarely had a script that took more than a few seconds to run. Most times it will have finished faster than I could have even typed the compile command in C++. But if you happen to have a use case where speed is indeed imperative, then you switch to whatever language is more suitable, of course. Be pragmatic, not dogmatic. A fighter jet is faster than a bicycle, but not when you just want to go get a pizza.
every data type in Python is a lot more complicated compared to C or C++, for example, every time you interact with a number in Python, the program will call various other functions while in C or C++ the operation is only a few assembly instructions
int foo() { constexpr size_t count = 1000000000; size_t n = 0; for (size_t i = 0; i < count; ++i) ++n; return n; } GCC 12.1 -O3 mov eax, 1000000000 ret The code in the video is not comparable with Python because when compiled with lower optimization level the compiler isn't working with all the lights on, so to speak, and then optimizing it will fold the constant like above. It's toy code. Need more complicated expression and the inputs have to come from another translation unit without link-time code generation enabled.
We can of course make the "n" volatile but then the compiler is forced to read and write, which will make the code garbage. At least the loop will be run but again the compiler isn't fully utilised and the test is garbage. Same when using std::atomic, it'll just add a lock prefix. // ----------------------------------------------------------------------- // volatile size_t n = 0; // ----------------------------------------------------------------------- foo(): mov QWORD PTR [rsp-8], 0 mov edx, 1000000000 .L2: mov rax, QWORD PTR [rsp-8] add rax, 1 mov QWORD PTR [rsp-8], rax sub rdx, 1 jne .L2 mov rax, QWORD PTR [rsp-8] ret // ----------------------------------------------------------------------- // std::atomic n = 0; // ----------------------------------------------------------------------- foo(): mov QWORD PTR [rsp-8], 0 mov eax, 1000000000 .L2: lock add QWORD PTR [rsp-8], 1 //
What we really want is something like this.. .L3: add rax, 1 vpaddq ymm0, ymm0, ymm3 vpaddq ymm1, ymm1, ymm3 vpaddq ymm2, ymm2, ymm3 cmp rax, rdx jne .L3 vpaddq ymm0, ymm0, ymm1 vpaddq ymm0, ymm0, ymm2 ret 24 additions per iterations in three accumulators with no dependency to each other, then these are summed after the loop.. the reduce add at the end is omitted. ~8x the speed of scalar loop. Then put these into their own threads and you get another linear speed up. It's still toy code, tho.. benchmark something that can be compared, just incrementing something a billion times isn't comparable between languages in meaningful way unless C++ is crippled and then the comparison is nonsense. It's nonsense when not crippled as well, hehheh
This is what I have read: A C++ compiler translates C++ source code into machine language code and stores it on the disk with file extension .o (here, hello.o). The linker then links this object code file with standard library files required by the program code and, thus, creates an executable file in the end which is again saved on the disk. While we try to run and execute the C++ code file, the executable file is loaded from the disk to the memory, and then the CPU executes the program (one instruction at a time). For python this what happens: Internal working of Python Python doesn’t convert its code into machine code, something that hardware can understand. It converts it into something called byte code. So within Python, compilation happens, but it’s just not in a machine language. It is into byte code (.pyc or .pyo) and this byte code can’t be understood by the CPU. So we need an interpreter called the Python virtual machine to execute the byte codes. Python Virtual Machine(PVM): The bytecode then goes into the main part of the conversion is the Python Virtual Machine(PVM). The PVM is the main runtime engine of Python. It is an interpreter that reads and executes the bytecode file, line by line. Here In the Python Virtual Machine translate the byte code into machine code which is the binary language consisting of 0s and 1s. The machine code is highly optimized for the machine it is running on. This binary language is only understandable by the CPU of a system.
You can highly optimize the Python code by setting n = 1 billion before the loop begins.
Yeah and also the c++ one. But maybe he did that for a reason
@L Oh yeah, can you make me understand the "joke"?
@@Nohomosapien the joke was that the whole point was to speed test it, not to actually have "1000000000" in the console. So it makes no sense to write n = 1 billion lol
@@Nohomosapien are you that dumb bro
The C++ compiler should actually do this with enough optimization enabled.
I am really surprised that Python was only about 50 times slower.
I didn't enable any optimizations in the compiler
c=2.432; p=112.447; p/c
>>> 46.23643092105264
lol
@@TheBuilder the only avalible optimisation is just to skip iterations and add billion at once. Compiling with optimisation flag will change nothing or will make the test meaningless, I think.
there are so many that the compiler does a better job optimizing the code. with python its not an option but any language can be equally fast
You can optimize python by writing a library in c++ that will count to 1 billion for it.
LOL
Yeah yeah😂
You see, Python is actually good since it gives you time to get a glass of water, do something, play a game, graduate and get a diploma, get a job, witness the year 3000, all while the compiler is doing its thing.
Isnt python interpreted?
@@budgethvick225 It is.
@@budgethvick225 it gets compiled then interpreted
@@nvcbl tru
Python is not a compiler genius
I told one of my professors that python was probably the best language to learn about parallelism and concurrency. He looked like I just admitted to support dog fighting. I then told him that python was so slow and inefficient, you would be able to visibly see the time difference from running on multiple threads. He laughed.
Every time I stand up for Python people look at me like I just quoted Hitler
@@croma2068 It does have some niche uses. And I honestly believe it should be taught to children instead of cursive. That being said, it is pretty slow. And when its not slow, its because its using some other language.
there are issues with the design of the language but the speed is fine for what i use it for
Python is honestly the best. Sure if you're working on a large product or in a competitive setting, you'll need Cpp, Java, or sometimes C. But languages are just syntax, after a while using a new language is just deciding on the best language for the task then a few google searches. Python has so many uses, not just for learning. It's very heavy in ml, has many useful libraries (graphing, numpy), and can do straight up magic compared to other languages. Whenever I want to automate or check something, I just open idle.
Edit: people replying about the "magic" just being simple pseudocode, yes that's exactly right lol. In just a few seconds I can reformat data with nested dict/list comprehensions into a structure that would take a dozen lines in other languages. I know lots of the stuff are written in C/Cpp anyways, that's like saying go use assembly because everything ends up being like that anyways. My whole point is that python is extremely useful, obviously there's a time and place.
@@noornasri5753 "straight up magic" aka it can do what other languages do in slightly shorter syntax but it by no means has an advantage over other languages library wise.
The only reasons python is so heavily used are:
1. its installed by default on all major linux distros, so its a prime choice for automation without further setup on fresh boxes.
2. Due to its simple syntax, its been picked up by data scientists to use for ML, but mind you ALL the heavy lifting is done in C++ so you could easily take say javascript which is also very easy to use (and has equivalent packages for everything) but is also much faster (using this shitty benchmark, written in node I get performance thats just a few ms slowler than c++)
Python is not the best, never will be. Its just a very conveniently placed language.
Want a language that actually does magic that no other language can? Learn rust.
Tip: You can use underscores in Python to separate numbers. Example: 1_000_000_000
thanks
Wtf is notation!? That is so ugly
Aint spaces just better
It won't work @@living-in-ohio
@@living-in-ohioaren't*
I once tried counting to 1 Billion in my head.
It took me a few centuries.
I am glad I finally finished with that.
same tbh
Me too, but I optimized the loop und just thought of 1 billion.
bro how slow where u counting`??? 1 bil seconds is 31 years
@@MrMeaty6 Well, @filipburcevski9566 is right because it's going to take you more than one second to name each number...
So yeah, it's going to take a few centuries.
try saying 555.555.555 in one second :)
I studied C++ and C# in university, but before that I was learning python (through online courses) for exam. Now it is funny to remember how teacher said that if we won’t optimise our code, it will calculate for longer than the exam lasts
When I was at university (ok, it was Thames Poly. It only became a university when I left), if you wanted to add a gigabyte of memory to your computer you'd first have to apply for planning permission to build an extension. Now, 32GB comes in those handly blister packs sometimes near the till for 'impulse purchases'.
@@herseem I know the feeling. I wrote my first programs in 1978 in high school. I wrote my first professional programs in Fortran in 1984 on a VAX computer shared by about 20 people. It only had about 256 MB of RAM to support all the users.
@Game Plays 1230 Avoid dynamic features in inner loops, cache, use aggregate functions like the array functions in numpy since they are written in C and use vector instructions
This is so true. I had to rewrite an entire program during an exam because the way i wrote It the first time was so slow that i couldn't reiterate it the number of times needed before running out of time
Just had to make the thumbsup an even 1000.
That's why ChatGPT crashes so much after we've had a "long" conversation.When we open an old conversation, it crashes the entire browser, does not process new messages and crashes the entire chat.That too on the Android app...
the semicolon in python is just perfect as someone who write in C++
Muscle memory 💪
Also parentheses in the condition
I cut my teeth on C and only came to Python for work purposes target recently. I'm often told in coffee reviews that I'm getting "paren happy" in my conditionals and a few other places. Not so much an issue with semicolons, weirdly.
@@bobbyfeet2240 yeah. when I do python I’m like well why shouldn’t I use parentheses if it still works after I put them in
Funny thing is you can write semicolons in Python if you really want to. It basically just ignores them.
You might think that the major time difference was because he used different coding languages, but python would’ve been way faster if he just removed a few zeros.
eaxtly and it would have been even more long for c++ if u added more zeroes
These are the facts that they don't want you to know.
My question is, why not just set N = 999,999,999? Wouldn’t that speed things up?
How about we just remove the “1” and it’ll be infinitely faster
@@ohneko6193 ohno🤣
Perfect examples between compiler vs interpreter.
Hey give Python a break! I bet you can't count to a billion that quickly!
python does in 2minutes what it would take the average person a life time to do
@@TheBuilder about 30 years if 1 second stands for 1 number.
@@dekippiesip and don't forget numbers get harder and longer to count as you go
@@technolus5742 Something else gets harder and longer
@@ProgThoughts yea-
but he probably dosent have that so leave him alone lmao
This is a great showcase of how much python has improved over the years, I mean this would probably not finish a few years ago.
😂😂😂😂😂😂😂😂
So a few years ago the fastest way to run this program was to first wait a few years.
Talk about looking on the bright side...
@@ronald3836 😂😂😂
@@ronald3836and so I did 😈
I had never delved into programming before, but this seemed so straightforward that I couldn't resist giving it a try. Using VS Code, I successfully executed it! Thank you; it was an enjoyable experience!
Glad you enjoyed it
AI bot 😅
You are lying lmao. You need to setup several things before using C++ and Python on VS Code. You can't just run it out of the box by downloading a code editor app.
@@connectionlost7216 these smartass dudes fishing for love in yt comments..
@@connectionlost7216 Who said that he didnt setup anything dumbass?
Sometimes you don’t need the program to run quickly. You just need it to run. That’s why I love Python. Although it’s certainly not the fastest language, its ease of use is great for beginners, and I can write a program quicker in Python than I can in c++. I don’t think I would be a programmer if Python didn’t exist, quite honestly. And keep in mind I love C++ too just for different reasons.
You said everything right. However, there are people who are sure that there is only 1 language for all tasks in the world and that is python. I say this because I know a "hacker" who brutforce passwords in python (brutforce to find hash colision) and wondered why it took so long
Bruh tell us another thing everybody already knows
@@niyazleushkin you can use python for everything, as long as the libs are compiled in C lmao
Jokes aside, the level of draft code you can make in python is unmatched by any other language. And honestly, for most stuff nowadays python is trully enough.
@@ruynobrega6918 That depends on the application. There are a lot of languages that are MUCH easier to use in particular applications. I have done a lot of scripting programs in Perl and it is much easier than Python for that. And MATLAB is much easier for many engineering applications.
Please turn on -O2/-O3 in c++ the time will be mesured in miliseconds (There will be no loop...)
Is there a reason why that's not done on default? How would the assembly code look like? Would it just be an add instruction bilion times? Wouldn't that binary be absurdly large?
@@morgard211 sometimes optimization gets in the way of debugging so its an option. Enabling optimizations would probably result in the compiler computing the loop at compile time and giving you a binary that just prints the value
@@morgard211 also optimization is not perfect and sometimes breaks functionality in complex programs, requiring additional time to figure out what the optimized did and how to tell it to not do that.
@@morgard211 Assembly is also capable of loops bro (jumps/calls)
So about 5 lines are enough for this (if you actually loop through it unoptimized)
If you Compile on -O3 the compiler would register the loops and set the variable to 1000000 instantly instead of adding 1000000 times
@@chri-k Optimalization only breaks code if there is UB in it. (Assumeing that you have a good compiler) If it breaks a complex program then that is because someone wrote a wrong code somewhere in it. But if the faulty code is in a wrong place, then it is extreamly hard to figure out how to fix it, maybe harder than it worth.
I ran it in Common Lisp and Racket (translated by ChatGPT because I still suck at them) on Linux Mint with a 5600x CPU:
- Python took 39 seconds.
- CLisp using SBCL took 0.222 seconds.
- I gave up on GNU Clisp after 6 minutes.
- Racket took 0.845 seconds.
I altered them to print the results to make sure they actually did the task, and they printed one billion. The Racket script is relatively complicated and might not be optimized well. But these results are crazy!
I love these weird languages. Like who uses them. Where did they come from. Why are there so many tutorials for them but literally no one has written a legit program outside of playing around with the language. Why does every university student have an inexplicable urge to learn one and then act superior about it. Like if you told a coworker you wrote a sphere packing algorithm in Rust that can calculate physics for 5 billion spheres in 3 femtoseconds they'd be like 'OK cool but can you just like edit the HTML so this Button glows when you hover over it?' it's like looking up the Wikipedia page for different regional variants of pizza and seeing people put cold grated cheese and sardines on hot pizza sauce in some small rural village in Cambodia. It's like OK. Sure go ahead I won't stop you. The rest of us are going to have the normal pizza but I appreciate the creativity. Maybe I'm missing out who knows though. I just don't have the time or energy to sit down and figure out how to make pizza that way or to go to a really specific restaurant that does this. I'd rather just get better at making normal pizza. There are plenty more good normal pizza classes to choose from than looking for classes for how to make this specific hot sauce pizza. And then they hold up a slice of the cold sardine and cheese pizza on hot sauce and you just nod and smile like 'looks good' before going back to eating your own pizza. It makes you think 'maybe making pizza is an art form and I'm just blind to beauty. Maybe I'm part of the problem of why people don't eat pizza this way. Is this really normal pizza? Is there such a thing as normal pizza? If all humans were to stop existing, would pizza still be normal? Is there some sort of recipe for pizza inherently stored in the mathematical language that makes up the universe? Or was normal pizza just in the right place in the right time?' It doesn't matter anyway because these pizzas aren't going to make themselves. And enough people like normal pizzas and I have enough experience with normal pizza that I'm going to keep going with making normal pizza. It doesn't matter that I'm not the best chef in the world because everyone likes pizza. And sure, I could make sardine and hot sauce pizza and be the best sardine and hot sauce pizza maker in the world but ultimately I just want to make pizza that a lot of people are happy to eat. I do wish when I was growing up I just learned how to make burgers instead tbh.
@@JacobKinsleyemotional read 10/10
If you use -O2 or higher, the C++ loop gets folded completely, making it even faster
You mean unrolled?
@@ChrisM541No, he means replaced with a precomputed constant, i.e. 1000000000.
@@vladimirarnost8020 Thanks, that's interesting. Is there ever a danger this type of 'optimisation' happens in the wrong situation?...where it is absolutely not wanted?
@@ChrisM541 Good question. The compiler is usually doing a good job not breaking code by over-optimising it.
However, when the goal of such a loop is to wait for a certain amount of time, e.g. in embedded code touching hardware directly, the loop removal might break such code as it would simply run too fast.
@@vladimirarnost8020you can specify if functions or certain code parts get optimized or not just for that purpose
Python enjoyers: NOOOOOO you're not doing it right.
Rust enjoyers: Let me try my best to show why Rust is faster than C++ than this video does, and why it should be the best!
C++ Enjoyers: I wonder which libraries written in C++ they're going to use...
As far as I know most of the libraries used by python or rust are written in c not c++
Edit: just to clarify rust mostly uses libraries written in rust but there still are a lot of c libraries which are used in rust mostly because they are already high performance, well established and we'll documented.
@@Longus07 isn't c also c++ cuz if you run c code in c++ it'll still work so technically all c code is c++ but weaker
@@shengalabu8050 NO
@@shengalabu8050 most of Java code will run in c# excluding the libraries and imports
@@Longus07bruh why you talking bout java?
Can we get a demonstration of how long it takes to write and debug some c++ data processing tasks that python can do in one line of code?
I mean to a beginner like you it looks like a mess of complicated jargon, but to someone who uses the language it's just as simple, just more involved
Isn't For Loop like twice as fast compared to While Loop in Python? C++ would still obviously win but I believe Python's score would look better.
Using the Numba library can make Python comparable to other languages for simple operations.
@@TheBuilder Yes, with JIT. Doesn’t work for everything though.
@@TheBuilder it's not even that, the range psudo iterator is implemented in c, not in python. You spend much more time running c code when using range compared to a manual while loop
Pythons for loop is like 5 times slower than while, as it basically uses a while loop, but also try/except, which majorly slows it down
@@Eknoma ruclips.net/video/Qgevy75co8c/видео.html
No it is not
To be honest, Python was my first language to then go on to Java, C and C++. Its very good to learn with very high level and general programming but when you dig deeper, it falls short except from certain tasks like machine learning.
Similar story with me, except I started with C, switched to Python and thought it was the best thing ever since I could write 5 lines of code to do practical things....then I started working and realized lacking a strict type system makes Python and JavaScript prone to errors especially on larger projects
Machine learning in python is done in C...
@@sepxviii731 good one 😂👍
@Tarik B. try Julia
@Tarik B. You live in the past century. Julia sometimes is even faster than C
Used python to calculate random line vectors for FEM analysis it took a while..tried optimizing code & simplifying equations but just helped a little..maybe it’s not good for a lot of math calculations but otherwise floats make code slow too.
i hope you were using numpy not base python
@@TheBuilder I had written the program based on sympy.
It would be really nice to include javascript timings. On my machine: C++ is 0.66s, Python is 42.37, and JS (Node) is (amazingly) 0.53s. Presumably V8 is doing some crazy optimization, so I added a Math.random() check in the loop, and it went up to 6.3 seconds. Still very impressive.
nodejs is now the god of all languages
@@nvcbl nah is not. I hate nodejs
there is something wrong with your c++ compiler or computer then
@@nvcbl lul this pice of shut up mark and Go away
Ricky, i don’t know…
But there is a package for that
import counting_library, speed
speed(counting_library.count(100000000))
and good packages are written in C 😁, like numpy. Python is just a powerful API for C libraries.
Don't you think c plus plus compiler removed effectively empty cycle from assembly code?
yes and Python is interpreted*.
No lol, then the runtime would be 1 millisecond, not 2 seconds.
No. -O is obviously not applied
We're studying FEM/FEA by implementing it in Python code atm. And while it sits there forever using 26GB RAM and 96% CPU I think to myself, how fast would it be if written in C++? (Maybe it wouldn't be that bad of a difference since Numpy is written in C though. Idk really...) However, I wouldn't stand a chance to be able to implement the ideas/concepts we are using it for in such a short time if it was written in C++... It would at least double if not triple the time spent on the project is my guess 😅 Off course such an application would be implemented in C++ for an actual program. But each tool has its place. For us it is having an easy language, in order to learn FEA by implementing it in code, the coding part being secondary to the concepts we're coding at least within the course we're taking. That's where python shines I guess. It minimizes "barrier" between the scientific/engineering concept and the code implentation, and it's a great language for just that reason. Different tools for different purposes.
If "the coding part is secondary to concepts", then MATLAB/Octave beat Python in that regard. Yet it's Python that gets all the f*ing praise smh.
@@MajinSahaMathematica is better
"the coding part being secondary to the concepts"
always code is the second. It doesn't matter what you're doing whether it's a store or business rules in DDD or a game.
In our company, we model the business in DDD using C#. (python is too slow and has poor objectivity - only by convention).
Did you consider using PyPy if applicable?
C++ and Fortran are best if you're solving CFD or FEA problems because as far as I know, C++ and Fortran are fastest when handling multidimensional array. OpenFOAM has its entire library written in C++ and is a great open source CFD solver.
Guys all language are for different things for speed it's better to use c++
generally this
Well, python was created organically so it wasnt planned through, and then people created amazing libraries like numpy.
so if someone would do the same with c++ it would be as awesome as python, OR, if someone would write libraries to use compiled languages in python it would just use them for speed ?
Can someone explain this to a non-programmer? Wouldn’t the code to the CPU’s be the same?
Don’t compilers, even if they’re being done in real time, change the code into assembly instructions for the processor?
I would assume that such a simple program would result in the same instructions being seen by the CPU.
I know I am two months late to answer, but I think your question is interesting.
Your mistake is to considered Python as a « compiled at runtime » language, which is not true. Compiled language are transformed to CPU instructions during compilation as you seem to know. Python is an interprated language, that means for each code line, the python program (python.exe or /bin/python) reads the line (reading a string is a set of CPU instructions), and updates its state in memory according to the python code.
It is like there is a layer between CPU and code, the python code will never be translated into instructions. The python program is for python what the CPU is for C/C++ language : the executor. However the python program is made of CPU instructions,and running it to execute the code line takes CPU time to execute.
The same applies to CPU, you can have an algorithm done with instructions that runs on a CPU, but you can also design a chip that is specified to do this same algorithm and it will be quite faster than the CPU (if it is well-designed) because there is one layer less (there is no more instructions that translates the algorithm). However you lose the versatility of a CPU which can execute different algorithms. It is the same with python, it is a language where programs can be run and debugged fastly because there is no compilation time, but with less efficiency than compiled language (some tools try to avoid that by compiling python code at runtime like Numba and get better performances).
@@tholod Why isn't the explanation higher? Cheers.
You can get even faster c++ code if you enable optimizations with the -O2 flag. Although it's possible that the compiler optimizes the loop and removes it 😅
that would be unfair since Python still dutifully does all those additions
@@TheBuilder True although this also highlights the power of compiled languages that can optimise away unnecessary computation
@@jormaig In python we call it optimizing away the software engineer
There are ways to ensure that loop is not removed.
#include
#include
#include
uint32_t n;
int main() {
clock_t begin = clock();
volatile int deopt;
for (n = 0; n != 1000000000; ++n) { (void) deopt; }
clock_t end = clock();
double spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("%f
", spent);
}
0.350250 with -O3, 0.98 with -O0.
@@MTQvODg Hmmm… And how exactly would you suggest to benchmark it in a “good” way?
True story, my friend and I went to a coding interview where they treated all languages the same and had a run-time limit. We both got the question, I used python w/ DP, he used C w/ brute force :)
Dang! Brute force? 😂
And who did better
@@8BitGamerYT1the Python probably still runs and c brute force finished before guy fully lifted finger off enter
im learning python and javascript rn but whats the difference between like python and c++
Now process data on each iteration and you'll see how python performance decreases dramatically. Depending of the case I've noticed that python is a about 500× slower. For example try calling a method within a custom class.
@@ILubBLOfficial that's OK. However not everything can be run with those libraries. I'd prefer integrating python with C++ using a wrapper such as Pybind11, SIP, shivoken, boost::python, etc . That approach for optimisation is much more powerful as it brings the best of python and C/C++ together. By the way, Numpy is a descent approach. However Eigen which is the C++ counterpart is much faster and optimised. Pandas is a huge package. Too many things there that aren't always necessary.
@@ajskateboarder realistically speaking. Many companies specially in embedded systems, prototype in python and rewrite everything in C/C++ for production. I still consider that Python and C++ work fine together. They are just tools not religion.
@@ILubBLOfficial Eigen which is the C++ equivalent to numpy is about 20 or 30 times faster than numpy. And also, not always you can avoid Python bottlenecks. However I don't critice Python for that. I think it is still a great scripting tool for a range of applications. What I critice is that there seems to emerge a new generation of developers, most of them beginners that frenetically love a specific programming language as if it were a sort of religion. They wrongly think they can do everything in a single programming language and they don't admit the weaknesses of their favourite tool. They are just tools, not religious. There are also many areas in which Python is a better choice than other tools, but when it comes to optimisation, performance, and concurrency, Python performs very poorly. Probably worst than the vast majority of its competitors, and that's why it is recommended to learn more than just python.
@@christianm4906 Where did you get your statistics from. From what I see its more like 2-15x slower.
A modern compiler such as the GNU C++ compiler you use in the video, do many passes and optimisations. There is a possibility the compiler may have changed the code to something other than 1 billion increment instructions that would make this comparison fair. However the point still stands that python will always be much slower for operations like this; it's not the right tool for this job, just as you wouldn't use a screwdriver to punch a nail into a wall, though the screwdriver has uses of it's own
Optimizations were turned off in the compiler, counting up to a billion actually takes a second or two on most processors
I think there’s some settings you can tweak for cout speeds. I saw it at an article wrote by geeks for geeks or something it’s about competitive coding
Yes n +=1 is faster than n =n+1 for some reason
@@mb_entity cout is called every time there is a
iirc, maybe im mising it up with printf
Iirc it's cin.tie(0); sync_with_stdio(false);
@@mb_entity You are not dumb, you are actually correct.
@@04ZFZ I think this is it
Stupide debates honestly and the video is just showing the truth
Python is just slow everyone should now that but its really easy to use and beginner friendly.
C++ is just fast everyone should now that but its really hard to use and not beginner friendly at all.
All programming languages are created for different purpose:
-Python -> Small 2D games, Websites, AI, High level stuff...
-C++ -> Game engines (2D/3D), Hardware, System, Low level stuff...
So people who says Python is better than C++ or vis versa you guys are just kids who just discovered programming.
--- In Life You Will Not Only Learn One Programming Language, You Should Know At Least Two Or More ---
Well said! Both have their own purpose and everyone who codes should learn both at least a little.
but Python is really better than C++
@@lyflfflyflff Nah not really in terms of games and stuff.
@@lyflfflyflff Why do you think Python is better?
@@frisoverweij7977 Because I can create programs ten times faster with Python. It's "slow" speed is almost unnoticeable in 99% cases. In the rest 1% cases you can use some optimization methods and get good results.
Python code’s 1st execution is also its compiling stage. On the 2nd run, it is way faster. You should have either calculated c++ compiling time + run time or compare python’s 2nd run
Python does not compile anything... it's interpreted language. Second faster run it's just cache layer, but it's only temporary :)
@@sinoichiWrong, it compiles the code into byte code which then gets interpreted.
@@catfan5618 Then you wouldn't need python installed. 😉 Sorry, python scripts are executed line by line by python interpreter.
@@catfan5618 And if you think it is compiled. So where does the binarie goes? Where is executable file? Because when you run python script it does not change it's form. It's still editable. Even if it would be compiled to RAM, there is no sense to not get compiled files.
@@sinoichi Compiled does not mean we get a binary. See Java, we compile our Code, but dont get a binary. Python compiles it code into bytecode and stores this into .pyc files.
Does that mean python is very slow as compared to c in running big blocks of code ?
I think Python is faster than C when I want to get something practical done
@@TheBuilder python is a great scripting language with enrmous support
but it has a purpose and that is not same as that of C
@@TheBuilder But it quickly become slow when the project evolves.
@@jebbi2570 How much your project will evolve anyway? Actually many softwares out there dont need to evolve that much.
@@jebbi2570 The great thing about Python is that whenever you truly want/need C's speed, you can just write a shared library providing that functionality in C, then call it from Python.
Of course this too has its limits, but it's a nice workaround for when there's one significant bottleneck.
I'm actually planning to work on a Python-like compiled programming language soon. This video remotivated me. Thank you
Golang or building your own?
@@TheBuilder My own, though I'm going to use LLVM as a backend because I don't want to lose my sanity. It's gonna support both high-level and low-level features. Also, it's gonna be gradually typed. I can tell you more about it if you want
It's not something I'm familiar with. but if you want to share more you can contact me through email or join the discord
Nim?
@@RedHair651 Nim's not that Pythonic even though it's somewhat more humane than the average language
Why not replace the endl with "
"? Doesnt this reduce the flushing operation being done in the assembly? You could probably make the C++ even faster.
You can get the time down to 25.8 seconds using a for loop instead of a while loop in python.
I lied, that was without printing the number. for loops are still faster though.
@@swiftz6098 well, that is absurd.
he can also make the C++ Program about 4 times faster using register int n
@@wrnlb666 the compiler should automatically do that tho
@@arthur1112132 But it's c/c++, so the compiler gives you the power to choose if you, the creater of your own code, want to let the compiler do that or not.
I had my “road to Damascus” moment in 1981 when I wrote and ran a similar routine on the Original IBM PC. . . first, interpreted BASIC, then Compiled BASIC, then in C. I was showing my kids. Well, I was blown away. Even though I had written some “pretty good” code (worth about $1,000,000) for my employer, I NEVERwrote another line in BASIC.
My first programming experience was with ZX Basic. Speccy allowed to do Assembly (or rather literal machine code) by "poking" memory (e.g. storing literal values in it and then calling a desired memory address to execute), but sadly my child brain couldn't comprehend it. Basic was slow of course and I had to scratch my head trying to understand how all the other programs ran so fast for a while. At some point I learned there was a C editor, but I couldn't find the tape anywhere.
Python is better because it allows you to have office chair races; and when the boss asks you to return to work you can tell him:
"Compiling!"
Underrated comment 😂
If you tell your boss you are compiling python, you'll likely get fired, as even generating bytecode is a sub second task
I was curious to get the times on my machine.
C : 1s740
Python: 47s
Perl: 34s
JS ~2.0 sec
java 0.025s
Python on PyPy: 1.083s
The catch: it doesn't include compiler optimizations on C++ part. Otherwise it would probably turn into just "cout
Python has -O too
@@Darqonikand -OO
@@2024.GamerYT Python's -O (and -OO) doesn't do much to be fair - it pretty much just removes assertions. It won't make your code much faster.
You can use an underscore as separator for intergers in python for better readability btw
but who needs readability in a 2 line file???
C++ has the advantage of using register variables. That reduces variable memory accesses to zero while in the loop. Python must push and pop n on and off the stack.
How do you know this ?
I too would like to see more info, but it's a pretty plausible explanation.
@@blip666 Compiler explorer can show the assembly output of compiled programs and you can see it directly interacting with cpu registers (eax, edx, etc.). Cpu registers are far faster than memory accesses. I don't use python so I don't know how it works internally but it's physically impossible for a language to optimize with cpu registers without compiling to binary.
There are way more reasons why c/cpp/rs are magnitudes faster than python. 1, yes, it is a compiled language so any "actions" are just one instruction to the cpu. 2 Static typing also makes it faster because memory is constant size and no need for extra memory to store types. 3. Static typing and other restrictions allow optimizing compilers, esp. llvm to analyze data flow and other stuff to inline/remove/optimize assembly. 4. garbage collectors do add overhead to the code, i don't think by much but languages without gc, especially rust, know exactly what parts of memory to deallocate.
I would expect the compiler to just optimize the loop away and set the value to 1 billion...
@@Fl4shback it would do that if any optimizations were enabled for the compilation in the video, the author said in another comment that it felt unfair to do since python cant do that, lol
well, I tried Python. On my i7 that 10 years old the while cycle took 52 sec but after that I decided to try the for loop - 22 secs only.
I'm not a programmer but if c++ or whatever could run a simple cycle faster than the code in other language, does it mean that a complex program would be same faster?
Complex program would be even more faster. But for the python there are libraries that written on c++ and you can use them in python to speed up computing in comparison with native python code.
Not necessarily.
0:08 wait, you can put apostrophes in numbers in C++ ?? is it also available in C?
C++ yes, C i'm not sure
Depends on the editor
0:28 I KNEW IT 😂
Python is actually faster if you ignore the numbers
Im a new CS student with practically no experience. This was a great example to showcase different use cases of the various languages. I did not understand how much of a difference there would be! I think i have it right because python reads line by line which takes longer?
That’s just one of the reasons. I’d look up the difference between a compiler and an interpreter for more details. It’s also important to understand that it takes a lot more steps to translate Python code into binary that your computer can read than it does for C++. Running Python code has a lot more overhead because of all of the other things that need to happen in the background to make it run which makes it take a lot longer.
Okay seriously how can you live with a tab width of EIGHT? And also why do you have different tab sizes for C++ and Python?
mine is 2😂
Mine is -1 😁
@@Valentyn007 🗿
@@perelium-x While I personally don't like 2, I think it's tolerable at the very least lmao
@@Valentyn007 giga based
The whole point of python is to daisy chain modules written in c++ that have built in vectorisation and parallelism. Numpy is faster than a naive c++ implementation of the same code and it is also significantly faster to implement.
Programs most suited for python are
-Embarrassingly parallel
-Involves a lot of repeated calculations
-Involve large scale data manipulation
-Designed because you want the output
Main reason its the language of AI and the physical sciences (Fortran if numpy cant be jerry rigged).
Even when you're calling C/C++ or Fortran code to do the heavy lifting, writing a program meant to do serious computation should not be done in Python. It will still harm your performance very substantially. To show this: say you run some program fully written in Fortran in which 1% of your runtime is setup and internal logic, and 99% is "heavy lifting". Moving your 1% to python blows it up multiple orders of magnitude (100x slower is not unreasonable), so your runtime DOUBLES. Just because you're too lazy to write statically type your variables? Python is a disaster for scientific computing.
@@Maxifichter You forgot that python is already done before your ultrafast C++ code even starts running, cause you had to spend 3 months writing a library that already exists in python.
@@Shower_T Sure, writing code in a proper programming language is slower, let's be generous and say it takes three times (3x) as long to write C++ than Python? That still is nowhere close to the 100x (or higher) performance loss when you run Python code. Plus your C++ is more or less guaranteed to work properly for a while, the Python team can decide on a whim to delete some method you were relying on (this actually happened to me once), so you'll have guaranteed ongoing maintenance in a Python codebase. Python should never be used for any applications that need to be performant, or are going to exist for longer than a few months (or however long else Python versions last)
@@Maxifichter So you purposely installed a version that breaks your code? Thats not Pythons fault, the solution is simply to not be stupid lmao. If your program is shit, sure blame the language beacuse you dont know better. I'm sure the three weeks additional coding time are worth the three hours runtime you saved which are worthless btw, because guess what while your codes running anyway, you couldve just done something else.
@@Maxifichter most the time it's more like 1% setup and 99% heavy lifting *in python*, where switching to pure c++ offers a ~1% speedup assuming you have implemented the most efficient versions of all the wheels you are reinventing.
If you compiled C++ with (strong) optimizations on, it could actually become constant time. The compiler is allowed to optimize away the loop entirely as its body has no observable effects.
Python can't do that because it's not a compiled language to begin with.
Addendum:
For some machines, C/C++ simple loops as shown are more fast if you add one or more (try it yourself) "nop" instruction into the loop. It may appear bizarre but can be easily explained: because the instructions alignment and CPU's execution pipeline.
Eg.: #include
int main() {
size_t n = 0;
while (n++ < 1'000'000'000)
asm(
"nop;"
"nop;"
"nop;"
);
;
std::cout
good idea but not something people should be doing by hand
@@TheBuilderif the compiler would not eliminate the loop completely then it would add nops with strong optimization and I can tell you that from experience
arguably inline assembly might be a cheat in Cpps favor but yes it would be much better on newer processors not only because of the pipeline but also because of the instruction fetching from memory reading multiple bytes per clock cycle which lead to filling the pipeline itself
for best optimization you can set n to 1 billion and use a prefix decrement inside the condition
I thought gcc turns i++ into ++i automatically?
i knew python was slower, but didn't know it was this slow.
about a week ago, we were given an assignment where we had to calculate the value of sin with the geometrical interpretation of sine (sin x = x - x^3/3! + x^5/5!....). i figured the larger the value of n, the more accurate the value of sine will be. so i started out with n = 1,000,000. didn't see an output for about 10s so i killed the program, thinking i did something wrong. after i spent a lot of time on it, i decided to just put n = 1 and see what happens. it executed. I was so mad.
It is a 2 years old video now python is just 20x slower without any optimizations
To be fair that's a huge number to start out with
@@JacobKinsley exactly. so i did 1300. that is the highest value that executes it seems. the value i got was close enough, and is workable with, but it still bugs me for some reason. im a bit OCD regarding stuff like this.
Nobody just counts- except to understand the overhead. The things I would do at each increment are probably written in C, and likely designed to run on a GPU.
I use python to execute functions in libraries. These are likely to be faster implementations of code than I could write in any language simply because they are typically open source and have lots of eyes on them.
Was that the point- python is slow at loops?
I think if you use -o2 while compiling c++ it will be more optimized
Unlikely to make a huge difference in this simple of a program
@@uwirl4338 no, with O2 optimization, the compiler will just set n to 1 billion before the while loop. To truly optimize the code, he can add a register keyword when declaring n, that will make the program runs 4 times faster on my pc
@@wrnlb666 Oh really? I was under the impression the register keyword didn't even really do much on modern C/C++ compilers and was more a relic of the past, since modern compilers are so much better at optimizing now.
@@uwirl4338 -O2 will make an enormous difference. The optimizer can and will realize that the end value of n does not depend on anything unknow at he time of complication that can change so it can be precalculated. If you look at the generated assembly -O2 will eliminate the loop completely. I tested the code on godbolt with the option and -O0 takes 3 seconds compared to -O2 at 30 microseconds. It is an optimization of a factor of 100,000, so an enormous difference.
That is a runtime measure in the program around all of the code shown in the video so any startup time is not included. If iI ht optimized variant measure time to before the value is printed the time I get is 0 µs. So the 30 microseconds is the time to print the result.
If you add the volatile keyword for n it can be optimized away and you get 3s gain regardless of -O0 or -O2.
@@taragnor compiler optimization optimize things that doesn't make the result looks different. In this case, setting n to 1 billion before the while loop doesn't make the result different. Because n is not volatile, and the compiler knows that it will not be changed outside of the program.
C++ code just counts, while the other purposefully counts AND "prints", maliciously increasing the buffer size? Gauging apples vs pineapples? 🤔
Wdym?
You should brush up on your python syntax, or watch the results of the program(s) running.
The python while loop is closed by the outdent of the print statement.
Be careful with making something that simple with a constant number. Some compilers will set n directly to 1,000,000,000. Set n to volatile, or set n with a scanf.
Thanks, finally someone understanding the red flag here; nothing is done with the number, and the compiler gets it
How can it be 2,4 seconds to do that ??? How the optimiser know if it’s a pure function or if your n is not changing elsewhere… it is not a constant if you are using n++ or n += 1😅
I dont see when you are setting up the “constant” value anywhere… n++ is not a constant and n is initially 0…
Timing function should avoid that.
@@salocin9695The compiler doesn't get it, because optimizations are off. With -O3, the program would not need 2.4 seconds to run, it would need only a few milliseconds, because it would just print 1,000,000,000 without doing any work with the number.
Tried the same thing on my PC with c++ and c
Found something weird.
With the exact same c++ code my PC finishes in 1.27 seconds.
But if I replace the 'size_t' with an 'int' it finishes in 1.1 seconds. Which is kind of to be expect since size_t is bigger than int but when I tried the same in C there was NO DIFFERENCE in speed for both int and size_t??? Meaning that c++ is slower with size_t than c for some reason.
C did both in 1.1 seconds btw
Maybe you have compiled the C program for 32 bit?
@@griglog howndo you do that??
@@griglog c++ was slower that doesn't make sense
@@noapoleon_ C and C++ performance is identical. That's why I suppose one of the programs got compiled for 64-bit and other for 32, because in 64 size_t type is twice larger and takes more time to process
@@noapoleon_ To change the target platform you should add arguments to the compiler command. And to check what you've got, run "file " on Linux
can you compile and run the python code in Linux rather than using the native interpreter execute approach? I have only used python for programing microcontrollers (dumping the compiled code to the microcontrollers e^2 chip for execute on power-up).
Idk whether it's still the case but Pypy (a python interpreter designed for speed) used to beat C++ on some regular expression benchmark (basically C++ had to redo all the work for every iteration whereas the JIT could hardwire the regular expression and optimize it).
if the comparison was against std::regex, then that's no surprise, because std::regex is hilariously slow. like, "outperformed by literally any other option" slow.
CTRE would be a different story, though :)
tracing JITs are interesting tech though! there are definitely cases where it would be totally expected to trash an AOT compiler's optimizations.
@@asuasuasu JIT shine when you have data that is constant in practice but variable in theory. Like, matching repeatedly against a fixed regex. A precompiled regex lib has to analyse the regex every time. A JIT can compile a short program that represents the regex.
@@cmilkau but also std::regex is slow and you should never really use it.
@@cmilkau C++'s std::regex is not a good choice (nor a good way to benchmark C++ vs anything else), it is slow due to it's design, almost anything outperforms it.
@@cmilkau you can harness similar optimizations on compiled languages using PGO with a modern compiler to generate more optimized machine code for your input space.
I would point out that while loops are actually slower than for loops in python, but i guess its fair because you used while loop in c++ too.
why are they slower wtf
@@vadiks20032 Probably something gets optimized if the compiler knows how long the loop will run for.
Then again, I also heard that all for loops in python are while loops under the hood
In python, while-loops are faster than for-loops.
Python's while-loops are exactly like C while-loops, but Python for-loops are more complex and contain more overhead.
Python's for-loop has to call "next()" on an object to fetch the next value, but "next()" tells you it doesn't have any more values using an exception which has to be caught and used to break out of the for-loop. It can be implemented using a while-loop and a try-except block.
@@monochromeart7311 In python, while loops are slower. stackoverflow /questions/1377429/what-is-faster-in-python-while-or-for-xrange
@@AnimatorArt24 that's a stackoverflow question from 13 years ago, about Python 2's xrange.
Python 3 doesn't have xrange anymore, and the language went through many changes in 13 years.
@TheBuilder
Sorry, I have a question.
Why are you using n++?
After all, ++n is faster.
try compiling both and check the instructions generated, they'll be identical, at least with gcc.
@@TheBuilder
Not really, we just need more tests.
the implementation of ++n is something like this: n+=1; return n;
n++ implementation; so awesome
l=n;n+=1;return l;.
@@atcheshire3030 look at the generated assembly here if you don't believe me godbolt.org/z/r6E3Tv45M
@@TheBuilder
Thank you.
I understand.
The compiler simply optimized the code.
For loop in python is faster than while, should use first one
It's faster but not by much
@@TheBuilderit is much faster, around double
@@TheBuilder uhh... "from numba import jit". Wrap counter in function with decorator @jit(looplift=True). Same operation takes just well under 1s
2.5s is excessively high, whats your CPU? I get ~250 ms on C# JIT optimized, ~750 ms on unoptimized with Ryzen 7 Laptop CPU
Got 290 ms on c# compiled for release and run on a linux laptop with a Xeon E-2286M CPU.
IF I run this raw C with no optimization flags in my asus potatobook from 2016, then I get similar times as this guy. But just using -O2 flag drops it to less than 10ms.
It's an incredibly weird "flex" anyway considering python also takes well under a second to do this if you wrap the thing in a function and use a JIT compiler
As a software engineer for the past 35 years I have found that the different computer languages are a tradeoff of speed of writing the program versus the speed of running the program. Interpreted languages like Python are great for quick and dirty one-time programs but you wouldn't want to write a Python program that mines Bitcoins.
personally i wouldnt want to write any program that mines bitcoins
Python is a relevant language for some tasks, but it’s main claim is it’s a modern interpreted language - easy to teach.
A few years ago, it would have been BASIC, compiled BASIC, then Pascal. Java tried to slide in, but was pigeonholed for slow web applications.
Things have to move forward, just like C, C++ and so on.
«««slow web applications»»»😁😁😁
Then what is not in your opinion?
That's a important trade off. Does anyone here have experimented compiling Python code with Cython or Numba? I wonder how it will perform in this situation.
Also, now with Python 3.11 it got a bit faster. But still much slower than Cpp off course.
3.11 version made it 15% slower for me lol.
Just tested in a RPi4:
- Python 3.9 + Numba 2.257s
- C++: 6.052
I, honestly, was not expecting this...
Numba needs to JIT compile your loop. It converts the code into C and compiles it on-the-fly. That's why the first execution is slow. If you use the function a lot, it's as fast as C. And it's even smart enough to put your vectorizable code onto the GPU without you noticing, it's crazy good 😂
@@marcotroster8247 Really I was not expecting that with numba the time is about 1/3 of a plain c++ loop... I was expecting something similiar in performance, so 6s about...
@@agalliazzo Numba is really good at vectorizing loops 😂 I suppose it's doing multiple additions at once with SIMD optimizations or GPU 😎
There are multiple ways you could speed up counting in Python. You can use the multiprocessing module to count in parallel across multiple CPU cores. Using Numba, a Just-In-Time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. You can also use a library called CuPy that leverages NVIDIA GPUs to accelerate array computations. these methods speed up counts to a few seconds.
How sad Python has to use multi cores to do basic counting
You can use _ in python the same way you use ' in C++: 1_000_000_000
You can also keep the semicolon: print(123); works in python without issue
You really added a semicolon in Python 😂
Python enjoyers be like: oh my gosh you added STRUCTURE and READABILITY to your code?? lol lmao wtf is that what are you?? A loser????
this is called muscle memory 😂
subconscious bias
You can do a for loop instead in Python to optimize it even further
Or don't ask python to do things it isn't meant to do
Why exactly is it so much slower? Is it the way it handles loops, the numbers, or something entirely different?
By the way, these types of loops would normally just be optimized to print out 1 billion from the start.
I’d love to see C and assembly for comparison. But not shitty count, something more realistic
@@Hephasto
Nowadays if you use some better C compiler it generates like 95% perfect asm code. For "more realistic" = more complicated programs it's even faster to write in C unless you REALLY know how to optimize in assembly just because there is already 50 years of optimization experiences in the compiler
But that would have taken less than a millisecond.
@@donovan6320 You can try it, just do the same loop but compile with the /02 or /03 flag. It won't compute values that aren't read.
Now a better way to test how quickly a computer can increment would be to start a timer and use some time limit as the condition to exit the loop. The result should be in the hundreds of millions per second, but I'll test it myself in a sec
@@puppergump4117 I can too lol, would do it
In C++ u can use ++n as probably the fastest way to increase varlue by 1. Also for loop can be faster.
Python for loop is runnig C for loop so it would be much faster.
Why would ++n be faster? It's still the same assembly instructions right? Isn't it just returning before the other instructions?
@@zephor6664 It may be faster, since i++ usually cannot be implemented without an intermediate copy, but ++i is done in place and returns a reference to itself.
@@decode9242 Both ++i and i++ produce the same INC instruction for integers. There is no difference. If you take their value, e.g. x = ++i; vs x = i++, then there is a difference of _when_ the value if i is copied to x but i is still simply incremented if it resides in a register in most cases. Depending on the context, the compiler might also use a LEA instruction to perform the arithmetic if it deems it beneficial.
Where the prefix vs. postfix notation _may_ make difference is with C++ iterators. It's generally preferable to use the prefix form, i.e. ++it. If the ++operator is inlined, even the postfix form might actually result in identical code if the value of it++ is not used anywhere, the temporary copy is optimised away.
It depends on many factors, including the compiler type, version, CPU type, register size, surrounding code, etc.
I have just started learning python as my first language, should i change to C or C++? Pls help
no, python is a great first language
I actually came across this recently. I was trying out micro-python for the first time on an RP2040 Rasberry Pi PIco W.
The embedded version of "Hello world" is "Blinky", you blink the LED on and off. Wow!
I figured out the gpio library, but I couldn't bother to find the "Delay" function/method/module, so just counted to 100,000. I was expecting to have to ramp that up much higher and thankful I didn't need to think about datatype in python.... but no. The LED blinked with about 1Hz frequency. counting to 100,000 took 500ms! To be honest I haven't switched it back on again. I don't think it's the RP2040's fault, I think it's python's fault.
If I did that in C on the same board the LED would just look a little dim and flickery. If I did it in ASM it would just look like it was on.
"counting up to a number" is a very bad baseline for comparisons. For one, these languages have extremely different performance characteristics.
(Micro)Python is naively interpreted. It absolutely doesn't have advanced optimization passes that would make it able to reason about this.
A C++ compiler will be able to make sense of your counting, and is likely to trash away your loop, 1. because it'd be able to figure out that you're just counting up to 100000, and 2. because you're probably not even using the value... so it can just optimize it away entirely.
Though, if you need anything reasonably fast then MicroPython is not a good idea, for sure. I heavily use Python and I dislike the idea of using Python in embedded programming.
Assembly is not a magic bullet. If you know your C/C++ and optimization, then dropping down to assembly is rather unlikely to let you write faster code.
Bear in mind that the raspberry pi foundation's goal is to teach people programming. Python is their standard language for everything they do (hence raspberry "pi"), because it's easy to learn.
Having said that, you can program the pico in cpp as well.
Ported an NQUEENS algorithm to both J2ME (SPH-M330) and an arduino (ATMEGA328P).
The arduino was ~20% faster. Arduino UNO clones usually clock the ATMEGA328P at 16Mhz.
The Samsung has a full 32bit ARM processor at (probably) 192Mhz. It might even utilize the "jazelle" instruction set. Thats honestly really fucking pathetic for Java.
Needless to say, I was incredibly pleased to find out that my stupid conway GOL demo ported to C (Qualcomm BREW) was so fast that blinkers are a blur on screen.
I didn't see printed char on console. How we can check this two examples are equivalent?
their not, his first code is literally just pasting 1b into the console instead of actually of putting in every number 1-1b into the console.
@@XSuperModzzX???
It depends upon the quality of the optimizer. A good optimizer could perform the calculation at compile time and simply spit out the calculated answer at run time, even for Python.
you can make it faster by doing cout
The reason is, Python is an interpreted language, means in a loop like this each line has to be "compiled" over and over again. C++ is compiled once into a system specific assembler code and than executed.
Correct answer, c++ is a middle Level Languages, python is a scripting/Interpreted Languages
@@Goose-f7d ohhh
What is the biggest factor on this time difference? I only used python for development. Please?
The Python ecosystem was developed around the idea of calling highly specialized functions to do complex tasks. It doesn't matter that Python is slow. As a developer you are just using some library anyway, and all the functions in that library are probably written in some other language, like C++ or Java
@@TheBuilder I see Thank you very much
You have had to disable the optimization from gcc, obviously the optimized code is faster because it would just straight print the one billion and skips the loop
His C++ took 2.4 seconds to complete….clearly the loop is still running.
It doesn't skip anything. A program with removed loop will complete in 10-200 nanoseconds, depending on API call convention and operation system.
Optimizations are disabled by default. With optimizations he would get ~3x improvement of performance.
@@alexanderd.7818 In this case, he would get the result immediately, even if he 'looped' to a trillion.
That's roughly the difference between O(1) and O(n) algorithms.
@@vladimirarnost8020 It depends on whether the dead code elimination is enabled or not. If it is, then the loop will be removed and the program will complete execution immediately. Otherwise, it will loop through all iterations honestly and it will take approximately 0.5s on his hardware.
C++ is faster no doubt. Python is still pretty awesome though and doesn't take away from its usefulness.
Good thing this wasn't a test of usefulness
@@ozymandias_times9663 right
What about if you compile both projects? Btw 2 different languages for 2 different uses. Love python versatility and the huge number of free libraries basically for every use
My tests with Rust:
rustc:
real 1.184s
user 1.183s
sys 0.001s
Cargo (debug):
real 1.177s
user 1.176s
sys 0.001s
Cargo (release):
real 0.003s
user 0.000s
sys 0.003s
System runs on a Ryzen 7 5800U with 16 GB of RAM and Fedora 36 Workstation.
if you inspect the binary you'll probably see a syscall to print the computed value
@@TheBuilder Yup, pretty much.
what, how? i have been waiting for over 7 minutes and its still going on.....
here is the code i used for rust
fn main() {
let mut number = 0;
while number < 1_000_000_000 {
number += 1;
println!("{}", number);
}
}
can you send the code you used?
@@QmVuamFtaW4 you are printing the number every loop
The compiler might be smart enough to just print the final value
I've just tried this with pypy:
c++ 1.04s
python 40.21s
pypy 1.01s
wow nice! I didn't even know there was something even faster, i will figured out ^^
this is because the back-end of python in written in C. you can get the python interpreter to loop in C rather than in python
@@thecheesybagel8589 exactly. It was strange for me to watch speed competitions when the developer did not use all the capabilities of Python for this
@@felixmarshall8614 he also dont use capabilities of c++ -o2
I want to see the compiled code to see if the compiler is able to optimize the loop out.
Interestingly, this is about the same amount of time that a *_real_* python takes to suffocate its prey...
... and about 777 times faster than it took to confuse a cat
But your Python example wasn't slower in _human time,_ only in _CPU time._ I'd say it was about half as many characters long, so would have taken you half the time to type.
Python takes less time to learn, less time to set up, less time to code, less time to debug, and less time to deploy. How long it takes for the code to actually execute is almost completely irrelevant in most use cases. Even in contrived examples where it takes very long indeed, CPU hours are still very cheap compared to man hours.
I've been writing hacky Python for more than 20 years, never worrying about optimizing anything, and rarely had a script that took more than a few seconds to run. Most times it will have finished faster than I could have even typed the compile command in C++.
But if you happen to have a use case where speed is indeed imperative, then you switch to whatever language is more suitable, of course. Be pragmatic, not dogmatic.
A fighter jet is faster than a bicycle, but not when you just want to go get a pizza.
Respect my man, you explained it wonderfully
Well said!
I’m new to programming. I’ve been learning intro python. Can someone explain why the languages have such different processing speed?
every data type in Python is a lot more complicated compared to C or C++, for example, every time you interact with a number in Python, the program will call various other functions while in C or C++ the operation is only a few assembly instructions
I want to see well written assembly vs optimized C/C++ vs Python with numba and/or Cython optimizations.
@engineer gaming 💌
int foo()
{
constexpr size_t count = 1000000000;
size_t n = 0;
for (size_t i = 0; i < count; ++i)
++n;
return n;
}
GCC 12.1 -O3
mov eax, 1000000000
ret
The code in the video is not comparable with Python because when compiled with lower optimization level the compiler isn't working with all the lights on, so to speak, and then optimizing it will fold the constant like above. It's toy code. Need more complicated expression and the inputs have to come from another translation unit without link-time code generation enabled.
We can of course make the "n" volatile but then the compiler is forced to read and write, which will make the code garbage. At least the loop will be run but again the compiler isn't fully utilised and the test is garbage. Same when using std::atomic, it'll just add a lock prefix.
// -----------------------------------------------------------------------
// volatile size_t n = 0;
// -----------------------------------------------------------------------
foo():
mov QWORD PTR [rsp-8], 0
mov edx, 1000000000
.L2:
mov rax, QWORD PTR [rsp-8]
add rax, 1
mov QWORD PTR [rsp-8], rax
sub rdx, 1
jne .L2
mov rax, QWORD PTR [rsp-8]
ret
// -----------------------------------------------------------------------
// std::atomic n = 0;
// -----------------------------------------------------------------------
foo():
mov QWORD PTR [rsp-8], 0
mov eax, 1000000000
.L2:
lock add QWORD PTR [rsp-8], 1 //
What we really want is something like this..
.L3:
add rax, 1
vpaddq ymm0, ymm0, ymm3
vpaddq ymm1, ymm1, ymm3
vpaddq ymm2, ymm2, ymm3
cmp rax, rdx
jne .L3
vpaddq ymm0, ymm0, ymm1
vpaddq ymm0, ymm0, ymm2
ret
24 additions per iterations in three accumulators with no dependency to each other, then these are summed after the loop.. the reduce add at the end is omitted. ~8x the speed of scalar loop. Then put these into their own threads and you get another linear speed up.
It's still toy code, tho.. benchmark something that can be compared, just incrementing something a billion times isn't comparable between languages in meaningful way unless C++ is crippled and then the comparison is nonsense. It's nonsense when not crippled as well, hehheh
@@n00blamersegfault lol, you need the sys_exit syscall for linux and terminateProcess function for windows
I got 46 seconds using a while loop and 43 seconds using a for loop in python3.10.6.
Try 3.11
try C
Truly guise which one is faster python or c++.....????
This is what I have read:
A C++ compiler translates C++ source code into machine language code and stores it on the disk with file extension .o (here, hello.o). The linker then links this object code file with standard library files required by the program code and, thus, creates an executable file in the end which is again saved on the disk. While we try to run and execute the C++ code file, the executable file is loaded from the disk to the memory, and then the CPU executes the program (one instruction at a time).
For python this what happens:
Internal working of Python
Python doesn’t convert its code into machine code, something that hardware can understand. It converts it into something called byte code. So within Python, compilation happens, but it’s just not in a machine language. It is into byte code (.pyc or .pyo) and this byte code can’t be understood by the CPU. So we need an interpreter called the Python virtual machine to execute the byte codes.
Python Virtual Machine(PVM): The bytecode then goes into the main part of the conversion is the Python Virtual Machine(PVM). The PVM is the main runtime engine of Python. It is an interpreter that reads and executes the bytecode file, line by line. Here In the Python Virtual Machine translate the byte code into machine code which is the binary language consisting of 0s and 1s. The machine code is highly optimized for the machine it is running on. This binary language is only understandable by the CPU of a system.
now do it with JavaScript