Thanks for the comment. However, I can't get join() to be faster than merge(), in fact join() is 4x slower than merge() in my code. In the pandas section of my code here: github.com/ekbrown/scripting_for_linguists/blob/main/Script_polars_pandas_left_join.py when I comment out my merge() line and uncomment the two set_index() lines and the join() line, it is 4x slower. If you get set_index() + join() to be quicker than merge(), please leave a reply with how. Thanks!
Probably worth noting, Polars is quicker because it's multi-threaded and uses all the cores on the machine, were as Pandas is single threaded
Thank you for pointing that out! I appreciate it.
Nice take, benchmark must go beyond speed. How much resources are used (CPU, memory) to achieve the apparent faster speed?
I’m not sure. I’ll have to analyze that next.
pandas has a join method. It's supposedly faster. You just have to set the join columns as the index before calling.
Thanks for the comment. However, I can't get join() to be faster than merge(), in fact join() is 4x slower than merge() in my code. In the pandas section of my code here:
github.com/ekbrown/scripting_for_linguists/blob/main/Script_polars_pandas_left_join.py
when I comment out my merge() line and uncomment the two set_index() lines and the join() line, it is 4x slower. If you get set_index() + join() to be quicker than merge(), please leave a reply with how. Thanks!