Polars vs. Pandas vs. Tidyverse vs. data.table for Left Join of Data Frames

Поделиться
HTML-код
  • Опубликовано: 26 янв 2025

Комментарии • 7

  • @ahmedal-attar3478
    @ahmedal-attar3478 4 месяца назад +2

    Probably worth noting, Polars is quicker because it's multi-threaded and uses all the cores on the machine, were as Pandas is single threaded

    • @ekbphd3200
      @ekbphd3200  4 месяца назад

      Thank you for pointing that out! I appreciate it.

    • @paulselormey7862
      @paulselormey7862 4 месяца назад +1

      Nice take, benchmark must go beyond speed. How much resources are used (CPU, memory) to achieve the apparent faster speed?

    • @ekbphd3200
      @ekbphd3200  2 месяца назад

      I’m not sure. I’ll have to analyze that next.

  • @gardnmi
    @gardnmi 4 месяца назад

    pandas has a join method. It's supposedly faster. You just have to set the join columns as the index before calling.

    • @ekbphd3200
      @ekbphd3200  4 месяца назад

      Thanks for the comment. However, I can't get join() to be faster than merge(), in fact join() is 4x slower than merge() in my code. In the pandas section of my code here:
      github.com/ekbrown/scripting_for_linguists/blob/main/Script_polars_pandas_left_join.py
      when I comment out my merge() line and uncomment the two set_index() lines and the join() line, it is 4x slower. If you get set_index() + join() to be quicker than merge(), please leave a reply with how. Thanks!