Python for HPC

Поделиться
HTML-код
  • Опубликовано: 10 ноя 2024

Комментарии • 4

  • @Morpho32
    @Morpho32 4 года назад +1

    Very good video, thank you!
    However I replicated exactly the same example as him for the groupby using Numba and it's still much slower than Pandas groupby... I don't understand how he got it to quick for the njit version... My groupby took 3 seconds while njit takes 14 seconds... so much slower...

    • @Morpho32
      @Morpho32 4 года назад

      OK I noticed I had done differently, I had created a DataFrame and not a Series. I am surprised by how much difference this makes! Basically if you create a DataFrame instead of Series, it makes it much slower. I would understand for the groupby of Pandas but I don't understand why this impacts the NJIT version !? Nowhere in the function we specify whether it is a DataFrame or a Series since we pass the numpy arrays to the function and not the Pandas objects... So how come Numba is much slower? How does it know ???

    • @Morpho32
      @Morpho32 4 года назад

      OK... I reply to my own message... Numba will make the difference between whether it is DataFrame or Series because at some point we use np.zeros_like(m) ... Basically if we use DataFrame then it will create a DataFrame for the output (output in the function)... which slows the function a lot compared to when output is a Series.
      So if you have a DataFrame at the beginning, you can just replace m_numba = np.zeros_like(m) by m_numba =np.zeros(len(m)) and it will be fast.