Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code

Поделиться
HTML-код
  • Опубликовано: 27 янв 2025

Комментарии • 12

  • @jfr9964
    @jfr9964 5 лет назад +14

    So I am working on analysing a chaotic system, and I needed to compute a large number of trajectories for different initial conditions. Even with multiprocessing, and all the optimization I could think of, it was taking forever. Then I watched this video, promptly added 3 lines of code: import numba, and two of the decorators, and got a speed-up by a factor of 20. Words cannot express how grateful I am for this video.

  • @thinkmichaelthink
    @thinkmichaelthink 6 лет назад +20

    The seven strategies are:
    1. Line profiling (4:37)
    2. NumPy (5:33)
    3. Specialized Data Structures (9:10)
    4. Cython (12:07)
    5. Numba (13:47)
    6. Dask (15:23)
    7. Find an Existing Implementation (18:56)

    • @maciejurbanski6146
      @maciejurbanski6146 5 лет назад

      now to make this list closer to being complete:
      PyPy - JIT like Numba for all your code, but at cost of compatibility
      Pytorch - think numpy but GPU-based (arrays -> tensors)

  • @FinallyAFreeUsername
    @FinallyAFreeUsername 6 лет назад +6

    Another great JVDP talk.

  • @yinzhangfred
    @yinzhangfred 6 лет назад +2

    I am a bit surprised that he didn't mention pypy, which can be fast not only for numerical computation, but for string operations as well. But then, you would have to point out that it is not yet very compatible with pandas/numpy.

    • @magno5157
      @magno5157 5 лет назад

      Can Pypy compete with the speed of Numpy and Pandas? Or is Pypy slower?

    • @yinzhangfred
      @yinzhangfred 5 лет назад

      pypy vs numpy are for different things. If your data fits into RAM and are in an array/ data frame format, then I'd say go for numpy or pandas (I personally use pandas, though it uses about 4X RAM of your data size). On the other hand, if you write 'vanilla' python or need to process your data line by line (for whatever reason), pypy is good. It can speed up even string operations while Numba can not. @@magno5157

    • @magno5157
      @magno5157 5 лет назад

      ​@@yinzhangfred You've just made me curious. Could you give me examples of numerical computations that do not require arrays and data frames?

    • @yinzhangfred
      @yinzhangfred 5 лет назад

      @@magno5157 Ah, you are right, I did not read the title of the talk carefully, where it reads "optimizing your NUMERICAL code". The examples I have are not quite "numerical". For example, I use pypy to turn a large structured log file to csv. If I didn't come across pypy, I would have to deal with c.

  • @ErickMuzartFonsecadosSantos
    @ErickMuzartFonsecadosSantos 6 лет назад +9

    I was expecting one of the strategies to numerical optimization in python to be using GPU execution, through pytorch/cuda, for example. Any comment on that?

    • @ErickMuzartFonsecadosSantos
      @ErickMuzartFonsecadosSantos 6 лет назад +8

      Ray Donnelly, executing numerical calculations on GPU goes beyond the use cases of machine learning. Take a look at general purpose programming on GPUs: en.m.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
      An obvious use case of this approach would be to offload to the GPU compute intensive calculations, such as matrix multiplications, thus improving performance of python code. Pytorch provides functionality similar to numpy and so could be a viable alternative for "optimizing your numerical code".
      I would have liked feedback on this approach, benchmark references or just experiences porting, say, numba code to pytorch.

    • @JJ-xi2vp
      @JJ-xi2vp 6 лет назад

      I was also a bit disappointed. Although there is another good talk about CuPy which aims to be a numpy for the gpu. Which sounds great.