What if the data is dynamic? eg a few data points are added every second. So the process might start with no data at the beginning of the day and end up with millions of rows by the end of the day. This is typical for financial time series. I presume insertion of elements or copying would not be very efficient. Is pandas or any other implementation good enough for such use cases?
Correct me if i'm wrong, but couldn't you use array slicing to make operations on the array? That is because editing the sliced array edits the array as a whole.
The problem i have is that i use functions that cannot be trivially simplified to ufuncs. Stuff like detecting a rising edge, for example. How do you speed up those kinds of loops?
Just a doubt though....even if numpy ufunc does take an array as an element....internally should the elements of an array loop to get the output? So is it right to say looping does not happen in numpy?
Thanks. Very useful tips. But, the nearest neighbors example shows a fatal flaw to losing loops. The diff matrix that you generated, transforms your 1000*3 input to a 1000*1000*3 one. This leads to MemoryError in cases with larger input data. I am sorry, but having a fast loop is still a must.
That'd accidentally change the distance between two distinct points that happen to occupy the same space and not just between a point and itself. I think it depends on whether that's acceptable in your model.
Good lecture, but it would have been more interesting if he compared NumPy to other competing numerical computing softwares such as Matlab, for example...!
This guy is python gold.
+Артемий Артемий пшшш
@Артемий Артемий ৃৃ
The fact that he showed that you can do nearest neighbors without a single loop really shows the power of numpy
Jake is a great speaker. Enjoyed and learned a lot from his talks. Waiting for more
We are using his textbook in our intro to data science class, and his writing is also very informative and accessible at the same time.
Absolutely brilliant, still being appreciated, thank you Jake
awesome video. everyone starting out with numpy should watch this video. makes so much more sense now to me.
Thanks, very informative, the tips make my program run lot faster
The strategies for fast looping begin at 7:15
That is cool man, fortranmagic in ipython notebooks!!!!
Amazing video! thank you astronomer.
I'm wondering how I can reduce the for loops in my project. And it happens that I met this video😮thanks a lot😁
Great talk . Learned a lot.
Really great and enlightening talk.
Great talk. Since this talk, has there been any other methods developed to make loops faster, other than numpy? Anyone?
Numba and pypy. Also cython
Amazing insight
Great talk.
This is awesome. Thanks.
I like the embedded image of the speaker, but not when it obscures part of the current slide. ;-(
What makes it fast is what makes it slow...Zen.
Joke aside, 10x for the vid! Useful info.
Excellent talk.
Are the slides available anywhere?
What if the data is dynamic? eg a few data points are added every second. So the process might start with no data at the beginning of the day and end up with millions of rows by the end of the day. This is typical for financial time series.
I presume insertion of elements or copying would not be very efficient. Is pandas or any other implementation good enough for such use cases?
Correct me if i'm wrong, but couldn't you use array slicing to make operations on the array? That is because editing the sliced array edits the array as a whole.
10:16 it gives me 5.19 ms in pure python and 47.4 us with numpy, python is speeding up or computers are faster?
The problem i have is that i use functions that cannot be trivially simplified to ufuncs. Stuff like detecting a rising edge, for example. How do you speed up those kinds of loops?
Cristi Neagu Check out numba!
awesome video!
23:45 - KNN worth pure numpy
Just a doubt though....even if numpy ufunc does take an array as an element....internally should the elements of an array loop to get the output? So is it right to say looping does not happen in numpy?
Thanks. Very useful tips. But, the nearest neighbors example shows a fatal flaw to losing loops. The diff matrix that you generated, transforms your 1000*3 input to a 1000*1000*3 one. This leads to MemoryError in cases with larger input data. I am sorry, but having a fast loop is still a must.
Can you not work in batches and minimize the number of single operations
How does X.reshape(1000, 1, 3) - X end up in a result with shape(1000, 1000, 3)? I can't figure it out. Help!!!
Figured it out by myself. haha
Could you explain please?
@@OlumideOni docs.scipy.org/doc/numpy/user/basics.broadcasting.html
@@tabtang thank you
11:45 How many of these tasks can also be done using itertools?
They will be slower than ufuncs.
Itertools is mainly built keeping in mind memory efficiency and not really execution speed.
what about recursion?
26:02 one could also just use D[D==0] = np.inf
That'd accidentally change the distance between two distinct points that happen to occupy the same space and not just between a point and itself.
I think it depends on whether that's acceptable in your model.
A bit embarrassing, but I haven't thought about that... it was too obvious xD
Thank you, I now it's out of dated. But it was awesome
Why is that outdated? Any more effecient techniques out there to make python faster?
Good lecture, but it would have been more interesting if he compared NumPy to other competing numerical computing softwares such as Matlab, for example...!
OR....just go back to FORTRAN (or C)
almost everyone who did numpy knows this - seemingly very basic and nothing hacky!
you should link us to one of your talks anna
But he is right anyway.
not everyone