Hey Emma, appreciate the time and effort creating amazing content. Your channel has helped me to get a DS offer from a top tech company, the AB testing series is intuitive. Needless to say, ML-related topics aren't complex as other sources, they are easy to understand and the implementation part is awesome. Looking forward to watching more ML-related videos!
time complexity is not O(MN) as N is actually constant. Thus, complexity is O(M) + O(M log M) = O(M log M) also it is unnecessary to save all distances, only the top k matter, for which a constant k-size heap could be used. Thus space complexity is constant.
I really appreciate your effort in preparing these contents! By far, going over your videos is the most efficient way for me to re-cap the key concepts to prepare for data science interviews. Thanks a lot!
I think the predict function is only for one point prediction. I have made some updates to have it for a dataset: def predict(self, data, k): predict_output = [] for point in data: distance_label = [ (self.get_distance(point, train_point), train_label) for train_point, train_label in zip(self.x, self.y) ] neighbors = sorted(distance_label)[:k] predict_output.append(sum(label for _, label in neighbors) / k) return predict_output
It depends on the company, if they are in need of A/B testing experts, they might ask in the interview process. Also, don't fully trust the job description, I'd highly recommend asking the recruiter what kind of questions will be asked so that you could prepare accordingly!
Isn't the space complexity from the distance_label array going to be O(m) + O(n), since we are first calculating the distance for each feature m, then summing it into a single value, then storing that value for each point in the training set n?
Hey Emma, I get stuck with the definition the data structure of x and y ruclips.net/video/P-mM9396Dn8/видео.html , would you mind illustrating this much deeplier?
Thanks Jiayi Wu -- At 4:49, for implementation of the self.distance function, you can refer to this video. ruclips.net/video/uLs-EYUpGAw/видео.html
Hey Emma, appreciate the time and effort creating amazing content. Your channel has helped me to get a DS offer from a top tech company, the AB testing series is intuitive. Needless to say, ML-related topics aren't complex as other sources, they are easy to understand and the implementation part is awesome. Looking forward to watching more ML-related videos!
Thanks Emma! This is helpful! Looking forward to the videos on optimizing naive KNN algo and more ML algo!
time complexity is not O(MN) as N is actually constant. Thus, complexity is O(M) + O(M log M) = O(M log M)
also it is unnecessary to save all distances, only the top k matter, for which a constant k-size heap could be used. Thus space complexity is constant.
log(n_samples) is not necessarily larger than n_features. 2**20 ~ 1 million, 2**30 ~ 1 billion. we could easily have more than 30 features
I really appreciate your effort in preparing these contents! By far, going over your videos is the most efficient way for me to re-cap the key concepts to prepare for data science interviews. Thanks a lot!
I'm glad you like them! Thanks for taking the time to tell me and best of luck with your interviews.
Helpful
Thanks a lot for your content and efforts. Your videos are very helpful for the revision. Looking for more videos on other algos.
Hi Emma, thank you so much for making awesome videos! It helped me a lot!
Incredible video! Awesome explanation
Thank you Emma! Great content as always. It was so hard to find reliable resources to learn MLE interviews nowsaday haha
Thanks for your commend, Minh. I'm glad to hear you're finding my content helpful! 😊
I think the predict function is only for one point prediction. I have made some updates to have it for a dataset:
def predict(self, data, k):
predict_output = []
for point in data:
distance_label = [
(self.get_distance(point, train_point), train_label)
for train_point, train_label in zip(self.x, self.y)
]
neighbors = sorted(distance_label)[:k]
predict_output.append(sum(label for _, label in neighbors) / k)
return predict_output
can you send me whole code for knn
Hi Emma, thank you so much for all your videos, they are all super helpful! Can you please do more ML and Python coding videos in the future?
Yep! More to come, stay tuned!
Amaizing content, thank you so much 🤟
Great video as always Emma! Do you think in new grad interviews they ask about A/B testing if its not in the job description?
It depends on the company, if they are in need of A/B testing experts, they might ask in the interview process. Also, don't fully trust the job description, I'd highly recommend asking the recruiter what kind of questions will be asked so that you could prepare accordingly!
Isn't the space complexity from the distance_label array going to be O(m) + O(n), since we are first calculating the distance for each feature m, then summing it into a single value, then storing that value for each point in the training set n?
Hi Emma, thank you so much for your videos! I learned so much from them. Can you do one on Decision Tree ?
Hi CC! Thank you for the feedback, glad you find my content helpful! Sure, we can do a video on Decision Tree, stay tuned! :)
Would it be possible to use a min heap instead of sorting the points by distance (and implement this in linear time instead of NlogN)?
nice video, may I get the source code of this video.
The interviewer ask me about this. It is quite embarrassed to give the wrong answer
Am I missing self.distance somewhere? Thank you!
Sorry about missing the function. You can refer to ruclips.net/video/uLs-EYUpGAw/видео.html for the implementation.
Hey Emma, I get stuck with the definition the data structure of x and y ruclips.net/video/P-mM9396Dn8/видео.html , would you mind illustrating this much deeplier?