Hey, I love your videos but this is not the most optimal, or even the 2nd most optimal solution. You can improve this solution by having a min heap of size k. Then, go through the list of points and add to the heap. If the size of the heap exceeds k, pop from the heap. By the end, your heap will contain the k closest points. This is O(nlogk) time (which is similar to your solution) but only O(k) space, whereas your solution is O(n) space. There is also an O(n) solution based on quick sort called quick select.
wait if you have a min-heap and pop from it whenever size exceeds k you will just end up popping the closest points because you popped the smallest distance (min heap has min at top) -- instead use a max heap by adding -distance,x,y to the heap, then you will be left with k smallest distance (closest points) - but otherwise, you are correct
we can optimize the solution by pushing each values to the like heappush(res, [d, x, y]) => o(n) as we do the distance rather than going through the res list one moretime. But, you way I thought of it the first time but it is not possible b/c the problem statement need the only closest. So, we need to fine all of the closest and return k many of them.
That's correct. Keeping a heap of size k can make time complexity to O(nlogk). But, I think we should make a MAX HEAP of size k rather than a min-heap. Then, when an element 'x' comes in, we just compare if 'x' is smaller than the largest element in the max heap( the top one.) or not. If x is smaller, We can drop the previous top element away and add 'x' to our max heap.
OP certainly meant max heap, not min heap. This uses less memory, but in my case actually takes longer (I guess because of the extra remove operations while adding). It depends on the input. Time complexity is the same.
I really like this problem, your explanation helped me really understand it. As a js developer I would like to know how to implement a heap during the interview, otherwise I will have to use a sorted array
Leetcode lets you use priority queues(same as min/max heap) from `datastructures-js/priority-queue`. Hopefully in an interview environment you can use something like this package or just explain that the sorted array will act as a heap.
i got this in an amazon interview . I just sorted the array by distance and selected the first K elements. (not the best solution but it passed all the test cases)
just wondering will be allowed that we use this kind of build-in function like heapify in the interview? BTW I have been watching your videos for a while, it is really really really helpful!
I'm sure you would; it wouldn't be reasonable to expect the candidate to write the heap functions as part of another algorithm in a 20-minute interview!
@@MrRichiban Well there are non-heap based approaches like just sorting the list like he said in the video. Even still, in most cases you'd be able to use libraries like heapq in an interview - if you for some reason can't then just sort the list.
Each time intset a element to hep is log(N), there are n elements to insert. So the maximum upper bound for time complexity is O(NlogN). However, after doing math calculations, the approximate time complexity is O(N) instead. This involves math calculations, and all you need to know is that to build a heap, the time complexity is O(N). That's enough.
The first part is O(N logN), looping through the points array then add it to heap, a better solution would be using max heap, this will improve the solution to O(N logK)
The first part is actually O(N) because at that point in the code, minHeap wasn't turned into a heap yet. It was still a list, so appending to the list was an O(1) operation
@@studyaccount794 I would not say so. If you already have a all your elements, you can turn that list into a heap in linear time. Second he isn’t popping N elements of the list only K. That means the efficiency would still be (N + Klog(N))
i haven't checked, but i imagine it goes left to right in the same way sorting does if you pass in an iterable of tuples. Compare index 0's and if they are the same, then only compare index 1's, and so on.
If we are using heaps then we do heapify on distance values then it tooks O(n) time and for k pop operations we need klog(n) time right I.e total time is n+klog(n) . But if we use arrays it took O(nlogn) time and for k pop operations it tooks constant time right. I.e total time is nlogn only . Is nlogn is really greater than n+klogn
If we replace the append heap line with heappush and remove the heapify completely, time complexity for that for loop will be O(nlogn) i.e n times heappop is nlogn. Right? So this following logic this worse than appending and heapify, right? Kinda counter-intuitive! heap = [] # for loop is n time logn for p in points: x,y = p d = x**2 + y**2 heappush(heap,[d,x,y]) res = [] # k time logn while k: d,x,y = heappop(heap) res.append([x,y]) k-=1 return res
Quick Select with Random Selects, O(N) average time. I chose quick select here because all data was present (no stream) and we are not performing any transformative operations on the actual inputs (as in last stone weight). """ We have points Xi, Yi, and an integer k. We want the K-closest points to the origin. We could use a heap to solve this problem in O(NLogN) (heapify plus popping...) However a better solution will be to use quick select which gets an average case runtime of O(N) time! The closests points will have smallest euclidean distances ... """ class Solution: def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]: def partition(l,r): #choose a random pivot pivot = random.randint(l,r) pivot_distance = euclidean_distance(points[pivot]) #move pivot to end points[pivot], points[r] = points[r], points[pivot] #store idx for swapping store_idx = l for i in range(l,r): if euclidean_distance(points[i]) < pivot_distance: points[i], points[store_idx] = points[store_idx], points[i] store_idx += 1 #switch store_idx with pivot points[r], points[store_idx] = points[store_idx], points[r] return store_idx
def quick_select(points,k): left, right = 0, len(points)-1 pivot_idx = len(points) #continue parititoning while pivot != k while pivot_idx != k: pivot_idx = partition(left, right) #k smallest lie to the left if pivot_idx > k: right = pivot_idx - 1 #the k smallseet lie to the right & to the left of pivot_idx all values smaller, so left = pivot_idx else: left = pivot_idx #return k closest points return points[:k] def euclidean_distance(point): x,y = point return math.sqrt(x**2+y**2) #perform quick select and return k smallest return quick_select(points, k)
Hey can we just use lambda function to store the key value than sort it and then just remove elements after bigger then k def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]: points.sort(key = lambda x: x[0]**2 + x[1]**2) return points[:k]
Sounds like the sort approach would be O(nlogn) but the heap approach would be O(n). But yeah the sort approach is way faster than the heap approach on leetcode
@@jwswsame doubt man. Do I need to learn heap now ? Or should I go on with the sorting ? I was looking for this comment desperately. You suggest is valuable
Got the N*log(K) solution. But remember not to apply leetcode style programming in real life, especially not in python. Exploit numpy parallelism/vectorization instead of using sequential loops: points = np.array(points) dists = np.linalg.norm(points, axis=1) k_indices = np.argpartition(dists, k-1)[:k] return points[k_indices].tolist() Faster than 95% of leetcode solutions, although with huge variance since their timing system is broken.
How did python automatically use min heap? Didn't you say in another video that Python only supports max heap, and to use min heap we have to use the negative values?
I don't understand the running time of this. Adding an element to a heap is O(log N). Doing this for N elements is O(N log N). Popping from the Heap K times is O (K log N). Since K < N, the total running time should be O(N long N)
In the worst case you are correct, but minheap will be faster in the average case where k != n. Heapify operation does NOT take nlogn time, it takes n time. So the overall complexity of the minheap solution is O(n + klogn) , whereas with sorting it's O(nlogn)
heapify doesn't just insert N times. It uses Floyd's algorithm which is O(N). The intuition is that binary trees are bottom heavy; in a 15 node tree, 12 of the nodes are in the bottom two levels, and only 3 are on the top two levels. The bottom nodes don't have to be moved far to heapify. The top nodes might have to move further but there are fewer of them. If you actually count how many nodes have to be moved and sum it up , it works out to O(N) in the end.
There is a faster max heap solution. There is also an on-average O(N) solution using quick select as described here: ruclips.net/video/8F4H1G2MmY8/видео.html
your approach has complexity of nlogn thus heap power is unused complexity is suppose to be nlogk checkout below from heapq import heapify, heappush, heappop from math import sqrt class Solution: def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]: x = [] i = 0 while(i < k): first = points[i][0] second = points[i][1] heappush( x, (-sqrt(first*first+ second*second), points[i]) ) i += 1 while(len(points) > i): first = points[i][0] second = points[i][1] heappush(x, (-sqrt(first*first+ second*second), points[i])) heappop(x) i = i+1 answer = [] for i, j in x: answer.append(j) return answer
Would my 1 line solution be cheating? return heapq.nsmallest(k, points, key=lambda x: x[0]*x[0] + x[1]*x[1]) Runtime: 1071 ms, faster than 64.20% of Python3 online submissions for K Closest Points to Origin. Memory Usage: 20.3 MB, less than 79.53% of Python3 online submissions for K Closest Points to Origin.
Thanks!
Got an offer from FB, this question was the technical screen! Thanks again for the great content
Hey Nathan congrats on FB, it's so awesome your hard word paid!!!!
Thank you so much for the support, I really appreciate it! 😊
Hey! Congrats on your offer. Would you mind if I ask you personally thru other platforms? How to contact you?
Hi Nathan, for what position was this asked? Looking for ML position questions, if anyone has any experience please share
I did not know we can heapify the a list list. Thanks man!
Hey, I love your videos but this is not the most optimal, or even the 2nd most optimal solution.
You can improve this solution by having a min heap of size k. Then, go through the list of points and add to the heap. If the size of the heap exceeds k, pop from the heap. By the end, your heap will contain the k closest points. This is O(nlogk) time (which is similar to your solution) but only O(k) space, whereas your solution is O(n) space.
There is also an O(n) solution based on quick sort called quick select.
This may not work for cases where the closest point to the origin is at an index>k.
wait if you have a min-heap and pop from it whenever size exceeds k you will just end up popping the closest points because you popped the smallest distance (min heap has min at top) -- instead use a max heap by adding -distance,x,y to the heap, then you will be left with k smallest distance (closest points) - but otherwise, you are correct
we can optimize the solution by pushing each values to the like heappush(res, [d, x, y]) => o(n) as we do the distance rather than going through the res list one moretime. But, you way I thought of it the first time but it is not possible b/c the problem statement need the only closest. So, we need to fine all of the closest and return k many of them.
That's correct. Keeping a heap of size k can make time complexity to O(nlogk). But, I think we should make a MAX HEAP of size k rather than a min-heap. Then, when an element 'x' comes in, we just compare if 'x' is smaller than the largest element in the max heap( the top one.) or not. If x is smaller, We can drop the previous top element away and add 'x' to our max heap.
OP certainly meant max heap, not min heap. This uses less memory, but in my case actually takes longer (I guess because of the extra remove operations while adding). It depends on the input. Time complexity is the same.
I really like this problem, your explanation helped me really understand it. As a js developer I would like to know how to implement a heap during the interview, otherwise I will have to use a sorted array
Leetcode lets you use priority queues(same as min/max heap) from `datastructures-js/priority-queue`. Hopefully in an interview environment you can use something like this package or just explain that the sorted array will act as a heap.
i got this in an amazon interview . I just sorted the array by distance and selected the first K elements. (not the best solution but it passed all the test cases)
Was it online assessment? Onside the interviewer will ask you to optimise it
@@appcolab it was online assessment but I solved the question again on leetcode using the same method and it was pretty efficient (faster than 93%)
Awesome, did you go on to clear the onsite interviews?
@@appcolab I am still in the interview process I just did my phone interview and I am awaiting the results
All the best!!
It's me again. Thank you NeetCode.
just wondering will be allowed that we use this kind of build-in function like heapify in the interview? BTW I have been watching your videos for a while, it is really really really helpful!
I'm sure you would; it wouldn't be reasonable to expect the candidate to write the heap functions as part of another algorithm in a 20-minute interview!
You can generally use builtins in an interview. If you can use the built-in dictionary, why not the built-in heap?
@@MrRichiban Well there are non-heap based approaches like just sorting the list like he said in the video. Even still, in most cases you'd be able to use libraries like heapq in an interview - if you for some reason can't then just sort the list.
I just realize you sound like daily dose of internet
OUR daily dose of internet
shit you're right
Can you explain how heapify is O(n)? To build a binary heap, don't you need to call heap reorder n times making it O(nlog(n))?
Each time intset a element to hep is log(N), there are n elements to insert. So the maximum upper bound for time complexity is O(NlogN). However, after doing math calculations, the approximate time complexity is O(N) instead. This involves math calculations, and all you need to know is that to build a heap, the time complexity is O(N). That's enough.
The first part is O(N logN), looping through the points array then add it to heap, a better solution would be using max heap, this will improve the solution to O(N logK)
The first part is actually O(N) because at that point in the code, minHeap wasn't turned into a heap yet. It was still a list, so appending to the list was an O(1) operation
@@megharamanathan5997 So it takes O(1) * n to append. Then heapify takes O(nlogn).
heappop k times takes O(klogn). Is that correct?
@@megharamanathan5997 But he's still sorting the array to get the answer. Sorting takes nlogn
@@studyaccount794 I would not say so. If you already have a all your elements, you can turn that list into a heap in linear time. Second he isn’t popping N elements of the list only K. That means the efficiency would still be (N + Klog(N))
@@david_trey5523 How could you convert a list to min heap in O(n)?
such a bizarre problem. they might as well just give you an array of ints and ask you to heap sort it
Great explanation, thank you
i dont know how you come up with this kind of solutions, wow
how does heapify know what its key is?
i haven't checked, but i imagine it goes left to right in the same way sorting does if you pass in an iterable of tuples. Compare index 0's and if they are the same, then only compare index 1's, and so on.
Such a clear and efficient explanation
If we are using heaps then we do heapify on distance values then it tooks O(n) time and for k pop operations we need klog(n) time right I.e total time is n+klog(n) .
But if we use arrays it took O(nlogn) time and for k pop operations it tooks constant time right. I.e total time is nlogn only .
Is nlogn is really greater than n+klogn
If we replace the append heap line with heappush and remove the heapify completely, time complexity for that for loop will be O(nlogn) i.e n times heappop is nlogn. Right?
So this following logic this worse than appending and heapify, right? Kinda counter-intuitive!
heap = []
# for loop is n time logn
for p in points:
x,y = p
d = x**2 + y**2
heappush(heap,[d,x,y])
res = []
# k time logn
while k:
d,x,y = heappop(heap)
res.append([x,y])
k-=1
return res
Quick Select with Random Selects, O(N) average time. I chose quick select here because all data was present (no stream) and we are not performing any transformative operations on the actual inputs (as in last stone weight).
"""
We have points Xi, Yi, and an integer k.
We want the K-closest points to the origin.
We could use a heap to solve this problem in O(NLogN) (heapify plus popping...)
However a better solution will be to use quick select which gets an average case runtime of O(N) time!
The closests points will have smallest euclidean distances ...
"""
class Solution:
def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]:
def partition(l,r):
#choose a random pivot
pivot = random.randint(l,r)
pivot_distance = euclidean_distance(points[pivot])
#move pivot to end
points[pivot], points[r] = points[r], points[pivot]
#store idx for swapping
store_idx = l
for i in range(l,r):
if euclidean_distance(points[i]) < pivot_distance:
points[i], points[store_idx] = points[store_idx], points[i]
store_idx += 1
#switch store_idx with pivot
points[r], points[store_idx] = points[store_idx], points[r]
return store_idx
def quick_select(points,k):
left, right = 0, len(points)-1
pivot_idx = len(points)
#continue parititoning while pivot != k
while pivot_idx != k:
pivot_idx = partition(left, right)
#k smallest lie to the left
if pivot_idx > k:
right = pivot_idx - 1
#the k smallseet lie to the right & to the left of pivot_idx all values smaller, so left = pivot_idx
else:
left = pivot_idx
#return k closest points
return points[:k]
def euclidean_distance(point):
x,y = point
return math.sqrt(x**2+y**2)
#perform quick select and return k smallest
return quick_select(points, k)
Hey can we just use lambda function to store the key value than sort it and then just remove elements after bigger then k
def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]:
points.sort(key = lambda x: x[0]**2 + x[1]**2)
return points[:k]
Sounds like the sort approach would be O(nlogn) but the heap approach would be O(n). But yeah the sort approach is way faster than the heap approach on leetcode
@@jwswsame doubt man. Do I need to learn heap now ? Or should I go on with the sorting ?
I was looking for this comment desperately.
You suggest is valuable
why is it faster if O(nlogn) is slower than O(n)?@@jwsw
When solving this problem in Ruby, sorting the distances without a min heap was faster.
We need to use the Square Root. If a>b then sqrt(a)>sqrt(b) iff a>1
Why do heappop? Direct use indexing. O(n + k) solution
Can you explain bounded robot in the next video?
5:31 why did you re-draw the circle
Got the N*log(K) solution. But remember not to apply leetcode style programming in real life, especially not in python. Exploit numpy parallelism/vectorization instead of using sequential loops:
points = np.array(points)
dists = np.linalg.norm(points, axis=1)
k_indices = np.argpartition(dists, k-1)[:k]
return points[k_indices].tolist()
Faster than 95% of leetcode solutions, although with huge variance since their timing system is broken.
You cannot import np in interviews
How did python automatically use min heap? Didn't you say in another video that Python only supports max heap, and to use min heap we have to use the negative values?
It was the opposite, python has min heap and we had to use the negative to simulate a max heap
Correction: Heapify is O(logn), here its refering to build heap which is O(n)
Time Complexity: O(n + klog(n)) right?
Yup. In the case where k = n, this is actually worse than sorting.
I solved this problem using sorting.
this could actually be solved using quick select in O(n) and O(1), not sure if heap solution is acceptable in an f and g equivalent interview
It looks like quick select gives me time exceed limit due to a lot of duplicates in the list
@@PeachHeartBlog You can use 3-way partitioning to deal with duplicates.
I don't understand the running time of this. Adding an element to a heap is O(log N). Doing this for N elements is O(N log N). Popping from the Heap K times is O (K log N). Since K < N, the total running time should be O(N long N)
Can anyone explain, actually where we have to use Heap / Priority Queue? For this problem, time complexity of both normal sorting and minheap are same
In the worst case you are correct, but minheap will be faster in the average case where k != n. Heapify operation does NOT take nlogn time, it takes n time. So the overall complexity of the minheap solution is O(n + klogn) , whereas with sorting it's O(nlogn)
@@YeetYeetYe Why does Heapify operation take n time? Isn't heap insertion log n? and doing that n times is basically "n log n"?
@@xalizalizx I would highly recommend this video as an explanation: ruclips.net/video/MiyLo8adrWw/видео.html&ab_channel=AlgorithmswithAttitude
heapify doesn't just insert N times. It uses Floyd's algorithm which is O(N). The intuition is that binary trees are bottom heavy; in a 15 node tree, 12 of the nodes are in the bottom two levels, and only 3 are on the top two levels. The bottom nodes don't have to be moved far to heapify. The top nodes might have to move further but there are fewer of them. If you actually count how many nodes have to be moved and sum it up , it works out to O(N) in the end.
There is actually a O(N) solution too for this
There is a faster max heap solution. There is also an on-average O(N) solution using quick select as described here: ruclips.net/video/8F4H1G2MmY8/видео.html
Apologies. The quick select solution now gives a TLE on LeetCode because of a pathological testcase. But the max heap solution is still good.
why is it obvious that we need to pop k times?
Hey! Can you do Kth largest element in an array? Thanks for this video!
its the exact same question as this.
just use a maxheap instead of the minheap
just store the negative value of the distance in to the heap
u r a god!
your approach has complexity of nlogn thus heap power is unused
complexity is suppose to be nlogk checkout below
from heapq import heapify, heappush, heappop
from math import sqrt
class Solution:
def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]:
x = []
i = 0
while(i < k):
first = points[i][0]
second = points[i][1]
heappush(
x, (-sqrt(first*first+ second*second), points[i])
)
i += 1
while(len(points) > i):
first = points[i][0]
second = points[i][1]
heappush(x, (-sqrt(first*first+ second*second), points[i]))
heappop(x)
i = i+1
answer = []
for i, j in x:
answer.append(j)
return answer
There's no need for actual SQRT, all you need is the squares for comparisons.
Those SQRTs actually hurt performance alot.
This screams QuickSelect to me
Share the code plz if you do it
You need to white board till u arrive at the solution (and NOT leave it halfway) before you jump to the coding section.
Why is the video quality so bad?
Please, just make sure your internet connection is strong then select 720p quality.
I believe this is NlogN solution. A better solution is to do it with Nlogk
Would my 1 line solution be cheating?
return heapq.nsmallest(k, points, key=lambda x: x[0]*x[0] + x[1]*x[1])
Runtime: 1071 ms, faster than 64.20% of Python3 online submissions for K Closest Points to Origin.
Memory Usage: 20.3 MB, less than 79.53% of Python3 online submissions for K Closest Points to Origin.
actually, heapify is faster than nsmallest
Thanks!
Hey Poojan - thank you so much, I really appreciate it!! 😊
@@NeetCode I am a fan of your videos, pls keep it up!