Anyone who can come up with solutions like this, on the fly, without having first studied the problem, is a genius and I salute them. Also thank you Neetcode for this video
@@APudgyPanda96 I agree with you. I'm trying to imagine why they would even have something like this. If I structured interviews like this, I'd be testing preparedness. It wouldn't be a sign of a good developer at all.
I dealt with a slightly different problem in the 1980s. I was to find the "moving median" of N points in a data sequence that keeps on coming. But I wasn't going to do this with programming, but with dedicated hardware, like gate arrays. At any instant, there were N saved points. The oldest was discarded, the sample just arrived was inserted, and the median of the N points in the list was reported. I used a min-heap and a max heap. The exciting thing was that all the comparisons and data movements in working with a heap can be done in parallel, so the O(log N) time per sample reduces to O(1).
This is just incredible... for some reason I either never learned, or don't remember learning, Heaps in my computer science education at university. Throughout the years I have always heard the word Heap but never really drilled down into what it did or where it is useful. This really inspired me to fill the gap!!
Started grinding seriously as my I have a big tech interview coming up. The more obscure problems have video solutions on youtube which are horribly explained. I really appreciate how great your solutions are now, and I already loved them before. Thank you NeetCode, hopefully everything goes well!
Man you're actually so goated. These explanations legitimely make me understand the structures and concepts and I can code out the solution from understanding the logic before you go through writing the code. While I'm sure Google was great, I'm glad you're focusing full time on this because you're the absolute best resource I've come across
I'm a bit confused on why to add the new numbers directly to small heap by default. Couldn't this be optimized by comparing the incoming number to both of the root nodes and adding the new one to the small/large heap depending on if the new value is less than or greater than the root nodes? Like let's take 3547: First add 3 to the small heap - O(1) Next 5 comes in, is greater than 3, add to large heap - O(1) Next 4 comes in, is between the values, add to small heap - O (log n) Next 7 comes in, is larger than 4 and larger than 5, add to large heap - O (log n) if that last number was a 1, its smaller than 4 and 5, add to small heap. Now the heaps are unbalanced so pop 4 from small heap, push to large heap - O (log n) + O (log n) This way whenever your trees are imbalanced you can just pop the root of the larger one and push it to the smaller one to keep them always balanced and at most you have to push and pop once rather than twice with the "swap" method in the video. I may be missing something, but it seems inefficient at first glance. Other than that, great video, you always explain these so concisely 👍
i may have figured out why.. in case either of the 2 heaps keeps increasing due to the value we keep pushing and it becomes too large. We are already making a swap why not do it without all the lookup.
@@dumdumbringgumgum2940 You are correct, but remember the lookup is merely an O(1) and it can save you a couple of O(log n) operations in case the numbers are sort of balanced such as in the example
An alternative solution could be to use self-balancing trees (AVL, Red-Black, etc.) where all the operations are log_n. However, we don't need find and instead just need to maintain a separate pointer to the median, then after each insert, depending on whether the inserted number is greater or less the current median, we move the median pointer to the element adjacent to the previous median.
I had this in an interview and was asked: - "If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?" - "If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?" How would you go about this? :) Thanks!
bump, I'm also curious how one would do this. Did you give any answers or did you intervieweres give any response Bernhard? My only thought was you could use an 8-bit integer or float.
Maybe initialise an array of size 100, where the key is the number and the value is the frequency. You can then get the answer in constant time by iterating through until you reach the halfway point. If 99% of the numbers are in the range you could do the same thing but have two extra variables for "over 100" and "under 100" and this would work as long as the median number was in the range 0-100
Excellent work on explaining this! I was wondering if you can do a follow up video expounding upon the other approaches to the problem? IMHO, the primary reason this a leetcode hard problem is because of the myriad of approaches to solving it especially given different constraints (i.e. memory bound - how can do we do it with Reservoir Sampling, Segment Trees, etc.) I saw the Leetcode solution to this problem mentions these different approaches but I think you will do a much better job of explaining it :)
A bit shorter and easier: import heapq class MedianFinder: def __init__(self): self.min_heap = [] # To store the larger half of numbers self.max_heap = [] # To store the smaller half of numbers def addNum(self, num: int) -> None: # Push the number to the max-heap (negated) to simulate a min-heap heapq.heappush(self.max_heap, -num) # Pop the smallest number from the max-heap and push it to the min-heap heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap)) # If the min-heap has more elements than the max-heap, balance the heaps if len(self.min_heap) > len(self.max_heap): heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap)) def findMedian(self) -> float: # If the total count of numbers is odd, the median is the top of the max-heap if len(self.max_heap) > len(self.min_heap): return -self.max_heap[0] # If the total count of numbers is even, the median is the average of the tops of both heaps return (-self.max_heap[0] + self.min_heap[0]) / 2
Even tho this is a hard problem and I have trouble with mediums and even some easy problems, I was able to get this one pr easily. So I guess everyone has a type of problem that just clicks. No point beating yourself up over not getting all of the patterns immediately.
thanks neetcode for amazing explanation !!!!..... you are doing a great job man.....there are a few youtube channels which implement problem code in python...YOU ARE THE BEST OF THEM 👑
Thanks for the explanation! I found the below solution a bit cleaner yet still intuitive. The idea is that for every new number, 1. we first push it to the small heap 2. then pop from small heap and push to big heap 3. check if the small heap is too small(not balanced) the code can be reduced largely in this case: class MedianFinder: def __init__(self): self.maxHeap = [] self.minHeap = [] def addNum(self, num: int) -> None: heapq.heappush(self.maxHeap, num * -1) heapq.heappush(self.minHeap, heapq.heappop(self.maxHeap) * -1) if len(self.minHeap) > len(self.maxHeap): heapq.heappush(self.maxHeap, heapq.heappop(self.minHeap) * -1) def findMedian(self) -> float: if len(self.maxHeap) == len(self.minHeap): return (self.minHeap[0] + self.maxHeap[0] * -1) /2 else: return self.maxHeap[0] * -1
Here is a bit more concise solution, if anyone is having difficulties with the one in the video: class MedianFinder: def __init__(self): self.left = [] self.right = [] heapq.heapify(self.left) heapq.heapify(self.right) def addNum(self, num: int) -> None: heapq.heappush(self.left, -num) # Balance the largest element to min-heap heapq.heappush(self.right, -heapq.heappop(self.left)) # Maintain size property if len(self.right) > len(self.left): heapq.heappush(self.left, -heapq.heappop(self.right)) def findMedian(self) -> float: if len(self.left) > len(self.right): return -self.left[0] else: return (-self.left[0] + self.right[0]) / 2.0
Why do you do this thing where you first add to the left by default, and then check & rebalance between the two heaps? Can't you peek at the min/max in either tree and decide where it should go based on that, and based on the current sizes of each tree without having to actually go through an `add` operation?
The solution is understandable, but it would take weeks of full time work for me, to find it by myself. As every test, mock code verifies your knowledge of patterns.
the addNum function should be SO MUCH simpler. You don't need to check anything if you keep everything on track. if "small" is larger, push element to there and pop from there and push it to "bigger", VICE VERSA. The heaps themselves will make the work of whos bigger/smaller.
no, first, this is not a standard method that subject to be changed without any notice. Second, it will introduce bugs if used outside the implementation of heapq.nsmallest It's an implementation specific method alongside heapq._heappop_max and heapq._heappush_max used to aid the implementation for heapq.nsmallest They will not maintain proper max heap structure when used in ways other than inside heapq.nsmallest
For some interesting reason it runs a bit slower, if we add incoming num into the big heap instead of small, and make the necessary amendment to the code to pop from big instead of small.
instead of using heap or priority queues. You can use segment tree . Let me explain , each node from l to r represents the position where each value in the segment l , r lies in the sorted array . Then you can use lazy update to update the position and binary search in segment tree to find the median :"D have fun.
i inow it's silly, but inserting into an empty heap is O(1). Getting the max or min is O(1) if it doesn't involve removing it. I think you covered that though. Awesome video. I liked and subscribed!
Hi, I have a question about this question. Can I use binary search to do insertion sort when adding a new number? This will also be a O(logn) algorithm
question: if you're doing a bunch of logn operations for every add, isn't this a nlogn algorithm? if you're adding n elements, and each time you'll do logn operations, this becomes nlogn. you may as well have just sorted it.
@@sasageyo9571 you can just store a single car, is_sorted and set it when you sort(when calling get_median), then unset the var when you insert. Seems alot less annoying than to create two different heaps.
Got asked this during a 20 minute technical interview for a FAANG internship. Got the brute force solution but didn't get anywhere close to optimal. Safe to say I didn't do well 😔
Great explanation. Couldn't heaps be avoided in this case? We can insert the values from our stream into our list in log time by using binary search (by definition empty and lists of size 1 are always sorted) and then find the median that way. Am I missing something?
@@mirkowaechter yes almost. The video's solution is 3*NlogN which at infinity is equivalent to NlogN. But sorted insertion as in the question above, is exactly NlogN
Great Explanation! - Question: Why can't we find the index to insert the element to array list using Binary Search and add it to that index as we go? Is it because insertion to a particular index in ArrayList is O(n) operation?
Wouldn't you get an index out of range if the first action was to find median? before adding at least two elements? Line 34 would run but there wouldn't be anything to return from the heaps. line 34: return ( -1 * self.small[0] + self.large[0]) / 2
This is correct. The problem states that addNum() will be called first so index out of range won't happen. You could add a guard clause that returns 0 if both heaps are empty.
heya! im not sure why the solution works for you, but i had to divide by 2.0 instead of 2 because of some floating error - took very long to debug this :/ had to read through the discuss section in leetcode to find some answers - but thanks for doing this!
The brute force solution is actually O(n^2). The reason is because it takes n time to search the array to know where to insert the current number, and then the actual insertion takes another n because you have to shift over all the elements after it.
@@electric336 but you need to calculate the time complexity for the given function ( one function that would insert it 1 time ), not for all of the possible utilizations of the function (having continuously calling the function )
Jesus, on an interview I have no idea how they expect us to come up with this if we never saw this heap solution before. I'm just gonna give them the in-order sorting solution and talk to them about the 2 heap solution if I ever get this, cause this is tough to remember.
I don't understand why line 16 doesn't output "IndexError: list index out of range" when he tries to get the first index of the empty large heap. What am I missing?
@@consistentthoughts826 ohhhh wow thank you. Seems like every company has two rounds of OAs, so that makes a lot of sense. I really need to dig into `bisect` because I keep seeing it but don't know how to use it yet. Do you know of any Easy Leetcode problems with solutions that use `bisect`? That's normally how I learn the fastest.
@NeetCode , What do you think is the problem with using re-balancing BST directly which keeps root as the median (in odd case)? I mean ,it seems like you are trying to do the exactly same thing indirectly
@@RS-vu4nn you only heapify once. In fact you don't even really need to call it since you start with an empty array. You don't heapify afterwards. When adding an item to the heap, it takes logn. Popping is O(1). With the balanced BST, you have to re balance on both add and pop.
@@atd233 Yeah you are technically right, in case of balanced tree that would be the case. I think user-defined balanced tree would be the right approach but that would be too much of hassle ,that's why people generally use two heaps instead. Thanks for helping with clarification.
@@atd233 Popping from a heap is also O(lg n) because the heap "nodes" have to be repositioned to maintain the min or max heap property. Peeking at the top of a heap is O(1).
Cant we do it in set in which insert in sorted order and will be less painful than using two heaps. max insert complexity is still log(n) and median will also be O(1)
@@thewatcherlollol Are you sure ? Check difference between and in C++. if a certain language does not have an ordered set then it does not mean it does not exist. You can always make one if a language's std library does not have it.
Speed will be a little be faster if we check item is large or equal to self.large[0], then decide which heap to add import heapq class MedianFinder: def __init__(self): self.small = [] # maxheap self.large = [] # minheap heapq.heapify(self.small) heapq.heapify(self.large) def addNum(self, num: int) -> None: if (self.large and (self.large[0]) 1: temp = -1 * heapq.heappop(self.small) heapq.heappush(self.large, temp) if len(self.small) - len(self.large) < -1: temp = -1 * heapq.heappop(self.large) heapq.heappush(self.small, temp) def findMedian(self) -> float: if len(self.small) > len(self.large): return -1 * self.small[0] if len(self.large) > len(self.small): return self.large[0] return (-1 * self.small[0] + self.large[0])/2
I think the find is O(n) here though? Because you're going through half the nodes to get the median. Which is linear. With heap it's constant. And insertion will be O(logn) for both.
Came up with this solution without watching the coded solution - class MedianFinder: def __init__(self): self.maxHeap = [] # lower half of the sorted array self.minHeap = [] # upper half of the sorted array def addNum(self, num: int) -> None: heappush(self.maxHeap, -num) # if all elements in maxHeap are not smaller than all elements in meanHeap if self.minHeap and self.maxHeap and (-self.maxHeap[0]) > self.minHeap[0]: heappush(self.minHeap, -heappop(self.maxHeap)) # check if difference in size of min heap and max heap is within allowable limit sizeDiff = len(self.maxHeap) - len(self.minHeap) if sizeDiff>1: # maxHeap has 1+ extra elements heappush(self.minHeap, -heappop(self.maxHeap)) elif sizeDiff < 0: # minHeap has extra elements heappush(self.maxHeap, -heappop(self.minHeap)) def findMedian(self) -> float: if (len(self.minHeap) + len(self.maxHeap))%2==0: # even size return (self.minHeap[0] - self.maxHeap[0])/2 else: return -self.maxHeap[0]
Does anyone know how to answer the follow-up questions? 1. If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution? I think we can replace 2 Heaps with 2 TreeMaps. The key is the number, value is the count. We need to keep the sum of values of 2 TreeMaps and update each time we balance size of the 2 TreeMaps. Because the we have 100 keys => time is O(1) and space is also O(1) 2. If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution? I think 2 TreeMaps still work without changing in this case. But time and space complexity is the same as the original problem. What do you think?
I think you can do this with a simple array. Store frequency of each number. In the getMedian() function, loop through the array and find the index where commulative frequency is equal to (number_of_elements/2). In case only 99% numbers are between 1-100, the median will definitely fall between 1-100. In such a case, store count of numbers lesser than 1 and greater than 100 and find the element at index (total_no_of_elements/2). Two things I'd like to point out: 1) the above methods are for odd number of elements. Even no of elements just require a simple check. 2) I am horrible at this so I can't be certain. I tried but if you think I am wrong, I'd be happy to hear.
Yes you can do that. But using the list insertion (list.insert()) operation has the time complexity of O(n). So that'll again result in a O(log n) + O(n) ≈ O(n) operation for add( ).
Anyone who can come up with solutions like this, on the fly, without having first studied the problem, is a genius and I salute them. Also thank you Neetcode for this video
no such human exists
The only purpose of asking this kind of question in interview , probably that have you mugged up all leetcode solutions.
Testing preparedness. This kind of problem solving is healthy and great but not a predictor of success as a developer.
@@thePontiacBandit hard disagree. I think asking this in an interview is extremely unhealthy and a waste of time for everyone invovled
@@APudgyPanda96 I agree with you. I'm trying to imagine why they would even have something like this. If I structured interviews like this, I'd be testing preparedness. It wouldn't be a sign of a good developer at all.
I dealt with a slightly different problem in the 1980s. I was to find the "moving median" of N points in a data sequence that keeps on coming. But I wasn't going to do this with programming, but with dedicated hardware, like gate arrays. At any instant, there were N saved points. The oldest was discarded, the sample just arrived was inserted, and the median of the N points in the list was reported. I used a min-heap and a max heap. The exciting thing was that all the comparisons and data movements in working with a heap can be done in parallel, so the O(log N) time per sample reduces to O(1).
You're on another level, Mr Radar - God Mode.
That's so fascinating.
wow!!
Hey I just solved this on leetcode..no. 480. It's the same problem?
haha, you started DSA since 1980s, even implementing with low-level hardware, so this channel is too easy for you, I think.
This is just incredible... for some reason I either never learned, or don't remember learning, Heaps in my computer science education at university. Throughout the years I have always heard the word Heap but never really drilled down into what it did or where it is useful. This really inspired me to fill the gap!!
Started grinding seriously as my I have a big tech interview coming up. The more obscure problems have video solutions on youtube which are horribly explained. I really appreciate how great your solutions are now, and I already loved them before. Thank you NeetCode, hopefully everything goes well!
Man you're actually so goated. These explanations legitimely make me understand the structures and concepts and I can code out the solution from understanding the logic before you go through writing the code. While I'm sure Google was great, I'm glad you're focusing full time on this because you're the absolute best resource I've come across
10:18
But two, that's "two" big of a difference 😁
Thank you NeetCode, your videos have helped me understand hard problems a bunch!
I'm a bit confused on why to add the new numbers directly to small heap by default. Couldn't this be optimized by comparing the incoming number to both of the root nodes and adding the new one to the small/large heap depending on if the new value is less than or greater than the root nodes?
Like let's take 3547:
First add 3 to the small heap - O(1)
Next 5 comes in, is greater than 3, add to large heap - O(1)
Next 4 comes in, is between the values, add to small heap - O (log n)
Next 7 comes in, is larger than 4 and larger than 5, add to large heap - O (log n)
if that last number was a 1, its smaller than 4 and 5, add to small heap. Now the heaps are unbalanced so pop 4 from small heap, push to large heap - O (log n) + O (log n)
This way whenever your trees are imbalanced you can just pop the root of the larger one and push it to the smaller one to keep them always balanced and at most you have to push and pop once rather than twice with the "swap" method in the video.
I may be missing something, but it seems inefficient at first glance.
Other than that, great video, you always explain these so concisely 👍
You're right.
i may have figured out why.. in case either of the 2 heaps keeps increasing due to the value we keep pushing and it becomes too large. We are already making a swap why not do it without all the lookup.
@@dumdumbringgumgum2940 You are correct, but remember the lookup is merely an O(1) and it can save you a couple of O(log n) operations in case the numbers are sort of balanced such as in the example
totally!! I guess he just wanted to to make the code cleaner and easier to understand.
Sounds right to me.. instead of always adding to first heap.. deciding on which one to add will save some rebalancing steps.
An alternative solution could be to use self-balancing trees (AVL, Red-Black, etc.) where all the operations are log_n. However, we don't need find and instead just need to maintain a separate pointer to the median, then after each insert, depending on whether the inserted number is greater or less the current median, we move the median pointer to the element adjacent to the previous median.
I had this in an interview and was asked:
- "If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?"
- "If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?"
How would you go about this? :) Thanks!
bump, I'm also curious how one would do this.
Did you give any answers or did you intervieweres give any response Bernhard?
My only thought was you could use an 8-bit integer or float.
Maybe initialise an array of size 100, where the key is the number and the value is the frequency. You can then get the answer in constant time by iterating through until you reach the halfway point.
If 99% of the numbers are in the range you could do the same thing but have two extra variables for "over 100" and "under 100" and this would work as long as the median number was in the range 0-100
@@mongoose2014 wow that's great. Had you done something like that on a previous problem? Very clever of you.
Yes, as @mongoose2014 said, one can use set/dict approach to solve this.
@@TheMadRunner00 I still don't understand the 99% part, could you please elaborate on it?
Very nice visual explanation. I would suggest, that instead of saying size is approximately equal, can we say size difference must be at MOST 1 ?
Yes if its not, simply remove from whichever is larger and insert into the other.
Excellent work on explaining this!
I was wondering if you can do a follow up video expounding upon the other approaches to the problem? IMHO, the primary reason this a leetcode hard problem is because of the myriad of approaches to solving it especially given different constraints (i.e. memory bound - how can do we do it with Reservoir Sampling, Segment Trees, etc.) I saw the Leetcode solution to this problem mentions these different approaches but I think you will do a much better job of explaining it :)
is leetcode premium worth it to see leetcode offcial solutions?
A bit shorter and easier:
import heapq
class MedianFinder:
def __init__(self):
self.min_heap = [] # To store the larger half of numbers
self.max_heap = [] # To store the smaller half of numbers
def addNum(self, num: int) -> None:
# Push the number to the max-heap (negated) to simulate a min-heap
heapq.heappush(self.max_heap, -num)
# Pop the smallest number from the max-heap and push it to the min-heap
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
# If the min-heap has more elements than the max-heap, balance the heaps
if len(self.min_heap) > len(self.max_heap):
heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
def findMedian(self) -> float:
# If the total count of numbers is odd, the median is the top of the max-heap
if len(self.max_heap) > len(self.min_heap):
return -self.max_heap[0]
# If the total count of numbers is even, the median is the average of the tops of both heaps
return (-self.max_heap[0] + self.min_heap[0]) / 2
whenever i came to see your approch i just had to watch 1/3 of the video cuz your explanation is soo good
Even tho this is a hard problem and I have trouble with mediums and even some easy problems, I was able to get this one pr easily. So I guess everyone has a type of problem that just clicks. No point beating yourself up over not getting all of the patterns immediately.
thanks neetcode for amazing explanation !!!!.....
you are doing a great job man.....there are a few youtube channels which implement problem code in python...YOU ARE THE BEST OF THEM 👑
Thanks for the explanation!
I found the below solution a bit cleaner yet still intuitive.
The idea is that for every new number,
1. we first push it to the small heap
2. then pop from small heap and push to big heap
3. check if the small heap is too small(not balanced)
the code can be reduced largely in this case:
class MedianFinder:
def __init__(self):
self.maxHeap = []
self.minHeap = []
def addNum(self, num: int) -> None:
heapq.heappush(self.maxHeap, num * -1)
heapq.heappush(self.minHeap, heapq.heappop(self.maxHeap) * -1)
if len(self.minHeap) > len(self.maxHeap):
heapq.heappush(self.maxHeap, heapq.heappop(self.minHeap) * -1)
def findMedian(self) -> float:
if len(self.maxHeap) == len(self.minHeap):
return (self.minHeap[0] + self.maxHeap[0] * -1) /2
else:
return self.maxHeap[0] * -1
Awesome explanation... thank you!!
Thanks, much appreciated
Thank you for this great explanation! I was wondering why can't we use a self-balancing BST like AVL or Red Black Tree ?
You can definitely use them but heaps are more optimal
Thanks!
wow, it's automatic. Thanks a lot for all the awesome videos, hope you enjoy your my coffee and have a great weekend!
Thank you so much, I really appreciate it! 😊
@@NeetCode Your videos lift my python skill to the next level. Will let you know when I get my first offer! Appreciate it!
Here is a bit more concise solution, if anyone is having difficulties with the one in the video:
class MedianFinder:
def __init__(self):
self.left = []
self.right = []
heapq.heapify(self.left)
heapq.heapify(self.right)
def addNum(self, num: int) -> None:
heapq.heappush(self.left, -num)
# Balance the largest element to min-heap
heapq.heappush(self.right, -heapq.heappop(self.left))
# Maintain size property
if len(self.right) > len(self.left):
heapq.heappush(self.left, -heapq.heappop(self.right))
def findMedian(self) -> float:
if len(self.left) > len(self.right):
return -self.left[0]
else:
return (-self.left[0] + self.right[0]) / 2.0
Thank you Neetcode for the amazing job you are doing.
Why do you do this thing where you first add to the left by default, and then check & rebalance between the two heaps? Can't you peek at the min/max in either tree and decide where it should go based on that, and based on the current sizes of each tree without having to actually go through an `add` operation?
how do you recognize that you can use 2 heaps instead of an array + binary search
Thanks for making this clear!!!
WONDERFUL EXPLAINATION..Thank you❤
The solution is understandable, but it would take weeks of full time work for me, to find it by myself. As every test, mock code verifies your knowledge of patterns.
the addNum function should be SO MUCH simpler. You don't need to check anything if you keep everything on track.
if "small" is larger, push element to there and pop from there and push it to "bigger", VICE VERSA.
The heaps themselves will make the work of whos bigger/smaller.
Does using heapq._heapify_max a standard for implementing a max heap or we need to use -1 with min heap?
no, first, this is not a standard method that subject to be changed without any notice. Second, it will introduce bugs if used outside the implementation of heapq.nsmallest
It's an implementation specific method alongside heapq._heappop_max and heapq._heappush_max used to aid the implementation for heapq.nsmallest
They will not maintain proper max heap structure when used in ways other than inside heapq.nsmallest
For some interesting reason it runs a bit slower, if we add incoming num into the big heap instead of small, and make the necessary amendment to the code to pop from big instead of small.
I have 0 knowledge of dsa or oop yet i think this is the easiest leetcode problem there ever exists.
I can do this only using C !
Great stuff, thanks mate! I've been thinking off migrating to Python for interviews but the edge case with max heap again made me stop doing that :)
Good point, the language used should add more time ( as there's usually more compilation errors going from Python to Java to say C++ )
instead of using heap or priority queues. You can use segment tree . Let me explain , each node from l to r represents the position where each value in the segment l , r lies in the sorted array . Then you can use lazy update to update the position and binary search in segment tree to find the median :"D have fun.
i inow it's silly, but inserting into an empty heap is O(1).
Getting the max or min is O(1) if it doesn't involve removing it. I think you covered that though. Awesome video. I liked and subscribed!
Hi, I have a question about this question. Can I use binary search to do insertion sort when adding a new number? This will also be a O(logn) algorithm
But only to find the insertion index. To actually insert it would still be O(n) because you potentially have to move/shift every item in the list.
@@peterparker892 Make sense! Thank u so much!
we can also do with sortedlist from sorted containers
a fantastic explanation as always, Thanks!
question: if you're doing a bunch of logn operations for every add, isn't this a nlogn algorithm? if you're adding n elements, and each time you'll do logn operations, this becomes nlogn. you may as well have just sorted it.
Well spotted
sorting 1 time takes nlogn , sorting n times will take (n^2)log(n) :)
getMedian can be called any number of times
@@sasageyo9571 you can just store a single car, is_sorted and set it when you sort(when calling get_median), then unset the var when you insert. Seems alot less annoying than to create two different heaps.
perfect solution and illustration, thanks so much
Got asked this during a 20 minute technical interview for a FAANG internship. Got the brute force solution but didn't get anywhere close to optimal. Safe to say I didn't do well 😔
That's a beautiful problem and it deserves hard difficulty. I absolutely would not come up with this solution before seeing this video.
for anyone looking for a brief solution check this out :
class MedianFinder:
def __init__(self):
self.heap = [] , []
def addNum(self, num: int) -> None:
left , right = self.heap
heapq.heappush(left , -heappushpop(right , num))
if len(right) < len(left):
heapq.heappush(right , -heapq.heappop(left))
def findMedian(self) -> float:
left, right = self.heap
if len(right) > len(left):
return float(right[0])
else:
return float(-left[0]+ right[0])/2
Great explanation. Couldn't heaps be avoided in this case? We can insert the values from our stream into our list in log time by using binary search (by definition empty and lists of size 1 are always sorted) and then find the median that way. Am I missing something?
N log N
Yes.
For 1 elem -> log1
2->log2
3->log3
...etc
N->logN
Do the sum:
Log1+log2+log3+...+logn=log(1×2×3×4×..×n)=log(n!)~nlogn
So it is
NlogN
@@dadisuperman3472 isn't that the same for the heap solution?
@@mirkowaechter yes almost.
The video's solution is 3*NlogN which at infinity is equivalent to NlogN.
But sorted insertion as in the question above, is exactly NlogN
@@dadisuperman3472 the video solution is O(logn), not O(nlogn).
Great Explanation! - Question: Why can't we find the index to insert the element to array list using Binary Search and add it to that index as we go? Is it because insertion to a particular index in ArrayList is O(n) operation?
Yup, that's correct! You will need to shift the elements so it will be O(n).
Why can't I think of such amazing solutions 😢
Wouldn't you get an index out of range if the first action was to find median? before adding at least two elements?
Line 34 would run but there wouldn't be anything to return from the heaps.
line 34: return ( -1 * self.small[0] + self.large[0]) / 2
This is correct. The problem states that addNum() will be called first so index out of range won't happen. You could add a guard clause that returns 0 if both heaps are empty.
best explanation ever! thank you so much for this.
Could we use python's ._max_heap instead of multiplying by -1?
heya! im not sure why the solution works for you, but i had to divide by 2.0 instead of 2 because of some floating error - took very long to debug this :/ had to read through the discuss section in leetcode to find some answers - but thanks for doing this!
Can you link it here?
Thanks for this I was wondering why mine was working too. This fixed it.
The brute force solution is actually O(n^2). The reason is because it takes n time to search the array to know where to insert the current number, and then the actual insertion takes another n because you have to shift over all the elements after it.
Then it means that we have to do 2*n operations, not n^2, so it is still O(n)
@@Lulit999 No, the insertion takes n, and you insert n times, that is O(n(n)) = O(n^2)
@@electric336 but you need to calculate the time complexity for the given function ( one function that would insert it 1 time ), not for all of the possible utilizations of the function (having continuously calling the function )
Best explanation. Thank you
Thanks for making this look easy and Isn't that O(n) as we are finding lengths of small and large?
Thank You for the amazing explanation Neetcode
Jesus, on an interview I have no idea how they expect us to come up with this if we never saw this heap solution before. I'm just gonna give them the in-order sorting solution and talk to them about the 2 heap solution if I ever get this, cause this is tough to remember.
It's not that hard once you get familiar with different DSs and have practiced multiple examples utilizing them, keep it up, people!
amazing explanation, thank you very much sir !
Shouldn't the return self.small[0] should be -1* self.small[0] in the findMedian method. As we are artificially adding them for our purposes.
I Don't understand when we are chcking if every num in small is
I think that the heap solution is not much intuitive but thanks neetcode for such good explanation
Any reason why insertion via binary search wouldn't work in the same time complexity?
I think because bst is not generally balanced. Other people said that a balanced search tree would be an alternative solution.
Very nicely presented, thanks!
I don't understand why line 16 doesn't output "IndexError: list index out of range" when he tries to get the first index of the empty large heap. What am I missing?
Thank you so much for made this amazing video!
I believe in C++, we cant delete arbitrary elements from heap, so we would have to modifications in this approach.
not too bad! I thought I had to implement my own heap
Best Explanation!
which board you are using?
very well explained !!!
Binary search actually works for this problem.
awesome explanation.. thanks a lot!
why you dun use sortedcontainers? is it we are not allowed to use this library?
Nice explanation!
If in interviews this is how we have to explain but in any OA or CP we can use this
bisect.insort(nums,num) this will do all the minheap stuff
What do OA and CP abbreviate?
@@PippyPappyPatterson Online Assessment and Competitive Programming
@@consistentthoughts826 ohhhh wow thank you. Seems like every company has two rounds of OAs, so that makes a lot of sense.
I really need to dig into `bisect` because I keep seeing it but don't know how to use it yet.
Do you know of any Easy Leetcode problems with solutions that use `bisect`? That's normally how I learn the fastest.
@@PippyPappyPatterson try " first and last occurrence of an element in an array" question in leetcode. You will get an idea
@@consistentthoughts826 Thanks chief, happy tech-whale hunting.
Why didn't you use -1* in find median def while calculating median ?
he did
@NeetCode ,
What do you think is the problem with using re-balancing BST directly which keeps root as the median (in odd case)?
I mean ,it seems like you are trying to do the exactly same thing indirectly
Maintaining the balanced bst is O(n) no? Since you may have to shift all nodes.
@@atd233 why would it take average o(n) time ?, its like calling the heapify function .
So,its technically the same thing
@@RS-vu4nn you only heapify once. In fact you don't even really need to call it since you start with an empty array. You don't heapify afterwards.
When adding an item to the heap, it takes logn. Popping is O(1).
With the balanced BST, you have to re balance on both add and pop.
@@atd233 Yeah you are technically right, in case of balanced tree that would be the case. I think user-defined balanced tree would be the right approach but that would be too much of hassle ,that's why people generally use two heaps instead.
Thanks for helping with clarification.
@@atd233 Popping from a heap is also O(lg n) because the heap "nodes" have to be repositioned to maintain the min or max heap property. Peeking at the top of a heap is O(1).
Cant we do it in set in which insert in sorted order and will be less painful than using two heaps. max insert complexity is still log(n) and median will also be O(1)
Sets don't insert in sorted order...
@@thewatcherlollol Are you sure ?
Check difference between
and in C++. if a certain language does not have an ordered set then it does not mean it does not exist. You can always make one if a language's std library does not have it.
set in cpp is implemented using a balanced search tree, so yes, that would work. But sets in most languages are not ordered.
Speed will be a little be faster if we check item is large or equal to self.large[0], then decide which heap to add
import heapq
class MedianFinder:
def __init__(self):
self.small = [] # maxheap
self.large = [] # minheap
heapq.heapify(self.small)
heapq.heapify(self.large)
def addNum(self, num: int) -> None:
if (self.large and (self.large[0]) 1:
temp = -1 * heapq.heappop(self.small)
heapq.heappush(self.large, temp)
if len(self.small) - len(self.large) < -1:
temp = -1 * heapq.heappop(self.large)
heapq.heappush(self.small, temp)
def findMedian(self) -> float:
if len(self.small) > len(self.large):
return -1 * self.small[0]
if len(self.large) > len(self.small):
return self.large[0]
return (-1 * self.small[0] + self.large[0])/2
isn't heappop() an O(logn) operation, since the heap has to be rearranged later, instead of O(1) ?
We have to peek not pop
Another solution: Implement a BST. When you invoke addNum() insert into the BST (which is lgN) operation (assuming it is balanced).
Then findMedian is similar to implementKthSmallest element in BST.
const medianUsingBST = () => {
let root = undefined;
let size = 0;
// Ensures sort during insertion
const insertIntoBST = (num) => {
if (!root) {
root = new TreeNode(num);
size++;
return;
}
const helper = (curr, parent) => {
if (!curr) {
if (num < parent.val) {
parent.left = new TreeNode(num);
} else {
parent.right = new TreeNode(num);
}
size++;
return;
}
if (num > curr.val) {
helper(curr.right, curr);
} else {
helper(curr.left, curr);
}
}
helper(root, undefined);
}
const findKthSmallestElementInBst = (k) => {
let counter = 0;
let result = undefined;
const inOrder = (curr) => {
inOrder(curr.left);
counter++;
if (counter === k) {
result = curr.val;
}
inOrder(curr.right);
}
inOrder(root);
}
const addNum = (num) => {
insertIntoBST(num);
}
const findMedian = () => {
if (size % 2 === 1) {
return findKthSmallestElementInBst(Math.ceil(size / 2));
}
return (findKthSmallestElementInBst(size / 2) + findKthSmallestElementInBst(size / 2) + 1) / 2;
}
return { addNum, findMedian}
}
I think the find is O(n) here though? Because you're going through half the nodes to get the median. Which is linear. With heap it's constant. And insertion will be O(logn) for both.
You can store the iterator pointing to the middle and ++ or -- it.
The heaps solutions is easier to implement though
Great... Although you code in python, which is above my head, thanks for the intuition though;
Thank you so much🥰🥰🥰
anyone can tell me that , at 23:16, is there any case where the input will enter into line no 24? if so what could be that sample input?
Go to 12:51, it will answer your query.
i like the way you think ,its impressive
How to come up that we require 2 priority queue one of min and one of max pls give intuition also
What is time complexity for this solution?
Can we use library function of heaps in interview?
yes
ask during the interview
Hello sir could you please explain why sometimes it's self.small but sometimes it's just small?? I'm really confused, thank you!
He fixes it in the end for all of them to have the self.
Came up with this solution without watching the coded solution -
class MedianFinder:
def __init__(self):
self.maxHeap = [] # lower half of the sorted array
self.minHeap = [] # upper half of the sorted array
def addNum(self, num: int) -> None:
heappush(self.maxHeap, -num)
# if all elements in maxHeap are not smaller than all elements in meanHeap
if self.minHeap and self.maxHeap and (-self.maxHeap[0]) > self.minHeap[0]:
heappush(self.minHeap, -heappop(self.maxHeap))
# check if difference in size of min heap and max heap is within allowable limit
sizeDiff = len(self.maxHeap) - len(self.minHeap)
if sizeDiff>1: # maxHeap has 1+ extra elements
heappush(self.minHeap, -heappop(self.maxHeap))
elif sizeDiff < 0: # minHeap has extra elements
heappush(self.maxHeap, -heappop(self.minHeap))
def findMedian(self) -> float:
if (len(self.minHeap) + len(self.maxHeap))%2==0: # even size
return (self.minHeap[0] - self.maxHeap[0])/2
else:
return -self.maxHeap[0]
so maximum time complexity for add operation will be : 3 log N , correct ?
Definitely the best videos. Los mejores vídeos sin duda.
the brute force in the video code:
class MedianFinder:
def __init__(self):
self.data = list()
def addNum(self, num: int) -> None:
#sorting by shifting
self.data.append(num)
index=0
for i in range(len(self.data)):
if self.data[i]>num:
index=i
break
temp=num
while index float:
half = len(self.data) // 2
if not len(self.data) % 2:
return (self.data[half - 1] + self.data[half]) / 2.0
return float(self.data[half])
cant u use an order statistic tree for this?
Great video
Does anyone know how to answer the follow-up questions?
1. If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
I think we can replace 2 Heaps with 2 TreeMaps. The key is the number, value is the count. We need to keep the sum of values of 2 TreeMaps and update each time we balance size of the 2 TreeMaps.
Because the we have 100 keys => time is O(1) and space is also O(1)
2. If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
I think 2 TreeMaps still work without changing in this case. But time and space complexity is the same as the original problem.
What do you think?
I think you can do this with a simple array. Store frequency of each number. In the getMedian() function, loop through the array and find the index where commulative frequency is equal to (number_of_elements/2).
In case only 99% numbers are between 1-100, the median will definitely fall between 1-100. In such a case, store count of numbers lesser than 1 and greater than 100 and find the element at index (total_no_of_elements/2).
Two things I'd like to point out:
1) the above methods are for odd number of elements. Even no of elements just require a simple check.
2) I am horrible at this so I can't be certain. I tried but if you think I am wrong, I'd be happy to hear.
Your explaining is good as f
Here's a cheat solution with fast performance.
from sortedcontainers import SortedList
class MedianFinder:
def __init__(self):
self.list_num = SortedList()
def addNum(self, num: int) -> None:
self.list_num.add(num)
def findMedian(self) -> float:
n = len(self.list_num)
if n % 2:
return self.list_num[n//2]
return (self.list_num[n//2] + self.list_num[n//2-1])/2
How can we create a heap in Js.
Instead of a heap can't we maintain a simple list. For every insert we add the element in the sorted order using binary search.
Yes you can do that. But using the list insertion (list.insert()) operation has the time complexity of O(n). So that'll again result in a O(log n) + O(n) ≈ O(n) operation for add( ).
class MedianFinder:
def __init__(self):
self.small = [] # the smaller half of the list, max heap (invert min-heap)
self.large = [] # the larger half of the list, min heap
def addNum(self, num):
if len(self.small) == len(self.large):
heappush(self.large, -heappushpop(self.small, -num))
else:
heappush(self.small, -heappushpop(self.large, num))
def findMedian(self):
if len(self.small) == len(self.large):
return float(self.large[0] - self.small[0]) / 2.0
else:
return float(self.large[0])
You are amazing!!!!!
this is amazing