As a side note, consider donating directly to the creators if they have a donation link, because RUclips takes a whopping 30% of your donation. In this case, Neetcode accepts Patreon donations, which takes a more reasonable commission of about 8%.
@@hamdi_even 8% is too high tbh. Not for neetcode specifically, he sells his own courses and is set. But for someone else who might be in need of the money 8% is ridiculous
I have never practiced DSA in my life, not even in college. After getting laid off, I stumbled across your videos to learn DSA. They are so crisp, informative, and to the point. I can't thank you enough.
i used your previous video on groupAnagrams to solve this, just hashmapped the array in to a defaultdict(int) den sorted the dictionary entirely in a descending order. Your videos have been really helpful, first time i solved a medium all by myself
That's so cool that python has a convenient way to sort hashmap by value. I looked into it for java, and it is a nightmare. I would have to create my own comparator. I think if I'm doing that, I may as well just learn bucket sort at this point.
@@trenvert123 In Go I faced the same problem, spent 20 minutes trying to sort a hash map by values (failed). So I just copied values into new array and sorted them there lol.
@@Albert-nc1rj you can also insert the contents of your hashMap into a second hashMap, but use the values from frequency hashMap as the keys of the second hashMap. Once you have done that, take all the keys out into an array, sort the array and retrieve first k values, use first hashMap to get integers and return that.
Am I the only one who is a little confused as to how this solution is O(N). If you loop through the array which is the size of the array, and then in each index you might have to loop through up to N times. So how is this not o(n^2) Edit: Nevermind, I think I realize it now, I figured I would write it out for anyone who might still be confused. As we traverse through the array, we go through the whole array. So this is O(n). But we aren't doing an operation n times at each stop. We are doing N more operation throughout the entire array. So even though the for loops are nested, we are doing N more operations throughout a for loop which is N, so the total is just N+N, which simplifies to O(N)
I would say it's n+k rather than n+n, because the size of the res array is k. So after looping through the freq array of size n, you only need to fill the res array k times then stop, so k more operations. Still it's O(n)
@@ahmedmansour5032 but at worest case you will face frequency = 1 for each element in nums .. and O(N) is always calcualted in worest case, I have made a commend on video please go through it , you will understand what I am saying
for the heap solution, it's better to use a min heap of size k rather than using a max heap and then removing max k times. Using the min heap, you would remove min and add the next frequency. by the end, you are left with k most frequent ones and removing the min gives you the answer. You can reduce this to n log k and not n log n
@@PippyPappyPatterson so let's suppose k = 3 and you have numbers [1 , 2 , 3, 4, 5]. You can find the k-largest or in this case 3rd largest using a min-heap of size 3. As you add in numbers, your heap can grow like this: [ 1 ] [ 1 , 2 ] [ 1, 2 , 3 ] ** you're capped at 3 *** [2, 3, 4] ** add next( 4 ) and remove min (1)** [3, 4, 5] ** add 5, remove 2 ** Now the head of the heap will be the 3rd largest element.
@@PippyPappyPatterson here class Solution: def topKFrequent(self, nums: List[int], k: int) -> List[int]: num_to_count = collections.defaultdict(int) for num in nums: num_to_count[num] += 1 min_heap = [] for num in num_to_count: if len(min_heap) < k: heapq.heappush(min_heap, (num_to_count[num], num)) else: heapq.heappushpop(min_heap, (num_to_count[num], num)) res = [] while min_heap: _, val = heapq.heappop(min_heap) res.append(val) return res
Amazing contents! The best algorithms channel that focus on logic and thinking in a clear way. Happy to have found this channel, been writing neetcode ever since.
The algorithm that you explained at 3:15 was counting sort and not bucket sort. What you did, however, towards the end was similar (not same as) bucket sort.
I'd like to suggest a minor improvement: for n in nums: count[n] = 1 + count.get(n, 0) if count.get(n, 0) > max_val: max_val = count.get(n, 0) freq = [[] for i in range(max_val + 1)] The max_val variable is used to track the maximum frequency instead of using the length of nums. This can potentially save some space if the maximum frequency is significantly less than the length of nums. Addition of max_val adds a small const time overhead
the one thing I dont like about usage of heap questions is that most of the times you havge to default to some library to do it cause I doubt any of us could code up a heap in a phone screen.
People say you are supposed to learn enough to be able to figure out leetcode problems, as opposed to memorizing leetcode. Are we seriously supposed to have been figured this method out? This was so specific...
this is how i did it, i basically converted the list in a dict, sorted it using values and the took out the k most frequent values but i really appreciate your videos. class Solution: from collections import Counter def topKFrequent(self, nums: List[int], k: int) -> List[int]: value = [] my_dict = dict(Counter(nums)) my_dict2 = dict(sorted(my_dict.items(), key=lambda item: item[1], reverse = True)) mlis = list(my_dict2.keys()) for i in range(k): value.append(mlis[i]) return value
I came up with this solution originally but really appreciated the thoughtful description of the linear solution! result = defaultdict(int) for num in nums: result[num] += 1 result = dict(sorted(result.items(), key=lambda item: item[1])) return list(result.keys())[-k:]
One thing to note in python. Please do not intialize like this ( if you are ) bucket = [[]] * (len(nums) + 1) this references to the same list and your output will be wrong. Use the bucket = [[] for _ in range(len(nums)+ 1)] , as mentioned in the video.
We can optimize it more by storing the maxFrequency while creating the HashMap (which has the integer and their corresponding frequency). Then, the next iteration to get the required elements can start from this maxFrequency instead of N.
yeah, the example he showed is actually the worst case and it may be O(n²) but we can't say n² cause we don't find n elements at every index, this is an exceptional and not like general nested loop
Also, you can do res += freq[i] in line 13. The problem description mentions the solution will be unique. So, we know that all the elements added if will either match k or will be lesser. So, no need to run a loop.
@@kestiv2429 The question guarantees that there is a unique solution. Hence every time we extend the result array, at some point it will(has to be) be equal to k. If it were not guaranteed, then you would be right.
Thanks for the video! I came up with the same solution except I assumed each element is "repeated unique number of times" from the problem statement - "It is guaranteed that the answer is unique.". So instead of looping over each lists, I just considered the first element.
is it just me or after struggling on a particular part for a while.... then it hits!!!!! best feeling ever. NeetCode, I am following along every problem and have the confidence that I'll get my dream tech position. thank you, its the same feeling I had when I found out about khan academy in high school
@@michaelchen9275 If we restrict the heap to be of size k (since we only care about k most frequent), at worst case we'll end up popping n elements. i.e. O(n log k)
This is perfect, thank you bro, took me a while to understand this problem, gonna need to redo it without looking at the solution in a couple days. Thank you, liked and commented again to support
I got a similar question on my onsite interview with amazon (not the same question but same concept). I did not know bucket sort so I used the sorting method. The interviewer said there was a way of getting a linear time complexity and I did not know what to do .
i didn't see any body came up with this solution in the discussion on leetcode , all of the solutions were use heap or may be some of them use quick select ; so i was afraid that i analysis my algorithm wrong but after watching you i know that i was right about my solution .
Awesome video! One small thing, shouldn't it be O(n log k) instead of O(k log n) for the heap solution since there's n elements, heap of size k means log k time to heapify and n calls so n * log k?
The final array that you create may have a lot of holes. Say there are 1000 elements, but consisting of only 1's and 2's, then the list will be full of holes. It can be further optimized by keeping a 'max_freq' variable which will provide an upper bound on the size of the array. This max_freq can be updated while creating the hash-map.
or may be this may not be the most efficient way to do this for all kinds of inputs like the one you said and unsorted numbers. It doesn't give the top frequent one's. Dictionary with maxheap is the one that can handle all possible inputs.
I'm thinking you can just invert the frequencyMap to {frequency: list of values with that frequency} and then sort the keys in that inverted map. This sorting would be O(sqrt(n) log(sqrt(n))) (which is < O(n)) because there cannot be more than O(sqrt(n)) different frequencies (if each value has a different frequency, then n = O(c * (c+1) /2), with c being the number of distinct frequencies). Then it's just a matter of iterating over the reverse sorted keys and adding values to a resultArray until that array reaches length k lookupDict = defaultdict(int) for n in nums: lookupDict[n] += 1 inverseDict = defaultdict(list) for key, v in lookupDict.items(): inverseDict[v].append(key) sortedKeys = sorted(inverseDict.keys(), reverse = True) sortedKeysIndex = 0 res = [] while k > 0: values = inverseDict[sortedKeys[sortedKeysIndex]] if len(values) > k: res.extend(values[:k]) else: res.extend(values) k -= len(values) sortedKeysIndex += 1 return res
So basically the map of frequency to values can actually be of size O(n*n)? The first 'n' is due to worst case frequency being 1 to n and the second 'n' is because at most 'n' numbers can have the same frequency of 1. I kinda understand how its not really O(n*n) and a bit less than that as the duplicates are being consumed into a single key, but its still more than O(n) right? Another observation: The only possible values of frequency that have a value attached to them are the subsets that add up to N. In the given example, 100 occurs once, 2 occurs twice and 1 occurs thrice, so sum of frequencies is actually 1+2+3=6 which is N in this case. So the worst case space complexity is actually O(n*length of longest subset of frequencies that sum to N). Dont really know how I ended up with a Subset Sum problem just because i couldnt justify the O(n) space complexity lol. Any clarifications are appreciated :D
My current status: Understands the Questions, Have an Idea of how to start (not how to finish), Watches just your drawing explanation, Realize what is missing, Writes the whole solution.
What I don't quite understand is that when I implemented this solution, the speed and memory is not as efficient compared to my initial solution in ``` class Solution: def topKFrequent(self, nums: List[int], k: int) -> List[int]: count = {} for num in nums: count[num] = 1 + count.get(num, 0) res = sorted(count, key = count.get, reverse = True) return res[:k] ``` Even though using sorted() here causing it to be n log n... Can someone explain why this one is appearing to be much quicker than the solution in the video?
I love how your videos are always so damn clear :) ! Thanks alot Btw, here is a weird(?) quicksort version that beats 93%: def topKFrequent(self, nums: List[int], k: int) -> List[int]: count=list(Counter(nums).items()) def quick(l,r): pivot,p = count[r][1],l for i in range(l,r): if pivot>=count[i][1]: count[i], count[p] = count[p], count[i] p+=1 count[r], count[p] = count[p], count[r] if p>len(count)-k: return quick(l,p-1) if p=ind]
questions: lets assume we have array [1,2,2,3,3,4,4,4] and k = 2. Then for first most frequent we will have 4 but what about the 2nd most frequent element? cuz count of both 2,3 is 2 so how do we decide which value to insert into our resulting list??? Please help ps: "The test cases are generated such that the answer is always unique." is this statement accounting for that edge case?
That's was a very interesting way to solve a specific real life problems, by counting and working with hash and array as a way to identify K most frequent elements. This can be useful for small to big business handling inventory or when you need to pack-up your stuff to travel.
Amazing content as usual!! But I still don't understand, why is it O(n) even if there are 2 nested for loops? Thanks for creating these helpful videos!
The inner "for" loop on line 13 generally won't be iterating through an O(n) list, but instead just the number of input numbers that occurred a particular number of times.
I think, in worst case, it will be O(n^2). Let's consider the case, nums = [1,2,3,4,5,6] and k =6, then on line 13 the inner loop freq[1] = [1,2,3,4,5,6] will iterate "n" times. But in Avg. case it is O(n).
@@sravankumar4195 I believe you are correct, in the worst case this would be O(n^2) for the situation you described. You have to iterate over the whole frequency list(size n) and then when you get to the solution set(in your case index 1) you have to iterate over n elements once more. I had the same thought
I used a dictionary and pair of while loops in python. Basically, while the nums list > 0, I would .pop the 0th number off the list and check to see if the number was in the dict as a key. If it was, add one to the value for that key. If not add it to dict as a key with a value of one. When the nums list was exhausted, another while loop for k > 0. Use the python max function to give a variable equal to the key with the highest value. Add that key to the results list, set the key value to 0, and -1 on k. I'm a novice though so this may be suboptimal?
Got this question for google, what to do then if input is streaming (like a log)? guess we keep updating the count (histogram), and rebuild the freq array everytime?
More simpler way from collections import Counter def dx(nums: list, k: int): x = Counter(nums).most_common()[:k] ls = [] for i,j in x: ls.append(i) return ls
I did the solution with priority queue and hashmap and it seemed to have better time complexity and space efficiency than using bucketsort. I feel like this is tricky because, when we are solutions for problems, we start analyzing which data structure we are going to use, its time complexity, etc. based on how those data structures are regularly implemented. The thing is, algorithms and built in functions in languages have improved drastically that they take less time than what theoretically they should take. It's tricky but are those are things that we should consider as well?
Heap solution (faster than 95% of solutions on leetcode): dic = {} for i in nums: if i not in dic.keys(): dic[i] = 1 else: dic[i] += 1 heaped = [] for key in dic: heaped.append([-dic[key], key]) heapq.heapify(heaped) res = [] for i in range(k): freq, x = heapq.heappop(heaped) res.append(x) return res
I think your solution is O(n2). because in last part you performed n operations inside n. for i in range(len(freq) -1, 0, -1): ==== O(n) for n in freq[i]: ==== O(n)
For the first heap solution, you can do a total complexity of n Log(k). Instead of heapify the whole dictionary ,you can heap push and heap pop when you go through the dictionary. When len(heap) > k, heap pop.
In this case, a heap sort might be more optimal. Even though your code is O(n), it's making a ton of allocations, which may make it practically slower in some cases.
Actually there is a small improvement on this solution. Given the following observation: The length of the hashmap is the number of distinct elements in the array, all the rest are repetition of some or all of the elements of the array. So if we suppose that all elements are repeated only once except one elements is repeated enough to fill the array size, we can calculate the largest frequency achievable as follow: maxFreq = Len(arr) - Len(hashmap) + 1 that means the size of the frequency array is just : maxFreq, instead of len(nums)+1 freq = [[] for in in range(MaxFreq)] That was my 50cents contribution ;)
Well what mistake I did is - I watched other python tutorials What good thing happen is - I am watching your videos But you are geniuses you explain well but I have to watch two times every video to understand correctly. Thank you
First of all thank you very much @neetCode for this amazing explanation. Anyone has suggestions or improvements on the below solution?? class Solution { topKFrequent(nums, k) { let countObj = {}; const resArr = []; let ctr = nums.length; for(const i of nums){ countObj[i] = (countObj[i] || 0) + 1; } while(ctr > 0){ for(const key in countObj){ if(countObj[key] == ctr && resArr.length < k){ resSet.push(key); } } if(resArr.length == k){ break; } ctr--; } return resArr; } }
Firstly, thank you for being so amazing with your videos! Actually I was wondering in the for loop on line 12, since inside that loop we keep checking for the length of res, isn't that increases the time complexity from the linear expectation?
i had actually thought of the 2nd array implementation you said with N array size.. but i didnt think on how I would extract the top K as you did by going backwards! you're a genius!
Nope, if you are using a maxHeap it will be k log n. Because for that you need to heapify the whole thing first so n elements would be in the heap. Then you would pop 'k' times from the heap of size 'n'. So O(k log n). This is Valid if you use a maxHeap!!! If you are using a minHeap then you would be correct, then the heap would have a max size of k as you said. And you would loop n times and push then poll when size reaches k.
Man I could not do it myself because I didn't understand problem clearly, after 50 seconds of the video I understand and did it with ease, thanks a lot. Now I am going to watch rest of the video to learn optimal solution)
Got the same question in one of the phone screens. I came up with map and heap approach. The interviewer asked for an optimization for memory. And, I could not think of any other solution. It costed me next round unfortunately. Only if I notice your solution!!!! :-( Again! Great explanation and "neat" solution!!
If we were to use the max heap approach, how would we apply heapfiy if heapify only takes in a list and not a dictionary? Thanks for the amazing video!
Does this become linear time because, we iterate the list of nums (n), then the map of counts (n), and eventually the frequency list (n). Which ends up as O(n + n +n) = O(3n) = O(n) ???
The simplest method is: def topKFrequent(self, nums: List[int], k: int) -> List[int]: cache = {} for c in nums: if c not in cache: cache[c] = 1 else: cache[c] = cache[c] + 1 cache = dict(sorted(cache.items(), key=lambda i:i[1], reverse=True)) return list(cache.keys())[:k] O(nlogn)
I believe the length of the frequency/bucket array could be lower than len(nums) + 1 because of the rule that the # of distinct elements >= k. This tells us that there will be at least k distinct elems and since each one must have at least one frequency, no single number could occupy all len(nums) spots (unless k is 1). Therefore I think it could be further optimized (albeit minimally lol) to: freq = [ [ ] for i in range( (len(nums) + 1) - (k - 1) ) ]
is log(n) better solution than nlog(n) and log(n)?.... can you create a video on how to find big o and which one is better than which? i know there are many videos in youtube, but yours will be the best
class Solution: def topKFrequent(self, nums: List[int], k: int) -> List[int]: _map = {} res = [] for num in nums: _map[num] = 1 + _map.get(num,0) heap = [(val,key) for key,val in _map.items()] for val,key in (sorted(heap,reverse=True)[0:k]): res.append(key) return res
Also, in the first approach, its mentioned that if we use a max heap the total time will be O(k log n) which seems incorrect , as first to add all items to the max heap we have to traverse through all items which in worst case could be O(n) and adding takes O(log n) so in total it becomes O(n log n). If we use min heap and only hold max k items then it will give O(N log k)
I guess this is what makes the question medium? I just did a quick hash count of numbers and then Object.entries()'d it out and sorted by second index, then pulled the first k values out of that. So O(n log m). I wonder if this is weak in an interview. Offering bucket sort solution (or sometimes other niche solutions) just feels like I musta done the same question or a nearly identical question to pull it quickly out of an interview though... I'm surprised Neet code starts with two mediums whose solution feel a bit hyper specific. Is it intentional?
Hey! Could I receive a bit of clarification please? Why is your approach at 3:10 not O(n)? I thought it would be as would first need to find the max value in the given array (nums), then create our bucket array with that number as the upper limit? But since max( ) takes O(n) time - I'm quite confused. Thanks in advance!
Can we not use a heap/priority queue instead of using an arrya? Wouldn't that automatically keep the most frequent elements at the top if the sorting was done according to the count?
I used a dict to maintain the num->occurance pair for all items. Barely got the AC but the sorting part threw me off...had to use a lambda function to sort the dict by value. The optimal solution is a lot more subtle. Great content, very informative and well explained.
Hey everyone, I have one question. Why is this solution considered O(n) where n is the size of the nums array? Why are we not looking at the worst case complexity for lines 11-16 at 12:52 If all items are unique Line 12: i will iterate over the range (0 , n) //Size of freq is n+1 Line 13: if all items inside nums are unique, every item will be present at the first index This will lead to looping through the entire length of the array n => this gives us O(n^2)
or alternatively sort the count table by values, and then return first k elements : # create hash_table where num in nums is ke # increment value for each key repeatation table = dict() for n in nums: if n in table: table[n] += 1 else: table[n] = 1 # sort by values table = sorted(table.items(), key=lambda item: item[1], reverse=True) # create a list answer = list() # store to answers for item in table[:k]: answer.append(item[0]) return answer
I don't understand why its considered bucket sort. I get that each indicie acts as a bucket for # of occurrences, but the actual sorting part never takes place as far as I can tell
I appreciate the time you put making and sharing all your content for free. Here is the $10 I might have spent on your udemy course.
Thank you so much!!!
Does he have udemy course???
@@onlysubscriptions2152 nah, just a hypothetical $10 he would’ve spent since most people pay wall this content, but he does it for free.
As a side note, consider donating directly to the creators if they have a donation link, because RUclips takes a whopping 30% of your donation.
In this case, Neetcode accepts Patreon donations, which takes a more reasonable commission of about 8%.
@@hamdi_even 8% is too high tbh. Not for neetcode specifically, he sells his own courses and is set. But for someone else who might be in need of the money 8% is ridiculous
I have never practiced DSA in my life, not even in college. After getting laid off, I stumbled across your videos to learn DSA. They are so crisp, informative, and to the point. I can't thank you enough.
Hi you got any job?
@@e889. not yet.
Update?
Hey guys, yes I did. Wouldn't have been possible without Neetcode.
@@rhitamdutta1996 Nice
I love you man. You're an actual angel. Your explanations are always so clear. And your drawings are so easy to understand.
Thanks, appreciate the kind words 🙂
@@NeetCode you are n angel. :)
Best youtube channel for leetcode problems hands down.
i used your previous video on groupAnagrams to solve this, just hashmapped the array in to a defaultdict(int) den sorted the dictionary entirely in a descending order. Your videos have been really helpful, first time i solved a medium all by myself
That's so cool that python has a convenient way to sort hashmap by value. I looked into it for java, and it is a nightmare. I would have to create my own comparator. I think if I'm doing that, I may as well just learn bucket sort at this point.
haha, did the same, cheers
@@trenvert123 Java is sooooo verbose......
@@trenvert123 In Go I faced the same problem, spent 20 minutes trying to sort a hash map by values (failed). So I just copied values into new array and sorted them there lol.
@@Albert-nc1rj you can also insert the contents of your hashMap into a second hashMap, but use the values from frequency hashMap as the keys of the second hashMap.
Once you have done that, take all the keys out into an array, sort the array and retrieve first k values, use first hashMap to get integers and return that.
Am I the only one who is a little confused as to how this solution is O(N). If you loop through the array which is the size of the array, and then in each index you might have to loop through up to N times. So how is this not o(n^2)
Edit: Nevermind, I think I realize it now, I figured I would write it out for anyone who might still be confused. As we traverse through the array, we go through the whole array. So this is O(n). But we aren't doing an operation n times at each stop. We are doing N more operation throughout the entire array. So even though the for loops are nested, we are doing N more operations throughout a for loop which is N, so the total is just N+N, which simplifies to O(N)
Thanks a lot buddy! I was scratching my head off to find out this same doubt.
Now that I saw your comment, I was able to understand it.
Thanks again!!
So essentially the inner loop is just operating on the subset of N elements?
I would say it's n+k rather than n+n, because the size of the res array is k. So after looping through the freq array of size n, you only need to fill the res array k times then stop, so k more operations. Still it's O(n)
@@quanmai5759 Yah I also think so it will be n+k
@@ahmedmansour5032 but at worest case you will face frequency = 1 for each element in nums .. and O(N) is always calcualted in worest case, I have made a commend on video please go through it , you will understand what I am saying
for the heap solution, it's better to use a min heap of size k rather than using a max heap and then removing max k times. Using the min heap, you would remove min and add the next frequency. by the end, you are left with k most frequent ones and removing the min gives you the answer. You can reduce this to n log k and not n log n
and even for values like k = 1e9, logk is around 30 so the complexity is around O(30*n) which is basically O(n)
@@gurmukhsinghnirman4935 indeed!
How would you cap the size of the heap `h` at size `k`?
As you're adding frequencies, `if len(h) > k: heapq.heappop(h)`?
@@PippyPappyPatterson so let's suppose k = 3 and you have numbers [1 , 2 , 3, 4, 5]. You can find the k-largest or in this case 3rd largest using a min-heap of size 3. As you add in numbers, your heap can grow like this:
[ 1 ]
[ 1 , 2 ]
[ 1, 2 , 3 ] ** you're capped at 3 ***
[2, 3, 4] ** add next( 4 ) and remove min (1)**
[3, 4, 5] ** add 5, remove 2 **
Now the head of the heap will be the 3rd largest element.
@@PippyPappyPatterson here
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
num_to_count = collections.defaultdict(int)
for num in nums:
num_to_count[num] += 1
min_heap = []
for num in num_to_count:
if len(min_heap) < k:
heapq.heappush(min_heap, (num_to_count[num], num))
else:
heapq.heappushpop(min_heap, (num_to_count[num], num))
res = []
while min_heap:
_, val = heapq.heappop(min_heap)
res.append(val)
return res
Amazing contents! The best algorithms channel that focus on logic and thinking in a clear way. Happy to have found this channel, been writing neetcode ever since.
The algorithm that you explained at 3:15 was counting sort and not bucket sort. What you did, however, towards the end was similar (not same as) bucket sort.
is it different?
@@namoan1216 no hes wrong
I'd like to suggest a minor improvement:
for n in nums:
count[n] = 1 + count.get(n, 0)
if count.get(n, 0) > max_val:
max_val = count.get(n, 0)
freq = [[] for i in range(max_val + 1)]
The max_val variable is used to track the maximum frequency instead of using the length of nums.
This can potentially save some space if the maximum frequency is significantly less than the length of nums.
Addition of max_val adds a small const time overhead
the one thing I dont like about usage of heap questions is that most of the times you havge to default to some library to do it cause I doubt any of us could code up a heap in a phone screen.
People say you are supposed to learn enough to be able to figure out leetcode problems, as opposed to memorizing leetcode. Are we seriously supposed to have been figured this method out? This was so specific...
When you encounter a similar problem next time, you will think , well, i already saw it somewhere. I can do this.
@@edd4851 Really? How many leetcode problems are solved this way?
@@netanelkaye3014 1😅
its best not to think about it lol
@@netanelkaye3014none of
this is how i did it, i basically converted the list in a dict, sorted it using values and the took out the k most frequent values but i really appreciate your videos.
class Solution:
from collections import Counter
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
value = []
my_dict = dict(Counter(nums))
my_dict2 = dict(sorted(my_dict.items(), key=lambda item: item[1], reverse = True))
mlis = list(my_dict2.keys())
for i in range(k):
value.append(mlis[i])
return value
While counting, you can keep track of the max occurrences and then you only need to initialize freq to that max instead of len(nums)
Good one ...I spent a lot of time understanding this but finally got it..🤗🤗
Sure but it's still O(N).
thx
count = {}
maxFreq = 0 # or 1
for each in nums:
count[each] = count.get(each, 0) + 1
maxFreq = max(maxFreq, count[each])
freq = [set() for i in range(maxFreq + 1)]
I came up with this solution originally but really appreciated the thoughtful description of the linear solution!
result = defaultdict(int)
for num in nums:
result[num] += 1
result = dict(sorted(result.items(), key=lambda item: item[1]))
return list(result.keys())[-k:]
One thing to note in python. Please do not intialize like this ( if you are )
bucket = [[]] * (len(nums) + 1)
this references to the same list and your output will be wrong.
Use the bucket = [[] for _ in range(len(nums)+ 1)] , as mentioned in the video.
Your video got a me a job as an SDE at AWS!!
Congratulations 🎉🎉
Hey John, which neetcode 150 questions did they ask? I have a phone interview coming up
We can optimize it more by storing the maxFrequency while creating the HashMap (which has the integer and their corresponding frequency). Then, the next iteration to get the required elements can start from this maxFrequency instead of N.
I am bit confused, at 12:25 and lines 12,13 wouldn't it become a N^2 solution if all the elements are distinct
yeah, the example he showed is actually the worst case and it may be O(n²) but we can't say n² cause we don't find n elements at every index, this is an exceptional and not like general nested loop
@@ParryHotter73 ok got it. thanks buddy
Also, you can do res += freq[i] in line 13. The problem description mentions the solution will be unique. So, we know that all the elements added if will either match k or will be lesser. So, no need to run a loop.
for the return function, an alternate way is to use the extend() method in python:
res = [ ]
ptr = len(frequency)-1
while len(res)
Result size will be wrong if len(frequency[-1]) > k I think
@@kestiv2429 The question guarantees that there is a unique solution. Hence every time we extend the result array, at some point it will(has to be) be equal to k. If it were not guaranteed, then you would be right.
You can also use count = Counter(list)
This will count in itself. No need for a loop. Another day thanking Guido van Rossum for making my life easier.
Your explanation is like art! Thank you!
Thanks for the video! I came up with the same solution except I assumed each element is "repeated unique number of times" from the problem statement - "It is guaranteed that the answer is unique.". So instead of looping over each lists, I just considered the first element.
is it just me or after struggling on a particular part for a while.... then it hits!!!!! best feeling ever. NeetCode, I am following along every problem and have the confidence that I'll get my dream tech position. thank you, its the same feeling I had when I found out about khan academy in high school
we're both gonna get the dream job homie!
I think the runtime of using heap is O(n log k), we need O(n) to construct the heap and remove an item cost O(log k) ?
Yes, this is even what the leetcode official answers says
Isn't it O(log n) to remove from a heap with n elements? And we do that k times, so that makes O(k log n).
@@michaelchen9275 If we restrict the heap to be of size k (since we only care about k most frequent), at worst case we'll end up popping n elements. i.e. O(n log k)
This is perfect, thank you bro, took me a while to understand this problem, gonna need to redo it without looking at the solution in a couple days.
Thank you, liked and commented again to support
I got a similar question on my onsite interview with amazon (not the same question but same concept). I did not know bucket sort so I used the sorting method. The interviewer said there was a way of getting a linear time complexity and I did not know what to do .
Holy this is so much clearer than the quick select one.... Thank you so much
But not constant space complexity
i didn't see any body came up with this solution in the discussion on leetcode , all of the solutions were use heap or may be some of them use quick select ;
so i was afraid that i analysis my algorithm wrong but after watching you i know that i was right about my solution .
Infact the top solution in the discuss is using bucket sort itself.
Awesome video! One small thing, shouldn't it be O(n log k) instead of O(k log n) for the heap solution since there's n elements, heap of size k means log k time to heapify and n calls so n * log k?
Yes, it isn’t O(n) like he said
The final array that you create may have a lot of holes. Say there are 1000 elements, but consisting of only 1's and 2's, then the list will be full of holes. It can be further optimized by keeping a 'max_freq' variable which will provide an upper bound on the size of the array. This max_freq can be updated while creating the hash-map.
or may be this may not be the most efficient way to do this for all kinds of inputs like the one you said and unsorted numbers. It doesn't give the top frequent one's. Dictionary with maxheap is the one that can handle all possible inputs.
I'm thinking you can just invert the frequencyMap to {frequency: list of values with that frequency} and then sort the keys in that inverted map. This sorting would be O(sqrt(n) log(sqrt(n))) (which is < O(n)) because there cannot be more than O(sqrt(n)) different frequencies (if each value has a different frequency, then n = O(c * (c+1) /2), with c being the number of distinct frequencies).
Then it's just a matter of iterating over the reverse sorted keys and adding values to a resultArray until that array reaches length k
lookupDict = defaultdict(int)
for n in nums:
lookupDict[n] += 1
inverseDict = defaultdict(list)
for key, v in lookupDict.items():
inverseDict[v].append(key)
sortedKeys = sorted(inverseDict.keys(), reverse = True)
sortedKeysIndex = 0
res = []
while k > 0:
values = inverseDict[sortedKeys[sortedKeysIndex]]
if len(values) > k:
res.extend(values[:k])
else:
res.extend(values)
k -= len(values)
sortedKeysIndex += 1
return res
really love the approach taken, thank you
So basically the map of frequency to values can actually be of size O(n*n)? The first 'n' is due to worst case frequency being 1 to n and the second 'n' is because at most 'n' numbers can have the same frequency of 1. I kinda understand how its not really O(n*n) and a bit less than that as the duplicates are being consumed into a single key, but its still more than O(n) right? Another observation: The only possible values of frequency that have a value attached to them are the subsets that add up to N. In the given example, 100 occurs once, 2 occurs twice and 1 occurs thrice, so sum of frequencies is actually 1+2+3=6 which is N in this case. So the worst case space complexity is actually O(n*length of longest subset of frequencies that sum to N). Dont really know how I ended up with a Subset Sum problem just because i couldnt justify the O(n) space complexity lol. Any clarifications are appreciated :D
How come the solution code doesn't work when we instantiate our freq array like this:
freq = [[]] * (len(nums) + 1)
My current status: Understands the Questions, Have an Idea of how to start (not how to finish), Watches just your drawing explanation, Realize what is missing, Writes the whole solution.
that's great man. Keep going
keep grinding. You will get there
I got my first job after following your neetcode 150, 2 years ago. now after the layoff i am here again learning the dsa.
What I don't quite understand is that when I implemented this solution, the speed and memory is not as efficient compared to my initial solution in
```
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
count = {}
for num in nums:
count[num] = 1 + count.get(num, 0)
res = sorted(count, key = count.get, reverse = True)
return res[:k]
```
Even though using sorted() here causing it to be n log n... Can someone explain why this one is appearing to be much quicker than the solution in the video?
sorted() uses quick sort which has O(n log n) time complexity in the python interpreter.
I love how your videos are always so damn clear :) ! Thanks alot
Btw, here is a weird(?) quicksort version that beats 93%:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
count=list(Counter(nums).items())
def quick(l,r):
pivot,p = count[r][1],l
for i in range(l,r):
if pivot>=count[i][1]:
count[i], count[p] = count[p], count[i]
p+=1
count[r], count[p] = count[p], count[r]
if p>len(count)-k:
return quick(l,p-1)
if p=ind]
Nice!
Mine beats 98.62 time and 90 on space and is two lines long :)
@@heathergray4880 would you share your code for learning please?
questions: lets assume we have array [1,2,2,3,3,4,4,4] and k = 2. Then for first most frequent we will have 4 but what about the 2nd most frequent element? cuz count of both 2,3 is 2 so how do we decide which value to insert into our resulting list???
Please help
ps: "The test cases are generated such that the answer is always unique." is this statement accounting for that edge case?
That's was a very interesting way to solve a specific real life problems, by counting and working with hash and array as a way to identify K most frequent elements. This can be useful for small to big business handling inventory or when you need to pack-up your stuff to travel.
Amazing content as usual!! But I still don't understand, why is it O(n) even if there are 2 nested for loops? Thanks for creating these helpful videos!
The inner "for" loop on line 13 generally won't be iterating through an O(n) list, but instead just the number of input numbers that occurred a particular number of times.
I think, in worst case, it will be O(n^2). Let's consider the case, nums = [1,2,3,4,5,6] and k =6, then on line 13 the inner loop freq[1] = [1,2,3,4,5,6] will iterate "n" times. But in Avg. case it is O(n).
@@sravankumar4195 nope its o(n) only.
@@sravankumar4195 I believe you are correct, in the worst case this would be O(n^2) for the situation you described. You have to iterate over the whole frequency list(size n) and then when you get to the solution set(in your case index 1) you have to iterate over n elements once more. I had the same thought
@@Logan-mj3wx What about, if instead of a for loop, you simply did res += freq[i]. Is that operation faster O(1)?
Nice algorithm but it looks like it will fail when the nums list is [-1,-1] which makes c as -1 throwing index out of range exception.
I used a dictionary and pair of while loops in python. Basically, while the nums list > 0, I would .pop the 0th number off the list and check to see if the number was in the dict as a key. If it was, add one to the value for that key. If not add it to dict as a key with a value of one. When the nums list was exhausted, another while loop for k > 0. Use the python max function to give a variable equal to the key with the highest value. Add that key to the results list, set the key value to 0, and -1 on k. I'm a novice though so this may be suboptimal?
Got this question for google, what to do then if input is streaming (like a log)? guess we keep updating the count (histogram), and rebuild the freq array everytime?
More simpler way
from collections import Counter
def dx(nums: list, k: int):
x = Counter(nums).most_common()[:k]
ls = []
for i,j in x:
ls.append(i)
return ls
We can do a little space optimization by having max(counts) size for bucket instead of nums.length
Good point!
I did the solution with priority queue and hashmap and it seemed to have better time complexity and space efficiency than using bucketsort. I feel like this is tricky because, when we are solutions for problems, we start analyzing which data structure we are going to use, its time complexity, etc. based on how those data structures are regularly implemented. The thing is, algorithms and built in functions in languages have improved drastically that they take less time than what theoretically they should take.
It's tricky but are those are things that we should consider as well?
Heap solution (faster than 95% of solutions on leetcode):
dic = {}
for i in nums:
if i not in dic.keys():
dic[i] = 1
else:
dic[i] += 1
heaped = []
for key in dic:
heaped.append([-dic[key], key])
heapq.heapify(heaped)
res = []
for i in range(k):
freq, x = heapq.heappop(heaped)
res.append(x)
return res
I think your solution is O(n2). because in last part you performed n operations inside n.
for i in range(len(freq) -1, 0, -1): ==== O(n)
for n in freq[i]: ==== O(n)
For the first heap solution, you can do a total complexity of n Log(k). Instead of heapify the whole dictionary ,you can heap push and heap pop when you go through the dictionary. When len(heap) > k, heap pop.
nlogk generally bigger than klogn
k=1200 and n=1500 nlogk ≈1500×3.0792≈4618.8
klogn=1200×log1500 ≈ 3811.2
I was wondering where you were going with the bucket. This is so clever !!! Brilliant !!
nick white, kevin, neetcode best guys to get the perfect explanation...
In this case, a heap sort might be more optimal. Even though your code is O(n), it's making a ton of allocations, which may make it practically slower in some cases.
11:42 you could also use `for i in reversed(freq):` and modify the code as needed
Actually there is a small improvement on this solution.
Given the following observation:
The length of the hashmap is the number of distinct elements in the array, all the rest are repetition of some or all of the elements of the array.
So if we suppose that all elements are repeated only once except one elements is repeated enough to fill the array size, we can calculate the largest frequency achievable as follow:
maxFreq = Len(arr) - Len(hashmap) + 1
that means the size of the frequency array is just : maxFreq, instead of len(nums)+1
freq = [[] for in in range(MaxFreq)]
That was my 50cents contribution ;)
Well what mistake I did is - I watched other python tutorials
What good thing happen is - I am watching your videos
But you are geniuses you explain well but I have to watch two times every video to understand correctly.
Thank you
First of all thank you very much @neetCode for this amazing explanation. Anyone has suggestions or improvements on the below solution??
class Solution {
topKFrequent(nums, k) {
let countObj = {};
const resArr = [];
let ctr = nums.length;
for(const i of nums){
countObj[i] = (countObj[i] || 0) + 1;
}
while(ctr > 0){
for(const key in countObj){
if(countObj[key] == ctr && resArr.length < k){
resSet.push(key);
}
}
if(resArr.length == k){
break;
}
ctr--;
}
return resArr;
}
}
there is a mistake at the end, heap solution is O(N logK ) not O(K logN)!!! otherwise this is a very good explanation, thanks!
Firstly, thank you for being so amazing with your videos!
Actually I was wondering in the for loop on line 12, since inside that loop we keep checking for the length of res, isn't that increases the time complexity from the linear expectation?
nooo actually the inner runs only for n times . soo thats n(outer loop) + n(inner loop) = 2n which is O(n).
i had actually thought of the 2nd array implementation you said with N array size.. but i didnt think on how I would extract the top K as you did by going backwards! you're a genius!
For the heap the time complexity should be n.Log(k) since the max size of the heap can only be k and n is the number of elements
Nope, if you are using a maxHeap it will be k log n. Because for that you need to heapify the whole thing first so n elements would be in the heap.
Then you would pop 'k' times from the heap of size 'n'. So O(k log n).
This is Valid if you use a maxHeap!!!
If you are using a minHeap then you would be correct, then the heap would have a max size of k as you said. And you would loop n times and push then poll when size reaches k.
My man! I've spent HOURS watching you. FYI you can do the counter with collections, and save few lines of implementation
How is that?? Can you pls share
@@yossarian2909 from collections import Counter, then Counter(nums) gives the frequency dict
@@hamzasayyid8152 continue the code
I was solving 692, and I got AC, I remembered this video came back to this. Thanks for such detailed videos.
Man I could not do it myself because I didn't understand problem clearly, after 50 seconds of the video I understand and did it with ease, thanks a lot. Now I am going to watch rest of the video to learn optimal solution)
Watching this after attempting on NeetCode is soooo good!
Man you light up my leetcode
Got the same question in one of the phone screens. I came up with map and heap approach. The interviewer asked for an optimization for memory. And, I could not think of any other solution. It costed me next round unfortunately. Only if I notice your solution!!!! :-(
Again! Great explanation and "neat" solution!!
This solution is also not memory optimised. Using "min" heap instead of max heap with size k would be memory optimised.
@@harigovind11 Wouldn't it still be O(N) since you still need to add them to a map to store the frequencies?
@@harigovind11 "Using "min" heap instead of max heap with size k would be memory optimised. " can you explain how ?
You don't need bucket sort for linear time. You can just store the number, freq. pairs and run quickselect.
If we were to use the max heap approach, how would we apply heapfiy if heapify only takes in a list and not a dictionary? Thanks for the amazing video!
Does this become linear time because, we iterate the list of nums (n), then the map of counts (n), and eventually the frequency list (n). Which ends up as O(n + n +n) = O(3n) = O(n) ???
Why would we initialise freq with (length of array + 1) empty lists instead of just len(array)
It should be `for i in range(len(freq)-1, -1, -1):` instead of `for i in range(len(nums)-1, 0, -1):` right?
you dont really need the to loop the line 13, just get the length of freq[i] and return if the res length is equal or greater than k
man this was the video , great great explanation
The simplest method is:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
cache = {}
for c in nums:
if c not in cache:
cache[c] = 1
else:
cache[c] = cache[c] + 1
cache = dict(sorted(cache.items(), key=lambda i:i[1], reverse=True))
return list(cache.keys())[:k]
O(nlogn)
I believe the length of the frequency/bucket array could be lower than len(nums) + 1 because of the rule that the # of distinct elements >= k. This tells us that there will be at least k distinct elems and since each one must have at least one frequency, no single number could occupy all len(nums) spots (unless k is 1). Therefore I think it could be further optimized (albeit minimally lol) to:
freq = [ [ ] for i in range( (len(nums) + 1) - (k - 1) ) ]
Could you explain how the for loop works within the array, I didn't understand that part?
Freq=[[] for i in range(Len(nums)+1)]
how does it work for negative numbers ? For eg : arr [ -1, -1] and k = 1
is log(n) better solution than nlog(n) and log(n)?.... can you create a video on how to find big o and which one is better than which? i know there are many videos in youtube, but yours will be the best
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
_map = {}
res = []
for num in nums:
_map[num] = 1 + _map.get(num,0)
heap = [(val,key) for key,val in _map.items()]
for val,key in (sorted(heap,reverse=True)[0:k]):
res.append(key)
return res
Also, in the first approach, its mentioned that if we use a max heap the total time will be O(k log n) which seems incorrect , as first to add all items to the max heap we have to traverse through all items which in worst case could be O(n) and adding takes O(log n) so in total it becomes O(n log n).
If we use min heap and only hold max k items then it will give O(N log k)
Time complexity for building heap is actually O(n) and not O(n log n). O(n log n) is not the tightest bound.
without knowing what is bucket sort I was only able to come with the hashmap sorting solution thanks mate for the o(n) soln.. great explanation
I guess this is what makes the question medium? I just did a quick hash count of numbers and then Object.entries()'d it out and sorted by second index, then pulled the first k values out of that. So O(n log m). I wonder if this is weak in an interview. Offering bucket sort solution (or sometimes other niche solutions) just feels like I musta done the same question or a nearly identical question to pull it quickly out of an interview though... I'm surprised Neet code starts with two mediums whose solution feel a bit hyper specific. Is it intentional?
Is this solution also works when theres no limit on the values of the initial array? Or it assumes that the values in the array are bounded?
Hey! Could I receive a bit of clarification please? Why is your approach at 3:10 not O(n)? I thought it would be as would first need to find the max value in the given array (nums), then create our bucket array with that number as the upper limit? But since max( ) takes O(n) time - I'm quite confused. Thanks in advance!
fantastic explanation! the only issue I think is the first for loop should be for i in range(len(freq) - 1, -1, -1)
Not needed because that last spot is for values who appear 0 times in the array, so it’d be an emtpy list either way
Trust me bro, you are amazing explaining things. Thanks a lot for such content. Pls keep posting..
using the two loops in the last actually increases the time complexity.
This won't work for negative values. The question you are referring to from Leetcode also has negative values as inputs in the array
Can we not use a heap/priority queue instead of using an arrya? Wouldn't that automatically keep the most frequent elements at the top if the sorting was done according to the count?
Made it simple but efficient as always!
I used a dict to maintain the num->occurance pair for all items. Barely got the AC but the sorting part threw me off...had to use a lambda function to sort the dict by value.
The optimal solution is a lot more subtle.
Great content, very informative and well explained.
i did the exact same thing lmao
The explanation was so amazing that I understood how to solve half way through the video!
Note that for the last loop using reversed(freq) is faster and more readable
Hey everyone, I have one question.
Why is this solution considered O(n) where n is the size of the nums array? Why are we not looking at the worst case complexity for lines 11-16 at 12:52
If all items are unique
Line 12: i will iterate over the range (0 , n) //Size of freq is n+1
Line 13: if all items inside nums are unique, every item will be present at the first index
This will lead to looping through the entire length of the array n
=> this gives us O(n^2)
or alternatively sort the count table by values, and then return first k elements :
# create hash_table where num in nums is ke
# increment value for each key repeatation
table = dict()
for n in nums:
if n in table:
table[n] += 1
else:
table[n] = 1
# sort by values
table = sorted(table.items(), key=lambda item: item[1], reverse=True)
# create a list
answer = list()
# store to answers
for item in table[:k]:
answer.append(item[0])
return answer
ur solution is much faster for me, thank u
Best explanation this helped me in solving Top K Frequent Elements and Sort Characters By Frequency
Thank you man! Blessed to have you!
The solution and thought process is genius! Can't come up with this optimal solution by myself, thanks a lot.
I don't understand why its considered bucket sort. I get that each indicie acts as a bucket for # of occurrences, but the actual sorting part never takes place as far as I can tell