For those who don't know what is defaultdict(list). It means "Defining a dictionary with values as a list ". Also it will give you error so dont forget to use this in your code. from collections import defaultdict
If you don't want to use defaultdict() from collections, you may do this: def groupAnagrams(strs): res = {} for s in strs: count = [0] * 26 # a .... z for c in s: count[ord(c) - ord("a")] += 1 key = tuple(count) if key in res: res[key].append(s) else: res[key] = [s] return list(res.values())
I went with a O(m*n*log(n)) solution. This one seems super advanced to come up with on the spot, but I would have never thought of sorting words and key them in that way. Both solutions are super smart.
Same here, I got the O(m*n*log(n)) solution first fairly easily then failed the tests which require better time complexity. From when I last was grinding interview prep about 6 years ago I think I would've got the better solution, but right not I feel the same - no chance to think of on the spot in an interview. However, past experience tells me that it'll come back with practice. 10mo on from your comment, how you feeling?
@@gradientO Probably because there is a lot of overhead in the solution above. Initialising dictionary, converting list to tuples etc. Also, the MNLogN solution works. If you see in Leetcode the length of string is pretty less. It is
Reason why O(M * N * logN) is better in this case: It's wild but in this case O(M * N * logN) is better than O(M * N * 26). Because in order for logN to be 26, you need N to be 10^26 which is practically impossible. That's why I consider logN to be constant because it literally is (in the programming context).
Just did this problem with O(m . nlog(n)) yay, just something to note: - The time complexity between sorting and using a map for character frequency is very similar (if not equal) depending on the average length of the strings. The problem in LeetCode specifies that the string lengths vary from 0 to 100, making it so the sorting time isn't as impactful on O. (Tested multiple times both solutions runtime as well) - The algorithm will take more space on the solution of frequency map than the sorting solution if the strings do not have much characters (depending on the sorting algorithm you use as well) as it will take O(n) space complexity for the algorithm to sort the string. This is because the strings may be way less longer than 26 characters, which is the size of the frequency map, so it's a thing to take into consideration for sure. So, in other words, if you see that the string length is specified as having the possibility of being very large then the frequency map is the way to go, otherwise the sorting algorithm is the best solution. Pretty sure you won't read this Neet but I just had a meeting with you and Pooja Dutt, both of you are awesome.
I was so confused by how the O(m.n) solution was behind the O(m.n.logn) solution in time complexity on leetcode. Your explanation helped me understand it now with the smaller size of each individual string.
For anyone wanting to write exact same solution in Javascript, here it is: var groupAnagrams = function (strs) { let res = {}; //mapping charCount to list of anagrams i.e "1e,1a,1t": ["eat", "tea", "ate"] for(let s of strs) { //To count each char: let count = new Array(26).fill(0); //for a....z //We'll index a to 0 till z to idx 25 for(let c of s) { count[c.charCodeAt(0) - "a".charCodeAt(0)] += 1; } let commaSeparatedCount = count.join(","); if(commaSeparatedCount in res) { res[commaSeparatedCount].push(s) } else { res[commaSeparatedCount] = [s]; } //console.log("res: ", res); //console.log("Object.values(res): ", Object.values(res)) } return Object.values(res); } Basically, I don't know Python, so I typed out line by line Neet's python solution to understand the output of each of the lines to reproduce this solution in Javascript. Thank you Neetcode!
@@sahildhawan22 But in worst case, we have unique words. And since our object maps our array of unique letters to a unique word, shouldnt the space be O(n), where n is length of our input array?
Wow! Very different than my soultion. In my head I went about it by grabbing the word at strs[0], check to see if every char in that word exitists in any of the other words, and if so append all matching words to a group. After all possible words have been found we remove them from the strs list, so the element at strs[0] is now a new word that is not any of the anagrams of the orginal strs[0].
ooh that make sense. So did you find that checking the length of the string should also be there - Because `eat` can be in `eaten` but they are not anagrams. ?
first, thank you @NeetCode for such great videos so I am doing these problems on the Jupyter Notebook. Here are the minor changes to get the code to work: 1. add "from collections import defaultdict" 2. lowercase the word "List" def groupAnagrams(self, strs: list[str]) -> list[list[str]]: 3. change "return res.values()" to "list(res.values)"
Thanks. Great work! Here is a solution without using defualtdict . I also used pretty print here to see the result . import pprint strs = ["eat" ,"tea","tan","ate","nat","bat"] result = {} for i in strs: key = [0] * 26 for c in i: key[ord(c) - ord('a')] += 1 if tuple(key) in result: result[tuple(key)].append(i) else: result[tuple(key)] = [i] pprint.pprint(result)
This one was really difficult for me for some reason.. I need to come back and solve this again without looking at the solution. Thanks man, liked and comment for support.
According to the question, each string has only 100 chars => log(100) < 7 => O(mnlogn) < (7mn). Therefore, the first solution is much faster than the second solution, which is O(26mn). O(26mn) is better than O(m nlogn) only when the average string length is 67.108.864 chars.
If max string length is very large and more than 2^26, here is my solution to combine both solutions: class Solution { fun groupAnagrams(strs: Array): List { // TO(m(26a + blogb)) / SO(mn) // m: strs.length // a: average length of elements which has 2^26 chars or more // b: average length of elements which has less than 2^26 chars // n: average length of all elements
val hm = mutableMapOf()
val switcherLength = 1 shl 26 // 26 could be adapted according to the # of accepted chars for (str in strs) { // O(m) val key = if (str.length < switcherLength) { // replace joinToString("") with concatToString() for better performance if using Kotlin 1.4 or newer str.toCharArray().apply { sort() }.joinToString("") // O(blog(b)) } else { val counts = mutableMapOf() for (char in str) { // O(26a) counts[char] = 1 + (counts[char] ?: 0) } counts }
the average length of a string would have to be > 1.0e26. then neetcode's solution would be better. log(n) = 26 , n would need to be massive.. Neetcode is overlooking somethings here.
On leetcode they are about the same for python3 with the sorting solution being a little bit better on the memory. sorting version - 104ms, 17.9mb counts array version - 104ms, 19.8mb.
If you're like me and doing this cause you couldn't figure it out on your own, I usually take notes on my code in the comments so I understand the logic when I come back to it and to solidify it in my head. hope this helps!
Thank you @NeetCode for all the great videos! I just want to point out that the optimal approach is great for a large n but in case of this particular problem because n is limited to max of 100 (0
Also sorting based variant is more universal, because not limited by using only 26 characters. Adding uppercased letters won't double time complexity for example.
if you want the LeetCode's exact output (if order mattered) then the following code does it: from collections import defaultdict class Solution: def groupAnagrams(self, strs: list[str]) -> list[list[str]]: res = defaultdict(list) strs.sort() for s in strs: count = [0] * 26 # a ... z for c in s: count[ord(c) - ord("a")] += 1 res[tuple(count)].append(s) return sorted(list(res.values()),key=len)
Bro. I've been an engineer for like five years now, and you are so much better than me at this. I've decided to become really good. I want to be able to do this shit. I'm not going to burn myself out, but I'm going to train neetcode 150 until i can do all 150 problems without looking up solutions. Might take me a whole year or more, but i dont care. I'm going to be able to solve these. I know for a fact it will make me a better problem solver and engineer.
Great video! I want to point out that the group anagrams problem limits the string length to size 100 so the m*n*logn solution will be faster for all cases as the worst case would be for string size 100. m*100*log(100) -> m*100*2 < m*100*26.
Thanks for the great point. Mind helping me? I don't get why 26 is 'technically' part of the complexity. Sure count has a length of 26, but inserting it into the dict is O(1), isn't it?
Hi @@shan504 , The best way I can explain is when inserting into a dictionary the dictionary has to do a look up to check if the key already exists in the dictionary. The look up portion of the insert is why we have 26 as part of our time complexity. Normally the key to a dictionary would be a primitive type (int, byte, short, long, float, double, boolean, and char) and during these cases a look up would take O(1). But as shown in the video the solution is using a list as it's key instead of a primitive type. Now when the dictionary does a look up it has to check each integer within the list(26 of them as opposed to 1) to figure out what the key needs to be. I hope this helps clear things up.
@@shonsanchez6403 Hi Shon. Thank you! I was thinking that was the case, though I think in general, at least for an interview, I would just say the look up is essentially O(1). Thanks again :)
@@shonsanchez6403 the runtime is not O(26mn) but O(m(n + 26)). The O(26) comes before and after counting the characters, not for each character, so technically: 200m > 126m but this is incomplete. Sorting takes O(n log n) = 200 + convert the string to a tuple in O(n) = 100 + hash the tuple in O(n) = 100 = 400 The hash map approach takes: create an empty array of size 26 in O(26) + count each character in O(n) = 100 + convert the array to a tuple in O(26) + hash the tuple in O(26) = 178 400m > 178m so the hash map approach does less operations. All of this in the worst case.
@@TCErnesto but you can convert this hashmap into sorted string directly and vise versa. Hashmap can't be less then n log n this way.. If its faster this way we would have sorted using hashmap and converted it back. All O(N)"savings" here are from not counting operations using hashmap..
Here's a slightly shorter version of the same thing : class Solution: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: res = defaultdict(list) for s in strs: res[frozenset(Counter(s).items())].append(s) return res.values()
class Solution: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: sorted_map = defaultdict(list) for item in strs: sorted_map[tuple(sorted(item))].append(item) return sorted_map.values() I thought of this way
Maintain a dictionary and a list, dict pairings (sorted_string: length of list), check if sorted string is already in dict if not just append it to list in list[list[]] type, if found take the length of list from dict as it is the position of sorted string and append it to the list inside the parent list, fastest method so far.
I was able to come up with a solution before looking it up :) not optimal but I was glad I was able to solve it. Thanks in part to your videos. def groupAnag(list): result = {} for item in list: itemResult = [] for letter in item: itemResult.append(letter) itemResult.sort() itemResult = ''.join(itemResult) if itemResult in result: result[itemResult].append(item) else: result[itemResult] = [item] my_list = [i for i in result.values()] return my_list
For practical scenarios, I found that it's faster to sort each word. I actually got a better result on leetcode by sorting the words and then checking if they're in a dictionary.
First of all, thanks for your work. I coded first and second solutions in python3, solution 1 (using sorted strings) seems to perform better in speed use and in memory use on leetcode test cases.
Just did this and you're right. According to leetcode, the ultimate solution of this video beats 27.88% solutions based on runtime and 5.36% solutions based on memory whereas the a sorted solution beats 87.87% solutions based on runtime and 84.54% of solutions based on memory. It varies but the second solution is never lower than in performance or space complexity (edit: just ran the non-sorted code again and it beat the an iteration of the sorted-code. I just read a reddit comment that says LC's % beat is basically a random number generator and I think that might be accurate). I'm still new at this but I suspect leetcode isn't very good at reporting how well a code actually performs.
@@kushwanthaddepalli5236 here you are anagrams = defaultdict(list) for word in strs: sorted_word = tuple(sorted(word)) anagrams[sorted_word].append(word) return list(anagrams.values())
I went with a different approach where I just check whether sorted(string) is in the dict, if not then just store an array against that sorted string in the dict
dic = defaultdict(list) for s in strs: dic[tuple(sorted(s))].append(s) return dic.values() # this is one way of using built in function but what anna told is applies for all langs !
This solution looks better def group_anagrams(words): # Step 1: Create an empty dictionary anagram_groups = {} # Step 2: Iterate through each word for word in words: # Step 3: Sort the letters of the word sorted_word = ''.join(sorted(word)) # Step 4: Check if sorted_word is in the dictionary if sorted_word in anagram_groups: # If it is, append the word to the list of anagrams anagram_groups[sorted_word].append(word) else: # If not, create a new key with sorted_word and set its value as a list with the word anagram_groups[sorted_word] = [word] # Step 5: Return the values of the dictionary return list(anagram_groups.values()) # Test the function print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
As a developer with 5+ years of experience, I usually work a lot with comparing asc characters that's why it makes sense for the companies to throw this to the interview...Precious coding interviews
Instead of counting characters at all the 26 indices and checking for similar counts , I used sorted() with O(nlogn) time and went directly to appending.
I think you have to return list(res.values()) for all the test cases. Cause output is expected to be in a list format but res.values() return dict_values
res = defaultdict(list) creates a defaultdict object named 'res' where each key will have a default value of an empty list. This is particularly useful when you want to append items to lists within a dictionary without having to initialize the list manually for each new key.
more efficient time space way would be to sort each str ele in strs and put the original form in a hashmap with hashmap values as lists of similar anagrams (ie map = { 'aet':['tea','ate'], 'abt':['bat','tab'], ... } then you just output the hashmaps values
he mentioned this in the begining for the runtime within leetcode it definitely is faster but for over all big o notation has it as (m*nlogn) so if you have massive strings hundreds of thousands of characters his solution is much faster
Set up a dictionary and output list. Iterate thru the input strings. For each input string, create a separate sorted string. If the sorted string is a key in the dictionary, append the input string to a list in the key's value If the sorted string is not a key in the dictionary, set up a new key value pair where the key is the sorted string, and the value is a list holding just the input string. Iterate thru the values of the the dictionary, append them to the output list. Return this output list.
Maybe we can get rid of mapping strings with integer list and use sorted keys. Something like below worked for me: ans = defaultdict(list) for str in strs: sorted_str = sorted(str) ans[tuple(sorted_str)].append(str)
I have made a hashmap where I have put the sorted version of the string. For every element in the list, I check it it’s sorted version exists in the hashmap or not, it it exists, then I append it to the list to the respective sorted string, otherwise create a new one. Once done, I create a list of list of the values present in the dictionary. I have given my implementation in python below. Thank you for your explanation. :) :) class Solution: def stringSorting(self, inp: str) -> str: inp = sorted(inp.lower()) return "".join(inp) def groupAnagrams(self, strs: List[str]) -> List[List[str]]: pairs = {self.stringSorting(strs[0]):[strs[0]]} result = [] for i in range(1, len(strs)): tmp = self.stringSorting(strs[i]) if tmp in pairs: pairs[tmp].append(strs[i]) else: pairs[tmp] = [strs[i]] return [val for key, val in pairs.items()]
Here's a more intuitive key-value pair that I used. KEY --> sorted word (just call sorted() on a word in strs) VALUE --> same as Neet's soln, the list of anagrams (not sorted) like in the video, you'll need to convert the keys to tuples
Even if the complexity of insert the "count" into the dictionary is o(n), isn't it still be O(m * n)? as the inserting happens after the inner loop. I do not get the O( m *n*26) part. I thought it is O(m * (n + 26))
The reason why it’s O(m*n*26) is because of the lookup time for a dictionary. Because the key is no longer a primitive (int, char, float etc) but instead an array of characters the look up time is no longer 0(1) but O(size of the array I.e 26) because the map now has to compare each element of the key to check for its equality.
@@TCErnesto inserting into a hash map is not always O(1) that is only the case with a primitive key (ints, bool, char) but not with keys that are complex data structures like arrays and objects because of the hashing algorithm which will now take more than one element into account to create a hash key. In addition the dictionary has to compare the key with input to prevent a hashing collision and overriding existing data.
@@shonsanchez6403 yeah hashing an array will take O(x) where x is the size of the array. In this case hashing the tuple will take O(26), but this operation takes place after counting the characters which takes O(n). The tuple is not being hashed every time a character is counted, that would be the time you mention, O(26n). But first the chars are counted, and then only after that, the tuple is hashed, therefore the runtime is O(n + 26).
@@TCErnesto I believe you’re right about the complexity. Must have been thinking of the algorithm different in my head 😅 but looking at the actual code cleared things up. Good discussion.
Wow, wouldn't have think of ASCII solution, I think sorted one is a bit simpler to understand and come up with and also for this particular LeetCode tests performance looks the same, but maybe for very long strings your proposed solution would pay back. Thanks for sharing!
If we sort each string, the time complexity would be m*n*log(n). In the presented solution, it is m*n*26, which is represented by m*n. However, according to the constraints on leetcode, n
I did this in O(n) time by creating my own hash function where elements should be same but their order does not matter (still failing some testcases probably due to collision) but it was so much tiring and this simple solution is so much better.
@@gopalakrishnan9610 I dont remember now exactly, but I used the ascii codes of alphabets with their position in word as a multiplier, still dont remember the whole hash function,
My solution, which doesn't use defaultdict, and uses squares to avoid collisions: dict = {} for str in strs: hash = 0 for c in list(str): hash += ord(c)**2 dict.setdefault(hash, []).append(str) return dict.values()
Much simpler and probably faster: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: d = defaultdict(list) for k in strs: key = "".join(sorted(k)) d[key].append(k) return list(d.values()) For this solution, Leetcode reported run time of 80ms, beating 99.84%, and uses 17.1MB of memory, beating 84.2%.
Ken looks like your right. However, sort() inside a forloop might not be an efficient solution when it comes to scalability. What if we have strings that are very long or a list of 1 million words with long strings... sorting it self in worst case scenario is n Log n. putting that inside of a for loop makes it n * n log n or O(n^2 log n)! That would not be an ideal implementation
@@geld5220 Good point. In a real life situation, we'd know if the strings were very long, and in that case, I suppose the video's method would be faster. However in that case, there is the overhead of converting a list to a tuple which takes O(n) time for each string.
@@kenhaley4 sorted(k) gives a sorted list so your code also has the overhead of joining the list into a string, which takes O(n) time for each string. I agree with your other points though. I used tuple(sorted(k)) for one of my solutions. In practice your code works well because the strings are short on LeetCode (max 100 chars).
You automatically assumed that sorting the alphabets of each string is O(nlog(n)). This only applies for a comparison based sort. However, you can sort using counting sort which will end up taking O(26) + O(n) time (or O(n)) plus O(n) additional space. If we serialize the output of counting sort as character followed by count, instead of constructing the whole sorted string, we will end up with exactly the same key as you got. A hashmap can then be used as usual to finish the problem.
# simpler 4 line solution with O(mn) time res = defaultdict(list) for word in strs: res[frozenset(Counter(word))].append(word) return list(res.values())
This is with O(m*n*logn) solution: def groupAnagrams(strs: list[str]): sorted_strs = [''.join(sorted(s)) for s in strs] res = {} for i, s in enumerate(sorted_strs): if s in res: res[s].append(strs[i]) else: res[s] = [strs[i]] return list(res.values())
I've firstly solved this problem with binary search by answer. Because we know how to check if two strings are anagrams, so we can use is_anagram(a: str, b: str) -> bool function in our binary search for the input array. But the time complexity is bad for this solution.. :)
It is given that n can be max of 100, and log(100) is 2 making the O(n·m·log(n)) solution faster than O(n·m·26). Let me know if I am thinking correctly.
Well technically speaking, when we use log, it's a log base 2. Your solution of 2 only works if you're using the default log base of 10. Either way, we get 6 and change when using log base 2, which is faster than a string input of 26. This solution of 26 is ONLY better if we're using strings roughly larger than: 100,000,000 log_10(100) = 2 [Log base 10 of (100) equals 2) log_2(100) = 6.644 [log base 2 of (100) equals 6.644)
There's no need to subtract the ascii value of each character from A's ascii value. Just use the characters ascii directly as the key in the hashmap, so change *count = [0] * 256.*
I don't think I could ever have come up with the NC solution during an interview. Best I could do was using the hash() function: def groupAnagrams(strs: List[str]) -> List[List[str]]: hash_index_map = collections.defaultdict(list) for i, word in enumerate(strs): hash_value = 0 for char in word: hash_value += hash(str(ord(char))) hash_index_map[hash_value].append(strs[i]) return hash_index_map.values() This passes all the LC checks, though of course it isn't foolproof due to the possibility of distinct words adding up to the same hash value, so I doubt it would fly during an interview. IRL I'd probably just do: def group_anagrams(strings): res = defaultdict(list) for s in strings: res[tuple(sorted(s))].append(s) return res.values() And call it a day.
It's wild but in this case O(M * N * logN) is better than O(M * N * 26). Because in order for logN to be 26, you need N to be 10^26 which is practically impossible. That's why I consider logN to be constant because it literally is (in the programming context).
You can sort a string s in o(|s|) time and o(1) time complexity, so I went with first solution: def groupAnagrams(self, strs): """ :type strs: List[str] :rtype: List[List[str]] """ classDict = {}
for string in strs: #cannoinze letters = list(string) letters.sort() key = join(letters,",")
if key not in classDict: classDict[key] = [] classDict[key].append(string) return list(classDict.values())
[0] * 26 I guess this vector representing a..z number of a..z can be directly converted into sorted string.. (aabbbcccc => {0: 2,1: 3, 2: 4}) So its basically exactly the same == should be same nlogn if written in same language. If its faster then we discovered sort algorithm faster than n log n >.< Which is not likely to be the case.
I kinda hate the optimal solution. I was like _this_ close to coming up with this answer, except dictionaries aren't hashable, and I couldn't figure out how to compare letters on the spot other than spending an hour making a letters dictionary ({a:0, b:1} etc). I feel like most people wouldn't come up with ascii on the spot in an interview. This question relies entirely on already knowing the "trick" so to speak
Isn't the overall time complexity for the sorting approach even longer than M*NlogN because you also have to do a string compare which is check N characters to N characters? Or is that factored into the M portion of the time?
if by string compare you mean converting the string to a hash map key, this is not true as the keys are hashed, there's a hash number associated to each string so the hash map doesn't compare strings
One thing you mentioned, which created a lot of confusion in the comments, is that it's not O(m * n * 26), it is O(m * (n + 26)) - you don't calculate the key for each character in a string, you calculate it once for each string. So sorting algorithm is better only for strings with average length lower than around 11.
here is a better solution : from typing import List class Solution: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: anagrams = {} for s in strs: key = "".join(sorted(s)) if key in anagrams: anagrams[key].append(s) else: anagrams[key] = [s] return list(anagrams.values())
My solution in typescript: function groupAnagrams(strs: string[]): string[][] { const sortedMap = new Map(); for (let i = 0; i < strs.length; i++) { // Sort each item const key = strs[i].split("").sort().join(""); // Add items as keys in the map. If one exists already concat it so it is in the right form [..., ...] sortedMap.set(key, (sortedMap.get(key) || []).concat(strs[i])); } return Array.from(sortedMap.values()); };
What is the space complexity of this solution? Will it be constant because we're using a single array of count 26 and we can reuse that array for every word? Or is it O(m*n) where m is the number of strings and n is the average length of the string?
If every word is unique you need to store a count of that word in the dict, so it would O(26 * n) or O(n) where n is the number of strings. In this case the average length of the word does not matter since we compress that info down to 26 ints. Note if we clone the words its O(n * m) though if you had very long words you would likely store pointers to the words rather than duplicate the words
Not sure how keeping the array as a key works. I just did an ansi sum of each character for the string as there could be a max of 100 characters in the string as per problem constraint, lowercase z being 122 * 100 = 12200 within the range of an int. so my map was int, vector still O(m*n) in time and O(n) in space in worst case.
I've found an even faster solution that doesn't require ascii math: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: dict = defaultdict(list) for str in strs: sortedstr = ''.join(sorted(str)) if sortedstr in dict: dict[sortedstr].append(str) else: dict[sortedstr] = [str] return dict.values() it's quite similar but essentially it sorts each str in strs. Then, if sortedstr is in dict, append to dict[sortedstr], otherwise append [str] to dict. return dict.values() hope this helps someone
Your code is great and it helped me, still I modified a bit like u can remove the if condition becaz append handles the if condition that u given, def groupAnagrams(self, strs: List[str]) -> List[List[str]]: res = defaultdict(list) for i in strs: sortstr = "".join(sorted(i)) res[sortstr].append(i) return res.values()
Hey Navdeep, just a small suggestion. In fact this could be a feature I think that would be useful on neetcode. Instead of giving a unsolved problem when you click shuffle, can you add a feature call revise which gives us a random solved problem so that we can re-do a solved problem?
at first I thought about putting each word in a list of sets as a set and comparing them. I'm still learning about sets and hashmaps etc so if anyone could tell me would converting each word into a set even work I'd appreciate it
Rate my solution. I'm a beginner. class Solution: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: hashMap = {} for word in strs: key = str(sorted(word)) if key not in hashMap.keys(): hashMap[key] = [word] else: hashMap[key] += [word] return list(hashMap.values())
I originally tried to solve this by creating a hash using the char codes of each char in a string and using the hash as the key to group the values. I had a ton of collisions as my hashing function was crappy. Was wondering if you'd revisit this and solve with hashing?
Interestinly, the sorting solution beats the non-sorting ones for some reason. Even I used the non-sorting solution and was wondering why my solution is not better than the ones who used sorting.
Constant multiple like 26 does not contribute to big O complexity but multiplying by log(n) does. Assume large test cases when thinking about time complexity
@@sp33r I think You underestimate how small log(n) is. 2^26 = 67 million. I feel confident that the case wouldn't be that large, and that's just for breaking even
Sorting time for the first method is log(n) This gives us a run time of O(m * nlogn) or O(m * n * logn) The second solution gives us a run time of O(m * n * 26) Meaning that our first solution is faster in cases where: logn < 26 We know from our Constraints that: 0 26 n > 67,108,864 Which even in the real world, i find it hard to imagine a string larger than 67 million.
Can you explain strs: List[str] in parameters ? I decided to divorce with Jva too but dont recognize this type of thing I couldnt search in the internet too.
I have a question for the M * n* log(n) solutions. I also came up with that, but one has to return the unsorted strings. So I am assuming that you also made a deep copy of strs to not fiddle with the order of characters?
Could we just use the Counter method for strings or does the ordering of how the hash map comes out individually for each string make that unusable here?
I really like NeetCode's solution, but it could be hard to come up with it on the spot. So, I've come up with the below solution. sort the each str and use it as key of the defaultdict. from collections import defaultdict class Solution: def groupAnagrams(self, strs: List[str]) -> List[List[str]]: res = defaultdict(list) for string in strs: sortedString = ''.join(sorted(string)) res[sortedString].append(string) return res.values()
This is pretty much what I did. It's better to at least get an idea you have implemented first that works whether it is brute force or not that you can then improve on. I don't think this solution is something you can think of off the top of your head on an interview like you said and actually according to the constraint of the problem, there is a limit on the max string size which is 100. So even if the first solution was O(m n log(n)) and the second one was O(26*m*n), log(100) is 6.6 so the former is actually faster in this scenario since there is a boundary on string size. If you were to remove the boundary, then you'd have to compare at what point log(n) outgrows 26, which would be when n = 2^26 or the string size is 67 million characters long. This sort of deep dive is probably better to talk with your interviewer about and discuss the logistics of your solution that comes pretty straightforward since in the real world, storing strings that are 67 million characters+ long within an array seems unrealistic IMO.
that vector is basically the same sorted string, they are idential but instead of aabbcc it will be {0:2, b:2, c:2}. Should be the same nlogn, if not then we can sort stuff this way and have better than native sort performance! Dictsort can be the name I guess (probably it will be something like bubble sort, but bubble is slow ;) ).
Amazing solution! I was here thinking after 30 minutes of brainstorming, "how's he going to approach this one". 😆 p.s If you don't want to use the imported package def groupAnagrams(self, strs: List[str]) -> List[List[str]]: store = {}
for word in strs: count = [0] * 26
for char in word: count[ord(char) - ord('a')] += 1
For those who don't know what is defaultdict(list). It means "Defining a dictionary with values as a list
". Also it will give you error so dont forget to use this in your code.
from collections import defaultdict
I was wondering what the "defaultdict(list)" did, thanks!
Thanks so much for the insight!
Also, we need to define the hashmap as res= collections.defaultdict(list)
@@ikthedarchowdhury8947 I think it was clear that importing defaultdict from collections did the job, no need for collections.defaultdict
Thank you!
If you don't want to use defaultdict() from collections, you may do this:
def groupAnagrams(strs):
res = {}
for s in strs:
count = [0] * 26 # a .... z
for c in s:
count[ord(c) - ord("a")] += 1
key = tuple(count)
if key in res:
res[key].append(s)
else:
res[key] = [s]
return list(res.values())
Also, you could use:
res[key] = res.get(key, []) + [s]
Instead of if/else
@@Mrorange016That’s exactly what I was thinking. I am glad someone thought of this too lol
I went with a O(m*n*log(n)) solution. This one seems super advanced to come up with on the spot, but I would have never thought of sorting words and key them in that way. Both solutions are super smart.
Same here, I got the O(m*n*log(n)) solution first fairly easily then failed the tests which require better time complexity. From when I last was grinding interview prep about 6 years ago I think I would've got the better solution, but right not I feel the same - no chance to think of on the spot in an interview. However, past experience tells me that it'll come back with practice. 10mo on from your comment, how you feeling?
@@falcomomo I get 6ms runtime for O(mnlogn) Java solution, but 22ms for O(mn). idk how
@@gradientO Probably because there is a lot of overhead in the solution above. Initialising dictionary, converting list to tuples etc.
Also, the MNLogN solution works. If you see in Leetcode the length of string is pretty less. It is
@@falcomomocan you tell me your strategy how you have done in m*nlogn
Reason why O(M * N * logN) is better in this case:
It's wild but in this case O(M * N * logN) is better than O(M * N * 26). Because in order for logN to be 26, you need N to be 10^26 which is practically impossible.
That's why I consider logN to be constant because it literally is (in the programming context).
Just did this problem with O(m . nlog(n)) yay, just something to note:
- The time complexity between sorting and using a map for character frequency is very similar (if not equal) depending on the average length of the strings. The problem in LeetCode specifies that the string lengths vary from 0 to 100, making it so the sorting time isn't as impactful on O. (Tested multiple times both solutions runtime as well)
- The algorithm will take more space on the solution of frequency map than the sorting solution if the strings do not have much characters (depending on the sorting algorithm you use as well) as it will take O(n) space complexity for the algorithm to sort the string. This is because the strings may be way less longer than 26 characters, which is the size of the frequency map, so it's a thing to take into consideration for sure.
So, in other words, if you see that the string length is specified as having the possibility of being very large then the frequency map is the way to go, otherwise the sorting algorithm is the best solution.
Pretty sure you won't read this Neet but I just had a meeting with you and Pooja Dutt, both of you are awesome.
I was so confused by how the O(m.n) solution was behind the O(m.n.logn) solution in time complexity on leetcode. Your explanation helped me understand it now with the smaller size of each individual string.
For anyone wanting to write exact same solution in Javascript, here it is:
var groupAnagrams = function (strs) {
let res = {}; //mapping charCount to list of anagrams i.e "1e,1a,1t": ["eat", "tea", "ate"]
for(let s of strs) {
//To count each char:
let count = new Array(26).fill(0); //for a....z
//We'll index a to 0 till z to idx 25
for(let c of s) {
count[c.charCodeAt(0) - "a".charCodeAt(0)] += 1;
}
let commaSeparatedCount = count.join(",");
if(commaSeparatedCount in res) {
res[commaSeparatedCount].push(s)
} else {
res[commaSeparatedCount] = [s];
}
//console.log("res: ", res);
//console.log("Object.values(res): ", Object.values(res))
}
return Object.values(res);
}
Basically, I don't know Python, so I typed out line by line Neet's python solution to understand the output of each of the lines to reproduce this solution in Javascript.
Thank you Neetcode!
do we know space complexity?
@@beyondlimits8159 It should be O(26) because in worst case we can have combination against every letter.
@@sahildhawan22 But in worst case, we have unique words. And since our object maps our array of unique letters to a unique word, shouldnt the space be O(n), where n is length of our input array?
the join is so smart!
thank you.
Always love the direct and no nonsense explanations! Great job!
Thanks! Much appreciated
@@NeetCode Why do you avoid using Counter()?
Wow! Very different than my soultion. In my head I went about it by grabbing the word at strs[0], check to see if every char in that word exitists in any of the other words, and if so append all matching words to a group. After all possible words have been found we remove them from the strs list, so the element at strs[0] is now a new word that is not any of the anagrams of the orginal strs[0].
is the complexity o(n)^2 ?
@@nicholas_as i think that'd be O(n^2 * k) where k is the length of the largest string
ooh that make sense. So did you find that checking the length of the string should also be there - Because `eat` can be in `eaten` but they are not anagrams. ?
i did this also but i got "time limit exceeded"
first, thank you @NeetCode for such great videos
so I am doing these problems on the Jupyter Notebook. Here are the minor changes to get the code to work:
1. add "from collections import defaultdict"
2. lowercase the word "List"
def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
3. change "return res.values()" to "list(res.values)"
for #2, you need to import typing.
from typing import List
Thanks. Great work!
Here is a solution without using defualtdict . I also used pretty print here to see the result .
import pprint
strs = ["eat" ,"tea","tan","ate","nat","bat"]
result = {}
for i in strs:
key = [0] * 26
for c in i:
key[ord(c) - ord('a')] += 1
if tuple(key) in result:
result[tuple(key)].append(i)
else:
result[tuple(key)] = [i]
pprint.pprint(result)
I like this more! Good job.
without importing the module you can use list(result.values())
you can also use result[tuple(key)=result.get(tuple[key],[]) + [i]
This one was really difficult for me for some reason.. I need to come back and solve this again without looking at the solution.
Thanks man, liked and comment for support.
it's been a year. Have u solved it?
Its been 6 months since @stefano_schmidt asked if you solved it. Have you solved it?
It's been 7 months since @stefano_schmidt asked if u solved it. Have u solved it?
It's been 8 months since @stefano_schmidt asked if you solved it. Have you solved it?
It's been 10 days since @mmkamron asked if you solved it. Have you solved it?
According to the question, each string has only 100 chars => log(100) < 7 => O(mnlogn) < (7mn). Therefore, the first solution is much faster than the second solution, which is O(26mn).
O(26mn) is better than O(m nlogn) only when the average string length is 67.108.864 chars.
If max string length is very large and more than 2^26, here is my solution to combine both solutions:
class Solution {
fun groupAnagrams(strs: Array): List {
// TO(m(26a + blogb)) / SO(mn)
// m: strs.length
// a: average length of elements which has 2^26 chars or more
// b: average length of elements which has less than 2^26 chars
// n: average length of all elements
val hm = mutableMapOf()
val switcherLength = 1 shl 26 // 26 could be adapted according to the # of accepted chars
for (str in strs) { // O(m)
val key = if (str.length < switcherLength) {
// replace joinToString("") with concatToString() for better performance if using Kotlin 1.4 or newer
str.toCharArray().apply { sort() }.joinToString("") // O(blog(b))
} else {
val counts = mutableMapOf()
for (char in str) { // O(26a)
counts[char] = 1 + (counts[char] ?: 0)
}
counts
}
hm[key] = (hm[key] ?: mutableListOf()) + str
}
return hm.values.toList()
}
}
@neetcode I think the O(m.n.logn) solution will always be optimum as log(n) will always be smaller than 26 in all of the cases
indeed lol
the average length of a string would have to be > 1.0e26. then neetcode's solution would be better. log(n) = 26 , n would need to be massive.. Neetcode is overlooking somethings here.
Yes, correct, as max string length is 100 under constraints section
On leetcode they are about the same for python3 with the sorting solution being a little bit better on the memory.
sorting version - 104ms, 17.9mb
counts array version - 104ms, 19.8mb.
@@qbertrtrtg 2^26, not 1.0e26, actually.
If you're like me and doing this cause you couldn't figure it out on your own, I usually take notes on my code in the comments so I understand the logic when I come back to it and to solidify it in my head.
hope this helps!
Thank you @NeetCode for all the great videos! I just want to point out that the optimal approach is great for a large n but in case of this particular problem because n is limited to max of 100 (0
Agreed. A sorting-based approach is consistently a little bit faster in my testing.
👍
Also sorting based variant is more universal, because not limited by using only 26 characters. Adding uppercased letters won't double time complexity for example.
if you want the LeetCode's exact output (if order mattered) then the following code does it:
from collections import defaultdict
class Solution:
def groupAnagrams(self, strs: list[str]) -> list[list[str]]:
res = defaultdict(list)
strs.sort()
for s in strs:
count = [0] * 26 # a ... z
for c in s:
count[ord(c) - ord("a")] += 1
res[tuple(count)].append(s)
return sorted(list(res.values()),key=len)
This one worked! , the one on the video did not, it groups words that are not anagrams
Bro. I've been an engineer for like five years now, and you are so much better than me at this. I've decided to become really good. I want to be able to do this shit. I'm not going to burn myself out, but I'm going to train neetcode 150 until i can do all 150 problems without looking up solutions. Might take me a whole year or more, but i dont care. I'm going to be able to solve these. I know for a fact it will make me a better problem solver and engineer.
Thank you! Never would have thought of making the keys the ASCII character count, that is pretty clever
awesome but I wish you went through one example step by step. i bet it's simple but it takes me an hour to fully understand the problem :(
The type of res.values() is dict_values. You might need list(res.values()) to avoid type error if you encounter one.
thanks bro
Great video!
I want to point out that the group anagrams problem limits the string length to size 100 so the m*n*logn solution will be faster for all cases as the worst case would be for string size 100.
m*100*log(100) -> m*100*2 < m*100*26.
Thanks for the great point. Mind helping me? I don't get why 26 is 'technically' part of the complexity. Sure count has a length of 26, but inserting it into the dict is O(1), isn't it?
Hi @@shan504 ,
The best way I can explain is when inserting into a dictionary the dictionary has to do a look up to check if the key already exists in the dictionary. The look up portion of the insert is why we have 26 as part of our time complexity.
Normally the key to a dictionary would be a primitive type (int, byte, short, long, float, double, boolean, and char) and during these cases a look up would take O(1). But as shown in the video the solution is using a list as it's key instead of a primitive type. Now when the dictionary does a look up it has to check each integer within the list(26 of them as opposed to 1) to figure out what the key needs to be.
I hope this helps clear things up.
@@shonsanchez6403 Hi Shon. Thank you! I was thinking that was the case, though I think in general, at least for an interview, I would just say the look up is essentially O(1). Thanks again :)
@@shonsanchez6403 the runtime is not O(26mn) but O(m(n + 26)). The O(26) comes before and after counting the characters, not for each character, so technically:
200m > 126m
but this is incomplete. Sorting takes O(n log n) = 200 + convert the string to a tuple in O(n) = 100 + hash the tuple in O(n) = 100 = 400
The hash map approach takes: create an empty array of size 26 in O(26) + count each character in O(n) = 100 + convert the array to a tuple in O(26) + hash the tuple in O(26) = 178
400m > 178m
so the hash map approach does less operations. All of this in the worst case.
@@TCErnesto but you can convert this hashmap into sorted string directly and vise versa. Hashmap can't be less then n log n this way.. If its faster this way we would have sorted using hashmap and converted it back. All O(N)"savings" here are from not counting operations using hashmap..
Here's a slightly shorter version of the same thing :
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
res = defaultdict(list)
for s in strs:
res[frozenset(Counter(s).items())].append(s)
return res.values()
Such a clever solution, I wonder if I will get to this level of problem solving to come up with something similar on my own ._.
did you get
@@kivanx yes, but back then, I have been out of touch for DSA for quite a long time lol
@@Mustafa-099 why
@@kivanx I got caught up in full time job and have been prepping to go for grad school rn
@@Mustafa-099 same time? it must be hard af
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
sorted_map = defaultdict(list)
for item in strs:
sorted_map[tuple(sorted(item))].append(item)
return sorted_map.values()
I thought of this way
Pls can u explain the function of the default dict to me ??
Maintain a dictionary and a list, dict pairings (sorted_string: length of list), check if sorted string is already in dict if not just append it to list in list[list[]] type, if found take the length of list from dict as it is the position of sorted string and append it to the list inside the parent list, fastest method so far.
I was able to come up with a solution before looking it up :)
not optimal but I was glad I was able to solve it. Thanks in part to your videos.
def groupAnag(list):
result = {}
for item in list:
itemResult = []
for letter in item:
itemResult.append(letter)
itemResult.sort()
itemResult = ''.join(itemResult)
if itemResult in result:
result[itemResult].append(item)
else:
result[itemResult] = [item]
my_list = [i for i in result.values()]
return my_list
For practical scenarios, I found that it's faster to sort each word. I actually got a better result on leetcode by sorting the words and then checking if they're in a dictionary.
First of all, thanks for your work.
I coded first and second solutions in python3, solution 1 (using sorted strings) seems to perform better in speed use and in memory use on leetcode test cases.
Just did this and you're right. According to leetcode, the ultimate solution of this video beats 27.88% solutions based on runtime and 5.36% solutions based on memory whereas the a sorted solution beats 87.87% solutions based on runtime and 84.54% of solutions based on memory. It varies but the second solution is never lower than in performance or space complexity (edit: just ran the non-sorted code again and it beat the an iteration of the sorted-code. I just read a reddit comment that says LC's % beat is basically a random number generator and I think that might be accurate).
I'm still new at this but I suspect leetcode isn't very good at reporting how well a code actually performs.
can i get the solution for this question using sorted strings
@@kushwanthaddepalli5236 here you are
anagrams = defaultdict(list)
for word in strs:
sorted_word = tuple(sorted(word))
anagrams[sorted_word].append(word)
return list(anagrams.values())
I went with a different approach where I just check whether sorted(string) is in the dict, if not then just store an array against that sorted string in the dict
dic = defaultdict(list)
for s in strs:
dic[tuple(sorted(s))].append(s)
return dic.values()
# this is one way of using built in function but what anna told is applies for all langs !
Since big O is an expression of worst case compute complexity,
@7:57 n is the 'longest' string.
(instead of the 'average' length)
This solution looks better
def group_anagrams(words):
# Step 1: Create an empty dictionary
anagram_groups = {}
# Step 2: Iterate through each word
for word in words:
# Step 3: Sort the letters of the word
sorted_word = ''.join(sorted(word))
# Step 4: Check if sorted_word is in the dictionary
if sorted_word in anagram_groups:
# If it is, append the word to the list of anagrams
anagram_groups[sorted_word].append(word)
else:
# If not, create a new key with sorted_word and set its value as a list with the word
anagram_groups[sorted_word] = [word]
# Step 5: Return the values of the dictionary
return list(anagram_groups.values())
# Test the function
print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
Thank you so much
Thanks NeetCode! Great github link
Easier Code ig
d={} #sorted string : list of anagrams
for word in strs:
key="".join(sorted(word))
if key in d:
d[key].append(word)
else:
d[key] = [word]
return d.values()
underrated comment
As a developer with 5+ years of experience, I usually work a lot with comparing asc characters that's why it makes sense for the companies to throw this to the interview...Precious coding interviews
def group_anagrams(strings):
anagram_groups = {}
for string in strings:
canonical = ''.join(sorted(string))
if canonical in anagram_groups:
anagram_groups[canonical].append(string)
else:
anagram_groups[canonical] = [string]
return list(anagram_groups.values())
Instead of counting characters at all the 26 indices and checking for similar counts , I used sorted() with O(nlogn) time and went directly to appending.
Thats what i did before this vid as well but Thats still m.nlogn isnt it
wow, your roadmap helps a lot - i've managed to solve this issue from the first try. thanks!
you have to return list(res.values()) for the edge cases where input is empty list or one word or no anagrams
I think you have to return list(res.values()) for all the test cases. Cause output is expected to be in a list format but res.values() return dict_values
res = defaultdict(list) creates a defaultdict object named 'res' where each key will have a default value of an empty list. This is particularly useful when you want to append items to lists within a dictionary without having to initialize the list manually for each new key.
class Solution {
public List groupAnagrams(String[] strs) {
Map map = new HashMap();
for(String s:strs){
int[] count = new int[26];
for(int j=0;j
more efficient time space way would be to sort each str ele in strs and put the original form in a hashmap with hashmap values as lists of similar anagrams (ie map = { 'aet':['tea','ate'], 'abt':['bat','tab'], ... } then you just output the hashmaps values
he mentioned this in the begining for the runtime within leetcode it definitely is faster but for over all big o notation has it as (m*nlogn) so if you have massive strings hundreds of thousands of characters his solution is much faster
I did the same thing, The problem had a constraint where len(string)
Easier solution, "from collections import defaultdict
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
d=defaultdict(list)
for st in strs:
key=''.join(sorted(st))
d[key].append(st)
return d.values()
"
Set up a dictionary and output list.
Iterate thru the input strings.
For each input string, create a separate sorted string.
If the sorted string is a key in the dictionary,
append the input string to a list in the key's value
If the sorted string is not a key in the dictionary,
set up a new key value pair where the key is the sorted string,
and the value is a list holding just the input string.
Iterate thru the values of the the dictionary,
append them to the output list.
Return this output list.
Maybe we can get rid of mapping strings with integer list and use sorted keys.
Something like below worked for me:
ans = defaultdict(list)
for str in strs:
sorted_str = sorted(str)
ans[tuple(sorted_str)].append(str)
return ans.values()
I think that this is the best solution
now the struggle starts. Actually you just need to learn these toy problems. These problems have their own language. This is the golden handshake
I have made a hashmap where I have put the sorted version of the string. For every element in the list, I check it it’s sorted version exists in the hashmap or not, it it exists, then I append it to the list to the respective sorted string, otherwise create a new one.
Once done, I create a list of list of the values present in the dictionary.
I have given my implementation in python below. Thank you for your explanation. :) :)
class Solution:
def stringSorting(self, inp: str) -> str:
inp = sorted(inp.lower())
return "".join(inp)
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
pairs = {self.stringSorting(strs[0]):[strs[0]]}
result = []
for i in range(1, len(strs)):
tmp = self.stringSorting(strs[i])
if tmp in pairs:
pairs[tmp].append(strs[i])
else:
pairs[tmp] = [strs[i]]
return [val for key, val in pairs.items()]
Here's a more intuitive key-value pair that I used.
KEY --> sorted word (just call sorted() on a word in strs)
VALUE --> same as Neet's soln, the list of anagrams (not sorted)
like in the video, you'll need to convert the keys to tuples
Even if the complexity of insert the "count" into the dictionary is o(n), isn't it still be O(m * n)? as the inserting happens after the inner loop. I do not get the O( m *n*26) part. I thought it is O(m * (n + 26))
inserting into a hash map is always O(1), but yeah the total runtime is O(m(n + 26)) = O(mn)
The reason why it’s O(m*n*26) is because of the lookup time for a dictionary. Because the key is no longer a primitive (int, char, float etc) but instead an array of characters the look up time is no longer 0(1) but O(size of the array I.e 26) because the map now has to compare each element of the key to check for its equality.
@@TCErnesto inserting into a hash map is not always O(1) that is only the case with a primitive key (ints, bool, char) but not with keys that are complex data structures like arrays and objects because of the hashing algorithm which will now take more than one element into account to create a hash key. In addition the dictionary has to compare the key with input to prevent a hashing collision and overriding existing data.
@@shonsanchez6403 yeah hashing an array will take O(x) where x is the size of the array. In this case hashing the tuple will take O(26), but this operation takes place after counting the characters which takes O(n). The tuple is not being hashed every time a character is counted, that would be the time you mention, O(26n). But first the chars are counted, and then only after that, the tuple is hashed, therefore the runtime is O(n + 26).
@@TCErnesto I believe you’re right about the complexity. Must have been thinking of the algorithm different in my head 😅 but looking at the actual code cleared things up. Good discussion.
Wow, wouldn't have think of ASCII solution, I think sorted one is a bit simpler to understand and come up with and also for this particular LeetCode tests performance looks the same, but maybe for very long strings your proposed solution would pay back. Thanks for sharing!
If we sort each string, the time complexity would be m*n*log(n). In the presented solution, it is m*n*26, which is represented by m*n.
However, according to the constraints on leetcode, n
7 and 26 doesn't matter for a computer can calculate 10^9 computations in a second. Basically, both provides similar complexity.
You shouldn't stick so much to the constraints.
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
d=defaultdict(list)
for item in strs:
sortItem = ''.join(sorted(item))
d[sortItem].append(item)
return d.values()
I did this in O(n) time by creating my own hash function where elements should be same but their order does not matter (still failing some testcases probably due to collision) but it was so much tiring and this simple solution is so much better.
Interesting, can you share your hash function. How did you manage to ensure no collisions?
@@gopalakrishnan9610 I dont remember now exactly, but I used the ascii codes of alphabets with their position in word as a multiplier, still dont remember the whole hash function,
My solution, which doesn't use defaultdict, and uses squares to avoid collisions:
dict = {}
for str in strs:
hash = 0
for c in list(str):
hash += ord(c)**2
dict.setdefault(hash, []).append(str)
return dict.values()
how am i supposed to come up with this on the spot?
Much simpler and probably faster:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
d = defaultdict(list)
for k in strs:
key = "".join(sorted(k))
d[key].append(k)
return list(d.values())
For this solution, Leetcode reported run time of 80ms, beating 99.84%, and uses 17.1MB of memory, beating 84.2%.
Ken looks like your right. However, sort() inside a forloop might not be an efficient solution when it comes to scalability. What if we have strings that are very long or a list of 1 million words with long strings... sorting it self in worst case scenario is n Log n. putting that inside of a for loop makes it n * n log n or O(n^2 log n)! That would not be an ideal implementation
@@geld5220 Good point. In a real life situation, we'd know if the strings were very long, and in that case, I suppose the video's method would be faster. However in that case, there is the overhead of converting a list to a tuple which takes O(n) time for each string.
@@kenhaley4 sorted(k) gives a sorted list so your code also has the overhead of joining the list into a string, which takes O(n) time for each string.
I agree with your other points though. I used tuple(sorted(k)) for one of my solutions. In practice your code works well because the strings are short on LeetCode (max 100 chars).
You automatically assumed that sorting the alphabets of each string is O(nlog(n)). This only applies for a comparison based sort. However, you can sort using counting sort which will end up taking O(26) + O(n) time (or O(n)) plus O(n) additional space. If we serialize the output of counting sort as character followed by count, instead of constructing the whole sorted string, we will end up with exactly the same key as you got. A hashmap can then be used as usual to finish the problem.
I initially thought about it and discarded it, but it actually beats the array or hashmap as a key solutions, nice!
TIL about counting sort, wow.
# simpler 4 line solution with O(mn) time
res = defaultdict(list)
for word in strs:
res[frozenset(Counter(word))].append(word)
return list(res.values())
This is the kind of solution that I conceptually thought of but couldn't implement because I never heard of frozenset before. Thank you!
Pls i really want to know the function of the defaultdict, thank you.
Your solution will not work with this test case: ["ddddddddddg","dgggggggggg"]
This is with O(m*n*logn) solution:
def groupAnagrams(strs: list[str]):
sorted_strs = [''.join(sorted(s)) for s in strs]
res = {}
for i, s in enumerate(sorted_strs):
if s in res:
res[s].append(strs[i])
else:
res[s] = [strs[i]]
return list(res.values())
I've firstly solved this problem with binary search by answer.
Because we know how to check if two strings are anagrams, so we can use is_anagram(a: str, b: str) -> bool function in our binary search for the input array. But the time complexity is bad for this solution.. :)
It is given that n can be max of 100, and log(100) is 2 making the O(n·m·log(n)) solution faster than O(n·m·26). Let me know if I am thinking correctly.
Well technically speaking, when we use log, it's a log base 2.
Your solution of 2 only works if you're using the default log base of 10.
Either way, we get 6 and change when using log base 2, which is faster than a string input of 26.
This solution of 26 is ONLY better if we're using strings roughly larger than: 100,000,000
log_10(100) = 2 [Log base 10 of (100) equals 2)
log_2(100) = 6.644 [log base 2 of (100) equals 6.644)
There's no need to subtract the ascii value of each character from A's ascii value. Just use the characters ascii directly as the key in the hashmap, so change *count = [0] * 256.*
But why? It will increase memory costs
@@pythoncoding1092 A memory increase of 1840 bytes is completely insignificant. The increased code complexity is probably a bigger overhead.
Thanks neetcode lovely solution.I was going the opposite way using strings as keys and overcomplicated things thanks.
I don't think I could ever have come up with the NC solution during an interview. Best I could do was using the hash() function:
def groupAnagrams(strs: List[str]) -> List[List[str]]:
hash_index_map = collections.defaultdict(list)
for i, word in enumerate(strs):
hash_value = 0
for char in word:
hash_value += hash(str(ord(char)))
hash_index_map[hash_value].append(strs[i])
return hash_index_map.values()
This passes all the LC checks, though of course it isn't foolproof due to the possibility of distinct words adding up to the same hash value, so I doubt it would fly during an interview. IRL I'd probably just do:
def group_anagrams(strings):
res = defaultdict(list)
for s in strings:
res[tuple(sorted(s))].append(s)
return res.values()
And call it a day.
It's wild but in this case O(M * N * logN) is better than O(M * N * 26). Because in order for logN to be 26, you need N to be 10^26 which is practically impossible.
That's why I consider logN to be constant because it literally is (in the programming context).
You can sort a string s in o(|s|) time and o(1) time complexity, so I went with first solution:
def groupAnagrams(self, strs):
"""
:type strs: List[str]
:rtype: List[List[str]]
"""
classDict = {}
for string in strs:
#cannoinze
letters = list(string)
letters.sort()
key = join(letters,",")
if key not in classDict:
classDict[key] = []
classDict[key].append(string)
return list(classDict.values())
Amazing solution, learnt a new pattern and new method today!
thanks
[0] * 26
I guess this vector representing a..z number of a..z can be directly converted into sorted string.. (aabbbcccc => {0: 2,1: 3, 2: 4}) So its basically exactly the same == should be same nlogn if written in same language. If its faster then we discovered sort algorithm faster than n log n >.< Which is not likely to be the case.
I kinda hate the optimal solution. I was like _this_ close to coming up with this answer, except dictionaries aren't hashable, and I couldn't figure out how to compare letters on the spot other than spending an hour making a letters dictionary ({a:0, b:1} etc). I feel like most people wouldn't come up with ascii on the spot in an interview. This question relies entirely on already knowing the "trick" so to speak
Isn't the overall time complexity for the sorting approach even longer than M*NlogN because you also have to do a string compare which is check N characters to N characters?
Or is that factored into the M portion of the time?
if by string compare you mean converting the string to a hash map key, this is not true as the keys are hashed, there's a hash number associated to each string so the hash map doesn't compare strings
One thing you mentioned, which created a lot of confusion in the comments, is that it's not O(m * n * 26), it is O(m * (n + 26)) - you don't calculate the key for each character in a string, you calculate it once for each string. So sorting algorithm is better only for strings with average length lower than around 11.
here is a better solution :
from typing import List
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
anagrams = {}
for s in strs:
key = "".join(sorted(s))
if key in anagrams:
anagrams[key].append(s)
else:
anagrams[key] = [s]
return list(anagrams.values())
way over my head, to just over my head, to within reach if I jump. many thanks
My solution in typescript:
function groupAnagrams(strs: string[]): string[][] {
const sortedMap = new Map();
for (let i = 0; i < strs.length; i++) {
// Sort each item
const key = strs[i].split("").sort().join("");
// Add items as keys in the map. If one exists already concat it so it is in the right form [..., ...]
sortedMap.set(key, (sortedMap.get(key) || []).concat(strs[i]));
}
return Array.from(sortedMap.values());
};
3:42 there is also one more constraint that 0
What is the space complexity of this solution? Will it be constant because we're using a single array of count 26 and we can reuse that array for every word? Or is it O(m*n) where m is the number of strings and n is the average length of the string?
If every word is unique you need to store a count of that word in the dict, so it would O(26 * n) or O(n) where n is the number of strings. In this case the average length of the word does not matter since we compress that info down to 26 ints.
Note if we clone the words its O(n * m) though if you had very long words you would likely store pointers to the words rather than duplicate the words
This solution just blew my mind
Any able to do this solution cleanly in cpp? With using the set as a key for hash?
# class Solution:
# def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
# sortmap={}
# for i in strs:
# sorted_str = "".join(sorted(i))
# if sorted_str not in sortmap.keys():
# sortmap[sorted_str]=[i]
# else:
# sortmap[sorted_str].append(i)
# return list(sortmap.values())
Not sure how keeping the array as a key works.
I just did an ansi sum of each character for the string
as there could be a max of 100 characters in the string as per problem constraint, lowercase z being 122 * 100 = 12200 within the range of an int.
so my map was int, vector
still O(m*n) in time and O(n) in space in worst case.
I've found an even faster solution that doesn't require ascii math:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
dict = defaultdict(list)
for str in strs:
sortedstr = ''.join(sorted(str))
if sortedstr in dict:
dict[sortedstr].append(str)
else:
dict[sortedstr] = [str]
return dict.values()
it's quite similar but essentially it sorts each str in strs. Then, if sortedstr is in dict, append to dict[sortedstr], otherwise append [str] to dict. return dict.values()
hope this helps someone
Your code is great and it helped me, still I modified a bit like u can remove the if condition becaz append handles the if condition that u given,
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
res = defaultdict(list)
for i in strs:
sortstr = "".join(sorted(i))
res[sortstr].append(i)
return res.values()
Hey Navdeep, just a small suggestion. In fact this could be a feature I think that would be useful on neetcode. Instead of giving a unsolved problem when you click shuffle, can you add a feature call revise which gives us a random solved problem so that we can re-do a solved problem?
One more great way to group anagrams is to use hashmap ;)
For example:
anagrams = {}
for string in strs:
sortedString = "".join(sorted(string))
if sortedString in anagrams:
anagrams[sortedString].append(string)
else:
anagrams[sortedString] = [string]
return list(anagrams.values())
underrated comment
Amazing... How come you can get to these ideas. It is simple, but it is hard to get to these solutions without hints :)
Joye just noticed : constrains says length of word is going to be max 100 so in that case sorting (logN) is faster then 26 times looping.
at first I thought about putting each word in a list of sets as a set and comparing them. I'm still learning about sets and hashmaps etc so if anyone could tell me would converting each word into a set even work I'd appreciate it
Some words can have duplicate letters, which would disappear when casted to a set
Rate my solution. I'm a beginner.
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
hashMap = {}
for word in strs:
key = str(sorted(word))
if key not in hashMap.keys():
hashMap[key] = [word]
else: hashMap[key] += [word]
return list(hashMap.values())
I originally tried to solve this by creating a hash using the char codes of each char in a string and using the hash as the key to group the values. I had a ton of collisions as my hashing function was crappy. Was wondering if you'd revisit this and solve with hashing?
anagram_groups = {}
for string in strings:
canonical = ''.join(sorted(string))
if canonical in anagram_groups:
anagram_groups[canonical].append(string)
else:
anagram_groups[canonical] = [string]
return list(anagram_groups.values())
Why do we use the average size of the strings in the Big O Time Complexity, rather than, say, the size of the largest string?
gosh when I tried it myself my solution was so much longer and tedious, but your solution looks beautiful lol
Good explanation, thank you.
yeah I would've never thought of the ASCII value stuff. Twas thinking about going with Counter and using the result as a key if possible
Interestinly, the sorting solution beats the non-sorting ones for some reason. Even I used the non-sorting solution and was wondering why my solution is not better than the ones who used sorting.
Should we trade the term log(n) to 26 in the time complexity? Is the test case that large?
Constant multiple like 26 does not contribute to big O complexity but multiplying by log(n) does. Assume large test cases when thinking about time complexity
@@sp33r I think You underestimate how small log(n) is. 2^26 = 67 million. I feel confident that the case wouldn't be that large, and that's just for breaking even
is there a reason why when i ran the first method suggest it gave a better time, than the method that was used as a solution?
Sorting time for the first method is log(n)
This gives us a run time of O(m * nlogn) or O(m * n * logn)
The second solution gives us a run time of O(m * n * 26)
Meaning that our first solution is faster in cases where: logn < 26
We know from our Constraints that: 0 26
n > 67,108,864
Which even in the real world, i find it hard to imagine a string larger than 67 million.
No way, the code is this concise and clear in Python. That's it. I am divorcing Java for leetcode. Switching to python.
Can you explain strs: List[str] in parameters ? I decided to divorce with Jva too but dont recognize this type of thing I couldnt search in the internet too.
I have a question for the M * n* log(n) solutions. I also came up with that, but one has to return the unsorted strings. So I am assuming that you also made a deep copy of strs to not fiddle with the order of characters?
Could we just use the Counter method for strings or does the ordering of how the hash map comes out individually for each string make that unusable here?
I really like NeetCode's solution, but it could be hard to come up with it on the spot. So, I've come up with the below solution. sort the each str and use it as key of the defaultdict.
from collections import defaultdict
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
res = defaultdict(list)
for string in strs:
sortedString = ''.join(sorted(string))
res[sortedString].append(string)
return res.values()
This is pretty much what I did. It's better to at least get an idea you have implemented first that works whether it is brute force or not that you can then improve on. I don't think this solution is something you can think of off the top of your head on an interview like you said and actually according to the constraint of the problem, there is a limit on the max string size which is 100.
So even if the first solution was O(m n log(n)) and the second one was O(26*m*n), log(100) is 6.6 so the former is actually faster in this scenario since there is a boundary on string size. If you were to remove the boundary, then you'd have to compare at what point log(n) outgrows 26, which would be when n = 2^26 or the string size is 67 million characters long. This sort of deep dive is probably better to talk with your interviewer about and discuss the logistics of your solution that comes pretty straightforward since in the real world, storing strings that are 67 million characters+ long within an array seems unrealistic IMO.
The sorting approach has a faster runtime on LeetCode but this solution with the custom key for each anagram feels more optimal somehow (for C++).
that vector is basically the same sorted string, they are idential but instead of aabbcc it will be {0:2, b:2, c:2}. Should be the same nlogn, if not then we can sort stuff this way and have better than native sort performance! Dictsort can be the name I guess (probably it will be something like bubble sort, but bubble is slow ;) ).
Amazing solution! I was here thinking after 30 minutes of brainstorming, "how's he going to approach this one". 😆
p.s
If you don't want to use the imported package
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
store = {}
for word in strs:
count = [0] * 26
for char in word:
count[ord(char) - ord('a')] += 1
try:
store[tuple(count)].append(word)
except:
store[tuple(count)] = [word]
return list(store.values())
Or you can always use the get method to deal with uninitialised values, like -
store.get(tuple(count),[])
store.get(tuple(count).append(word)