The 3rd solution does not solve the space problem. It relies on getting a deep copy of the array that you can change as you want in your function. Which means that it always requires N * size_of(int) memory. The 2nd solution can handle a shallow copy / const reference to the array and requires this amount of memory only in the worst case (no duplicate). One could improve the 3rd solution by creating an array of booleans/bits to indicate that a number was already visited, while using a const reference to the original array. This would result in N bits of additional memory instead of N * size_of(int).
@@dassbah5316 But it depends on the caller whether you get a copy or not (and if it's not a copy, whether it's ok to change it in the first place!). In the second solution the worst case is actually that the caller created a copy _and_ you create your set/bitmask. Tbh. if I was the interviewer I wouldn't primarily be looking for a great solution to the task, I would be looking for the interviewee to ask good questions. Like "Are we optimizing for cpu cycles or memory usage?" Or "do we expect large arrays or a lot of calls to this function with small arrays?" Because that third solution is dereferencing the array and calling Math.abs a whole lot, the first solution, despite being O(n^2), will actually run faster up to a certain array length because each individual iteration does less work. O-notation gives an indication of how the algorithm scales, not generally how fast it is. Both you and the talking head in the video make a whole bunch of assumptions that I, as an interviewer, wouldn't be that happy about. Over-confidence is a way too common characteristic in programmers which I learned to watch out for.
Third solution is amazing. You're using the logic used in a dictionary/hash table solution but saving space by using the same array. Wish I could come up with this kind of solution by myself haha.
5:24 Just looking at the problem I'd use a hashset and add elements to the set. The hashset only has to be as large as the array length so memory complexity of n and the time complexity would be worst case n as you iterate over the list and check if a value is already within the set if so return that value.
@@BrieoRobino Hashsets don't come for free when considering the overhead and O(1+load-factor) complexity. Worse yet, if the implementation doesn't use clever implementations like universal hashing (Java doesn't iirc), the complexity could be theoretically higher. The wording of the question has restricted the cardinality of the universe of keys to be exactly the same as N. Thus, you can use direct addressing with O(N+N) complexity with an array structure to simulate a hashmap/hashset. Model the index as the keys (input) and the content as a tally (ie +1 when seen). With this method you save some overhead that is associated with a typical implementation of a hashset.
@@sfcs3743 not everything is a complexity test. This isn't the 1970s where we have to deal with memory limits imposed by the technology. Any potential overhead from using a hash set is negligible at best. If I were doing the code review of someone that had a solution to this problem that was as over-engineered as his third solution I would suggest a hashset. sometimes readability and maintainability is far more important than saving a tiny amount of time and bits.
We can actually solve this by using a BitSet to keep track of whether an index is flagged or not. It's like an array of bools, except each bool is only a single bit in a small array of bytes. The added memory footprint is relatively small and the original array gets to go untouched!
If the problem was not about first duplicate but any duplicate, then I think there's a way of doing it in O(n) time and O(1) space without destroying the input. Not sure if this is possible for finding the first duplicate though...
Given the requirements, it seems like a nasty side effect. A step should be added to reset the values to positive. This method is also not thread safe. Is that a requirement? Who knows. This is why I hate interview questions.
When you first explained the question I immediately knew that it had something to do with the indices, but the only way I could think was to sort the array and compare the indices with the value stored on that indice. But that would give a wrong result if there were multiple duplicates. Your solution was amazing!
I kinda thought about the last solution in my head. There's an important drawback to this solution that ultimately makes it *O(n)* space complexity. This is because we need a bit to store that the element has been previously visited. In your case, well two's complement complicates things, but for simplicity's sake assume it has a bit for the sign. Originally, I also did this for unsigned integers, and as a way to store the state: I simply added the size of the array to the array and made subtraction/modulo/division to evaluate the information. Both these methods require the maximum number of n (array size) to be less than half of the maximum value the container could hold. If it is a 32 bit int, then it has to be signed or (unsigned/2). 2^31 (signed way) = (2^32)/2 (unsigned way) Thus, there is a stipulation that n cannot exceed the half-value of the element's container making it O(1) for ∀n < (N/2) where N is max number of states where each container's element could hold. Conversely, if we redefined this N to be the size of the array; it would lead to assigning an extra bit for each element. The cost of this space just by itself is logarithmic but the key takeaway is that it has the potential to double the N. The calculation when considering this potential is: total space: *O(n)* = O(N) + O(N) Note: I did not add the logarithmic O (log32(n) for 32-bit container) because it would make this recursive. Δspace (removing original array): *O(n)* = O(N) + O(1) since N (max size of element container) = n (array size) *O(n)* = O(n)
Nick you explained this so clearly step by step asking us to notice the minute details that the elements in array are between 1 to arr.length , then making the use of that point . Great 🤘
A small comment on 9:25 Being O(N^2) doesnt mean that it takes you N^2 time to do it. It means that your growth rate is quadratic. The time taken could literally be anything but the point here is, if you double the input size, the time taken (whatever it was, it could be a 0.000001 ms) will not double it will 2^2 = 4 aka quadruple Quadruple the input size and the time taken will be the time it took for the original times 16
I went through a year or 2 of being taught this in college and never once understood, and it took not even 5 minutes to explain it in a way this easy to understand.
I've seen a few responses saying to just sort the array. Here's why you can't. The problem requires you to find the 'first duplicate' in the array. If you have arr = [3, 3, 1, 1, 1, 1]. you can obviously tell that the first duplicate is 3. If you sort the array [1, 1, 1, 1, 3, 3], the first duplicate is now 1. The problem doesn't want to know if there's ANY duplicate. It specifically wants to know which set of numbers in the original array produces a duplicate first. Sorting the numbers completely disregards that requirement.
I think that in the first solution it would've been more optimal if the second pointer would loops only until the min_second_index, so i+1 < j < min_second_index , then if (a[i] == a[j] ) { min_second_index = j }
I would argue that making the numbers negative is actually using O(n) space. There's a much more intuitive solution that doesn't require modifying the elements -- only swapping them. Go through the array and put each item "in place". Unless it's already in place, if you try to put it in place and there's already the right element there, that's the answer.
Below solution is similar but little different from concept perspective and easier to understand: Since the input list cannot have negative values (because it is mentioned the elements have to be greater than 1), keep on making the elements -ve as you move through the loop and if you already find a -ve of current element, you return it. (Below sol in python) def first_duplicate(in_list): for i in range(len(in_list)): if -in_list[i] in in_list: return in_list[i] else: in_list[i] = -in_list[i] return -1
You will not get around O(n) space complexity in either the second or third case, making the algorithm and analysis in the third case incorrect. If you program this in a strongly-typed language then the integers will be either signed or unsigned, so it could happen that you won't actually have the space to make a number negative (e.g. if you have unsigned 8 bit ints and an array that fits >127 integers, which is a valid input here). In that case you need to expand the data type into a bigger size (in this case 16 bit) and that would mean tetta(n) space complexity, if you just expand every integer. If you use a non-strongly type programming language it will probably expand it internally, leading to O(n) space complexity and if you don't consider this the algorithm (or your analysis) is incorrect. In summary, you still need space to save the sign, and that would lead to incorrect results if the space is just not there (see given example). The reason they give you an array with entries smaller/equal than the array size is for the exact reason that you don't have to hash the numbers to begin with, a HashSet is overkill. They probably just want you to create an array with size n where you use the number as index to check whether you encountered the number first, which again is tetta(n) space complexity.
The last one is called Negative marking. This way, you integrated the Seen operation, with index. However, you do not need to deduct one. Your elements are [1, n] and you have n+1 elements. So, whenever you go through the list, just mark negative, the element at such index. So, - nums[nums[idx]]; later if you reach to a duplicate, since you're trying to make nums[nums[idx]] negative, and it's already negative, you will realize, that you met such number before.
The simplest and cleanest solution with regards to the constraints (if we're unsure whether the original array is allowed to have values that violate the initial constraints) is either using an n-length array of booleans with k-1 as index or a bitset of length n to save even more space.
I like the idea of using an intrinsic component of numbers like negatives to store boolean values to basically treat the data structure you already have as a hashset (without the modulo and collision detection)
My idea is similar to the 3rd method - but using bitwise checks instead. Initialize an integer with binary length equal to the array length, and use each bit to represent the "seen" status of that position. E.g. 1000 = Only 1 of 4 has been seen; 0110 = 2 and 3 of 4 have been seen. When reading through the array, check the bit at that position. If 1, that's your first duplicate, else flip the bit to 1. Space required would be number of bits equal to length of list, not sure how that would compare to a hashset.
for anyone unable to get the third answer...nick has another video where a similar question is solved with this exact apporach. Check for return duplicates in integer array. There we return all the duplicates and here we return the first one to be found. The third approach is really beautiful.
The last approach of solving the problem in o(n) is similar of finding the smallest positive missing number in unsorted array.( The only difference is that after looping all element and marking number -ve, we have to return the very first index+1 where positive number is found, so here we find first positive number 5 which is at index 3 so 3+1= 4 is smallest positive missing number in unsorted array.)
Yea basically, since the values of the array are between 1 to n, means that every value is a valid index if you subtract 1 from it, and if there are duplicates, they are always gonna reference the same index, it's just a matter of figuring out how to track which index was already referenced in constant space.
The thumbnail gave the third solution away! Also, I assumed the array would be unsigned because negative numbers don't make sense in it, and modifying your inputs is generally a bad practice. Plus, because of CPU architecture details (branch prediction and cache size) I suspect a solution using a bitset would actually be faster.
In a real project things like the 3rd solution is what adds terrible terrible bugs to the code base... Either it's to complicated that other devs won't understand it and slowly break it. Or it's straight up buggy, e.g. what if the caller of that function didn't expect that the array would get modified? Also it doesn't even necessarily save memory, since you couldn't use a data type that stores only positive integers in the array. So that bit was wasted there instead of in a proper bitset.
The 2.5 solution is to use an array of boolean. If the boolean is already true at the "value-1" index, it's done! In some languages, it's expensive (java) unless you use BitSet (map you nth bit to the correct bit in bytes). In other languages, you can use pointer arithmetic and some bit twiddling! (Doable in Java too, but why stray away from existing solution ?)
Gonna be honest, I thought the fact that the array couldn't contain a value larger than it's indexes was a red herring. I came up with the second solution (the first one seemed unnecessarily complex and I would never have thought of it) on the spot. But that third solution is really clever. The only possible downside is that you may need to go back over the array to convert the negatives back to positive (but given the scope of the problem presented that wouldn't matter at all).
we dont need to use contain method to check if element present in list ... add method itself give true and false for element added successfully and not respectively ..when we find first false for add method we can skip remaining element
Constant time is not the same as no time. It means the time to access an element does not increase as more items are added, but it will always take non-zero time.
This third solution is not as good as you may think. Either the caller has to be okay with the function mutating the array (which is rarely the case) or you have to copy it (but then you lose the only benefit of this approach over just using a hashset). Basically, it feels hacky and not useful. I think a good compromise would be to use the same principle the solution is using, and use a BitSet instead. It would still have O(N) space complexity, but it would use orders of magnitude less space than the set (depending on sizeof(int))
i came up with something similar to the second. with my current level of knowledge there's no way i would have thought of the third in any reasonable amount of time, and even after figuring out how it works i feel like i'd still have trouble actually implementing it. i looked at the solutions and even after writing the process on paper i didn't understand how/why it worked until i saw this video, so thanks. i gotta get on the grind to get familiar with the patterns and nuances...
Don't use a hash set when the problem clearly states that the input array has values greater than 0 and not greater than the length of the array. Just use an N+1 array of booleans to keep track of whether or not the value n at index n has already been seen or not and iterate from left to right. Index 0 in found doesn't mean anything in this solution, but it keeps it clean. def firstDuplicate(arr): found = [False for i in range(len(arr) +1)] for n in arr: if found[n]: return n found[n] = True return -1
Isn’t the first solution n log n instead of n^2? The distance the second pointer needs to travel gets smaller with every movement of the first pointer. The second pointer isn’t traversing the entire list every time.
The pointer travels N - 1 indexes at first, then N - 2, then N - 3... So it traverses (N - 1)+(N - 2)+(N - 3)+...+1 indexes, which adds to (N^2 - N)/2, giving us O(N^2). Edit: I wrote multiplication of terms instead of sum of terms, thank you @Petar Jovovic for pointing it out
U normally see logs in algorithms that take a divide and conquer approach, such as merge sort. Since u are cutting your input in half with every iteration, the time complexity involves log. Its like the opposite of n^2 where u double the iterations required.
since numbers are limited by size of the array, I would do a second array of true/false values with the same length as the input array. storage_arr[input_arr[i] - 1] = true I mean it's not as good as your final solution, but it only adds a tiny bit of spatial complexity and should be just as fast.
Third solution is brilliant, but I'm not sure that would work in many languages as arrays are almost always pass by reference, which means you're really screwing up someone's array, so in order to unscrew it you'd have to go back and make everything positive again. Now, if they said it was some data structure like an arraylist or something, that'd be different because you can pass the list object by value in many languages.
Here's a number-theoretic algorithm for this problem (the algorithm is not very efficient and requires a pre-computed resource but I think it's an interesting one): [1] Let P(x) be a function that returns the x-th prime number. This function can be pre-computed. i.e. The dictionary with entries (x, P(x)) can be pre-computed. [2] Let Product = 1. Let 'Array' be the array of numbers. [3] Product = Product * P(Array[0]+1); // if Array[0] = x, multiply 'Product' with the P(x+1) which is the (x+1)-th prime number. [4] LOOP (for i from 1 to n-1): [4a] if (Product % P(Array[i]+1)) != 0: Product = Product * P(Array[i]+1); [4b] If (Product % P(Array[i]+1)) == 0: return Array[i]; // If Product is divisible by P(Array[i]+1), then Product already has P(Array[i]+1) as a factor which means Array[i] has been encounter before. [5] return -1;
For my solution I went through the array and tried to swap values into their proper place (index = value - 1) one by one, always holding onto the number we displaced and attempting to swap that number into its proper place. If that place already has the proper value, the swap is not completed and we know we have a duplicate. It does use a temporary variable for the swapping though, so that's 0(1) extra space. The data values aren't changed though which might be helpful, idk. You do need to make sure you're skipping over numbers that are already in their spot at the start.
That's only for mutable array. If the language doesn't allow changing input parameter, it might be better to create a new array for the marking and checking.
it;s like you're using the array itself as a boolean array of flags to each integer, having a duplicate or not. and that's okay because you can clean up by abs() through the array
I almost found the last solution by myself. I didn't think of using the fact that they were integers and swaped the numbers to their indexes, but only swapping to indexes smaller than the current index, while checking each time if the number you're about to swap isn't the same as the current one, if it is return it. in retrospect it may be the best solution for the same problem with unsigned integers
havent watch the video yet, but could be it possible to use a for loop to scan through the entire array and while scanning look for repeated values, then save the indexes of those values into a new array then compare index to see which index is lower and let that be the final answer. for example [2,1,3,5,3,2] run the loop, int [] indexOfRepeated is [4, 5] , if (4
Brilliant! The third solution is essentially the second solution except you are creating the hash table. The key is the value in the array (n), the hash is n - 1, and the value is True or False (-/+).
Array slicing may come in handy for this problem. I agree the third solution is clever, but it’s not straightforward. Why not iterate through the list and for each number see if it exists in the subset of the list up to the previous element?
I did come up with 3rd solution myself with 15 minutes of thinking but the point is, just because you somehow saw though an unintuitive solution once doesn't mean you will always be able to. Sometimes you feel smart, sometimes you feel frustrated.
Interesting 3rd approach, but in swift a function parameter cant be changed as its a constant. So I would have to create it as a new variable and therefore uses extra space. I could have a &Inout solution, but then it would change the reference element and I’m not sure that this would improve the 2nd solution. But it’s a good conversation one can bring up in an interview.
Hey Nick, In the second solution, when you use seen.contains() method, doesn't this method too uses a loop to determine that..? So, time complexity should be O(n²), since seen.contains() is already inside a loop.
the last solution is easy if you have seen the solution. Basically, to make that short to understand is we are hacking the array to be the hash table itself. The index mapping is just a hash set itself.
i spent around 30 min before solving using bruteforce approach, after watching this video,,,Nick is legend
4 года назад
Clever! My initial instinct was to create a bool array (similar to the hashset solution, except I have more control over the space it uses) but I didn't think about using the extra bit they leave us by specifying that they use only half of the signed int space.
Doesn't the second solution still have complexity n^2? Becuase the contains() function must iterate through all the elements in the seen array. Over the length of the original array this would result in n^2 / 2 element comparisons. Only the third solution is actually linear
The contains() method of a hashset is O(1) in average case since it determines the position with the hashCode Method. However the collision treatment is treated with a tree set(according to internet) so in worst case its O(log(n)), which makes the whole thing O(nlog(n))
The third solution makes a lot of sense if you think of the array as a function mapping whole numbers to integers. In this case, the domain and range of the function are basically the same. You want to prove that the function is not one-to-one.
Great walk through; I am wondering if mutability of the array is permitted? If it is the third solution is for sure a great one; but if it's not or you need to do "clean up" afterwards the Hashset is probably your best bet.
I was thinking about this... Was trying to think of a way to use counting sort on a fresh array, but figured that the -ve trick might work given all numbers positive. Only question would be whether my method can mutate the array or not - if it should be read only we'll need O(n) space after all.
You could simplify the second solution a little bit by just doing this inside the for loop: if(!seen.add(a[i])) return a[i]; Set in java returns false if it already contains the value.
I'm writing this without seeing the answer but it could be solved fairly easily. When we're going through the elements of the array, we push THE FIRST element we find in the given array to a STACK OR A VECTOR. Now for the next elements, we first check if the element exists in the vector already, if it does, then it's the answer. If it doesn't, we push it to the vector and continue doing it for all the elements. If no such duplicate is found in the vector, we can say there's no duplicate. This is a fairly efficient algorithm isn't it?
If it's a stack or simple vector, you'll have to iterate through each value every time you come across a new element in the array. On the second array element, you'll have to search one stack element. On the 27th array element, you'll have to search through 26 stack elements. Size of array : stack searches. 2 : 1 3 : 1 + 2 4: 1 + 2 + 3 5: 1 + 2 + 3 + 4 6: 5-array + 5 7: 6-array + 6 This is a recursive addition problem. Its time complexity is (n^2 + n) / 2. Therefore, for sufficiently large n, it's n^2 in time complexity. With a binary search tree and an array of truly random numbers, your BEST time complexity is log base 2 of n since a binary search tree, ideally, eliminates half the possibilities with each comparison: see radix search. This assumes a completely balanced tree and not a naive one which basically looks like a queue. You'd still need to search this tree n times for any array size n. Thus, you'd end up with n * log base 2 n time complexity (nlogn) in your best scenario.
True, the third solution does use the least memory, but the code is not self explanatory and not futureproof. What is for some reason the code needs to process negative numbers also? I in my code would not use #3
First thing that came into my mind was indeed a hashmap or hashset. Never ever would had thought about that thirt solution. Actually cracked my brain a little.
You can do it without calling any math libraries with basic integer arrays too (this code is in C# but very easy to translate into Java). Passes all tests. int[] duplicates = new int[a.Length];
Explanation of the 3rd solution: Basically we are putting a garbage value in the index location. If the index already has a garbage, this means that it is the desired location. We use -ve number instead of a random garbage value because we don't want to lose the original value which will act as an index.
While reading the problem statement I was like, there’s got to be some hint in why the values are always between 1 and length of the array. The third solution blew my mind!
One could come up with some sort of optimisation in that brute force approach making the observation that the program should terminate if i>=min_second_index
When I heard the explanation of the problem I tried it myself in JavaScript. I found it extremely easy even though I just started learning about time complexity and algorithms 4 days ago. I used javascript Objects for this because they’re basically hash tables in disguise. So I added keys to the object with the value of true. I called the current iteration of the array from object, if it returned true, I would return the duplicate value and the index, if not, I would add the key to the object with a value of true. Here’s the code: //Finding duplicate async function findFirstDuplicate(array) { const values = {} for(i=0;i
I did some research and apparently you can also use javascript Map and Set because, like objects, they also retrieve data at a time complexity of O(1).
So I'm really curious if you get these kind of questions during like the on-site interviews? Like these easy level questions seem really easy while the medium ones seem much harder.
The third solution is cool, though I expected something greater. Like something linear without saving the array, maybe with the usage of mod and div division, I dunno.
But isn't the .contains() an O(log n) operation? As of java 8 the hashset is backed by a tree structure; traversing it to find whether the seen list contains your int value will add time complexity. So it might actually be a O(n log n) time complexity.
If what you're saying is true, a simple fix would be to use a regular array, A[ ], of the same size with all values initialized as 0. For each value, n, you come across in the original array, check the value at index A[n-1]. If it's 0, store n. If it's not zero, return n.
Seeing that we are searching for the first duplicate why do you need a minimum index? You just return it as soon as you find it and outside the for loops you return -1
Just looking at the problem my first thought was to use a simple array indexed from 0 to 9. Ofcourse it won't work if the values were not single digit or if they were floats. In that case I would probably use a hashset first.
That’s amazing problem solving! One question, won’t the array be mutated to have negative values after you’re done? So, if you were to run the check again, won’t your answer be wrong because some of the values will already be negative? Only way I could think the solve for that is to create a duplicate of the array in memory, but then your space complexity goes up to O(2n), right? Sorry if I’m way off base here, I don’t actually know Java that well…
I actually had this question asked in one of my phone screen interviews. I did the double for loop as base solution, then did the hash solution, then did the third one as sorting the array first then traverse once to find next same element (NlogN but does not take space). Then the interviews pushes for a best solution which I came up with the third solution as shown in the video. Though at the end I forgot to minus one with the index thing starting at zero, so some test cases were wrong. I did point it out but I ran out of time. I thought that was good enough since I found all four solutions with the last one just having a hiccups at the end as not having enough time to debug and fix, but I got the rejection nonetheless.
Wouldn't sorting it not necessarily return the right answer? In his example, if you sorted it you'd return the Lowest duplicate (2), but not the First duplicate (3)
my first solution is a set < int > duplicate and for each element in array ask if value is in set, if answare is true, this is the solution, else insert the value
Cool! I basically came up with the second solution on my own but I didn't know what hash sets are so I used a boolean array instead. I'll have to look into hash sets...
Shouldn't we be able to exit the loop if i==min_second_index ? Seems unnecessary to continue looping, unless I'm missing something (it's late and I'm tired :P ) edit: oh, video's not over... there's more, lol
So according to the internet HashSets contains() method actually has a time complexity of O(log(n)) because collision is treated with a tree set or something. Lets say mutating the array is permitted. If instead of using a hashset we could simply create a 2nd boolean array the size of array1.length. Then we could check (for value in array1) if(array2[value-1]==true) Return value; else array[value-1]=true; We would still have a 2nd array as big as the 1st array but the time complexity would be O(n) right? Somebody pls correct me if I'm wrong
How is that last solution not using extra space? It uses N signs, but it cheats by using the sign bits of the given array. If the problem asked to simply return one of the numbers that appears more than once, there is a legitimate O(N) time, O(1) memory solution, but that one is really really tricky.
Wouldn't they want you to make the array back into positive integers before you returned your result? You could mess something up elsewhere changing the numbers like that.
Before watching the solutions I would make a new array and basically make a copy array. As I run through the first array I add the values to the new array. I check each value to see if it's already in the array, if it's in the array it's the first duplicate.
That's basically what the second solution was. It's a decent solution, but you're adding space complexity to the algorithm because you're creating a new data structure to keep track of the values from the initial array. The third solution doesn't take up additional memory and instead just modifies the original array values in place.
The third solution is brilliant, doubt I’d ever come up with that under interview type pressure.
The 3rd solution does not solve the space problem.
It relies on getting a deep copy of the array that you can change as you want in your function.
Which means that it always requires N * size_of(int) memory.
The 2nd solution can handle a shallow copy / const reference to the array
and requires this amount of memory only in the worst case (no duplicate).
One could improve the 3rd solution by creating an array of booleans/bits
to indicate that a number was already visited,
while using a const reference to the original array.
This would result in N bits of additional memory instead of N * size_of(int).
Dass Bah do not understand but i think u right
me brain hurts
Damn it took me around 5min to understand what he was doing!
@@dassbah5316 But it depends on the caller whether you get a copy or not (and if it's not a copy, whether it's ok to change it in the first place!).
In the second solution the worst case is actually that the caller created a copy _and_ you create your set/bitmask.
Tbh. if I was the interviewer I wouldn't primarily be looking for a great solution to the task, I would be looking for the interviewee to ask good questions. Like "Are we optimizing for cpu cycles or memory usage?" Or "do we expect large arrays or a lot of calls to this function with small arrays?"
Because that third solution is dereferencing the array and calling Math.abs a whole lot, the first solution, despite being O(n^2), will actually run faster up to a certain array length because each individual iteration does less work. O-notation gives an indication of how the algorithm scales, not generally how fast it is.
Both you and the talking head in the video make a whole bunch of assumptions that I, as an interviewer, wouldn't be that happy about. Over-confidence is a way too common characteristic in programmers which I learned to watch out for.
I implemented it as the second way straight up. After watching the video to the end, the Third way is really great
@Thomas Kwok i just started, I implemented it as the second way straight up how do you know the time complexity ?
@@younessalmy1997 Generally implemented sorting is Quick sort for which the TC is O(nlogn)
Third solution is amazing. You're using the logic used in a dictionary/hash table solution but saving space by using the same array. Wish I could come up with this kind of solution by myself haha.
Your video on "How to Crack Google Interview (Complete Guide)" is now private.
Can you please post it again. It would be really helpful.
5:24 Just looking at the problem I'd use a hashset and add elements to the set. The hashset only has to be as large as the array length so memory complexity of n and the time complexity would be worst case n as you iterate over the list and check if a value is already within the set if so return that value.
thats true but in java hashset add method give true and false based on if element added in set or not
Hashset is an unnecesary data structure when a simple array with a tally could accomplish the same task
@@sfcs3743 that's literally the purpose of the hashset.
@@BrieoRobino Hashsets don't come for free when considering the overhead and O(1+load-factor) complexity. Worse yet, if the implementation doesn't use clever implementations like universal hashing (Java doesn't iirc), the complexity could be theoretically higher. The wording of the question has restricted the cardinality of the universe of keys to be exactly the same as N. Thus, you can use direct addressing with O(N+N) complexity with an array structure to simulate a hashmap/hashset. Model the index as the keys (input) and the content as a tally (ie +1 when seen). With this method you save some overhead that is associated with a typical implementation of a hashset.
@@sfcs3743 not everything is a complexity test. This isn't the 1970s where we have to deal with memory limits imposed by the technology. Any potential overhead from using a hash set is negligible at best. If I were doing the code review of someone that had a solution to this problem that was as over-engineered as his third solution I would suggest a hashset. sometimes readability and maintainability is far more important than saving a tiny amount of time and bits.
The third solution mutates the array, though. Maybe it doesn’t matter, but maybe it does.
Joe Bonez true. In essence it’s adding an N bool array
We can actually solve this by using a BitSet to keep track of whether an index is flagged or not. It's like an array of bools, except each bool is only a single bit in a small array of bytes. The added memory footprint is relatively small and the original array gets to go untouched!
@@otchigal6527 but that means you use linear memory but the condition was constant memory. that trick method is stupid.
If the problem was not about first duplicate but any duplicate, then I think there's a way of doing it in O(n) time and O(1) space without destroying the input. Not sure if this is possible for finding the first duplicate though...
Given the requirements, it seems like a nasty side effect. A step should be added to reset the values to positive. This method is also not thread safe. Is that a requirement? Who knows. This is why I hate interview questions.
When you first explained the question I immediately knew that it had something to do with the indices, but the only way I could think was to sort the array and compare the indices with the value stored on that indice. But that would give a wrong result if there were multiple duplicates. Your solution was amazing!
I kinda thought about the last solution in my head. There's an important drawback to this solution that ultimately makes it *O(n)* space complexity. This is because we need a bit to store that the element has been previously visited. In your case, well two's complement complicates things, but for simplicity's sake assume it has a bit for the sign.
Originally, I also did this for unsigned integers, and as a way to store the state: I simply added the size of the array to the array and made subtraction/modulo/division to evaluate the information. Both these methods require the maximum number of n (array size) to be less than half of the maximum value the container could hold. If it is a 32 bit int, then it has to be signed or (unsigned/2).
2^31 (signed way) = (2^32)/2 (unsigned way)
Thus, there is a stipulation that n cannot exceed the half-value of the element's container making it O(1) for ∀n < (N/2) where N is max number of states where each container's element could hold. Conversely, if we redefined this N to be the size of the array; it would lead to assigning an extra bit for each element. The cost of this space just by itself is logarithmic but the key takeaway is that it has the potential to double the N. The calculation when considering this potential is:
total space:
*O(n)* = O(N) + O(N) Note: I did not add the logarithmic O (log32(n) for 32-bit container) because it would make this recursive.
Δspace (removing original array):
*O(n)* = O(N) + O(1)
since N (max size of element container) = n (array size)
*O(n)* = O(n)
Nick you explained this so clearly step by step asking us to notice the minute details that the elements in array are between 1 to arr.length , then making the use of that point . Great 🤘
Why is the line height of your comment so much shorter than the rest of the comments?
Mind blown, feeling liberated, some inception level shit. Hats off for clear-cut solution
You can solve it in linear time without extra space and without mutating the array with Floyd's tortoise and hare algorithm.
Floyd's wouldn't work for input [2, 1, 3, 3, 5, 2]
@@sfontesv Yeah Itd find the 2nd duplicate, my bad. Was thinking any duplicate.
A small comment on 9:25
Being O(N^2) doesnt mean that it takes you N^2 time to do it. It means that your growth rate is quadratic. The time taken could literally be anything but the point here is, if you double the input size, the time taken (whatever it was, it could be a 0.000001 ms) will not double it will 2^2 = 4 aka quadruple
Quadruple the input size and the time taken will be the time it took for the original times 16
I went through a year or 2 of being taught this in college and never once understood, and it took not even 5 minutes to explain it in a way this easy to understand.
I've seen a few responses saying to just sort the array. Here's why you can't.
The problem requires you to find the 'first duplicate' in the array.
If you have arr = [3, 3, 1, 1, 1, 1]. you can obviously tell that the first duplicate is 3. If you sort the array [1, 1, 1, 1, 3, 3], the first duplicate is now 1. The problem doesn't want to know if there's ANY duplicate. It specifically wants to know which set of numbers in the original array produces a duplicate first.
Sorting the numbers completely disregards that requirement.
i really liked this way of explaining through drawing.Continue this style for leetcode problems as well.
Went form an external (hash) to an internal (1 to N values) mapping. Nice.
I think that in the first solution it would've been more optimal if the second pointer would loops only until the min_second_index, so i+1 < j < min_second_index , then if (a[i] == a[j] ) { min_second_index = j }
You're right, good observation
I would argue that making the numbers negative is actually using O(n) space. There's a much more intuitive solution that doesn't require modifying the elements -- only swapping them. Go through the array and put each item "in place". Unless it's already in place, if you try to put it in place and there's already the right element there, that's the answer.
Below solution is similar but little different from concept perspective and easier to understand: Since the input list cannot have negative values (because it is mentioned the elements have to be greater than 1), keep on making the elements -ve as you move through the loop and if you already find a -ve of current element, you return it. (Below sol in python)
def first_duplicate(in_list):
for i in range(len(in_list)):
if -in_list[i] in in_list:
return in_list[i]
else:
in_list[i] = -in_list[i]
return -1
You will not get around O(n) space complexity in either the second or third case, making the algorithm and analysis in the third case incorrect.
If you program this in a strongly-typed language then the integers will be either signed or unsigned, so it could happen that you won't actually have the space to make a number negative (e.g. if you have unsigned 8 bit ints and an array that fits >127 integers, which is a valid input here). In that case you need to expand the data type into a bigger size (in this case 16 bit) and that would mean tetta(n) space complexity, if you just expand every integer. If you use a non-strongly type programming language it will probably expand it internally, leading to O(n) space complexity and if you don't consider this the algorithm (or your analysis) is incorrect.
In summary, you still need space to save the sign, and that would lead to incorrect results if the space is just not there (see given example).
The reason they give you an array with entries smaller/equal than the array size is for the exact reason that you don't have to hash the numbers to begin with, a HashSet is overkill. They probably just want you to create an array with size n where you use the number as index to check whether you encountered the number first, which again is tetta(n) space complexity.
I just made a comment of mine on this. Should have gone through the comments section before typing it.
The last one is called Negative marking. This way, you integrated the Seen operation, with index. However, you do not need to deduct one. Your elements are [1, n] and you have n+1 elements. So, whenever you go through the list, just mark negative, the element at such index. So, - nums[nums[idx]]; later if you reach to a duplicate, since you're trying to make nums[nums[idx]] negative, and it's already negative, you will realize, that you met such number before.
Before proceeding with third solution its worth asking whether or not you are allowed to change input array.
The simplest and cleanest solution with regards to the constraints (if we're unsure whether the original array is allowed to have values that violate the initial constraints) is either using an n-length array of booleans with k-1 as index or a bitset of length n to save even more space.
I like the idea of using an intrinsic component of numbers like negatives to store boolean values to basically treat the data structure you already have as a hashset (without the modulo and collision detection)
My idea is similar to the 3rd method - but using bitwise checks instead. Initialize an integer with binary length equal to the array length, and use each bit to represent the "seen" status of that position. E.g. 1000 = Only 1 of 4 has been seen; 0110 = 2 and 3 of 4 have been seen. When reading through the array, check the bit at that position. If 1, that's your first duplicate, else flip the bit to 1. Space required would be number of bits equal to length of list, not sure how that would compare to a hashset.
I was able to come up with the first and the second solution by myself but the third one simply blew my mind! Thank you so much! :D
Man keep it up. This is the single most useful content for tech professionals!
for anyone unable to get the third answer...nick has another video where a similar question is solved with this exact apporach. Check for return duplicates in integer array. There we return all the duplicates and here we return the first one to be found. The third approach is really beautiful.
The last approach of solving the problem in o(n) is similar of finding the smallest positive missing number in unsorted array.( The only difference is that after looping all element and marking number -ve, we have to return the very first index+1 where positive number is found, so here we find first positive number 5 which is at index 3 so 3+1= 4 is smallest positive missing number in unsorted array.)
It's same in terms of time, but it's not in terms of space. It's not using any extra space.
I just want to say that I love your videos. I'm preparing for coding interviews now and this is so helpful. I'm going to buy your merch to support. =)
Yea basically, since the values of the array are between 1 to n, means that every value is a valid index if you subtract 1 from it, and if there are duplicates, they are always gonna reference the same index, it's just a matter of figuring out how to track which index was already referenced in constant space.
The thumbnail gave the third solution away!
Also, I assumed the array would be unsigned because negative numbers don't make sense in it, and modifying your inputs is generally a bad practice. Plus, because of CPU architecture details (branch prediction and cache size) I suspect a solution using a bitset would actually be faster.
I was also thinking about why we hadn't used the restriction of 1
The last solution was amazing
and not made by him
Nobody is arguing that he came up with the solution. He is just helping us out to understand it. Noble cause.
@@PoulJulle-wb9iu why so bitter
@@ralexand56 lol
In a real project things like the 3rd solution is what adds terrible terrible bugs to the code base... Either it's to complicated that other devs won't understand it and slowly break it. Or it's straight up buggy, e.g. what if the caller of that function didn't expect that the array would get modified? Also it doesn't even necessarily save memory, since you couldn't use a data type that stores only positive integers in the array. So that bit was wasted there instead of in a proper bitset.
The 2.5 solution is to use an array of boolean. If the boolean is already true at the "value-1" index, it's done!
In some languages, it's expensive (java) unless you use BitSet (map you nth bit to the correct bit in bytes).
In other languages, you can use pointer arithmetic and some bit twiddling! (Doable in Java too, but why stray away from existing solution ?)
Gonna be honest, I thought the fact that the array couldn't contain a value larger than it's indexes was a red herring. I came up with the second solution (the first one seemed unnecessarily complex and I would never have thought of it) on the spot. But that third solution is really clever. The only possible downside is that you may need to go back over the array to convert the negatives back to positive (but given the scope of the problem presented that wouldn't matter at all).
we dont need to use contain method to check if element present in list ... add method itself give true and false for element added successfully and not respectively ..when we find first false for add method we can skip remaining element
Constant time is not the same as no time. It means the time to access an element does not increase as more items are added, but it will always take non-zero time.
This third solution is not as good as you may think. Either the caller has to be okay with the function mutating the array (which is rarely the case) or you have to copy it (but then you lose the only benefit of this approach over just using a hashset). Basically, it feels hacky and not useful.
I think a good compromise would be to use the same principle the solution is using, and use a BitSet instead. It would still have O(N) space complexity, but it would use orders of magnitude less space than the set (depending on sizeof(int))
i came up with something similar to the second. with my current level of knowledge there's no way i would have thought of the third in any reasonable amount of time, and even after figuring out how it works i feel like i'd still have trouble actually implementing it. i looked at the solutions and even after writing the process on paper i didn't understand how/why it worked until i saw this video, so thanks. i gotta get on the grind to get familiar with the patterns and nuances...
Don't use a hash set when the problem clearly states that the input array has values greater than 0 and not greater than the length of the array.
Just use an N+1 array of booleans to keep track of whether or not the value n at index n has already been seen or not and iterate from left to right.
Index 0 in found doesn't mean anything in this solution, but it keeps it clean.
def firstDuplicate(arr):
found = [False for i in range(len(arr)
+1)]
for n in arr:
if found[n]: return n
found[n] = True
return -1
Isn’t the first solution n log n instead of n^2? The distance the second pointer needs to travel gets smaller with every movement of the first pointer. The second pointer isn’t traversing the entire list every time.
The pointer travels N - 1 indexes at first, then N - 2, then N - 3...
So it traverses (N - 1)+(N - 2)+(N - 3)+...+1 indexes, which adds to (N^2 - N)/2, giving us O(N^2).
Edit: I wrote multiplication of terms instead of sum of terms, thank you @Petar Jovovic for pointing it out
U normally see logs in algorithms that take a divide and conquer approach, such as merge sort. Since u are cutting your input in half with every iteration, the time complexity involves log. Its like the opposite of n^2 where u double the iterations required.
@@hazaelbatista3241 You mean (n-1) + (n-2)+.....1 right?
@@petarjovovic308 Wow, nice catch, I was kinda sleepy and made that huge mess :(
Will edit, thanks!
since numbers are limited by size of the array, I would do a second array of true/false values with the same length as the input array.
storage_arr[input_arr[i] - 1] = true
I mean it's not as good as your final solution, but it only adds a tiny bit of spatial complexity and should be just as fast.
Third solution is brilliant, but I'm not sure that would work in many languages as arrays are almost always pass by reference, which means you're really screwing up someone's array, so in order to unscrew it you'd have to go back and make everything positive again. Now, if they said it was some data structure like an arraylist or something, that'd be different because you can pass the list object by value in many languages.
really digging this style, very thorough but nicely presented
Bro your way of explaining logic is really cool. :)
I'm not really a programmer, but I was proud I came up with the exact second answer pretty quick!
you came up with a hashset nto being a coder? then you are in math, but whatever stupid
Bullshit
@@PoulJulle-wb9iu u dont need much experience to know hashsets. After just 1 or 2 weeks of learning programmation u can come up with this
@@falseee4445 no shit tard
@@PoulJulle-wb9iu ?
Here's a number-theoretic algorithm for this problem (the algorithm is not very efficient and requires a pre-computed resource but I think it's an interesting one):
[1] Let P(x) be a function that returns the x-th prime number. This function can be pre-computed. i.e. The dictionary with entries (x, P(x)) can be pre-computed.
[2] Let Product = 1. Let 'Array' be the array of numbers.
[3] Product = Product * P(Array[0]+1); // if Array[0] = x, multiply 'Product' with the P(x+1) which is the (x+1)-th prime number.
[4] LOOP (for i from 1 to n-1):
[4a] if (Product % P(Array[i]+1)) != 0: Product = Product * P(Array[i]+1);
[4b] If (Product % P(Array[i]+1)) == 0: return Array[i]; // If Product is divisible by P(Array[i]+1), then Product already has P(Array[i]+1) as a factor which means Array[i] has been encounter before.
[5] return -1;
For my solution I went through the array and tried to swap values into their proper place (index = value - 1) one by one, always holding onto the number we displaced and attempting to swap that number into its proper place. If that place already has the proper value, the swap is not completed and we know we have a duplicate. It does use a temporary variable for the swapping though, so that's 0(1) extra space. The data values aren't changed though which might be helpful, idk. You do need to make sure you're skipping over numbers that are already in their spot at the start.
Your solution probably breaks down for: [5, 1, 1, 6, 6, 6] where your algorithm would return 6 when it should be 1?
That's only for mutable array. If the language doesn't allow changing input parameter, it might be better to create a new array for the marking and checking.
it;s like you're using the array itself as a boolean array of flags to each integer, having a duplicate or not. and that's okay because you can clean up by abs() through the array
I almost found the last solution by myself. I didn't think of using the fact that they were integers and swaped the numbers to their indexes, but only swapping to indexes smaller than the current index, while checking each time if the number you're about to swap isn't the same as the current one, if it is return it.
in retrospect it may be the best solution for the same problem with unsigned integers
You could further improve the last solution by typing *=-1, instead looking ot up again
havent watch the video yet, but could be it possible to use a for loop to scan through the entire array and while scanning look for repeated values, then save the indexes of those values into a new array then compare index to see which index is lower and let that be the final answer. for example [2,1,3,5,3,2] run the loop, int [] indexOfRepeated is [4, 5] , if (4
Brilliant! The third solution is essentially the second solution except you are creating the hash table. The key is the value in the array (n), the hash is n - 1, and the value is True or False (-/+).
Array slicing may come in handy for this problem. I agree the third solution is clever, but it’s not straightforward. Why not iterate through the list and for each number see if it exists in the subset of the list up to the previous element?
LOVE your latest videos where you start drawing and explaining so much better! You're the BEST !!! Thank you
I really like your approach of explaining with 3 different methods
I did come up with 3rd solution myself with 15 minutes of thinking but the point is, just because you somehow saw though an unintuitive solution once doesn't mean you will always be able to.
Sometimes you feel smart, sometimes you feel frustrated.
Interesting 3rd approach, but in swift a function parameter cant be changed as its a constant. So I would have to create it as a new variable and therefore uses extra space. I could have a &Inout solution, but then it would change the reference element and I’m not sure that this would improve the 2nd solution. But it’s a good conversation one can bring up in an interview.
Hey Nick,
In the second solution, when you use seen.contains() method, doesn't this method too uses a loop to determine that..?
So, time complexity should be O(n²), since seen.contains() is already inside a loop.
Set contains() is O(1), ArrayList contains() is O(n²)
the last solution is easy if you have seen the solution. Basically, to make that short to understand is we are hacking the array to be the hash table itself. The index mapping is just a hash set itself.
i spent around 30 min before solving using bruteforce approach, after watching this video,,,Nick is legend
Clever! My initial instinct was to create a bool array (similar to the hashset solution, except I have more control over the space it uses) but I didn't think about using the extra bit they leave us by specifying that they use only half of the signed int space.
I had the same idea, solution 3 is pretty clever though. It basically fakes a bool array by using negation.
Doesn't the second solution still have complexity n^2? Becuase the contains() function must iterate through all the elements in the seen array. Over the length of the original array this would result in n^2 / 2 element comparisons. Only the third solution is actually linear
The contains() method of a hashset is O(1) in average case since it determines the position with the hashCode Method. However the collision treatment is treated with a tree set(according to internet) so in worst case its O(log(n)), which makes the whole thing
O(nlog(n))
The third solution makes a lot of sense if you think of the array as a function mapping whole numbers to integers. In this case, the domain and range of the function are basically the same. You want to prove that the function is not one-to-one.
In the brute force approach, we should just return the element, the time we find a duplicate why let the loop run till n^2
Worst case scenario where you have n^2 calculation and there are no duplicates, or duplicates are at the end of array
Could you please let me know what you are using to illustrate the algorithm using some arrows on board??
Photoshop
Great walk through; I am wondering if mutability of the array is permitted? If it is the third solution is for sure a great one; but if it's not or you need to do "clean up" afterwards the Hashset is probably your best bet.
The second solution is nice. More readable and enough fast.
i wish that hed post more of these "coding interview question" videos and add them as a series and/or playlist.
Really good videos.
I was thinking about this... Was trying to think of a way to use counting sort on a fresh array, but figured that the -ve trick might work given all numbers positive. Only question would be whether my method can mutate the array or not - if it should be read only we'll need O(n) space after all.
You could simplify the second solution a little bit by just doing this inside the for loop:
if(!seen.add(a[i])) return a[i];
Set in java returns false if it already contains the value.
I'm writing this without seeing the answer but it could be solved fairly easily.
When we're going through the elements of the array, we push THE FIRST element we find in the given array to a STACK OR A VECTOR. Now for the next elements, we first check if the element exists in the vector already, if it does, then it's the answer. If it doesn't, we push it to the vector and continue doing it for all the elements.
If no such duplicate is found in the vector, we can say there's no duplicate.
This is a fairly efficient algorithm isn't it?
If it's a stack or simple vector, you'll have to iterate through each value every time you come across a new element in the array.
On the second array element, you'll have to search one stack element. On the 27th array element, you'll have to search through 26 stack elements.
Size of array : stack searches.
2 : 1
3 : 1 + 2
4: 1 + 2 + 3
5: 1 + 2 + 3 + 4
6: 5-array + 5
7: 6-array + 6
This is a recursive addition problem. Its time complexity is (n^2 + n) / 2. Therefore, for sufficiently large n, it's n^2 in time complexity.
With a binary search tree and an array of truly random numbers, your BEST time complexity is log base 2 of n since a binary search tree, ideally, eliminates half the possibilities with each comparison: see radix search. This assumes a completely balanced tree and not a naive one which basically looks like a queue.
You'd still need to search this tree n times for any array size n. Thus, you'd end up with n * log base 2 n time complexity (nlogn) in your best scenario.
True, the third solution does use the least memory, but the code is not self explanatory and not futureproof. What is for some reason the code needs to process negative numbers also? I in my code would not use #3
First thing that came into my mind was indeed a hashmap or hashset. Never ever would had thought about that thirt solution. Actually cracked my brain a little.
Dude your explanation is fantastic
You can do it without calling any math libraries with basic integer arrays too (this code is in C# but very easy to translate into Java). Passes all tests.
int[] duplicates = new int[a.Length];
for (int i = 0; i 1)
{
return a[i];
}
}
return -1;
Explanation of the 3rd solution: Basically we are putting a garbage value in the index location. If the index already has a garbage, this means that it is the desired location. We use -ve number instead of a random garbage value because we don't want to lose the original value which will act as an index.
While reading the problem statement I was like, there’s got to be some hint in why the values are always between 1 and length of the array. The third solution blew my mind!
One could come up with some sort of optimisation in that brute force approach making the observation that the program should terminate if i>=min_second_index
When I heard the explanation of the problem I tried it myself in JavaScript. I found it extremely easy even though I just started learning about time complexity and algorithms 4 days ago. I used javascript Objects for this because they’re basically hash tables in disguise. So I added keys to the object with the value of true. I called the current iteration of the array from object, if it returned true, I would return the duplicate value and the index, if not, I would add the key to the object with a value of true.
Here’s the code:
//Finding duplicate
async function findFirstDuplicate(array) {
const values = {}
for(i=0;i
I did some research and apparently you can also use javascript Map and Set because, like objects, they also retrieve data at a time complexity of O(1).
Omg I just saw his last solution and it’s genius!
So I'm really curious if you get these kind of questions during like the on-site interviews? Like these easy level questions seem really easy while the medium ones seem much harder.
The third solution is cool, though I expected something greater. Like something linear without saving the array, maybe with the usage of mod and div division, I dunno.
what you talking bout lol?
@@PoulJulle-wb9iu He is talking about not modifying the original array, sometimes the input is not mutable.
But isn't the .contains() an O(log n) operation? As of java 8 the hashset is backed by a tree structure; traversing it to find whether the seen list contains your int value will add time complexity. So it might actually be a O(n log n) time complexity.
If what you're saying is true, a simple fix would be to use a regular array, A[ ], of the same size with all values initialized as 0. For each value, n, you come across in the original array, check the value at index A[n-1]. If it's 0, store n. If it's not zero, return n.
Seeing that we are searching for the first duplicate why do you need a minimum index? You just return it as soon as you find it and outside the for loops you return -1
Just looking at the problem my first thought was to use a simple array indexed from 0 to 9. Ofcourse it won't work if the values were not single digit or if they were floats. In that case I would probably use a hashset first.
That’s amazing problem solving! One question, won’t the array be mutated to have negative values after you’re done? So, if you were to run the check again, won’t your answer be wrong because some of the values will already be negative? Only way I could think the solve for that is to create a duplicate of the array in memory, but then your space complexity goes up to O(2n), right? Sorry if I’m way off base here, I don’t actually know Java that well…
I actually had this question asked in one of my phone screen interviews. I did the double for loop as base solution, then did the hash solution, then did the third one as sorting the array first then traverse once to find next same element (NlogN but does not take space). Then the interviews pushes for a best solution which I came up with the third solution as shown in the video. Though at the end I forgot to minus one with the index thing starting at zero, so some test cases were wrong. I did point it out but I ran out of time. I thought that was good enough since I found all four solutions with the last one just having a hiccups at the end as not having enough time to debug and fix, but I got the rejection nonetheless.
Wouldn't sorting it not necessarily return the right answer? In his example, if you sorted it you'd return the Lowest duplicate (2), but not the First duplicate (3)
@@QualifiedUmbrella Yes, sorry to clarify. In my particular problem, the version was mentioned as only one duplicate.
my first solution is a set < int > duplicate
and for each element in array ask if value is in set, if answare is true, this is the solution, else insert the value
Cool! I basically came up with the second solution on my own but I didn't know what hash sets are so I used a boolean array instead. I'll have to look into hash sets...
"Constant time means it is not gonna take any time at all" LMAO 😂🤣 10:53
I mean, relatively speaking
I wonder if the second solution is really that linear, because in fact you still have to loop through the seen numbers.
But I don't much about Java.
thats not about java, but about hashset, its same in every lang
Shouldn't we be able to exit the loop if i==min_second_index ? Seems unnecessary to continue looping, unless I'm missing something (it's late and I'm tired :P )
edit: oh, video's not over... there's more, lol
Love your last solution. Very clever.
So according to the internet HashSets contains() method actually has a time complexity of O(log(n)) because collision is treated with a tree set or something.
Lets say mutating the array is permitted. If instead of using a hashset we could simply create a 2nd boolean array the size of array1.length. Then we could check
(for value in array1)
if(array2[value-1]==true)
Return value;
else array[value-1]=true;
We would still have a 2nd array as big as the 1st array but the time complexity would be O(n) right?
Somebody pls correct me if I'm wrong
How is that last solution not using extra space? It uses N signs, but it cheats by using the sign bits of the given array. If the problem asked to simply return one of the numbers that appears more than once, there is a legitimate O(N) time, O(1) memory solution, but that one is really really tricky.
Wouldn't they want you to make the array back into positive integers before you returned your result? You could mess something up elsewhere changing the numbers like that.
Thank you NIck, this makes way more sense than the tortise and hair method!!!
Before watching the solutions I would make a new array and basically make a copy array. As I run through the first array I add the values to the new array. I check each value to see if it's already in the array, if it's in the array it's the first duplicate.
That's basically what the second solution was. It's a decent solution, but you're adding space complexity to the algorithm because you're creating a new data structure to keep track of the values from the initial array. The third solution doesn't take up additional memory and instead just modifies the original array values in place.
@@TJHooper123 is the space added because of the array the same amount of space added with the data structure used in the video
DragX Gaming I don’t think so? Different data structures take up different amount of memory. A single character takes up less bytes than a hash table.