for those wondering why we can't just use a single digit number to delimit and track the length of the following word, what if word length >9 and takes up double digits! (maybe only my slow brain was confused about this :) )
Great point. Since the constraints are word length < 200, you can also convert the size to a char (ascii 1-256 , greater than 200) then convert that back to an int when decoding
Some people say that LeetCode problems are "useless exercises that real coders never use in their day-to-day job", but, honestly, this one seems quite ueseful. Yes, we are probably never going to be implementing encoding\decoding ourselves, but knowing how it works is a great thing, I think.
great solution! just a note, the `encode` method is O(n**2) because string concatenation copies the entire string, so the result string is being copied over and over n times. To make it O(n), we can use a list to mock a string builder. Each element of the list will be the delimiter and the word. At the end of the method we will join the list into a string and return it: def encode(self, strs) -> str: # O(n) return ''.join( f'{len(string)}#{string}' for string in strs ) in the code instead of using a list we can just use a generator of the encoded strings
@@skyhappy in Python single quotes and double quotes are interchangeable. The for loop inside the join is a generator and comes from list comprehension and functional programming
@@skyhappy There is shortcut for the list creation (initialization, for loop then append) called a generator in python. return ''.join([ f'{len(st)}:{st}' for st in strs]) is the same as: res2 = [] for st in strs: res2.append(f'{len(st)}:{st}') res2 = ''.join(res2) return res2 the OP just forgot to type the square brackets and spaced out the generator syntax into mulltiple lines
This question is weird, but a good alternate solution would be to just store the comma separated length of items at the start of the string before a delimiter. Could also use base 16 numbers to save storage space.
Yeah, did this my first try. Kinda prefer it because it works as an index. If needed, it would be possible to jump to word N based on the sizes in the beginning
Just did it on my first time and found it weird as well. I used dash '-' as a separator and worked, didn't even keep track of anything, just a single for loop and a split.
Pretty damn good. If you think about it that is kind of how UTF8 encoding works. The byte representations always *start* with a particular pattern indicating what is the first byte of a representation. If a character is represented by 4 bytes than the the other 3 also have a distinct pattern to show they belong to the first byte together.
It won't cause an issue because ["12#", "hello"] would be encoded as "3#12#5#hello". The decode algorithm would take the 3 characters after the first "#" and then the 5 characters after the third "#", decoding it back to ["12#", "hello"]. You can add this as a test case to verify.
Just elaborating on what Greg A said but with case [ '12#hello', 'abc', 'weird4#'] because I had the same question but with slightly different words: original words: [ '12#hello', 'abc', 'weird4#'] encoded string : '8#12#hello3#abc7#weird4#' Decoding algorithm: First reads 8# and then takes in the next 8 chars as first word, add word to decoded set Decoded: ['12#hello'] Then reads 3# and then takes in next 3 chars as second word, add to decoded set Decoded: ['12#hello', 'abc'] Then reads 7# and then takes in next 7 chars as third word, add to decoded set Decoded: ['12#hello', 'abc', 'weird4#'] These are the notes I wrote down to clarify things for myself Just a sanity check to confirm having number followed by # in the original words is fine :D
@@misterimpulsismThank you for explaining this. I was thinking the same thing as the original commenter. But it appears these two algorithms in conjunction with one another are fool proof
I was under the impression at first that we could pass in non-encoded strings. In that case, it would break the decode. But that doesn't seem to be the case here.
No it wouldn't because the first 2 characters would be #. Then whem we are finish reading...we are guaranteed to be at a place to read the next #. So even if we have 3#, we will be in a reading state at that point
So I'm a little confused by this approach. Why would you go through the effort of saving the variable length in a number instead of just separating the words with a unique delimeter then using like ''.join(s for s in strs) and s.split('')?
Another approach is we put "#" between every word and then another "#" if we have "#" inside the word. When we're decoding, if we have a single "#", it means it separates the word, however if there's another "#" next to it, it means it was inside a word.
@@benjaminberlin that would turn into 1) neat#co####de which can be read as neat then co##de 2) neat#code## which can be read as neat then code# as well
I'm sure that on the actual interview this would not be a valid solution (or maybe?), but we can split / join by some non-printable ascii character, 0x01 is great class Solution: def encode(self, strs): return '\x01'.join(strs) def decode(self, string): return string.split('\x01')
Here's my solution I implemented before looking at Neetcode's coding part of the video, and just understanding his logic for implementing the algorithm (same TC, just differently written; but slightly worse SC as we store the string): def decode(self, str): i = 0 output = [] while i < len(str): currStrLen = "" currStr = "" while str[i] != "#": currStrLen+=str[i] i+=1 j = int(currStrLen) i+=1 while j != 0: currStr+=str[i] i+=1 j-=1 output.append(currStr) return output
Doesn't this solution depend on none of the strings containing multiple sequences of digit#? e.g. 3#3#3#3#? in which case you wouldn't know which is the delimeter.
No, your string is invalid anyway, you wouldn’t get it by using your encoding method. Remember there’s a reason why you are specifying the length though.
That’s why you are saying the length of the word, even if the word is tu3#tle it doesn’t matter because before it you’d already know the length of that word (having before the word a 5#), so you wouldn’t check the content of that word. You’d just jump into the next delimiter.
Waiting for your videos daily❤️ Can you also make a video on rat in a maze (but it is not on leetcode) as it also frequently asked question? Edited : After seeing your some of the backtracking videos I’ve solved that problem.
Done thanks! Start off the encoded string with an integer that represents the length of first word then a dilemeter. The delimeter is needed here at the beginning and after each word’s length because the length of a word can be > 9 and would need 2 or 3 digits. Example using # as deleimeter Input [“hi”, “testing”] Output “2#hi7#testing” This output string can then be parsed back as list
I did another solution. Instead of using the length of a character and a #, the length of the character can be converted to a single character as the character code. Signature of that function is (number) => character. In Javascript, we have the function String.fromCharCode(string.length). Saves on size
You could use instantaneous codes at the beginning of the encoding to write all the string widths, an easy example is unary code but you could use a Gamma or Delta to be more advanced and use less space. So the final encoding would be You don't need delimiters as the initial numbers are uniquely decodable. It's similar to your solution and maybe overkill, but you can do it.
Shouldn't space complexity be O(n) since we are building up our encoded string that is dependent on the input size linearly. Same thing with decode but we're building up an array linearly dependent on the input size. Smaller input size means a smaller array we're building up so we use less space.
Java version : public class Solution { /* * @param strs: a list of strings * @return: encodes a list of strings to a single string. */ public static String encode(List strs) { StringBuilder result = new StringBuilder(); for(String s : strs){ result.append(s.length()+"#"+s); } return result.toString(); } /* * @param str: A string * @return: dcodes a single string to a list of strings */ public static List decode(String str) { // Write your code here List ans = new ArrayList(); StringBuilder item = new StringBuilder(); int i = 0; int j; while (i < str.length()) { j = i; while(str.charAt(j) != '#'){ j++; } int wordLength = Integer.valueOf(str.substring(i,j)); int startIndex = j+1; // skipping number+# characters length String subStr = str.substring(startIndex,wordLength+startIndex); ans.add(subStr); i = wordLength+startIndex; } return ans; } }
I originally thought that the fixed-length md5(s) hash could be used as the delimiter, because the result of the hash and the possibility of repeated characters in srts are almost zero.
Yes, string concatenation is O(n) which makes the functions O(n^2). You should use a StringBuilder or a list in the loop, and build the string at the end.
why cant we store the index (where the 2 strings are to be seperated) inside an array. for example ["good","morning","neetcode"] after encodeing=>["goodmorningneetcode'] and [3,6,7] while decoding we keep a track of current index at i==3 , we insert d into the current string , push it inside the array , create a new empty string set index to 0 again in the next iteration i==0 , charecter m will be pushed in the earlier created empty string , this push back of charecters will continue untill i reaches 6 and then we push back string to array.
This would be a great solution if you could save an array anywhere, but the problem requires that the decode and encode functions work *without* sending any information to each other. So you can't save an array or any kind of information anywhere outside of the functions themselves.
I put in a similar solution but i put the length of the word after the delimiter. But I somehow run into an indexing error. Why is that? class Solution: """ @param: strs: a list of strings @return: encodes a list of strings to a single string. """ def encode(self, strs): # write your code here code = "" for s in strs: code += ":" + str(len(s)) + s return code """ @param: str: A string @return: dcodes a single string to a list of strings """ def decode(self, s): # write your code here res, i = [], 0 while i < len(s): j = i while s[j] != ":": j += 1 length = int(s[j+1]) res.append(s[j+2:j+2+length]) i = j + 2 + length return res
Hey thank you very much for your content! which is the time complexity for your encoder example? I came out with something like O(M.N) M being the number of words within encoded string and N being the average length of the string. is this correct?
@@JeffRagusa yes of course, endcoded string will be : ///$//$/$ And decoding will be done by escaping by / So first // will reduce to / Then /$ => $ Then // => / Then $ means breakpoint. string found is "/$/" Then /$ => $ End of string so second string is "$" It is well proven solution developed ages ago. Computer networking algorithm uses it at very core of its algorithm for byte stuffing and packets fragmentation and all.
Really interesting solution, my first thought was to use ordinals to map all the letters to their ordinal, then split each letter by a delimeter and each word by a delimeter, but this is a much faster solution lol.
How about encoding each string to base64 (that has no commas, for example) and using comma to separate, then using Split method (C#), then decoding all? That was my solution to it, and in my case looked really simple. IDK about the big O complexity though.
I'm confused. Why is this a medium? You can encode with "".join(strs) and encode with str.split(""). Even if you need to do this manually, unless I'm missing something about the time complexity or other constraints, this doesn't seem anywhere near a medium. NeetCode also mentioned there being info in the description about constraints an extra memory usage (he said "stateless") but I didn't see anything about that on LintCode. What am I missing?
i agree, your confusion is actually my thought too. So, i try to google this question's original description in LeeCode the notes say: - The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters. - Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless. I think these can help you~
Using join and split won't work in the general case because you'd need to pick a delimiter. For any delimiter you pick, there are inputs that break it.
A simple delimiter won't work, because a string that contains the delimiter will make the decoder fail. The difficulty is in coming up with a sophisticated enough pattern that *any* string will be encoded and decoded correctly.
If the sting length exceeds 9, without a delimiter you won't be able to tell if the second integer within the len of the string is part of the initial string or not
@@ayariahmed376 Nothing to do with my comment. My comment is about the string having number or not affecting the encoding and decoding process with this method.
@@howardlam6181 If the sting length exceeds 9, without a delimiter you won't be able to tell if the second integer within the len of the string is part of the initial string or not. It DOES have to do with your comment, now think
you would append the length of the string + # so the encoded string would be 9#abc15#df1 and when you decode the string you would know to read 9 characters passed the #
this Solution provides the linear time: def encode(self, strs: List[str]) -> str: if not strs: return "" result = [] for s in strs: result.append(str(len(s)) + "#" + s) return "".join(result)
How do we get the length of the string, if j stops at the delimiter?. Can somebody explain this line: length = int(str[i:j]), on how to compute the length. Thanks in advance.
A backtick ` delimiter works. Pretty easy: class Solution: def encode(self, strs: List[str]) -> str: if not strs: return [] if len(strs) == 1 and not strs[0]: return "`" x = "`".join(strs) print(x) return x def decode(self, s: str) -> List[str]: if not s: return [] elif s == "`": return [""] else: return s.split('`')
IMO it's much more intuitive to just build an "index" or "table of contents" if you will and append that to the end of the joined string. This is similar to other patterns like password hashing with a salt, etc. You could actually implement that here to guarantee the delimiter is not included in any substring but you'd have to pass a unique salt to the decoder somehow. JS example, O(n) for both encoder and decoder: // * encoder: function encode(arr) { const index = []; let string = '' for (el in arr) { index.push(arr[el].length) string += arr[el] } return string + ':index:' + JSON.stringify(index) } // * decoder: function decode(encoded) { const [string, indexString] = encoded.split(':index:') const index = JSON.parse(indexString) const charArr = [...string] const result = [] let currentWord = '' for (let i = index.length - 1; i >= 0; i--) { for (let j = index[i]; j > 0; j--) { if (currentWord.length < index[i]) { currentWord = charArr.pop() + currentWord } } result.unshift(currentWord) currentWord = '' } return result }
What if a string also has that [digit#] pattern? My solution doesn't depend on pattern of input strings. Example: ["a", "bb", "ccc"] => "1,2,3|abbccc".
Doesn't matter, if the word was "abc4#abc", when the beginning of the string is reached (8#) it will append the next 8 characters regardless of their content. The pointers also skip over the word once they're finished appending it to the answer list, so the content of the string (4# in abc4#abc) will never be seen.
Suppose [4#xx] this is your word => encoded 4#4#xx => now on decode you see 4# that means it is guaranteed that your next four char is your actual char. So you simply store without bothering any #symbol by that time.
Hey dude, could you please link to the lintcode URL for this problem on your website? It currently links to the leetcode one which requires premium to view.
I dont see why the hash is necessary. Eg. for an array ["he55o", "fri3ndly", "per2on"], with just integers as delimiters denoting the length of the upcoming string, we would have 5he5506fri3ndly5per2on. If the first integer (5) is correct, this would then let us break up the "he55o" string correctly while ignoring the other integers (55) - we simply count 5 chars along after the first "5", which takes us to the "o" of he55o. Then the next integer (6) denotes the length of the following string (fri3ndly). Am I missing an edge case here? I just cant see what advantage the hash actually provides.
class Solution { public String encode(List strs) { // write your code here
StringBuilder sb=new StringBuilder(); for(String s:strs){ sb.append(s.length()); sb.append("#"); sb.append(s); } return sb.toString(); } /* * @param str: A string * @return: decodes a single string to a list of strings */ public List decode(String str) { // write your code here List list = new ArrayList(); int i=0; while(i
Yes any character would work. Only thing is while encoding u hv to use the same character and have the len(string) before it. And while decoding also check for the same character
Not the ideal solution but a cheat code, you can actually just do convert the whole list to string by str(strs) and then get the list back by doing eval(s)
C# version public class Codec { // Encodes a list of strings to a single string. public string encode(IList strs) { StringBuilder builder = new StringBuilder(); foreach (string s in strs) { builder.Append(s.Length + "#" + s); } return builder.ToString(); } // Decodes a single string to a list of strings. public IList decode(string s) { IList lst = new List(); int i = 0; int j = 0; while (i < s.Length) { j = i; while (s[j] != '$') { j = j + 1; } int idx = s.IndexOf('#', i); int length = Convert.ToInt32(s.Substring(i, idx - i)); lst.Add(s.Substring(idx + 1, length)); i = idx + length + 1; } return lst; } }
Not sure if this has been mentioned before, but i think your solution here still has an issue where if the string includes whatever your hashing key is (eg. "123#"), then we'll still have an issue. to account for this, rather than placing the key in front of every word, you could instead just construct the string with the counts at the start of the string (eg. "1,2,3::word1word2") and split based on your delimiter (in this case it'd be the "::"). Then you know that your first instance is always going to be your word lengths no matter what's actually in the strings.
It will not cause an issue as the hashing key always comes before the content of the string. Once a string it found, it appends the content of the string to the answer list and does not read it
This is completely wrong. Because for each string we are just reading x # of characters, where x is the number before #. Even if the exact same "hashing key" is included in the string, it doesn't matter, because at that point we are just reading x # of characters.
Why not use a unique delimiter string that is mathematically unlikely to appear in the string? Such as a random number with 32 digits. You could just split on that delimiter string and the answer would be much simpler.
I don't see this one as a medium problem.. actually it doesn't seem related to A&DS. It's some kind of basic serialization, aka regular programmer work.
Why not just use Escape characters? We'll still be using one '#' for separate words, but all we'll introduce symbol '/' for mark actual symbols '#' in strings. In case when we'll have '/' in string we'll just encode them like "//". Yeah like escaping symbols in c, c++. When decoding and find symbol '/' we'll look a symbol next to it and it could be either '/' or '#' Examples: ["we", "love", "leet#code"] -> "we#love#leet/#code" ["/we/", "#love/#", "##/#leet/#/code"] -> "//we//#/#love///##/#/#///#leet///#//code"
Computer systems have a special character called null terminator or hex = 0x0 represented by the '\0', just use that. class Solution: def encode(self, strs: List[str]) -> str: res = "" for s in strs: s += '\0' res += s return res def decode(self, s: str) -> List[str]: res = [] buf = "" for c in s: if c == '\0': res.append(buf) buf = "" else: buf += c return res
@@thenosey The whole point of the null character is to represent the end of a string. Should look into how Computer Systems work. There is absolutely no logical case in which a single string contains a null character in between. You are missing the point of the null character.
@@JoseAguirre-hl2do I'm well familiar with how strings work, but your solution is flawed in more use cases than the video solution. A logical case of a string containing a null character would be user input such as a password.
I did this too, I also used the null terminator. Here's my code: class Solution: def encode(self, strs: List[str]) -> str: res = "" for s in strs: for char in s: res += chr(ord(char) + 3) res += chr(0) return res def decode(self, s: str) -> List[str]: res = [] cur = "" for char in s: if char == chr(0): res.append(cur) cur = "" continue decodedChar = chr(ord(char) - 3) cur += decodedChar return res
No, first string will be encoded as "5#A123#4#B2#C". Decoding will be: we find the first '#', we read the length, length is 5, we push the next 5 chars to our result (['A123#']). We jump to 5 + length of the delimiter (2 in this case '5#'). We keep going until the second # is found. We read the length, length is 4, we push the next 4 chars to our result (['A123#', 'B2#C']). Since we're putting our delimiter on front of each string and we don't read the decoded chars, we guarantee we will always skip potential delimeters inside words
here is a great for you encode with new line as the delimiter so what separates the strings from each other is then we can join and split and the code is reduced to two lines
but what if we have one string that reads "a14#tryme", then the decode would be problematic, also, what if i have a string "a4#1234" then its encode would be "6#a4#1234" unless there must be certain rule said that it has to be 6 counts until it is able to soak another number+# encode/decode sign
Suppose [4#xx] this is your word => encoded 4#4#xx => now on decode you see 4# that means it is guaranteed that your next four char is your actual char. So you simply store without bothering any #symbol by that time.
Still there's possibility of string itself having the same pattern into it, by which you're deciding the number of chars.. decode will break in those cases 🤔
Empty string becomes 0#. Wild. Good vid. Still prefer double escaping and replacing , or converting ascii to char codes and delimiting with any string. Depends on the interviewer and what they are looking for. This is the best for a wide set of possible character inputs but not as easy to understand on first look if the interviewer cares more about team work and readable code
That doesn't make any sense, you should've said in the condition that we could change the initial string however we want, but you said we could only added the delimiter (which could only be added between words, not at the beginning, by the definition).
I used '\0' as a delimiter and it got accepted, when I saw your solution I think what I did is considered cheating on the porblem 😂😂 what do you think? is it correct or should be some test cases that handle this case
for those wondering why we can't just use a single digit number to delimit and track the length of the following word, what if word length >9 and takes up double digits! (maybe only my slow brain was confused about this :) )
Great point. Since the constraints are word length < 200, you can also convert the size to a char (ascii 1-256 , greater than 200) then convert that back to an int when decoding
This is actually a really good point and answers my question of why we can't just do length = int(str[i]) at line 23 :D
Your brain could be slow but mervelous!
I coded without keeping double digits in mind, and now here i am
Thanks for the comments and now I felt my brain is even slower lol
Some people say that LeetCode problems are "useless exercises that real coders never use in their day-to-day job", but, honestly, this one seems quite ueseful. Yes, we are probably never going to be implementing encoding\decoding ourselves, but knowing how it works is a great thing, I think.
great solution! just a note, the `encode` method is O(n**2) because string concatenation copies the entire string, so the result string is being copied over and over n times.
To make it O(n), we can use a list to mock a string builder. Each element of the list will be the delimiter and the word. At the end of the method we will join the list into a string and return it:
def encode(self, strs) -> str:
# O(n)
return ''.join(
f'{len(string)}#{string}'
for string in strs
)
in the code instead of using a list we can just use a generator of the encoded strings
How is:
return ''.join(
valid syntax, it should be "" the the least, and how can your write "for string in strs" inside the join function
@@skyhappy in Python single quotes and double quotes are interchangeable. The for loop inside the join is a generator and comes from list comprehension and functional programming
@@skyhappy There is shortcut for the list creation (initialization, for loop then append) called a generator in python.
return ''.join([ f'{len(st)}:{st}' for st in strs]) is the same as:
res2 = []
for st in strs:
res2.append(f'{len(st)}:{st}')
res2 = ''.join(res2)
return res2
the OP just forgot to type the square brackets and spaced out the generator syntax into mulltiple lines
Dang nice spot!
If this is unintuitive, an alternate workaround is to just make an array and join at the end.
Thanks!
Thank you so much!
This question is weird, but a good alternate solution would be to just store the comma separated length of items at the start of the string before a delimiter. Could also use base 16 numbers to save storage space.
Came here to look for this answer! Also awesome idea with the base 16
Yeah, did this my first try. Kinda prefer it because it works as an index. If needed, it would be possible to jump to word N based on the sizes in the beginning
Just did it on my first time and found it weird as well. I used dash '-' as a separator and worked, didn't even keep track of anything, just a single for loop and a split.
This question is as weird as any real world programming task is)
Pretty damn good.
If you think about it that is kind of how UTF8 encoding works. The byte representations always *start* with a particular pattern indicating what is the first byte of a representation. If a character is represented by 4 bytes than the the other 3 also have a distinct pattern to show they belong to the first byte together.
What if one of the strings happens to have a number followed by the #. Wont that break the decode?
It won't cause an issue because ["12#", "hello"] would be encoded as "3#12#5#hello". The decode algorithm would take the 3 characters after the first "#" and then the 5 characters after the third "#", decoding it back to ["12#", "hello"]. You can add this as a test case to verify.
Just elaborating on what Greg A said but with case [ '12#hello', 'abc', 'weird4#'] because I had the same question but with slightly different words:
original words: [ '12#hello', 'abc', 'weird4#']
encoded string : '8#12#hello3#abc7#weird4#'
Decoding algorithm:
First reads 8# and then takes in the next 8 chars as first word, add word to decoded set
Decoded: ['12#hello']
Then reads 3# and then takes in next 3 chars as second word, add to decoded set
Decoded: ['12#hello', 'abc']
Then reads 7# and then takes in next 7 chars as third word, add to decoded set
Decoded: ['12#hello', 'abc', 'weird4#']
These are the notes I wrote down to clarify things for myself
Just a sanity check to confirm having number followed by # in the original words is fine :D
@@misterimpulsismThank you for explaining this. I was thinking the same thing as the original commenter. But it appears these two algorithms in conjunction with one another are fool proof
I was under the impression at first that we could pass in non-encoded strings. In that case, it would break the decode. But that doesn't seem to be the case here.
No it wouldn't because the first 2 characters would be #. Then whem we are finish reading...we are guaranteed to be at a place to read the next #. So even if we have 3#, we will be in a reading state at that point
So I'm a little confused by this approach. Why would you go through the effort of saving the variable length in a number instead of just separating the words with a unique delimeter then using like ''.join(s for s in strs) and s.split('')?
I guess it's not advisble to do it because the string could have your unique delimiter e.g. within it, we never know which test case may fail.
2:53 explained here
Another approach is we put "#" between every word and then another "#" if we have "#" inside the word. When we're decoding, if we have a single "#", it means it separates the word, however if there's another "#" next to it, it means it was inside a word.
exactly, it was pretty much implied in the second test case of lintcode, and that apparently made that problem easier than expected.
But it will be an extra operation to check if next char is also # (extra check) and then ignore it
["neat", "co##de"], or ["neat", "code#"]
@@benjaminberlin that would turn into
1) neat#co####de which can be read as neat then co##de
2) neat#code## which can be read as neat then code# as well
@@ramboguten1910 But I believe you don't have to do atoi in decode and calculate lengths in encode ?
I solved this qustion at random using | as a delimiter, it doesnt have any test cases for | char.
I'm sure that on the actual interview this would not be a valid solution (or maybe?), but we can split / join by some non-printable ascii character, 0x01 is great
class Solution:
def encode(self, strs): return '\x01'.join(strs)
def decode(self, string): return string.split('\x01')
arguably true, but def wouldn't be valid in an interview because it doesn't show you actually know anything about algorithms.
really? I actually think that 5# can be in the answer since its utf-8. But /x169 can't because it's not utf-8 so its better go with this way
Here's my solution I implemented before looking at Neetcode's coding part of the video, and just understanding his logic for implementing the algorithm (same TC, just differently written; but slightly worse SC as we store the string):
def decode(self, str):
i = 0
output = []
while i < len(str):
currStrLen = ""
currStr = ""
while str[i] != "#":
currStrLen+=str[i]
i+=1
j = int(currStrLen)
i+=1
while j != 0:
currStr+=str[i]
i+=1
j-=1
output.append(currStr)
return output
Convert string size to byte array and put it into the encoded string, you don't need separator and you will use less space. But that also works!
Doesn't this solution depend on none of the strings containing multiple sequences of digit#? e.g. 3#3#3#3#?
in which case you wouldn't know which is the delimeter.
No, your string is invalid anyway, you wouldn’t get it by using your encoding method. Remember there’s a reason why you are specifying the length though.
@@loon7181 no the original comment is correct, they've added test cases to prevent this
I don't understand why, in the case where you get an input like ['for', "tu3#tle", "jam"] you don't get an error.
That’s why you are saying the length of the word, even if the word is tu3#tle it doesn’t matter because before it you’d already know the length of that word (having before the word a 5#), so you wouldn’t check the content of that word. You’d just jump into the next delimiter.
Waiting for your videos daily❤️
Can you also make a video on rat in a maze (but it is not on leetcode) as it also frequently asked question?
Edited : After seeing your some of the backtracking videos I’ve solved that problem.
Done thanks!
Start off the encoded string with an integer that represents the length of first word then a dilemeter. The delimeter is needed here at the beginning and after each word’s length because the length of a word can be > 9 and would need 2 or 3 digits.
Example using # as deleimeter
Input [“hi”, “testing”]
Output “2#hi7#testing”
This output string can then be parsed back as list
I thought of escaping the delimter (like #) presnt within the string. but your solution is a simple and fast!
I did another solution.
Instead of using the length of a character and a #, the length of the character can be converted to a single character as the character code.
Signature of that function is (number) => character.
In Javascript, we have the function String.fromCharCode(string.length).
Saves on size
You could use instantaneous codes at the beginning of the encoding to write all the string widths, an easy example is unary code but you could use a Gamma or Delta to be more advanced and use less space.
So the final encoding would be
You don't need delimiters as the initial numbers are uniquely decodable.
It's similar to your solution and maybe overkill, but you can do it.
Shouldn't space complexity be O(n) since we are building up our encoded string that is dependent on the input size linearly. Same thing with decode but we're building up an array linearly dependent on the input size. Smaller input size means a smaller array we're building up so we use less space.
Wouldn't the time complexity for decode be O(nm) because we also slice a string?
you are the best, my friend
This man is a hero
why cant we solve this problem using JSON stringfy and JSON parse? its a 1 line solution
Could we not skip the delimiter using an escape character? If we found #, we could replace that with \#
Java version :
public class Solution {
/*
* @param strs: a list of strings
* @return: encodes a list of strings to a single string.
*/
public static String encode(List strs) {
StringBuilder result = new StringBuilder();
for(String s : strs){
result.append(s.length()+"#"+s);
}
return result.toString();
}
/*
* @param str: A string
* @return: dcodes a single string to a list of strings
*/
public static List decode(String str) {
// Write your code here
List ans = new ArrayList();
StringBuilder item = new StringBuilder();
int i = 0;
int j;
while (i < str.length()) {
j = i;
while(str.charAt(j) != '#'){
j++;
}
int wordLength = Integer.valueOf(str.substring(i,j));
int startIndex = j+1; // skipping number+# characters length
String subStr = str.substring(startIndex,wordLength+startIndex);
ans.add(subStr);
i = wordLength+startIndex;
}
return ans;
}
}
Love your contents..really helpful 👍
I originally thought that the fixed-length md5(s) hash could be used as the delimiter, because the result of the hash and the possibility of repeated characters in srts are almost zero.
great explanation, you really are the goat
I finally understand it! Thank you!
since words length is < 200, I add leading zero's to the padding, and always read 3 bytes each time.
This reminds me of my class on Networking. Bit data is sent with headers, to help the recipient make sense of the data.
Does string addition inside a for loop affect the complexity at all?
Yes, string concatenation is O(n) which makes the functions O(n^2). You should use a StringBuilder or a list in the loop, and build the string at the end.
Really admire your videos
So this is O(N) vs O(n*m) for using an escape character inside the strings themselves?
would be nice to get link to the excel with list of problems and comments
why cant we store the index (where the 2 strings are to be seperated) inside an array.
for example ["good","morning","neetcode"]
after encodeing=>["goodmorningneetcode'] and [3,6,7]
while decoding we keep a track of current index at i==3 , we insert d into the current string , push it inside the array , create a new empty string set index to 0 again
in the next iteration i==0 , charecter m will be pushed in the earlier created empty string , this push back of charecters will continue untill i reaches 6 and then we push back string to array.
This would be a great solution if you could save an array anywhere, but the problem requires that the decode and encode functions work *without* sending any information to each other.
So you can't save an array or any kind of information anywhere outside of the functions themselves.
I put in a similar solution but i put the length of the word after the delimiter. But I somehow run into an indexing error. Why is that?
class Solution:
"""
@param: strs: a list of strings
@return: encodes a list of strings to a single string.
"""
def encode(self, strs):
# write your code here
code = ""
for s in strs:
code += ":" + str(len(s)) + s
return code
"""
@param: str: A string
@return: dcodes a single string to a list of strings
"""
def decode(self, s):
# write your code here
res, i = [], 0
while i < len(s):
j = i
while s[j] != ":":
j += 1
length = int(s[j+1])
res.append(s[j+2:j+2+length])
i = j + 2 + length
return res
Hey thank you very much for your content! which is the time complexity for your encoder example? I came out with something like O(M.N) M being the number of words within encoded string and N being the average length of the string. is this correct?
can be done easily with, escape sequancing
choose $ as a delimiter, and escape any $ in word with /$ and escape any / with //
@@AmishRanpariya is it any easier than his solution?
@@aryanmobiny7340 yes
@@AmishRanpariya Would that work on ["/$/", "$"]?
@@JeffRagusa yes of course, endcoded string will be : ///$//$/$
And decoding will be done by escaping by /
So first // will reduce to /
Then /$ => $
Then // => /
Then $ means breakpoint. string found is "/$/"
Then /$ => $
End of string so second string is "$"
It is well proven solution developed ages ago. Computer networking algorithm uses it at very core of its algorithm for byte stuffing and packets fragmentation and all.
can use #4 for the first string and only int for the other strings?
Really interesting solution, my first thought was to use ordinals to map all the letters to their ordinal, then split each letter by a delimeter and each word by a delimeter, but this is a much faster solution lol.
How about encoding each string to base64 (that has no commas, for example) and using comma to separate, then using Split method (C#), then decoding all? That was my solution to it, and in my case looked really simple. IDK about the big O complexity though.
I'm confused. Why is this a medium? You can encode with "".join(strs) and encode with str.split(""). Even if you need to do this manually, unless I'm missing something about the time complexity or other constraints, this doesn't seem anywhere near a medium.
NeetCode also mentioned there being info in the description about constraints an extra memory usage (he said "stateless") but I didn't see anything about that on LintCode.
What am I missing?
i agree , in the video he said the encoding/decoding must be also stateless but he relies on the '#' as well .
i agree, your confusion is actually my thought too.
So, i try to google this question's original description in LeeCode
the notes say:
- The string may contain any possible characters out of 256 valid ascii characters. Your algorithm should be generalized enough to work on any possible characters.
- Do not use class member/global/static variables to store states. Your encode and decode algorithms should be stateless.
I think these can help you~
Using join and split won't work in the general case because you'd need to pick a delimiter. For any delimiter you pick, there are inputs that break it.
A simple delimiter won't work, because a string that contains the delimiter will make the decoder fail.
The difficulty is in coming up with a sophisticated enough pattern that *any* string will be encoded and decoded correctly.
@@saeedjinat9173 What does it mean to be stateless? And how does using # make it become not stateless?
It doesn't matter if there are numbers within the string because you'll jump over the string itself when you decode it.
If the sting length exceeds 9, without a delimiter you won't be able to tell if the second integer within the len of the string is part of the initial string or not
@@ayariahmed376 Nothing to do with my comment. My comment is about the string having number or not affecting the encoding and decoding process with this method.
@@howardlam6181 If the sting length exceeds 9, without a delimiter you won't be able to tell if the second integer within the len of the string is part of the initial string or not. It DOES have to do with your comment, now think
@@st1p426 Again, NOTHING to do with what I said. Think about what I actually said and come back again.
Where is the spreadsheet?
What if one of the strings is like "abc15#df1"
you would append the length of the string + # so the encoded string would be 9#abc15#df1 and when you decode the string you would know to read 9 characters passed the #
Interesting take on the problem, though I prefer escape symbols
I don't get the ending part, why are we adding length into i, shouldn't we just do i = j + 1; ?
j is at the position of the "#" so you have to add the length to go to the next integer. If you just do i = j + 1, i will be at the start of the word.
bc u want to move on to the next word
this Solution provides the linear time:
def encode(self, strs: List[str]) -> str:
if not strs:
return ""
result = []
for s in strs:
result.append(str(len(s)) + "#" + s)
return "".join(result)
I got this one in 30 seconds im so proud of myself 😭 i suck at 90% percent of these lol
why didn't you just use join and split? it's one line for both functions
@@Milan-vi1bq Doesn't work for all inputs.
@@Milan-vi1bq lol you did it wronhh
that should have been a red flag
@@chrischika7026 whatever this leetcode stuff got me nowhere lol
How do we get the length of the string, if j stops at the delimiter?. Can somebody explain this line: length = int(str[i:j]), on how to compute the length. Thanks in advance.
str[I:j] gets u the numbers that represent the length, 43#(insert some 43 char string) would have 43 as the str[I:j]
New channel NintCode 😁
This looks so much like turing machine tape to me!
what about using a non-ascii character as the delimeter?
A backtick ` delimiter works. Pretty easy:
class Solution:
def encode(self, strs: List[str]) -> str:
if not strs:
return []
if len(strs) == 1 and not strs[0]:
return "`"
x = "`".join(strs)
print(x)
return x
def decode(self, s: str) -> List[str]:
if not s:
return []
elif s == "`":
return [""]
else:
return s.split('`')
what if there's a string that has "5#" in it?
I don't have any wechat, how to create an account on LintCode?
You can enter your phone number and they send you a code which can be used to create an account.
@@dipankarpurecha5564 i don't get any code using indonesia phone number. Which country that works?
@@shalsteven It doesn't work for Spain too
IMO it's much more intuitive to just build an "index" or "table of contents" if you will and append that to the end of the joined string. This is similar to other patterns like password hashing with a salt, etc. You could actually implement that here to guarantee the delimiter is not included in any substring but you'd have to pass a unique salt to the decoder somehow. JS example, O(n) for both encoder and decoder:
// * encoder:
function encode(arr) {
const index = [];
let string = ''
for (el in arr) {
index.push(arr[el].length)
string += arr[el]
}
return string + ':index:' + JSON.stringify(index)
}
// * decoder:
function decode(encoded) {
const [string, indexString] = encoded.split(':index:')
const index = JSON.parse(indexString)
const charArr = [...string]
const result = []
let currentWord = ''
for (let i = index.length - 1; i >= 0; i--) {
for (let j = index[i]; j > 0; j--) {
if (currentWord.length < index[i]) {
currentWord = charArr.pop() + currentWord
}
}
result.unshift(currentWord)
currentWord = ''
}
return result
}
This breaks when :index: is included among the possible strings
@@izzya8132 LOL
class Solution {
public:
string encode(vector& strs) {
string fullline="";
for(string mover : strs){
fullline+=mover+";";
}
return fullline;
}
vector decode(string s) {
vector output;
string temp="";
for(int i=0 ; i
Are this 75 questions are enough to crack a software engineer interview in FAANG??
What if a string also has that [digit#] pattern?
My solution doesn't depend on pattern of input strings. Example: ["a", "bb", "ccc"] => "1,2,3|abbccc".
Doesn't matter, if the word was "abc4#abc", when the beginning of the string is reached (8#) it will append the next 8 characters regardless of their content. The pointers also skip over the word once they're finished appending it to the answer list, so the content of the string (4# in abc4#abc) will never be seen.
and how about if the word start with the same n# that you chose as delimiter?
Suppose [4#xx] this is your word => encoded 4#4#xx => now on decode you see 4# that means it is guaranteed that your next four char is your actual char. So you simply store without bothering any #symbol by that time.
This is the only source where I learn DSA from please keep making more the day I get into faang will pay you half my first salary.
😂
what if we use JSON to encode and decode?
can someone please tell the space complexity of the solutions
Hey dude, could you please link to the lintcode URL for this problem on your website? It currently links to the leetcode one which requires premium to view.
It’s in the description
I dont see why the hash is necessary. Eg. for an array ["he55o", "fri3ndly", "per2on"], with just integers as delimiters denoting the length of the upcoming string, we would have 5he5506fri3ndly5per2on.
If the first integer (5) is correct, this would then let us break up the "he55o" string correctly while ignoring the other integers (55) - we simply count 5 chars along after the first "5", which takes us to the "o" of he55o.
Then the next integer (6) denotes the length of the following string (fri3ndly). Am I missing an edge case here? I just cant see what advantage the hash actually provides.
What if the string is longer than 9 characters?
The solution seems confusing maybe because you forget to tell that there can be words with length above 10.
class Solution {
public String encode(List strs) {
// write your code here
StringBuilder sb=new StringBuilder();
for(String s:strs){
sb.append(s.length());
sb.append("#");
sb.append(s);
}
return sb.toString();
}
/*
* @param str: A string
* @return: decodes a single string to a list of strings
*/
public List decode(String str) {
// write your code here
List list = new ArrayList();
int i=0;
while(i
First thing I tried was JSON.stringify & JSON.parse and of course, passed :P
would it work if you used 'a' instead of '#' as the delimiter?
Yes any character would work. Only thing is while encoding u hv to use the same character and have the len(string) before it. And while decoding also check for the same character
Not the ideal solution but a cheat code, you can actually just do convert the whole list to string by str(strs) and then get the list back by doing eval(s)
how do you manage to sign up for lintcode? I cant get the verification code on my phone
Enter ur country code and ur phone number. You wont get a text message but a call from a telecaller telling you the OTP twice
Couldn't we search and find a character to use that is not in the input list and then set that as the delimiter? =)
C# version
public class Codec
{
// Encodes a list of strings to a single string.
public string encode(IList strs)
{
StringBuilder builder = new StringBuilder();
foreach (string s in strs)
{
builder.Append(s.Length + "#" + s);
}
return builder.ToString();
}
// Decodes a single string to a list of strings.
public IList decode(string s)
{
IList lst = new List();
int i = 0;
int j = 0;
while (i < s.Length)
{
j = i;
while (s[j] != '$')
{
j = j + 1;
}
int idx = s.IndexOf('#', i);
int length = Convert.ToInt32(s.Substring(i, idx - i));
lst.Add(s.Substring(idx + 1, length));
i = idx + length + 1;
}
return lst;
}
}
man, how can you post code online without running it on local?
while (s[j] != '$') should be
while (s[j] != '#')
Turing Machine Tapes Vibe
Not sure if this has been mentioned before, but i think your solution here still has an issue where if the string includes whatever your hashing key is (eg. "123#"), then we'll still have an issue.
to account for this, rather than placing the key in front of every word, you could instead just construct the string with the counts at the start of the string (eg. "1,2,3::word1word2") and split based on your delimiter (in this case it'd be the "::"). Then you know that your first instance is always going to be your word lengths no matter what's actually in the strings.
It will not cause an issue as the hashing key always comes before the content of the string. Once a string it found, it appends the content of the string to the answer list and does not read it
This is completely wrong. Because for each string we are just reading x # of characters, where x is the number before #. Even if the exact same "hashing key" is included in the string, it doesn't matter, because at that point we are just reading x # of characters.
@@konan91 Oh that's a good point. thanks for the correction!
Why not use a unique delimiter string that is mathematically unlikely to appear in the string? Such as a random number with 32 digits. You could just split on that delimiter string and the answer would be much simpler.
Very easy to understand! Thank you so much 🥰
what if the orginal string had a number and then pound like ["ab3#c", "abc"]
I don't see this one as a medium problem.. actually it doesn't seem related to A&DS.
It's some kind of basic serialization, aka regular programmer work.
Is it just me or did you guys also feel unsafe sharing data on lintcode while signing up ?
passed the lintcode with this beating 86%:
def encode(self, strs):
return ':;'.join(strs)
def decode(self, str):
return str.split(':;')
I did that too but it won't work if one of the strings contains ':;'. It will be treated as a separate string.
i legit returned the original ones and it passed XD
Why not just use Escape characters? We'll still be using one '#' for separate words, but all we'll introduce symbol '/' for mark actual symbols '#' in strings. In case when we'll have '/' in string we'll just encode them like "//". Yeah like escaping symbols in c, c++. When decoding and find symbol '/' we'll look a symbol next to it and it could be either '/' or '#'
Examples:
["we", "love", "leet#code"] -> "we#love#leet/#code"
["/we/", "#love/#", "##/#leet/#/code"] -> "//we//#/#love///##/#/#///#leet///#//code"
why not just use ( " ) as the separator ? so the encoded string for ["hello" , "world"] will be -> "hello" "world"
Computer systems have a special character called null terminator or hex = 0x0 represented by the '\0', just use that.
class Solution:
def encode(self, strs: List[str]) -> str:
res = ""
for s in strs:
s += '\0'
res += s
return res
def decode(self, s: str) -> List[str]:
res = []
buf = ""
for c in s:
if c == '\0':
res.append(buf)
buf = ""
else:
buf += c
return res
Not a complete solution because the individual strings can contain '\0'
@@thenosey The whole point of the null character is to represent the end of a string. Should look into how Computer Systems work. There is absolutely no logical case in which a single string contains a null character in between. You are missing the point of the null character.
@@JoseAguirre-hl2do I'm well familiar with how strings work, but your solution is flawed in more use cases than the video solution. A logical case of a string containing a null character would be user input such as a password.
I did this too, I also used the null terminator. Here's my code:
class Solution:
def encode(self, strs: List[str]) -> str:
res = ""
for s in strs:
for char in s:
res += chr(ord(char) + 3)
res += chr(0)
return res
def decode(self, s: str) -> List[str]:
res = []
cur = ""
for char in s:
if char == chr(0):
res.append(cur)
cur = ""
continue
decodedChar = chr(ord(char) - 3)
cur += decodedChar
return res
You have used two while loops, one inside of the other. That means the time complexity is: O(n^2)
this is bad solution
Same doubt. How is the time complexity o(n)?
Wont this break if the input is
'A123#', 'B2#C'
Yes, this question is impossible to solve with string delimiters.
No, first string will be encoded as "5#A123#4#B2#C". Decoding will be: we find the first '#', we read the length, length is 5, we push the next 5 chars to our result (['A123#']). We jump to 5 + length of the delimiter (2 in this case '5#'). We keep going until the second # is found. We read the length, length is 4, we push the next 4 chars to our result (['A123#', 'B2#C']).
Since we're putting our delimiter on front of each string and we don't read the decoded chars, we guarantee we will always skip potential delimeters inside words
Cant we just do string.split?
this problem would be easy to solve in lua using table.concat and string.split
here is a great for you encode with new line as the delimiter so what separates the strings from each other is
then we can join and split and the code is reduced to two lines
but what if we have one string that reads "a14#tryme", then the decode would be problematic, also, what if i have a string "a4#1234" then its encode would be "6#a4#1234" unless there must be certain rule said that it has to be 6 counts until it is able to soak another number+# encode/decode sign
But what if word will contain number and then "#"
Suppose [4#xx] this is your word => encoded 4#4#xx => now on decode you see 4# that means it is guaranteed that your next four char is your actual char. So you simply store without bothering any #symbol by that time.
Who else thought that they can do it without "#" symbol? :)
This question just feels vague tbh, doesn't even feel like a medium especially if you solve using a delimiter.
Still there's possibility of string itself having the same pattern into it, by which you're deciding the number of chars.. decode will break in those cases 🤔
Do you have an example test case for that?
I was thinking what if any of the string is "2#". I realised now that I was wrong, 😬 your solution will work in that case too. 🙂👍🏻
Empty string becomes 0#. Wild. Good vid. Still prefer double escaping and replacing , or converting ascii to char codes and delimiting with any string. Depends on the interviewer and what they are looking for. This is the best for a wide set of possible character inputs but not as easy to understand on first look if the interviewer cares more about team work and readable code
That doesn't make any sense, you should've said in the condition that we could change the initial string however we want, but you said we could only added the delimiter (which could only be added between words, not at the beginning, by the definition).
I used '\0' as a delimiter and it got accepted, when I saw your solution I think what I did is considered cheating on the porblem 😂😂 what do you think? is it correct or should be some test cases that handle this case