Big respect for your knowledge and how you unselfishly share this. There is no one on RUclips that does this in such details. Keep doing what you are doing man.
For anyone wondering, apparently a "lakh" or "lac" as he was saying at 29:57 "is a unit in the Indian numbering system equal to one hundred thousand" - Wikipedia
Yeah, indian numbering doesnt go up in terms of 3 digits like (thousand, million, billion, etc..) It goes up in 3, 2, 2, ... eg . 1,000 -> Thousand 1,00,000 -> Lakh 1,00,00,000 -> crore ...
Software Engineer in Google, I barely write any comments, but you really did good job by making those impressive videos. Really appreciate your work, inspired me a lot and I learned tons of knowledge from those.
Very well explained with all pros and cons way of implementations and finally landed with a robust and scalable solution. URL shortener looks simple but very hard and I rarely found anyone else reaches close to your solution and way of explanation. just advice will be better if you explained with UML and ER diagram representations as well.
One quick question. I did write the implementation of base62 encoder and converted a simple text to Base62. The number of characters in the output is proportional to the length of the input. As the input increases, the encoded string length increases. In that case, how can u control the code to generate only 7 or 8 characters? taking the first 7 chars might not be a good idea. What is the solution?
Great video, maybe you can start with the API calls, Functional and Non Functional req to make it clear where are you going which will help you define your writes and reads and you could split the app servers to optimize your design. Also, I would like to know more about your approach on the reassigning or cleaning of ranges as those could go fast if you have such a limited number of characters for the short URL and reusability would be a factor as your number will be used completely within the first two years according to the analysis you had.
@Tech Dummies How will it insure that tiny url length is 7 char? Since we are pasing an integer and what if we passed big integer and that return more than 7 char from base62 method.
Good solution! I just want to add that if the application server goes down, the URL that was getting processed will be lost. To save us from that, we have to implement a temporary storage(with checkpointing) for a single app server or a distributed message queue for multiple app servers.
Hey Hi, Thank you for sharing your knowledge. Just one question, what if I use the same url to shorten it twice..it will generate 2 short urls..is that breaking a test case..or fine. I think we may need to keep a track of all the long urls also in a set so as to return same shorten url.
Same URl wont be converted to Short URL again. It returns the existing one from DB. If the URL doesn't exists in DB, Then it generates a new SHORT url.
@@saikishorer But what if two user try to shoten the same url at the same time , then in that case there will be two entry of the same URL , how can we avoid that.
It is very intutive and facinating to learn how these systems works behind the scene, i am watching this series as i if i am watching discovery channel.Thanks for sharing
Thanks for the great tutorial, one qs I am wondering how to ensure we always generate 7 base62 characters with random integer/long as input. I tried with the lib I have, the number of characters is not quaranteed.
how about taking it as a digit system with 62 characters, that meas we have abc....xyzABC...XYZ012...789 corresponding to 0...61, so an input 0 will have output aaaaaaa, and input 1 will be aaaaaab, so aaaaaba will be 62. For numbers from 0 ~ 62^7-1, we can have exactly 7 characters.
Thank you so much for explaining so well a seemingly simple system. It would good to speak a bit about data purging of url, and the business requirements of how long the tiny url is valid.
Awesome video! Something that is still unclear that every 'Build TinyURL' video/article doesn't explain is an example of actually generating the url. So we agree that we use the distributed counter and then base62 encode it. However when you encode a somewhat large number (much less than 3.5 trillion), you get a string that is longer than 7 characters! Ex. 1234567 ==[Base62]=> MTIzNDU2Nw That number is not even 1.5 million and we're way over the 7 character limit. If we truncate that and only take the first 7 characters, then we have the exact same problem as truncating the MD5 hash: collisions! Any help would be appreciated thanks!
thank you - that was a great explanation! Imho, the only thing missing, worth mentioning is that app should have thundering herd protection - so e.g. if tweet gets viral we should avoid cache misses
great job! I think could be useful to include one DB item with values as example {id, short url, long url}. Seems explicit in the tutorial but that can help some people understand better.
Hi Narendra, I was asked a question that if you are in the business of shortening the URL then how will you have an edge over your competitors, what extra you will do? Also if you have to shorten the URL to one or two characters then what approach you will take? Could you please help me with these questions.
One thing to note there is Base62 is a rather unpopular choice than base64( as the conversion is not straightforward). But we cant use base64 as it has "/" as an encoding character
Great video! If collision is an issue, instead of zookeeper, why can't we use a UUID as the number that gets fed into the base62 encoder. The chances of collisions with UUID is almost zero.
What if 2 users at the same time input the same long URL and those URLs are sent to different app servers? Also does this design allow me to extract the Long URL when I input short URL, if so what modifications are needed. BTW, you videos are amazing. I have watched every single one of them. Thank you for sharing. :)
Great analysis of the system design of a URL Shortener, we hit several of the caveats here especially when building the defender application, however; our job is far from done.
Hi, when we give 1 L to Base62, it does not necessarily produce a tiny string of '7' characters long. If we pick first 7 characters only, again there would be a collision possibility. Am I missing something here?
Great video 🙏 But I think one can write a simple random bite generator based on combination of user IP , current timestamp and url provided to generate random bites unique to every url.
Thanks for explaining the problem through various approaches .One question what if zookeeper service is down,will it not cause single point of failure?
One approach to collisions can be to use a counter along with a special symbol to act as counter. 1. In that case you have to make sure that the hash doesn't use the special symbol. 2. The hash portion of the url will serve as partition key and counter will serve as clustering key (Cassandra terminologies).
How do the databases scale out? If the table grows and grows lookup queries when you have cache misses will become a significant bottleneck. Replicas won’t help you here. Even if you sharded instances you’d still need a centralized table because you would need to lookup a specific instance in order to get the long URL given a short URL. Thoughts?
Very good explanation, and solving by breaking problems to small level and solving is more good. I see majority of problems were solved here but what about duplication, if the same long url is shortened by multiple users, the system is going to generate multiple short urls. This is redundant isnt it?? How can we solve this? Make a read once before we generate and write??
So, what would be the thing with the counter? Because now you'll be avoiding collisions, but the URLs generated are sequential per partition. That means you'll have some boring URLs like 0000001 and stuff, and also that it's possible to guess URLs on your service. I don't think that's super great. If you start making them into strings and encode literally however many bytes to ensure you'll have a 7-char long string in B62, you'll end up with less than the original ranges, though. And you can still guess a URL.
This is nice informative video. I had implemented Scheduler Server in past where zookeeper was in BETA on AWS cloudformation. Which worked pretty much but it is nice to use available services since it is such a pain to maintain separate code base.
Looking at the algorithm of base62, it seems like, given a same number, it will always produce the same output. Also, how come the string generated form base62 is guaranteed to be 7 bytes long?
Amazing explanation. But I've one question, what happens when all ranges in zookeper are exhausted? Is there a way to do some sort of compaction on ranges to reclaim them?
Hi Narendra, Thanks for the explanation. I really like your videos. There is one problem with the counter approach. Counter has a limit of 1,000,000 - 9,999,999, which can only generate approx 9,000,000 keys. For key length 7, total number of possible keys is 62^7 = 3 trillion. But with a counter range of 1,000,000 - 9,999,999, we can only use approx 9,000,000 keys out of 3trillion & if we change the counter range to a number which has less or more than 7 digits for eg. 10000-90000 will generate a key with length 6. How to resolve this issue? since we have to provide the unique random number every time to generate base62 of key whats the point of this conversion first place. why can't we give then that unique number as a key?
Hi Praful, I have the same question and I didn't find any solution yet. I am wondering do we need to always provide url with length 7 only? And what if we take different counter range as you already asked? If you have find any solution, please reply back.
Praful, I think you're confusing the 7 character constraint on 2 different things. The counter is an internal number, whereas the short URL is an external (user-facing string). Since the counter is internal, you can have more than 7 characters (e.g. for 3.5 trillion that's 13 characters if stored as a string) and use that to pass into the base-62 algorithm to always get a 7 character string that the user sees.
This video covered every possible cases and in very easy manner. I just had one question. Whether the app server is going to keep track of counter or zookeeper is going to pass the counter with every request? Let's assume here we have only one app server for the sake of simplicity.
good video, but we can generate random ids in the background whenever the available ids are below threshhold, we can use a more complex hashing to generate ids that way without delaying requests. Once the mapping is expired, save it as available and reuse it
Thanks for the detailed explanation. Just wanted to know one thing, since we are planning to use sharding, what can be used as sharding criteria in this kind of example?
Amazing explanation. Question: 1. How are spam and malicious links can be handled? 2. How might we be able to track and display traffic stats to users?
the final approach as well will have multiple short URLs for same long URLs since there is no check if the same long URL has already been mapped before. How do you address this?
Does the zookeeper have its own database? Point being what happens in the case of zookeeper failure? Also, I have some current counter value (numeric) to create a short link but somehow a DB exception occurred at that time like for a second or so, will it be wasted forever or how zookeeper will know this? Also, what can be the best approach to deal with the requirement of custom short link generation?
awesome, I have a question here : what if we just make the first character fixed for each server like server_A has a prefix 1 or a , and let each server count the hole remains range (6 or 7 chars) .
Great Video!! Thanks a lot for your efforts!! I have some queries even regarding the range from zookeeper.. Since our tiny url string size will be around 7-8 characters.. Using integers wont scale.. For example, if our tiny url length is around 8 characters, then we can generate only 9, 99, 99, 999 combination which is not sufficient..
The range (counter) from the zookeeper is not the url, it goes through the encoder to generate the url, the counter can go up to whichever type you are storing
Thanks for the tutorial! I really appreciate your work. However, in my opinion, you should explain the zookeeper architecture instead of just saying "use it".
Firs of all many thanks for creating this video. You make it quite easy to understand the design. Just one question regarding the Base 62. What if in the third approach the counter value is 2, 64, 126 in this case the resultant tiny URL will be same.
How does the Counter approach using Zookeeper give a No-collision guarantee while taking only the first 7 characters to store as shortURL. Can't those 7 chars well again be same for two different Base62 encoded strings. The same was done in MD5 approach where you mention at 17:03 that MD5 approach would have collisions because of taking only first 7 characters which might collide with a completely different MD5 string.
This approach will create a possibility of having more than one short URL for the same original URL right? Say multiple users giving same original URL, but the URL end up in different servers A1 and A2 etc. Then each server will use a unique counter to generate a short URL so each user even gives the same URL will get different short URL.
NoSQL Dbs still provide row level transaction support. So it is possible to use the first solution on horizontally scalable NoSQL dbs. Also, in the final solution, you are not checking for collisions. What happens when a server crashes and reboots quickly, and zookeeper assign the same range. The rebooted server will start from the beginning of the range and will override data. You need to cover this case too.
Your explanations are very good and has great detail. In this current design if the same long url is given twice, How does it return the same shortened url if that is a requirement?
Isn't B62 an encoding? Why it's an algorithm as you compared it to MD5? I thought to encode it in b62, you still need to go through md5 or sha256? Please correct me if I'm wrong, thanks!
Hi, Great work, but I have a concern - If I give the same long URL to this system, it will generate different tiny URLs every time. But, I think - For the same long URL, the system must generate the same tiny URL every time.
Big respect for your knowledge and how you unselfishly share this. There is no one on RUclips that does this in such details. Keep doing what you are doing man.
Ya, true man, @gauravSen, and this guy are providing this valuable information. If you know anyone like these guys on youtube reply back
Hussein my man you also share you knowledge Love you
Nice work... any thoughts on this review ? medium.com/double-pointer/review-of-grokking-the-system-design-course-c8613c28f3a1
@@ujjwalbansal1070 Check out Hussein Nasser's videos, they are very informative and fun to watch.
Oh my god!! Life has come full circle for me. Finding a Hussein Nasser comment on Narendra L's video. You two are my idols!!
For anyone wondering, apparently a "lakh" or "lac" as he was saying at 29:57 "is a unit in the Indian numbering system equal to one hundred thousand" - Wikipedia
Thank you! I was scratching my head about what that means... :)
Yeah, indian numbering doesnt go up in terms of 3 digits like (thousand, million, billion, etc..) It goes up in 3, 2, 2, ... eg .
1,000 -> Thousand
1,00,000 -> Lakh
1,00,00,000 -> crore
...
@ 20:10, he represents 1 million as 10,00,000 as 10 lakhs (see position of commas)
You should either stick to the metric system or use a global system that doesn't rely on localized counting systems. This would be a red flag for me.
Thanks a lot . I enabled subtitles, but no luck ☺️.
This was more clear than Ganga water in Gangotri : ] Thanks again for awesome explanation!
😂
Software Engineer in Google, I barely write any comments, but you really did good job by making those impressive videos. Really appreciate your work, inspired me a lot and I learned tons of knowledge from those.
You have the best Systems Design videos on youtube as far as I've seen. Well done! Subscribed.
Very well explained with all pros and cons way of implementations and finally landed with a robust and scalable solution. URL shortener looks simple but very hard and I rarely found anyone else reaches close to your solution and way of explanation. just advice will be better if you explained with UML and ER diagram representations as well.
One quick question. I did write the implementation of base62 encoder and converted a simple text to Base62. The number of characters in the output is proportional to the length of the input. As the input increases, the encoded string length increases. In that case, how can u control the code to generate only 7 or 8 characters? taking the first 7 chars might not be a good idea. What is the solution?
good question, I think the cut off might be the only solution, or we have to randomize until we reach 6 characters
The way you explains it seems like I can go back to my workstation and do it now. Hats off to your explanation.
Great video, maybe you can start with the API calls, Functional and Non Functional req to make it clear where are you going which will help you define your writes and reads and you could split the app servers to optimize your design. Also, I would like to know more about your approach on the reassigning or cleaning of ranges as those could go fast if you have such a limited number of characters for the short URL and reusability would be a factor as your number will be used completely within the first two years according to the analysis you had.
I am new to system design and find your explanations clear, thorough and insightful. Thanks!
This is brilliant .. I have seen ppl leave out the distributed part in system design but you nailed it bud 👍
@Tech Dummies How will it insure that tiny url length is 7 char? Since we are pasing an integer and what if we passed big integer and that return more than 7 char from base62 method.
I think this can be controlled in the code of generating the unique code.
generateUniqueCode(deci) {
const a = [0...9a...zA...Z];
while(deci>0 && len
you are great narendra, you explain everything so clearly and easily
Thanks for this top-quality content and video. Is that Zookeeper has the potential of a single point of failure here?
Even i am thinking of same.
zookeeper can be configured as single server or in quorum mode with multiple servers
Thank you so much for the detailed explanation. Don't need to refer any other sources once you go through this material.
I have been asked this question yesterday. Now I find the answer.
In btech level interview?
@@ssssahil yes....so it's very very easy, isn't it?
Good solution! I just want to add that if the application server goes down, the URL that was getting processed will be lost. To save us from that, we have to implement a temporary storage(with checkpointing) for a single app server or a distributed message queue for multiple app servers.
Hey Hi, Thank you for sharing your knowledge. Just one question, what if I use the same url to shorten it twice..it will generate 2 short urls..is that breaking a test case..or fine. I think we may need to keep a track of all the long urls also in a set so as to return same shorten url.
Same URl wont be converted to Short URL again. It returns the existing one from DB. If the URL doesn't exists in DB, Then it generates a new SHORT url.
@@saikishorer So does it have to check from database?
@@saikishorer But what if two user try to shoten the same url at the same time , then in that case there will be two entry of the same URL , how can we avoid that.
It is very intutive and facinating to learn how these systems works behind the scene, i am watching this series as i if i am watching discovery channel.Thanks for sharing
The student teacher relationship analogy was pretty clutch. Thank you!
oh oh 🙏
Your explanation of system design is very succinct and easy to grasp. Thank You.
This is a very nice explanation to the question that is frequently asked in interviews. cheers.
Thanks for the great tutorial, one qs I am wondering how to ensure we always generate 7 base62 characters with random integer/long as input. I tried with the lib I have, the number of characters is not quaranteed.
Im also wondering about this
how about taking it as a digit system with 62 characters, that meas we have abc....xyzABC...XYZ012...789 corresponding to 0...61, so an input 0 will have output aaaaaaa, and input 1 will be aaaaaab, so aaaaaba will be 62. For numbers from 0 ~ 62^7-1, we can have exactly 7 characters.
Best system design playlist and that too in free, hats off...!!!
Thank you so much for explaining so well a seemingly simple system. It would good to speak a bit about data purging of url, and the business requirements of how long the tiny url is valid.
In depth and clear explanation. Thanks Narendra!
you rock! a pleasure to watch!
great video, taking care of all the diff thing which should be known before design is also necessary.
You explained it in excellent way....thanks
This was wayyyy tooo good. Fully detailed and amazing explaination. God bless you sir.
one of the best videos for generating uniqueIDs, thank you
Awesome video! Something that is still unclear that every 'Build TinyURL' video/article doesn't explain is an example of actually generating the url.
So we agree that we use the distributed counter and then base62 encode it. However when you encode a somewhat large number (much less than 3.5 trillion), you get a string that is longer than 7 characters!
Ex. 1234567 ==[Base62]=> MTIzNDU2Nw
That number is not even 1.5 million and we're way over the 7 character limit.
If we truncate that and only take the first 7 characters, then we have the exact same problem as truncating the MD5 hash: collisions!
Any help would be appreciated thanks!
probably your conversion is incorrect, in my code b62(3000000000000) = QODkgKc which is in seven characters.
thank you - that was a great explanation! Imho, the only thing missing, worth mentioning is that app should have thundering herd protection - so e.g. if tweet gets viral we should avoid cache misses
great job! I think could be useful to include one DB item with values as example {id, short url, long url}. Seems explicit in the tutorial but that can help some people understand better.
increible!!! gracias por compartir hermano!!! mis respetos, saludos desde Mexico.
Great Video. Very well explained. Thanks for making this video.
Very useful, if you don't know which type of questions are being asked in big tech companies.
This channel is like a gold mine
thanks a ton Narendra, this looks a bit long video but it is very clear after watching the whole video.
Great video! Just one thing I don't understand. How can we be sure that base62 encoding will always produce string with length seven?
Hi Narendra,
I was asked a question that if you are in the business of shortening the URL then how will you have an edge over your competitors, what extra you will do? Also if you have to shorten the URL to one or two characters then what approach you will take? Could you please help me with these questions.
great explanation.. u gave all answers to questions which came to mind while watching the video.. superb!! looks like u were reading the mind
One thing to note there is Base62 is a rather unpopular choice than base64( as the conversion is not straightforward). But we cant use base64 as it has "/" as an encoding character
Great Video.
Keep on making such videos, it is really very helpful
Great video! If collision is an issue, instead of zookeeper, why can't we use a UUID as the number that gets fed into the base62 encoder. The chances of collisions with UUID is almost zero.
What if 2 users at the same time input the same long URL and those URLs are sent to different app servers? Also does this design allow me to extract the Long URL when I input short URL, if so what modifications are needed. BTW, you videos are amazing. I have watched every single one of them. Thank you for sharing. :)
Great analysis of the system design of a URL Shortener, we hit several of the caveats here especially when building the defender application, however; our job is far from done.
Hi, when we give 1 L to Base62, it does not necessarily produce a tiny string of '7' characters long. If we pick first 7 characters only, again there would be a collision possibility. Am I missing something here?
exactly what i was thinking
even after hashing (md5) no one guarantees the input counter wont result different first 7 chars to be the same
Great video 🙏
But I think one can write a simple random bite generator based on combination of user IP , current timestamp and url provided to generate random bites unique to every url.
hmmm...nice thought and idea too...i also agree with this,as the timestamp will be different for each moment
Very well explained. Keep up the good work.
It couldn't be better. You explain system design how it should be done. Just one suggestion - just a mention of HttpStatusCode 301 could be there....
Thanks for explaining the problem through various approaches .One question what if zookeeper service is down,will it not cause single point of failure?
One approach to collisions can be to use a counter along with a special symbol to act as counter. 1. In that case you have to make sure that the hash doesn't use the special symbol. 2. The hash portion of the url will serve as partition key and counter will serve as clustering key (Cassandra terminologies).
Can you explain a bit more about this?
we should use queue
Great, I didn't know about url shotner could be this complex, Thanks for Video Sir
How do the databases scale out? If the table grows and grows lookup queries when you have cache misses will become a significant bottleneck. Replicas won’t help you here. Even if you sharded instances you’d still need a centralized table because you would need to lookup a specific instance in order to get the long URL given a short URL. Thoughts?
Very good explanation, and solving by breaking problems to small level and solving is more good.
I see majority of problems were solved here but what about duplication, if the same long url is shortened by multiple users, the system is going to generate multiple short urls. This is redundant isnt it??
How can we solve this? Make a read once before we generate and write??
So, what would be the thing with the counter?
Because now you'll be avoiding collisions, but the URLs generated are sequential per partition. That means you'll have some boring URLs like 0000001 and stuff, and also that it's possible to guess URLs on your service.
I don't think that's super great.
If you start making them into strings and encode literally however many bytes to ensure you'll have a 7-char long string in B62, you'll end up with less than the original ranges, though. And you can still guess a URL.
This is nice informative video.
I had implemented Scheduler Server in past where zookeeper was in BETA on AWS cloudformation. Which worked pretty much but it is nice to use available services since it is such a pain to maintain separate code base.
Hi Naren ,
I really liked you videos . such a detailed and simple explanation for every system design video. Thanks for this selfless effort man
Respect for you brother, great videos
Looking at the algorithm of base62, it seems like, given a same number, it will always produce the same output.
Also, how come the string generated form base62 is guaranteed to be 7 bytes long?
getting the same
Amazing explanation. But I've one question, what happens when all ranges in zookeper are exhausted? Is there a way to do some sort of compaction on ranges to reclaim them?
Great video man! loved it! your way of teaching was phenomenal keep up the good work man
Just a question to consider.
How does it handle key recovery after short URL expires? will we not be wasting that?
nice explanation. many thanks. all the best. keep going !!!
such good quality content, Hats off
Hi Narendra, Thanks for the explanation. I really like your videos. There is one problem with the counter approach. Counter has a limit of 1,000,000 - 9,999,999, which can only generate approx 9,000,000 keys. For key length 7, total number of possible keys is 62^7 = 3 trillion. But with a counter range of 1,000,000 - 9,999,999, we can only use approx 9,000,000 keys out of 3trillion & if we change the counter range to a number which has less or more than 7 digits for eg. 10000-90000 will generate a key with length 6. How to resolve this issue? since we have to provide the unique random number every time to generate base62 of key whats the point of this conversion first place. why can't we give then that unique number as a key?
Hi Praful, I have the same question and I didn't find any solution yet. I am wondering do we need to always provide url with length 7 only? And what if we take different counter range as you already asked?
If you have find any solution, please reply back.
You are thinking of a counter as Base10 counter. Think about it as B62 counter. So it goes from 0,1,2,3,4,5,6,7,8,9,a,b,c...... ,Z-> 10,11,12,etc...
Praful, I think you're confusing the 7 character constraint on 2 different things. The counter is an internal number, whereas the short URL is an external (user-facing string). Since the counter is internal, you can have more than 7 characters (e.g. for 3.5 trillion that's 13 characters if stored as a string) and use that to pass into the base-62 algorithm to always get a 7 character string that the user sees.
This video covered every possible cases and in very easy manner. I just had one question.
Whether the app server is going to keep track of counter or zookeeper is going to pass the counter with every request?
Let's assume here we have only one app server for the sake of simplicity.
Great content. Super helpful. Thank you
good video, but we can generate random ids in the background whenever the available ids are below threshhold, we can use a more complex hashing to generate ids that way without delaying requests. Once the mapping is expired, save it as available and reuse it
Thanks for the detailed explanation. Just wanted to know one thing, since we are planning to use sharding, what can be used as sharding criteria in this kind of example?
Amazing explanation.
Question: 1. How are spam and malicious links can be handled?
2. How might we be able to track and display traffic stats to users?
the final approach as well will have multiple short URLs for same long URLs since there is no check if the same long URL has already been mapped before. How do you address this?
Well explained, good job. But number is guessable as it follows incremental approach. So something else also required to make it non guessable.
Does the zookeeper have its own database?
Point being what happens in the case of zookeeper failure?
Also, I have some current counter value (numeric) to create a short link but somehow a DB exception occurred at that time like for a second or so, will it be wasted forever or how zookeeper will know this?
Also, what can be the best approach to deal with the requirement of custom short link generation?
Thanks for the explanation, it was really useful and interesting! I also would be interested in how to setup the Apache ZooKeeper.
Thank you so much! This video was very helpful.
awesome, I have a question here :
what if we just make the first character fixed for each server like server_A has a prefix 1 or a , and let each server count the hole remains range (6 or 7 chars) .
Loved your video. Just out of curosity, can't the zookeeper be the single point of failure?
can we run a replicate database (cassandra or dynamo db) for storing available range instead of using zookeeper.
Great Video!! Thanks a lot for your efforts!!
I have some queries even regarding the range from zookeeper..
Since our tiny url string size will be around 7-8 characters.. Using integers wont scale..
For example, if our tiny url length is around 8 characters, then we can generate only 9, 99, 99, 999 combination which is not sufficient..
The range (counter) from the zookeeper is not the url, it goes through the encoder to generate the url, the counter can go up to whichever type you are storing
Thanks for the tutorial! I really appreciate your work. However, in my opinion, you should explain the zookeeper architecture instead of just saying "use it".
This excellent!! Well explained and easy to understand!!
I think Dynamodb(key value) provides putIfAbsent. You can use it and I am not sure if hbase has that functionality.
Amey Jain true, Not every database will support that
Firs of all many thanks for creating this video. You make it quite easy to understand the design. Just one question regarding the Base 62. What if in the third approach the counter value is 2, 64, 126 in this case the resultant tiny URL will be same.
Thnk you for this video 👍❤️
How does the Counter approach using Zookeeper give a No-collision guarantee while taking only the first 7 characters to store as shortURL. Can't those 7 chars well again be same for two different Base62 encoded strings. The same was done in MD5 approach where you mention at 17:03 that MD5 approach would have collisions because of taking only first 7 characters which might collide with a completely different MD5 string.
This approach will create a possibility of having more than one short URL for the same original URL right? Say multiple users giving same original URL, but the URL end up in different servers A1 and A2 etc. Then each server will use a unique counter to generate a short URL so each user even gives the same URL will get different short URL.
Thanks for the Video. How counter range impacts the Short URL generation when we apply at the App server level.
NoSQL Dbs still provide row level transaction support. So it is possible to use the first solution on horizontally scalable NoSQL dbs.
Also, in the final solution, you are not checking for collisions. What happens when a server crashes and reboots quickly, and zookeeper assign the same range. The rebooted server will start from the beginning of the range and will override data. You need to cover this case too.
Your explanations are very good and has great detail. In this current design if the same long url is given twice, How does it return the same shortened url if that is a requirement?
What happens when you run out of ranges to assign? Is there some kind of recycling mechanism?
Wondering if this is an over complicated design. Why don’t you just pre create the shorts strings in advance. ?
Isn't B62 an encoding? Why it's an algorithm as you compared it to MD5? I thought to encode it in b62, you still need to go through md5 or sha256? Please correct me if I'm wrong, thanks!
Great explanation. Could please make a video on how Zookeeper won't be a single point of failure for the design that you have suggested in this video?
Hi, Great work, but I have a concern - If I give the same long URL to this system, it will generate different tiny URLs every time. But, I think - For the same long URL, the system must generate the same tiny URL every time.
Simply Great!!! The way you explain, its fabulous.
Can't we use auto increment of sql and handle conversion to base62 of the generated id on application side ?
do we not consider zookeeper going down? Is it not single point of failure?
big respect and good information gained