What bothers you more - monthly real estate rental or cloud cost bills? Plan your cloud infrastructure and optimize costs. Learn from experiences of enterprises and startups, on 26 November 2021 at hasgeek.com/rootconf/optimizing-costs-of-cloud-infrastructure/
@@bhaveshssharma8826 You cannot use cache , as you need to change data of recommendation , and last watched videos , advertisements and so on! Hence we cannot use cache. It'll be a bad user experience!
I couldn't make my younger brother understood how big an achievement this is, he's still doing his engineering. I hope one day he'll understand what level of engineering in India specifically hotstar is doing. Loved the talk
Hi ajay, can you explain a bit about what is high level about this? i am not an engineer, but would like to know. Also, how does it compare with the engineering of other global hotstar competitors?
@@deep_z concurrency is the big thing here, how to handle these many people accessing same set of services without any interruptions, while in other apps you'll see different set of data will be consumed by different set of people, thats not true here.
@@arjunjmenon RUclips yes, irctc I don't think so, and overall RUclips is backed up by big players like Google, which is one of the best tech companies in the world.
What's interesting is they're not using CPU or networking for autoscaling. They're using concurrency and request rate. I think they already have scaled infra before the big game already. Autoscaling only kicks in when that baseline is exceeded
TLDR; Perform Load testing to get instance limits (in terms of cpu/ram/network) and than prelaunch more instances that would serve your expected numbers of concurrent users
This is amazing! I did not move from my chair for this 46 mins! In those tense moments of IND vs NZ, I used to wonder how these guys are keeping up with 25m users! That question has been beautifully answered after half a year! Thankyou Gaurav! You and your team are the real heroes!
This is absolute pinnacle of software engineering. Kudos to the team. I am still a student learning cloud platform and all, the whole presentation kept me glued to it.
@@asrajan55 hi Soundar, afai recon, the scaling of so many resources, along with the risk management and maintaining concurrency without breaking down the APIs AT THIS scale, is itself a challenge. To mention the fragile nature of the viewership graph and working on a breakdown time of 30-60 seconds, all together makes this task an absolute marvel.
This basic question is unanswered. 25Million concurrent viewers == 25Milllion TCP/UDP connections. are you creating the ~25 or higher TCP connections while performing the game day/load test ?
Interesting review. About using 8 AWS regions for distributing the bandwidth for pre-match load testing, how did it play out on match day since almost 20M of the 25M hits would have come from within India and hitting only 10 edge locations (2-3 regions) if I'm not wrong. Is that load testing strategy a good match for actual production load pattern?
if your all traffic comes from india doesn't necessarily mean your all traffic would be served from indian AZ servers only. there are always resource limitations which can loose up the performance comparatively when entire traffic is being served from indian servers vs scattered global servers. but you would not want to take the risk of serving everything from indian servers on the cost of risking the whole system as a whole. that's why a lil slower and higher network bandwidth is accepted than pushing all the traffic to nearest servers and taking the risk of system downtown. its a "Trade-off". people will be okay with a lil lower loading but not with viewership disruptions. a lot of things come into play when you are dealing with such scale. *Engineering is beautiful*
I have an on prem application which i build , which is based on Event driven architecture and we handle around 14M msgs/day , whereas the max spike rate we have seen is 5M/hr, but our throttling and fault tolerance isn't as effective as this , loved the details and especially the depth of the presentor.
The bitrate is still pretty terrible. I mean you can straight up see pixels on screen. Of all the streaming services, hotstar is poorest. Until they improve the bitrate and the fps, I will not be impressed.
It was indeed a great video. Thanks Gaurav for that wonderful presentation. I have one more question -> As you said for example if Dhoni is the next batsman we all know that the users are going to increase. But lets say if a batsman who is not performing well throughout the series, suddenly starts performing extremely well and again there is a spike in traffic. What is the plan to deal with such situations ?
Would love to hear something like this from Netflix which handled Money heist and all very smoothly. One key difference could be getting live data vs streaming from CDN. Nevertheless great job #hotstar team. Proud of you guys 🙏🙏
@@godsonjoseph strongly disagree. It'll only serve to demotivate the beginner who will find all these keywords and abbreviations to be extremely complicated. Maybe after programing for some time and reading some system design books one would be comfortable with this video.
Bro, I was desperately trying to upgrade to premium during India Vs Pakistan. That day there were more than 90 Lakh online viewers. I was unable to pay online , payment methods were not working due to high traffic.
Great talk… thinking in terms of RPS and concurrency to auto scale is out of the box… I am bit disappointed to know they need to keep watching the event to scale, hope they’ve automated this now
@@ruturajdharav4675 it's biased though. the number of viewers hotstar gets or cpus they run at any moment is positively drawfed by youtube numbers. the fact that all of those cpus are crunching the same livestream means very little.
So jist of 45 min talk is to use custom autoscaling metric like concurrent request instead of default cpu/mem metric provides by aws. Why don't u guys open source the auto scaling tool that u developed?
i genuinely appreciate the stability of the disney hotstar platform during intense workloads but when you stream at 720p @25fps for 20 million vs 2160p @50fps to 2 million in uk germany, it is not a big achievement
@@sanjeetbisht since 2020 it is 1080p 25fps, i was specifically talking about the ind vs nz 2 day match, i am pretty sure that it was 720p 25 fps for premium users for live events
I feel hotstar’s Home API calls have been comparatively slower than other platforms out there. Every time I go back to home it buffers for some time always
To be fair no LB really is. The scaling by margin of 800 servers with high capacity is harder for AWS. They don't have hardware optimization GCP has. Although I do think this is hard for GCP too from networking perspective.
Mainly its about the configuration on AWS to scale up that they have acquired using load testing. AWS need to be boasting about their cloud platform and services.
That is called broadcasting. Any listener can get the packet and process it. If there were no viewers, in that case also satellite is going to broadcast. But in case of internet, bandwidth is limited.
3:25 Scaling is not slow in nature since kubernets .. if u follow MS arch a pod can be up and running in about 5 seconds This problem was solved long ago....
5:05, I'm a newbie dev but can't we just temporary cache the homepage once the user open the app, and then show the same cached files when the user will return and because most users won't be consistent with the homepage, some will leave, it won't have a huge load on the homepage.. and because users will open the app in different times, it's easy to cache the homepage at different times
Every user has similar API calls like fetching the video list, categories data but there are many which are personalized and specific to a particular user such as user profile and recommended. This is not that easy to cache and needs to be sent differently. I guess this is where the backend system is most impacted. I'm not exactly sure as I also just started learning this...
Jio cinema handled 2.2 cr (22 million) concurrent traffic for a normal ipl 2023 league match of CSK vs RR Wonder what will be the live numbers in IPL 2023 final
Why is the channel moderator deleting my comments when I mention about how bad the actual experience is with hotstar Live matches. You can censor all you want but the recent play store reviews will tell you how bad it actually is with streaming Live matches in high quality. If it was not for the exclusivity deal with indian matches, it would not have users
Common cricket team is not our country , just chill and celebrate your achievements, as it is only IT nerds like us will care about it , of course apart from the broadcast partners and sponsors
What bothers you more - monthly real estate rental or cloud cost bills? Plan your cloud infrastructure and optimize costs. Learn from experiences of enterprises and startups, on 26 November 2021 at hasgeek.com/rootconf/optimizing-costs-of-cloud-infrastructure/
Great !! wish he has covered the database scaling as well.
@5:05 The point about the homepage API's taking a hit as soon as Dhoni gets out is INSANE. I wouldn't have imagined that scenario. Excellent talk.
That point is absolutely on point. Never thought about it in this way. Loved the talk.
Exactly I never thought this point as developer 😳 awsome
All these are usually observations from previous occasions.
All predictions are always based on previous results.
There will be cache for home page content, why will it cause a hit, anything I am missing?
@@bhaveshssharma8826 You cannot use cache , as you need to change data of recommendation , and last watched videos , advertisements and so on!
Hence we cannot use cache. It'll be a bad user experience!
I couldn't make my younger brother understood how big an achievement this is, he's still doing his engineering. I hope one day he'll understand what level of engineering in India specifically hotstar is doing. Loved the talk
Hi ajay, can you explain a bit about what is high level about this? i am not an engineer, but would like to know. Also, how does it compare with the engineering of other global hotstar competitors?
@@deep_z concurrency is the big thing here, how to handle these many people accessing same set of services without any interruptions, while in other apps you'll see different set of data will be consumed by different set of people, thats not true here.
@@ajaykumar-xy6pw youtube , irctc is doing it all the time . isnt it
@@arjunjmenon RUclips yes, irctc I don't think so, and overall RUclips is backed up by big players like Google, which is one of the best tech companies in the world.
@@arjunjmenon irctc lol
And RUclips just decides this one fine morning, to recommend this video to all engineers !!
The project I work on has 2M in a month and these guys are handling 25.3M in a day hope to work on something so big like this.
Not even 1 day, its 25.3 M concurrently. Over a day it will be multiple magnitudes higher!
the product I work on has at max 64 concurrent transactions
@@karun4663 haha
Just now of my application broke down on 100 concurrent users🤣🤣🤣🤣🤣
Ind Pak game, they reached 100M recently
Here after 5.4 million in IND v NZ Semi-Final of 2023, crazy to think that the new record is DOUBLE than this!
What's interesting is they're not using CPU or networking for autoscaling. They're using concurrency and request rate. I think they already have scaled infra before the big game already. Autoscaling only kicks in when that baseline is exceeded
Yes u need to scale on golden metrics ! CPU based autoscaling is not very useful ! It should always be concurrency or latency!
Yep. Proactice scaling > reactive scaling
Excellent level of logging, monitoring and observability. Loved the talk even though I am not a devops/ops person
Man same.i just felt this was interesting
This is not just for devops or SRE folks.This is what core engineering team does
I've seen the concurrent viewers hitting up to 45M in the last ipl match. Science and technology never cease to amaze me.
That's lakhs bro not millions
I too thought the same and got confused😅
And the skill of Indian engineers
TLDR; Perform Load testing to get instance limits (in terms of cpu/ram/network) and than prelaunch more instances that would serve your expected numbers of concurrent users
I think the essense was well summarised. But the actual value lied in the process of the talk.
Wonderful talk by Gaurav and it was such a pleasure to work with him.
This is amazing!
I did not move from my chair for this 46 mins!
In those tense moments of IND vs NZ, I used to wonder how these guys are keeping up with 25m users! That question has been beautifully answered after half a year!
Thankyou Gaurav! You and your team are the real heroes!
This is absolute pinnacle of software engineering. Kudos to the team. I am still a student learning cloud platform and all, the whole presentation kept me glued to it.
Wait...maybe I am not getting this...but isn't this all just configuring AWS?
@@asrajan55 hi Soundar, afai recon, the scaling of so many resources, along with the risk management and maintaining concurrency without breaking down the APIs AT THIS scale, is itself a challenge. To mention the fragile nature of the viewership graph and working on a breakdown time of 30-60 seconds, all together makes this task an absolute marvel.
They came up with their own autoscalling app - thats gangsta
I imagined how RUclips works perfectly with millions of calls per minute everyday
more like per second
RUclips is more distributed bcz of that u will most likely face buffering watching videos less popular youtuber outside India.
Kudos to hotstar team, handling traffic at at this level. Superb presentation to illustrate all concepts.
This basic question is unanswered. 25Million concurrent viewers == 25Milllion TCP/UDP connections. are you creating the ~25 or higher TCP connections while performing the game day/load test ?
Awesome explanation of how the Hotstar tackled the load problem to handle huge traffic. Liked it. Thanks for sharing such a good info.
Unfortunately hotstar management's doesn't want to spend more and let football streams be at 240p when it says Full HD.
Now I know why we don't get to watch the match at 4K 60fps at least. Hope we reach 4K 120fps soon with premium and 1080 60fps on regular plans.
Love love loved it❤...AMAAAAZING session...Every sentence is so full of substance and thought provoking...Gaurav absolutely killed it.
Wonderful presentation. Lots of insights into how you handle resiliency and load. Many thanks for uploading this.
Interesting review.
About using 8 AWS regions for distributing the bandwidth for pre-match load testing, how did it play out on match day since almost 20M of the 25M hits would have come from within India and hitting only 10 edge locations (2-3 regions) if I'm not wrong. Is that load testing strategy a good match for actual production load pattern?
have the exact same question ....
if your all traffic comes from india doesn't necessarily mean your all traffic would be served from indian AZ servers only. there are always resource limitations which can loose up the performance comparatively when entire traffic is being served from indian servers vs scattered global servers. but you would not want to take the risk of serving everything from indian servers on the cost of risking the whole system as a whole. that's why a lil slower and higher network bandwidth is accepted than pushing all the traffic to nearest servers and taking the risk of system downtown. its a "Trade-off". people will be okay with a lil lower loading but not with viewership disruptions. a lot of things come into play when you are dealing with such scale. *Engineering is beautiful*
I have an on prem application which i build , which is based on Event driven architecture and we handle around 14M msgs/day , whereas the max spike rate we have seen is 5M/hr, but our throttling and fault tolerance isn't as effective as this , loved the details and especially the depth of the presentor.
Apart from code, its just configuring AWS, I wonder, how AWS boasts about their metrics.
yeah here after ind vs pak
Same bruh
@Sir Bradulkar Where have you seen commerical content on 60 fps sir?
Came here after India vs Pakistan match when 2 crore people watched the match without buffering.
The way he explained 👌🔥
Woww, wonderful explanation with right techniques along with use cases. Really liked it. Thanks Gaurav.
The bitrate is still pretty terrible. I mean you can straight up see pixels on screen. Of all the streaming services, hotstar is poorest. Until they improve the bitrate and the fps, I will not be impressed.
This is insanely huge! How come this has not made news.
Because they are more concerned with Aryan khan's drug habits.
I think now there should be part 2. Scalling Hostart for 4.7 cr concurrent viewers.
This is so amazing. No claps from the audiance? What kind of audience was there?
Nicely explained by Gaurav, there is a lot to learn from his talk
I've learnt soo much from this video. ❤️
He's Hero of Cloud 🫡 waiting for pak vs india match case study .
Great talk Gaurav. Some especially useful and practical tips to scale on public cloud.
Really good production level insights, thanks for sharing...
Saving this for my system desing interview
Yes 😂
Fyi speaker isnot from any IIT,NIT,IIIT or BITS.
It was indeed a great video. Thanks Gaurav for that wonderful presentation. I have one more question -> As you said for example if Dhoni is the next batsman we all know that the users are going to increase. But lets say if a batsman who is not performing well throughout the series, suddenly starts performing extremely well and again there is a spike in traffic. What is the plan to deal with such situations ?
Would love to hear something like this from Netflix which handled Money heist and all very smoothly. One key difference could be getting live data vs streaming from CDN. Nevertheless great job #hotstar team. Proud of you guys 🙏🙏
Read about netflix openconnect they pre calculate the demand and preload the content at ISPs
Really awesome engineering by Gaurav Kamboj.
I think after seeing this video Amazon will make some changes on autoscaling.
Its my please to hear you
I want to know how they are managing the aws bills specially when he mentioned c5 9x large cpu's
All the cloud providers have corporate plans were they provide huge discounts to their potential customers.
This is the mafia level of infrastructure management!! I handled 100k users qnd felt like boss. Well now I know what boss level looks like xd
what a great talk, with details.
would like to see these more in future.
My coding abilities are limited to printing "Hello World" on python yet this is at the top of my recommendations lmao
Its best if you start watching stuff like this,.. Makes you think a lot before writing a single piece of code.
@@godsonjoseph strongly disagree. It'll only serve to demotivate the beginner who will find all these keywords and abbreviations to be extremely complicated.
Maybe after programing for some time and reading some system design books one would be comfortable with this video.
Us humans can seemlessly serve 25 million people over the internet but when will we start making reliable projector connections???
Bro, I was desperately trying to upgrade to premium during India Vs Pakistan. That day there were more than 90 Lakh online viewers. I was unable to pay online , payment methods were not working due to high traffic.
Great talk… thinking in terms of RPS and concurrency to auto scale is out of the box… I am bit disappointed to know they need to keep watching the event to scale, hope they’ve automated this now
Madness! The scale issues are insane! how did they even manage such spikes.
Hotstar: "Scaling for 25 million concurrent viewers"
RUclips:
There’s a difference between streaming live video and just doing playback of existing files
@@amanbhargava3164 even RUclips has live videos
RUclips is owned by Google. Which has it's own GCP.
@@techmad8204 but not with 25M
@@ruturajdharav4675 it's biased though. the number of viewers hotstar gets or cpus they run at any moment is positively drawfed by youtube numbers. the fact that all of those cpus are crunching the same livestream means very little.
The irony when you are talking about handling load for a streaming service and the screen says no signal (from the projector),
3-4 hrs watching rain on a screen!! That’s dope! 😂✊
they run highlights during this time, not rain drops falling.
This talk is amazing! Loved it.
And 1.2 Crores yesterday 🙌🏼🙌🏼🙌🏼
So jist of 45 min talk is to use custom autoscaling metric like concurrent request instead of default cpu/mem metric provides by aws.
Why don't u guys open source the auto scaling tool that u developed?
Again hotstar created another record of 1.3 billion viewer's.... Ind vs pak T20match 2021..They handled it very well...
All we here , After india vs pak match to know its database .
i genuinely appreciate the stability of the disney hotstar platform during intense workloads
but when you stream at 720p @25fps for 20 million vs 2160p @50fps to 2 million in uk germany,
it is not a big achievement
It's 1080p
@@kapilbhardwaj4680 did you watch it in 1080p on hotstar?
because i remember that the best quality was llimited to 720p 25fps for ind vs nz game
@@kapilbhardwaj4680 if you are a premium user only you will get 1080p! If you take hotstar mobile plan then only 720p resolution!
On TV app it was 1080p 22-30fps
@@sanjeetbisht since 2020 it is 1080p 25fps, i was specifically talking about the ind vs nz 2 day match, i am pretty sure that it was 720p 25 fps for premium users for live events
I'm a fresher just placed. I'm happy I atleast understand what he's talking. Ssly a lot of efforts go into this.
this is amazing but i am thinking about their AWS bill 🤔
Thanks youtube for recommendation.
about 13,500 8 CPU core 16 GB RAM load generators were used for load testing it seems, probably spot instance would be better in this case.
Good talk, but most of heavy lifting is done by AWS. Hotstar is just configuring AWS services
So inspiring to see home made apps doing so great
This is awesome! I would like to know the current infrastructure also, they must have come up with new strategy.
I feel hotstar’s Home API calls have been comparatively slower than other platforms out there. Every time I go back to home it buffers for some time always
I think they have done it intentionally
This is insane. I would love to join Hotstar as an Android Developer one day
RUclips never fails to surprise me
What was the AWS bill?
Whatever the bill was hotstar with this kind of performance could have already earned it back
5-6 crore each day during peak work load
probably a million on a day like this
@@shashankshekhar8970 a million dollars?
$1 Million on an Average
12:15 Elastic Load Balancer but not really Elastic in nature!
To be fair no LB really is. The scaling by margin of 800 servers with high capacity is harder for AWS. They don't have hardware optimization GCP has. Although I do think this is hard for GCP too from networking perspective.
Mainly its about the configuration on AWS to scale up that they have acquired using load testing.
AWS need to be boasting about their cloud platform and services.
liked the insights of ondemand scaling very much.
Simply superb ... Learnt a lot of things 🙏
"Dhoni is out" !! .. soldiers brace for the impact.
At 27:00 should not we save personal preference in client cache ? For better user performance.
25:36 Chaos Engineering topics are nice and fantastic.
Great.. Very informative.. now I'm just wondered how autoscaling works for normal telivisions in traditional way thru satellites
Autoscaling is not needed in em communication
That is called broadcasting. Any listener can get the packet and process it. If there were no viewers, in that case also satellite is going to broadcast.
But in case of internet, bandwidth is limited.
Was going to go for a sleep. But then this happened. Always wondered how they scale it.
3:25
Scaling is not slow in nature since kubernets .. if u follow MS arch a pod can be up and running in about 5 seconds
This problem was solved long ago....
5:05, I'm a newbie dev but can't we just temporary cache the homepage once the user open the app, and then show the same cached files when the user will return and because most users won't be consistent with the homepage, some will leave, it won't have a huge load on the homepage.. and because users will open the app in different times, it's easy to cache the homepage at different times
You will still make API calls to return from Cache.
Depending on what level you are caching, you can shave CPU cycles but not the requests.
Every user has similar API calls like fetching the video list, categories data but there are many which are personalized and specific to a particular user such as user profile and recommended. This is not that easy to cache and needs to be sent differently. I guess this is where the backend system is most impacted. I'm not exactly sure as I also just started learning this...
Watching again after 2 years.
And yet hotstar content moved to Hulu, Disney, ESPN in US
Disney have bought stakes in it ? do u know ?
how much is your infra costing? Is it hosted on EC2 with ASG managed by ECS?
The hero who make the Movies going from behind the curtains!,
Amazing tech talk 🎉
Not so difficult when app stream at 144p
This video needs CC, the accent is a bit heavy to fully comprehend.
Where are you from?
There are captions
Scaling was a great feat but even after all these years , the UI and UX of Hotstar is absolute garbage.
zee5 has worse
I never encountered this issue.
yes
It was more than 70million during ipl
Jio cinema handled 2.2 cr (22 million) concurrent traffic for a normal ipl 2023 league match of CSK vs RR
Wonder what will be the live numbers in IPL 2023 final
what next ofter solving these problems 🤔🤔
This was 2 years ago! wow..
Hotstar will be the next company I'm going into.
Interesting!
In ind vs pak match. No of viewer was 1.1cr . Being a full stack developer I can relate
today our server crashed and this is what I am watching lol
Why is the channel moderator deleting my comments when I mention about how bad the actual experience is with hotstar Live matches.
You can censor all you want but the recent play store reviews will tell you how bad it actually is with streaming Live matches in high quality. If it was not for the exclusivity deal with indian matches, it would not have users
Absolutely loved it..❤️❤️❤️ Hail System engineers 👨💻
Common cricket team is not our country , just chill and celebrate your achievements, as it is only IT nerds like us will care about it , of course apart from the broadcast partners and sponsors
22:00 is the actual explaination starts