![BigData Thoughts](/img/default-banner.jpg)
- Видео 97
- Просмотров 442 264
BigData Thoughts
Индия
Добавлен 7 дек 2015
Bigdata Thoughts is a channel focused on Bigdata technologies and cloud solutions.
Would be sharing my experience of building different use cases on cloud, the challenges faced and the different solutions. Would also talk about different technology options, architecture and design principles for each of the solutions.
Would be sharing my experience of building different use cases on cloud, the challenges faced and the different solutions. Would also talk about different technology options, architecture and design principles for each of the solutions.
What is AI and data science
What is AI
Evolution of AI
What is ML
What is Data Science
Real world application
Evolution of AI
What is ML
What is Data Science
Real world application
Просмотров: 50
Видео
All you need to know about Spark Monitoring
Просмотров 4672 месяца назад
All you need to know about Spark Monitoring - Ways to Monitor - WebUI - History Server - REST API - External Instrumentation
What is generative AI
Просмотров 2295 месяцев назад
What is AI What is generative AI Large language model (LLM) use cases challenges
Stream Processing Fundamentals
Просмотров 2485 месяцев назад
Stream Processing Fundamentals What is stream processing Stream and batch combination Benefits Challenges Design considerations
Evolution of Data Architectures in last 40 years
Просмотров 4336 месяцев назад
Evolution of Data Architectures -The Landscape -RDBMS -Datawarehouse -Data lake -Why data lakes? -Data lakehouse
Spark low level API Distributed variables
Просмотров 3729 месяцев назад
Different APIs offered by Spark What are low level APIs ? Why are they needed? Types of low level API What are distributed variables ? Distributed variable types Broadcast variables Why are Broadcast Variables better ? Accumulators
Spark low level API - RDDs
Просмотров 4479 месяцев назад
Different APIs offered by Spark What are low level APIs ? Why are they needed? Types of low level API What is RDD? Internals of RDD RDD API Types of RDD Creating RDDs Transformations on RDD Actions of RDD
Spark structured API - Dataframe and Datasets
Просмотров 90810 месяцев назад
Spark structured API - Dataframe and Datasets - Structured and unstructured APIs - Dataframe and Datasets - Row Object - Schema - Column - Column as logical tree - Dataset - when to use Dataset
Spark structured API - Dataframe
Просмотров 85711 месяцев назад
This video explains about - High level structured API Dataframe - How spark executes user code - All the steps that are needed to create a DAG
Spark Architecture in Depth Part2
Просмотров 2,2 тыс.11 месяцев назад
Spark Architecture in Depth Part 2 - Spark Architecture - Spark APIs - transformation vs actions with examples - End to end example to explain spark execution -
Spark Architecture in Depth Part1
Просмотров 3,8 тыс.Год назад
Spark Architecture in Depth - Driver - Executor - Cluster Manager - Data frame - Partition - Transformations - Narrow - Wide
Top 3 file formats frequently used in bigdata world
Просмотров 672Год назад
Top 3 file formats frequently used in bigdata world
What are Metadata Driven Architectures ?
Просмотров 1,8 тыс.Год назад
What are Metadata Driven Architectures ?
How to crack Bigdata Engineer Interviews
Просмотров 1,8 тыс.Год назад
How to crack Bigdata Engineer Interviews
Wonderful explanation
Really appreciate your hard work. Thank you for the great explanation.
thanks
Hello Shreya, Can you make a video of hand on Data ingestion in AWS S3?
Can Node/Thread have more partition than no of executors, if yes where the no of partition information will be stored.
Very good session mam if it was a practically show means it is very useful. thank you for your efforts
Thanks
Thank you for this video.
Thanks
Getting confidence in spark because of you only. Thanks so so much!
Thanks
To summarize, what the Datamarts are for a DataWarehouse, same are the DataMesh for a DataLake
How did you make such a good visual explanation? Which tool you used to draw sketches ? Pls guide 🙏
Data mesh and snowflake same..? Data mesh and microsoft fabric same?
Thanks, appreciate it.. is there a plan to post practical videos around spark performance tuning?
Thank you for sharing your thoughts.
this is not excatly asnwer
Please do videos with sample data sets so that it would help for hands on
Seems it is PAAS as mentioned on Microsoft website
Really Thanks to Good And Indepth Explantion
Thanks
lets say there is no change in records for the next day.. then.. does the data gets overwrite again?? with same records..??
No we are only taking the new differential data when we do CDC
This is excellent and valuable knowledge sharing... Easily one can make out these trainings are coming out of personal deep hands-on experience and not the mere theory ..Great work
thanks
Thank you, pls also post some practical videos around the same topic
Thank you for sharing thoughts
First one to monitor the notification from you
Thanks
how i join small table with big table but i want to fetch all the data in small table like the small table is 100k record and large table is 1 milion record df = smalldf.join(largedf, smalldf.id==largedf.id , how = 'left_outerjoin') it makes out of memory and i cant do broadcast the small df idont know why what is best case here pls help
18/april/2024
your videos on spark are hidden gems
Thanks
❤
❤
Nice explanation
Thanks
At the start of the video i was so happy seing all the diagrams.. Later got fully confused & felt complicated and i didnt understand well 😢
I wish I could give 1000 likes. You’re an excellent teacher!
Thanks
Nice explaination
Thanks
found it helpful. You may go slower though. I had to stop and rewind few times.
what a wonderfull explanation to the point... thank you
Thanks
Good playlist for Spark ruclips.net/p/PL1RS9FR9qIPEAtSWX3rKLVcRWoaBDqVBV
Thanks
Just woow, very simple explanation of a complex cluster overview.. Thanks.
Thanks
best explanation ever i came across on RUclips. watching all the parts .... Thank you for explaining it so smoothly.
Thank you for sharing thoughts!
That was very well explained. Thank you for putting this together. One question though, do you really think data modelling should be done on the Gold layer? I don't think so because Gold datasets are just busineess level aggregates suited to particular business consumption needs. Whereas Silver layer is the warehouse in Lakehouse. That is where modelling should be done, if needed.
Thank you so much.. all the vdos are very much clear and effective.
Thanks
Thank you for sharing your thoughts.
Thanks
Finally it got cleared to me after reading here and there . thank you .
one of the helpful session !
Thanks
Nicely explained, thank you ..looking forward to learn more around this topic
Thanks
well explained!!!
Well explained!!!
well explained!!
Thanks
well explained
Thanks
Nicely explained and thanks. helping a lot
kindly do similar simple thing for dataproc also bigquery.
Thank you for the detailed explanation. However the problems that I faced with reading dates prior to 1900, does not resolve even after setting all the mentioned properties. Does any one have a working example that solved the issue of reading dates prior to 1900. Below is the code that I added but did not work. conf = sparkContext.getConf() conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED") conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED") conf.set("spark.sql.datetime.java8API.enabled", "true")
Very good information 🎉
Thanks