Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks

The Harsh Reality of Being a Data Engineer

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

I started a Dynasty in College Football 25

TRYING TO SELL MY WRECKED ROLLS ROYCE TO THE PRESIDENT OF THE FIA

The Sims 4 Lovestruck: Official Gameplay Trailer

Functional Data Engineering - A Set of Best Practices | Lyft

Data Council

Просмотров 77 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 июл 2024
Download slides: www.datacouncil.ai/talks/func...
ABOUT THE TALK:
Batch data processing (also known as ETL) is time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot.
In this talk, we’ll discuss functional programming paradigm and explore how applying it to Data Engineering can bring a lot of clarity to the process. It helps solving some of the inherent problems of ETL, leads to more manageable and maintainable workloads and helps to implement reproducible and scalable practices. It empowers data teams to tackle larger problems and push the boundaries of what’s possible.
ABOUT THE SPEAKER:
Maxime Beauchemin works as a Senior Software Engineer at Lyft where he develops open source products that reduce friction and help generate insights from data. He is the creator and a lead maintainer of Apache Airflow [incubating], a data pipeline workflow engine; and Apache Superset [incubating], a data visualization platform; and is recognized as a thought leader in the data engineering field.
Before Lyft, Maxime worked at Airbnb on the "Analytics & Experimentation Products team". Previously, he worked at Facebook on computation frameworks powering engagement and growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft.
ABOUT DATA COUNCIL:
Data Council (www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: / datacouncilai
LinkedIn: / datacouncil-ai
Наука

Комментарии • 24

@ChristopherSlattery1980 5 лет назад ⁺¹¹
Interesting talk. Nice to see a tech video where they display the slides as well as the speaker.
@boringmanager9559 4 года назад ⁺²
wow, I started to understand those concepts so much better now
@moverecursus1337 Год назад ⁺¹
Interesting the approach to the slowing change dimension. Data storage nowadays are cheap, the time and price of engineering is a lot more expensive.
@hgiagiamou 6 лет назад ⁺²
Great talk!
@VajoLukic 5 лет назад ⁺⁶
It is such a fantastic talk! It summarizes all the good practices for modern data management and puts them into the right perspective so that "old school" BI/DW people can understand.
@millerblaine2047 2 года назад
i dont mean to be so off topic but does anyone know of a trick to log back into an Instagram account??
I stupidly lost the login password. I would appreciate any tips you can offer me!
@striker865 6 лет назад ⁺¹
Awesome talk! Thanks for taking the time to share some best practices from top the talent!
@ariesykes1432 2 года назад ⁺⁶
Really interesting video! It definitely relates to what I am working on currently. I work in IT recruiting. I'm always looking for a data engineers for my jobs and this video helps me understand a little more about what data engineers have to face in their field. Thank you!
@nhuwlaftooi Год назад
Really great talk! Help me to understand many new concepts
@cyclogenisis 2 года назад ⁺³
Overall not a bad presentation. Although, I do not think using Presto with 1 million per day snapshot on dimensions is viable (specifically out of box). Presto likes to put all joins onto the main fact into memory, modelling this way and using Presto isn't really in line with how it was meant to be used if your joining all the records each day. But in general, I do like the shift of keeping everything in dimensions if the technology allows for it.
Edit: The Q&A section talked more to this, he adapted his answer to "apply common sense"
@chinmayarankalle4389 Год назад
Great efforts thanks @maxime
@lambuth 4 года назад ⁺¹
Great talk. And quite the resemblance to the singer from 311.
@karangupta_DE 2 года назад
A persistent staging area might only be effective when there is one place where your raw data resides. For example if you stage your data on AWS S3 and then copy the data again to snowflake, you will end up having two places of storage with the same raw data.
@qwaszx822 5 лет назад ⁺³
Thanks a lot for the video. I have a question. For modern datawarehouse solutions are there any other data models emerged apart from starschema, snowflake ?
@dragonfly4484 5 лет назад ⁺⁴
data vault and incremental approach are kind of emerging as norms in some quarters
@lbb2rfarangkiinok 2 года назад
Talked a bout a lot of interesting stuff but I had a hard time really figuring out what is meant by functional engineering. Each individual topic was very clear for me, but not the red thread linking it all together.
@sspaeti 6 лет назад ⁺²
Thanks for sharing your great knowledge with us again. One question about snapshotting let's say daily. What if multiple changes in dim_supplier happen, then you wouldn't catch these once. Would that just be a small tradeoff you would accept or would you have something else in mind to track that?
@mistercrunch 6 лет назад ⁺¹⁰
If keeping track of intra-day changes is important I'd recommend denormalizing the dimension attribute into the fact table. Conceptually if some dimensional attributes are "flickering", meaning going back and forth and changing fairly often, it's pointing towards that attribute being logically attached to the fact and not to the dimension.
@kyosungchoo1436 5 лет назад
I think you use changing history to catch this changing, then as Maxime Beauchemin, this changing history should be in fact table.
@allthatyouare 5 месяцев назад
🤯
@marigoldx22 6 лет назад
Maha Guru!
@borat_trades2860 3 года назад ⁺²
honestly did not get much from this talk - pretty technical
@santanubaishya8357 4 года назад
Great talk!

Следующие

Автовоспроизведение

Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks

Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks

The Harsh Reality of Being a Data Engineer

The Harsh Reality of Being a Data Engineer

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

Data Engineering Principles - Build frameworks not pipelines - Gatis Seja

I started a Dynasty in College Football 25

I started a Dynasty in College Football 25

TRYING TO SELL MY WRECKED ROLLS ROYCE TO THE PRESIDENT OF THE FIA

TRYING TO SELL MY WRECKED ROLLS ROYCE TO THE PRESIDENT OF THE FIA

The Sims 4 Lovestruck: Official Gameplay Trailer

The Sims 4 Lovestruck: Official Gameplay Trailer

Gladiator II | Official Trailer (2024 Movie) - Paul Mescal, Pedro Pascal, Denzel Washington

Gladiator II | Official Trailer (2024 Movie) - Paul Mescal, Pedro Pascal, Denzel Washington

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Data Engineering and Data Science: Bridging the Gap | DataEDGE 2016

Data Engineering and Data Science: Bridging the Gap | DataEDGE 2016

Mindset Shift in Revenue Growth Management | Buynomics Webinar

Mindset Shift in Revenue Growth Management | Buynomics Webinar

Cloud Data Warehouse Benchmark Redshift vs Snowflake vs BigQuery | Fivetran

Cloud Data Warehouse Benchmark Redshift vs Snowflake vs BigQuery | Fivetran

СОБЕСЕДОВАНИЕ В СБЕР SEASONS || DATA ENGINEER

СОБЕСЕДОВАНИЕ В СБЕР SEASONS || DATA ENGINEER

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital

Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital

How to become a Data Analyst in 2024? - No sugar-coated advice from IKEA data analyst

How to become a Data Analyst in 2024? — No sugar-coated advice from IKEA data analyst

The Future of Data Engineering in a Post-AI World

The Future of Data Engineering in a Post-AI World

Product Link in Bio ( # 1636 ) @MaviGadgets ✅ Smart Universal Magnetic Car Phone Holder

Product Link in Bio ( # 1636 ) @MaviGadgets ✅ Smart Universal Magnetic Car Phone Holder

Как думаете, КС потянет? 😂 #shorts #gaming #pc #asus #cs2 #csgo

Как думаете, КС потянет? 😂 #shorts #gaming #pc #asus #cs2 #csgo

Samsung Galaxy Unpacked 2024 - Презентация Galaxy Watch Ultra, Buds 3, Galaxy Ring, Fold 6

Samsung Galaxy Unpacked 2024 - Презентация Galaxy Watch Ultra, Buds 3, Galaxy Ring, Fold 6

Как правильно выключать звук на телефоне?

Как правильно выключать звук на телефоне?

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

ПЫШНЫЙ СМАРТФОН с 36 ГБ оперативы? 😲 DOOGEE V Max Plus за 1 минуту

ПЫШНЫЙ СМАРТФОН с 36 ГБ оперативы? 😲 DOOGEE V Max Plus за 1 минуту

Как работает экосистема Apple?

Как работает экосистема Apple?

⚡️Супер БЫСТРАЯ Зарядка | Проверка

⚡️Супер БЫСТРАЯ Зарядка | Проверка