Netflix Data
Netflix Data
  • Видео 48
  • Просмотров 388 598
Unbundling the Data Warehouse: The Case for Independent Storage
Speaker: Jason Reid (Co-founder & Head of Product at Tabular)
This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. Unbundling a data warehouse means splitting it into constituent and modular components that interact via open standard interfaces. In this talk, Jason Reid discusses the pros and cons of both data warehouse bundling and unbundling in terms of performance, governance, and flexibility, and he examines how the trend of data warehouse unbundling will impact the data engineering landscape in the next 5 years.
If you are interested in attending a future Data Engineering Open Forum, we highly recommend you join our Google Group (groups.google.com/g/data-enginee...
Просмотров: 5 099

Видео

Reflections on Building a Data Platform From the Ground Up in a Post-GDPR World.
Просмотров 1,7 тыс.6 месяцев назад
Speaker: Jessica Larson (Data Engineer & Author of “Snowflake Access Control”) This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. The requirements for creating a new data warehouse in the post-GDPR world are significantly different from those of the pre-GDPR world, such as the need to prioritize sensitive data protection and regulatory compliance over performance and c...
Data Productivity at Scale
Просмотров 1,8 тыс.6 месяцев назад
Speaker: Iaroslav Zeigerman (Co-Founder and Chief Architect at Tobiko Data) This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. The development and evolution of data pipelines are hindered by outdated tooling compared to software development. Creating new development environments is cumbersome: Populating them with data is compute-intensive, and the deployment process i...
Automating the Data Architect: Generative AI for Enterprise Data Modeling
Просмотров 8 тыс.6 месяцев назад
Speaker: Jide Ogunjobi (Founder & CTO at Context Data) This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. As organizations accumulate ever-larger stores of data across disparate systems, efficiently querying and gaining insights from enterprise data remain ongoing challenges. To address this, we propose developing an intelligent agent that can automatically discover, m...
Real-Time Delivery of Impressions at Scale
Просмотров 2,5 тыс.6 месяцев назад
Speaker: Tulika Bhatt (Senior Data Engineer at Netflix) This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. Netflix generates approximately 18 billion impressions daily. These impressions significantly influence a viewer’s browsing experience, as they are essential for powering video ranker algorithms and computing adaptive pages, With the evolution of user interfaces t...
Welcome Address for the Data Engineering Open Forum 2024
Просмотров 1,1 тыс.6 месяцев назад
Max Schmeiser (Vice President of Studio and Content Data Science & Engineering) extends a warm welcome to all attendees, marking the beginning of our inaugural Data Engineering Open Forum. If you are interested in attending a future Data Engineering Open Forum, we highly recommend you join our Google Group (groups.google.com/g/data-engineering-open-forum) to stay tuned to event announcements.
Machine Learning Powered Auto Remediation in Netflix Data Platform
Просмотров 1,9 тыс.6 месяцев назад
Speakers: Stephanie Vezich Tamayo (Senior Machine Learning Engineer at Netflix) Binbing Hou (Senior Software Engineer at Netflix) This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. At Netflix, hundreds of thousands of workflows and millions of jobs are running every day on our big data platform, but diagnosing and remediating job failures can impose considerable operat...
Data Quality Score: How We Evolved the Data Quality Strategy at Airbnb
Просмотров 3,6 тыс.7 месяцев назад
Speaker: Clark Wright (Staff Analytics Engineer at Airbnb) This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. Recently, Airbnb published a post to their Tech Blog called Data Quality Score: The next chapter of data quality at Airbnb. In this talk, Clark Wright shares the narrative of how data practitioners at Airbnb recognized the need for higher-quality data and then ...
Netflix Data Engineering Tech Talks - Media Data for ML Studio Creative Production
Просмотров 3,4 тыс.Год назад
In the last 2 decades, Netflix has revolutionized the way video content is consumed, however, there is significant work to be done in revolutionizing how movies and tv shows are made. In this video, Sr. Data Engineers Amanual Kahsay and Dao Mi showcase how data and insights are being utilized to accomplish such a vision. #netflix #datascience #dataengineering #etl #bigdata
Netflix Data Engineering Tech Talks - Start Stop Continue for optimizing complex ETL jobs
Просмотров 3,8 тыс.Год назад
Judit Lantos, Data Engineer, Member Experience Data Engineering, shares a case study to demonstrate an effective approach for optimizing complex ETL jobs. #netflix #datascience #dataengineering #etl #bigdata
Netflix Data Engineering Tech Talks - Psyberg, An Incremental ETL Framework Using Iceberg
Просмотров 5 тыс.Год назад
Abhinaya Shetty and Bharath Mummadisetty, Data Engineers from Netflix’s Membership Data Engineering team, introduce Psyberg, an incremental ETL framework. Learn about how Psyberg leverages Iceberg metadata to handle late-arriving data, and improves data pipelines while simplifying on-call life! #netflix #datascience #dataengineering #etl #bigdata
Netflix Data Engineering Tech Talks - Knowledge Management - Leveraging Institutional Data
Просмотров 3,8 тыс.Год назад
Tristan Reid, software engineer, shares experiences about the Knowledge Management project at Netflix, which seeks to leverage language modeling techniques and metadata from internal systems to improve the impact of the more than 100,000 memos that circulate within the company #netflix #datascience #dataengineering #etl #bigdata
Netflix Data Engineering Tech Talks - Building Reliable Data Pipelines
Просмотров 9 тыс.Год назад
Holden Karau, OSS Engineer, Data Platform Engineering, talks about the importance of reliable data pipelines and how to build them covering tools from testing to validation and auditing. The talk uses Apache Spark as an example, but the concepts generalize regardless of your specific tools. Some related projects include: github.com/holdenk/spark-testing-base github.com/unionai-oss/pandera githu...
Netflix Data Engineering Tech Talks - Streaming SQL on Data Mesh
Просмотров 7 тыс.Год назад
Mark Cho, Guil Pires and Sujay Jain, Engineers from Data Platform talk about how a managed Streaming SQL using Apache Flink can help unlock new Stream Processing use cases at Netflix. You can read more about Data Mesh, Netflix's next generation stream processing platform, here: netflixtechblog.com/data-mesh-a-data-movement-and-processing-platform-netflix-1288bcab2873 #netflix #datascience #data...
Netflix Data Engineering Tech Talks - Data Processing Patterns
Просмотров 13 тыс.Год назад
Lee Woodridge and Pallavi Phadnis, Data Engineers at Netflix, talk about how you can apply different processing strategies for your batch pipelines by implementing generic abstractions to help scale, be more efficient, handle late-arriving data, and be more fault tolerant. #netflix #datascience #dataengineering #etl #bigdata
Netflix Data Engineering Tech Talks - The Netflix Data Engineering Stack
Просмотров 36 тыс.Год назад
Netflix Data Engineering Tech Talks - The Netflix Data Engineering Stack
Welcome to the world of Data Engineers at Netflix
Просмотров 25 тыс.3 года назад
Welcome to the world of Data Engineers at Netflix
2021 Apache Flink Meetup - Hosted by Netflix
Просмотров 6 тыс.3 года назад
2021 Apache Flink Meetup - Hosted by Netflix
Flink Meetup at Netflix (Los Gatos) - January 28, 2020
Просмотров 1,1 тыс.4 года назад
Flink Meetup at Netflix (Los Gatos) - January 28, 2020
Apache Cassandra Meetup Hosted by Netflix
Просмотров 2,3 тыс.5 лет назад
Apache Cassandra Meetup Hosted by Netflix
Netflix Meetup - #SheRules in Big Data
Просмотров 1,3 тыс.5 лет назад
Netflix Meetup - #SheRules in Big Data
Women in Big Data Meetup - Hosted by Netflix (Chicago)
Просмотров 7486 лет назад
Women in Big Data Meetup - Hosted by Netflix (Chicago)
Netflix hosts the Women in Big Data Organization in Los Gatos and Chicago
Просмотров 2 тыс.6 лет назад
Netflix hosts the Women in Big Data Organization in Los Gatos and Chicago
Druid Meetup hosted by Netflix (11/14/2018)
Просмотров 4,1 тыс.6 лет назад
Druid Meetup hosted by Netflix (11/14/2018)
#Sherules with Machine Learning
Просмотров 1,2 тыс.6 лет назад
#Sherules with Machine Learning
Netflix Research: Machine Learning Platform
Просмотров 3,8 тыс.6 лет назад
Netflix Research: Machine Learning Platform
Netflix Research: Analytics
Просмотров 7 тыс.6 лет назад
Netflix Research: Analytics
Netflix Research: Experimentation & Causal Inference
Просмотров 8 тыс.6 лет назад
Netflix Research: Experimentation & Causal Inference
Netflix Research: Machine Learning
Просмотров 6 тыс.6 лет назад
Netflix Research: Machine Learning
Netflix Research: Recommendations
Просмотров 5 тыс.6 лет назад
Netflix Research: Recommendations

Комментарии

  • @CoconutPete
    @CoconutPete 19 дней назад

    Netflix hires H1B slave labor over Americans it seems

  • @StephenvanWijk
    @StephenvanWijk Месяц назад

    Qeeries? Queries.

  • @janesmith4833
    @janesmith4833 Месяц назад

    Brilliant presentation.

  • @shuhuaxu4816
    @shuhuaxu4816 2 месяца назад

    stop acting like a clown, get to the point and make an efficient presentation.

  • @debpriyaseal3538
    @debpriyaseal3538 3 месяца назад

    I have a stupid question. If we write the data by processing time and not landing time. Then wouldn't it automatically get written in the right partition irrespective of which hour it got processed. Esp. in case of stateless eg: signup events pipeline.

  • @RaviMenu
    @RaviMenu 3 месяца назад

    Im curious to know instead of handshake mechanism for finding the when was the last updated time for the given profile, why not use redis cache for the last updatedTime ?

  • @rembautimes8808
    @rembautimes8808 4 месяца назад

    Very good talk . Always useful to hear these talks from practitioners.

  • @rembautimes8808
    @rembautimes8808 4 месяца назад

    Interesting idea to adopt design patterns for data engineering

  • @rembautimes8808
    @rembautimes8808 4 месяца назад

    Quite interesting that a survey was performed so that the points were grounded by data

  • @harshasaitammineni8150
    @harshasaitammineni8150 5 месяцев назад

    How can we connect with Betty Li? Any LinkedIn please?

  •  6 месяцев назад

    Can you enable captioning of the videos?

  • @reachkarthikt
    @reachkarthikt 6 месяцев назад

    Whats the point of posting this without proper capture

  • @TheImmaculate84
    @TheImmaculate84 6 месяцев назад

    This is really cool. Great high level summary of what there is to know about modelling and AI. Thanks Jide

  • @Babulal32218
    @Babulal32218 6 месяцев назад

    Thanks for sharing the internals at such details level. QQ as per initial design it was mentioned that for quick response request goes to key value data store and then towards the end it was mentioned those requests are catered by Cassandra.Also nowhere in actual design the request is going to impression table as shown in initial design

  • @AlohaEru
    @AlohaEru 6 месяцев назад

    Buffet of choices also confusing and fattening. There is beauty in a few lean options. Who likes a complex menu of this or that?

  • @AntonBryzgalov
    @AntonBryzgalov 6 месяцев назад

    Too few details on how the model was actually trained. One of the slides says OpenAI -> Weaviate. How does it actually happen? Weaviate is just a database after all: how the queries towards it are built? A blogpost with details will be highly appreciated. The idea is great but some additional technical details have to be disclosed.

  • @jonassteinberg3779
    @jonassteinberg3779 6 месяцев назад

    Exceedingly high level

  • @djangoworldwide7925
    @djangoworldwide7925 6 месяцев назад

    I'd expect Netflix's channel to upload better resolution and size of the slides. :/

  • @jonassteinberg3779
    @jonassteinberg3779 6 месяцев назад

    "my ducatti is current broken again" lollll

  • @VikrantVerma22
    @VikrantVerma22 6 месяцев назад

    Can you share the paper/blog link please? thx.

  • @ak8376
    @ak8376 6 месяцев назад

    Very detailed talk, extremely informative.. Great work Tulika!

  • @josenavio6445
    @josenavio6445 6 месяцев назад

    amazing

  • @cstephens16
    @cstephens16 6 месяцев назад

    awesome presentation and perfect timing for me. i have to give a presentation in a few days explaining all this new composability in data/database word and why developers should care about it.

  • @NeeruGautam-fp2ej
    @NeeruGautam-fp2ej 6 месяцев назад

    Good job keep it up 👍

  • @madhuriroy4005
    @madhuriroy4005 6 месяцев назад

    Veri nice 👍

  • @theukulelegod
    @theukulelegod 6 месяцев назад

    Ahhh I wish we had the slides in this one 😢

    • @iaroslavzeigerman9876
      @iaroslavzeigerman9876 6 месяцев назад

      The slides kick in around the 8th minute. So the viewers miss out on some memes, but the core parts of the talk are still there 😂

  • @bcroy8924
    @bcroy8924 6 месяцев назад

    Very nice presentation. Keep it up.

  • @ishakaushal1390
    @ishakaushal1390 6 месяцев назад

    Very informative, keep up 👍

  • @sangeetaprasad1879
    @sangeetaprasad1879 6 месяцев назад

    Great

  • @suhaniahuja7631
    @suhaniahuja7631 6 месяцев назад

    Great ! ❤

  • @musicalPartner
    @musicalPartner 6 месяцев назад

    Great! Informative 👍👍

  • @labsanta
    @labsanta 6 месяцев назад

    The Struggle of Enterprise Data Modeling: A Data Architect's Journey [03:27](ruclips.net/video/DtzIIVJq8wA/видео.html) Transitioning from data architect to generative AI expert - Discussing journey as a data professional over 16 years, focusing on data modeling and architecture roles at various companies - Detailing challenges faced as a data architect in managing data schemas, infrastructure, and collaborating with developers on data placement [06:54](ruclips.net/video/DtzIIVJq8wA/видео.html) Challenges with disparate data and maintaining consistency in large organizations. - Data was duplicated and scattered across different teams, leading to difficulties in answering questions. - Complex processes of pulling and joining data from disparate systems and writing code for data consistency and unification. [10:21](ruclips.net/video/DtzIIVJq8wA/видео.html) Automating data discovery, mapping, and integrations for a unified and accessible data view. - The AI agent automates data mapping, integrations across multiple organizations, and discovers data and relationships. - It also interprets metadata, infers data types and constraints, builds an ontological model, and continuously updates the model. [13:48](ruclips.net/video/DtzIIVJq8wA/видео.html) Automating data architecture through generative AI - Data modeling involves logical and physical perspectives including entity attributes, relationships, inventory, structure definition, and data population. - Data collection sources range from Postgres, S3, Data Lake, operational systems like Salesforce and Zendesk, involving querying, schema inference, and reverse engineering SQL code. [17:15](ruclips.net/video/DtzIIVJq8wA/видео.html) Using generative AI to enhance data modeling and querying - The process involved building a data ontology and pushing it into a vector database, specifically Weavio, to enable querying and building multiple levels of relationships - The aim was to provide a user-friendly experience by enabling free text search without the need to build a separate model [20:42](ruclips.net/video/DtzIIVJq8wA/видео.html) Generative AI interprets queries for quick data access - Generative AI interprets user queries accurately - Feedback loop ensures data accuracy and user satisfaction [24:09](ruclips.net/video/DtzIIVJq8wA/видео.html) Automated model updates and data tracking for improved decision-making - Ensures agents can learn and adapt by monitoring ontology and updating models with new data sources. - Removes the need for explicit database specifications, enabling intuitive free text search for better decision-making. [27:35](ruclips.net/video/DtzIIVJq8wA/видео.html) Automating Data Architect with Generative AI - Implemented Snowflake data warehouse for executives to improve data queries and comparisons - Considering enhancing system with knowledge graph and open AI integration for better results

  • @autkarsh8830
    @autkarsh8830 6 месяцев назад

    Quite an elaborate insights into inpressions🎉

  • @tanushreebhatt6779
    @tanushreebhatt6779 6 месяцев назад

    Informative, good job!

  • @agammishra9674
    @agammishra9674 8 месяцев назад

    Great content, learnt a lot....I wanted to know [ any viewer can answer as well if they got the answer] , how they ensured that in their SQS has no duplicates ? also, if batches are 10 mins apart, can't we use HWM table in OLTP systems to ensure we get ACID complaint ???

  • @Dom-zy1qy
    @Dom-zy1qy 8 месяцев назад

    Hello netflix, i would like to be hired by you guys. I am a slightly below average software engineer. Maybe you guys could let me sweep the floors or something? I can be a FAANG janitor! Would just ask for maybe like $11 an hour plus a salad from the cafeteria maybe. I look forward to hearing back from you guys.

  • @grawss
    @grawss 9 месяцев назад

    Wtf is this guy wearing?

  • @mahesh26sai
    @mahesh26sai 9 месяцев назад

    Thanks for sharing this to public!

  • @joswinpinto360
    @joswinpinto360 9 месяцев назад

    Gonna be in the team soon!!

  • @aditya_pawar
    @aditya_pawar 11 месяцев назад

    Wow, Must see video for every reliable data engineer!

  • @aditya_pawar
    @aditya_pawar 11 месяцев назад

    What is High play starts in the example for Context specific Audits @11:30

  • @svdfxd
    @svdfxd 11 месяцев назад

    How I wish

  • @ed7470
    @ed7470 11 месяцев назад

    Intro dope afff

  • @iirdna
    @iirdna 11 месяцев назад

    how you avoiding too high tide of a changes? meaning - is any late data arriving triggers Psyberg? even just few thousand of rows? or you accumulating changes at some sort of gates/elevators and process when enough late data accumulated to justify downstream reprocessing?

  • @elricofr
    @elricofr 11 месяцев назад

    Thanks for sharing. For the comparison between Extractor pattern and DRY principle, it stands but it's not exactly the same driver: DRY principle in programming is to avoid to replicate the same logic - as code - multiple times (to avoid repetitions and incoherences). And this logic can be applied multiple times during the run. Here, the goal is to avoid repetitions for the run itself.

  • @mayjoec
    @mayjoec 11 месяцев назад

    Is there any way you can enable transcript on the youtube video

  • @SergioSicre
    @SergioSicre 11 месяцев назад

    will they open source Maestro like Airbnb/Airflow??

  • @vipinahuja2996
    @vipinahuja2996 11 месяцев назад

    very smart, using new Acronyms for old Audit tables.

  • @TamilSelvanSS
    @TamilSelvanSS Год назад

    Straight up talk 👏

  • @Mario-yd3ht
    @Mario-yd3ht Год назад

    Where can I download this slide?