State Schema Evolution in PySpark using applyInPandasWithState - 2024.01.25

Поделиться
HTML-код
  • Опубликовано: 14 окт 2024
  • In this video, Craig Lukasik, a Senior Specialist Solutions Architect at Databricks, will cover state schema evolution in streaming. Delta Lake handles schema evolution. But what if your state which is used in stateful Structured Streaming needs to evolve? This video helps you understand the nuances of schemas in stateful Structured Streaming and provides a strategy for evolving state schema. The focus is on PySpark and the applyInPandasWithState operator. applyInPandas allows users to perform intricate operations while preserving the state. This is invaluable when dealing with multiple records from different streams. The video also goes over a detailed demo including data generation, building pipelines using the medallion architecture and the use of applyInPandas. Craig drops a ton of tips along the way, so make sure you watch the video in entirety!
    Target audience: PySpark Data Engineers
    ►[Documentation] Learn more about applyInPandas here - spark.apache.o...
    ►[Github] applyInPandas example code - github.com/cra...
    ►[Slides] Slides from the video - drive.google.c...
    ►[Documentation] Optimize stateful Structured Streaming queries - docs.databrick...
    ►[Blog] Python Arbitrary Stateful Processing in Structured Streaming - www.databricks...
    ►[Documentation] Performance Improvements for Stateful Pipelines in Apache Spark Structured Streaming - www.databricks...
    ►[Community feed] Scaling Pandas with Databricks - community.data...
    ►[Product] Learn more about Databricks here - www.databricks...
    ►Learn/connect with the speaker here - / clukasik
    ► Discover more about Databricks in the Skill Builder Series here - • Skill Builder for Data...
    #databricks #dataengineering #streaming #structuredstreaming #applyinpandas #statefulstreaming #data #lakehouse #medallionarchitecture #dataengineer #spark #delta

Комментарии •