07 Spark Streaming Read from Files | Flatten JSON data

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 12

  • @gautamKumar-dg3ss
    @gautamKumar-dg3ss 7 месяцев назад +2

    very informative, please make more projects on streaming

  • @revathinp4551
    @revathinp4551 5 месяцев назад +1

    For me clearSource and sourceArchive is not working ,files are not getting archived and archive folder is not getting created.whta colud be the issues?

    • @easewithdata
      @easewithdata  5 месяцев назад

      Please check this link to check if you are setting all parameters as per requirement - spark.apache.org/docs/latest/structured-streaming-programming-guide.html

  • @kamilstolarz7017
    @kamilstolarz7017 7 месяцев назад

    Hi, in my example, I had to set schema for streaming input file. I figured it out, but i'm wondering if it was my mistake on my part, or if your env configuration was diffrent and allows streaming without setting a schema?

    • @easewithdata
      @easewithdata  7 месяцев назад

      We specify schema in case of Streaming data to make sure the events are not malformed. But if you still want to infer the schema on run time, you can set spark.sql.streaming.schemaInference to true

    • @kamilstolarz7017
      @kamilstolarz7017 7 месяцев назад +1

      @@easewithdata Oh sorry, I watched the video again and now I see your comment about schemaInference. Anyway, thanks for the reply and keep going because you are doing a good job!

    • @jayantmeshram7370
      @jayantmeshram7370 3 месяца назад +1

      static_df = spark.read.json("/home/jovyan/spark-streaming/data/input/device_files/")
      inferred_schema = static_df.schema
      # Print the inferred schema
      static_df.printSchema()
      spark.conf.set("spark.sql.streaming.SchemaInference",True)
      streaming_df = (
      spark
      .readStream
      .schema(inferred_schema)
      .option("cleanSource","archive")
      .option("sourceArchiveDir","archive_der")
      .option("maxFilesPerTrigger", 1)
      .format("json")
      .load("/home/jovyan/spark-streaming/data/input/device_files/")

  • @user-eg1ss7im6q
    @user-eg1ss7im6q 6 месяцев назад

    thanks very much for the clip, very helpful, but i have two questions, my jupter notebook didn't show the left panel as the direcotry. and the write steam appeared to take forever, even it wrote to csv file. how to solve this please?

    • @easewithdata
      @easewithdata  5 месяцев назад

      Thanks ❤️ If you like my content, please make sure to share the same with your LinkedIn network 🛜
      For write stream taking forever, can you share the code.

  • @vishalalagh1031
    @vishalalagh1031 6 месяцев назад

    clearSource and sourceArchiveDir not working, files are not archived from input folder, still stands there, no archive folder is being created on running the same code, though the streaming works perfectly fine, what could be the possible reasons for it?
    for content: really helpful, to the point with actual use case, thanks for putting up such informative content

    • @easewithdata
      @easewithdata  6 месяцев назад

      If possible can you paste your code here