For me clearSource and sourceArchive is not working ,files are not getting archived and archive folder is not getting created.whta colud be the issues?
Please check this link to check if you are setting all parameters as per requirement - spark.apache.org/docs/latest/structured-streaming-programming-guide.html
Hi, in my example, I had to set schema for streaming input file. I figured it out, but i'm wondering if it was my mistake on my part, or if your env configuration was diffrent and allows streaming without setting a schema?
We specify schema in case of Streaming data to make sure the events are not malformed. But if you still want to infer the schema on run time, you can set spark.sql.streaming.schemaInference to true
@@easewithdata Oh sorry, I watched the video again and now I see your comment about schemaInference. Anyway, thanks for the reply and keep going because you are doing a good job!
thanks very much for the clip, very helpful, but i have two questions, my jupter notebook didn't show the left panel as the direcotry. and the write steam appeared to take forever, even it wrote to csv file. how to solve this please?
Thanks ❤️ If you like my content, please make sure to share the same with your LinkedIn network 🛜 For write stream taking forever, can you share the code.
clearSource and sourceArchiveDir not working, files are not archived from input folder, still stands there, no archive folder is being created on running the same code, though the streaming works perfectly fine, what could be the possible reasons for it? for content: really helpful, to the point with actual use case, thanks for putting up such informative content
very informative, please make more projects on streaming
For me clearSource and sourceArchive is not working ,files are not getting archived and archive folder is not getting created.whta colud be the issues?
Please check this link to check if you are setting all parameters as per requirement - spark.apache.org/docs/latest/structured-streaming-programming-guide.html
Hi, in my example, I had to set schema for streaming input file. I figured it out, but i'm wondering if it was my mistake on my part, or if your env configuration was diffrent and allows streaming without setting a schema?
We specify schema in case of Streaming data to make sure the events are not malformed. But if you still want to infer the schema on run time, you can set spark.sql.streaming.schemaInference to true
@@easewithdata Oh sorry, I watched the video again and now I see your comment about schemaInference. Anyway, thanks for the reply and keep going because you are doing a good job!
static_df = spark.read.json("/home/jovyan/spark-streaming/data/input/device_files/")
inferred_schema = static_df.schema
# Print the inferred schema
static_df.printSchema()
spark.conf.set("spark.sql.streaming.SchemaInference",True)
streaming_df = (
spark
.readStream
.schema(inferred_schema)
.option("cleanSource","archive")
.option("sourceArchiveDir","archive_der")
.option("maxFilesPerTrigger", 1)
.format("json")
.load("/home/jovyan/spark-streaming/data/input/device_files/")
thanks very much for the clip, very helpful, but i have two questions, my jupter notebook didn't show the left panel as the direcotry. and the write steam appeared to take forever, even it wrote to csv file. how to solve this please?
Thanks ❤️ If you like my content, please make sure to share the same with your LinkedIn network 🛜
For write stream taking forever, can you share the code.
clearSource and sourceArchiveDir not working, files are not archived from input folder, still stands there, no archive folder is being created on running the same code, though the streaming works perfectly fine, what could be the possible reasons for it?
for content: really helpful, to the point with actual use case, thanks for putting up such informative content
If possible can you paste your code here