Thanks for the video! I've tested the entire workflow, but I'm encountering an issue with the section on creating a DynamicFrame from the target Redshift table in the AWS Glue Data Catalog and displaying its schema. While I can see the updated schema reflected in the Glue catalog table, the code you provided still prints the old schema.
Amazing tutorials! Truly haven't seen such drilled down stuff for a while. Is there a native PySpark course, perhaps, in the making? :) I'm certain it will be very appreciated by many if such a course existed on this channel
Hi Adriano, first of all thanks for the amazing series of tutorials. They are really clear and detailed. I am trying to implement the UPSERT into Redshift using AWS Glue, but I am getting what seems to be an odd problem. If I run my glue script from the notebook (it is actually a copy-paste from your notebook, with minor adaptations to make it work with my data and setup), when writing to Redshift the "preactions" and "postactions" are ignored, meaning that I end up with just a `staging` table that never gets deleted and to which data are simply appended. And no `target` table is ever created. Have you ever had such a problem. I could not find any solution online and I do not understand why your code would work for you and not in my case. Thanks again!
Ciao Alberto, thanks! Hmm I think I might have had this happen to me before. Can you check to make sure you haven't misspelt any of the parameters, I think I'd there is an error it would ignore it the preactions.
Ciao Adriano (@@DataEngUncomplicated)! Thanks a lot for your reply. I also thought that might be the case, but it does not seem like it is. I really tried to copy & paste your code. Moreover, that happens also with code generated with the Visual Editor, which I assume having the correct syntax. I was wondering whether it could be related to the permissions of the role that is used to run the script, but I do not see why it would be allow writing data in the table, but not the SQL preaction ... In the meantime, I really enjoyed your other video about local development, and it really helps to keep dev costs down and to speed up significantly the devel cycle.
I am trying my hands on how to run the code locally, can you create some video on how to run glue jobs locally(notebook version).. setup and configuration
I actually have many videos on this, for example see this one: ruclips.net/video/__j-SyopVBs/видео.html. You can setup docker to run glue in there or use interactive sessions but that will cost compute in aws since you are just connecting to the cluster remotely but you can use a jupyter notebook to do this
Yes I have seen that video and had the same impression of cost. I am trying to setup local where I can use local spark(AWS glue) and let's say jupyter notebook. Also if locally I will be able to connect s3 and other services. Do you recommend other way to work locally? Also how this setup can be done .. trying since long ,not getting success
Thanks for the video! I've tested the entire workflow, but I'm encountering an issue with the section on creating a DynamicFrame from the target Redshift table in the AWS Glue Data Catalog and displaying its schema. While I can see the updated schema reflected in the Glue catalog table, the code you provided still prints the old schema.
Thanks for sharing your solution 😍😍
you're welcome!
Amazing tutorials! Truly haven't seen such drilled down stuff for a while. Is there a native PySpark course, perhaps, in the making? :) I'm certain it will be very appreciated by many if such a course existed on this channel
Thank you for your kind words! I have been slowly adding pyspark related content but I don't have a full course in the making, I wish I had more time!
Super video can you tell me how to incorporate same thing if we want to process multiple table at a time
Great explanation 😀
Thanks Marium!
good explanation regards to stagging.
Thanks Ashish!
can you do a simple video from s3 to oracle data migration
Please make more such diverse videos with what-if scenarios..
Hi Vivek, can you provide me some examples of what you are thinking?
I have a file when i do upsert with same file the data is increasing what i should do
It sounds like you might have defined the primary key incorrectly. As a result it is treating it as a new record and inserting instead of updating
Hi Adriano, first of all thanks for the amazing series of tutorials. They are really clear and detailed.
I am trying to implement the UPSERT into Redshift using AWS Glue, but I am getting what seems to be an odd problem.
If I run my glue script from the notebook (it is actually a copy-paste from your notebook, with minor adaptations to make it work with my data and setup), when writing to Redshift the "preactions" and "postactions" are ignored, meaning that I end up with just a `staging` table that never gets deleted and to which data are simply appended. And no `target` table is ever created.
Have you ever had such a problem. I could not find any solution online and I do not understand why your code would work for you and not in my case.
Thanks again!
Ciao Alberto, thanks!
Hmm I think I might have had this happen to me before. Can you check to make sure you haven't misspelt any of the parameters, I think I'd there is an error it would ignore it the preactions.
Ciao Adriano (@@DataEngUncomplicated)!
Thanks a lot for your reply. I also thought that might be the case, but it does not seem like it is.
I really tried to copy & paste your code. Moreover, that happens also with code generated with the Visual Editor, which I assume having the correct syntax.
I was wondering whether it could be related to the permissions of the role that is used to run the script, but I do not see why it would be allow writing data in the table, but not the SQL preaction ...
In the meantime, I really enjoyed your other video about local development, and it really helps to keep dev costs down and to speed up significantly the devel cycle.
Did you check to make sure your user in the database has permissions to create and drop a table? Maybe your user only has read/write access?
I followed the same steps but it's throwing error like id column is integer and you are trying to query is varying
Hey, it sounds like you might have different data types in your column. You perhaps think its an int but there is actually some strings in there.
Thanks for the video, Does this work for RDS oracle ?
Im not sure. I haven't worked with RDS oracle but in theory it should.
@@DataEngUncomplicated Thanks for the response👍
Hi Where are you running the code
Hi, I'm running my code locally using an interactive glue session.
I am trying my hands on how to run the code locally, can you create some video on how to run glue jobs locally(notebook version).. setup and configuration
I actually have many videos on this, for example see this one: ruclips.net/video/__j-SyopVBs/видео.html. You can setup docker to run glue in there or use interactive sessions but that will cost compute in aws since you are just connecting to the cluster remotely but you can use a jupyter notebook to do this
Yes I have seen that video and had the same impression of cost.
I am trying to setup local where I can use local spark(AWS glue) and let's say jupyter notebook. Also if locally I will be able to connect s3 and other services.
Do you recommend other way to work locally? Also how this setup can be done .. trying since long ,not getting success