AWS Glue PySpark: Upserting Records into a Redshift Table

DataEng Uncomplicated

Просмотров 8 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 янв 2025

Комментарии • 29

@asfakmp7244 7 месяцев назад
Thanks for the video! I've tested the entire workflow, but I'm encountering an issue with the section on creating a DynamicFrame from the target Redshift table in the AWS Glue Data Catalog and displaying its schema. While I can see the updated schema reflected in the Glue catalog table, the code you provided still prints the old schema.
@tuankyou9158 Год назад
Thanks for sharing your solution 😍😍
@DataEngUncomplicated Год назад
you're welcome!
@critical11creator Год назад ⁺¹
Amazing tutorials! Truly haven't seen such drilled down stuff for a while. Is there a native PySpark course, perhaps, in the making? :) I'm certain it will be very appreciated by many if such a course existed on this channel
@DataEngUncomplicated Год назад
Thank you for your kind words! I have been slowly adding pyspark related content but I don't have a full course in the making, I wish I had more time!
@sudhanshushekhar-gj4os 2 месяца назад
Super video can you tell me how to incorporate same thing if we want to process multiple table at a time
@mariumbegum7325 Год назад
Great explanation 😀
@DataEngUncomplicated Год назад
Thanks Marium!
@ashishsinha5338 Год назад
good explanation regards to stagging.
@DataEngUncomplicated Год назад
Thanks Ashish!
@muralikrishnavattikunta8466 5 месяцев назад
can you do a simple video from s3 to oracle data migration
@vivek2319 Год назад
Please make more such diverse videos with what-if scenarios..
@DataEngUncomplicated Год назад
Hi Vivek, can you provide me some examples of what you are thinking?
@JotibaAvatade-z8r 16 дней назад
I have a file when i do upsert with same file the data is increasing what i should do
@DataEngUncomplicated 16 дней назад
It sounds like you might have defined the primary key incorrectly. As a result it is treating it as a new record and inserting instead of updating
@datagufo Год назад
Hi Adriano, first of all thanks for the amazing series of tutorials. They are really clear and detailed.
I am trying to implement the UPSERT into Redshift using AWS Glue, but I am getting what seems to be an odd problem.
If I run my glue script from the notebook (it is actually a copy-paste from your notebook, with minor adaptations to make it work with my data and setup), when writing to Redshift the "preactions" and "postactions" are ignored, meaning that I end up with just a `staging` table that never gets deleted and to which data are simply appended. And no `target` table is ever created.
Have you ever had such a problem. I could not find any solution online and I do not understand why your code would work for you and not in my case.
Thanks again!
@DataEngUncomplicated Год назад
Ciao Alberto, thanks!
Hmm I think I might have had this happen to me before. Can you check to make sure you haven't misspelt any of the parameters, I think I'd there is an error it would ignore it the preactions.
@datagufo Год назад
Ciao Adriano (@@DataEngUncomplicated)!
Thanks a lot for your reply. I also thought that might be the case, but it does not seem like it is.
I really tried to copy & paste your code. Moreover, that happens also with code generated with the Visual Editor, which I assume having the correct syntax.
I was wondering whether it could be related to the permissions of the role that is used to run the script, but I do not see why it would be allow writing data in the table, but not the SQL preaction ...
In the meantime, I really enjoyed your other video about local development, and it really helps to keep dev costs down and to speed up significantly the devel cycle.
@DataEngUncomplicated Год назад
Did you check to make sure your user in the database has permissions to create and drop a table? Maybe your user only has read/write access?
@mohammadfatha7740 Год назад
I followed the same steps but it's throwing error like id column is integer and you are trying to query is varying
@DataEngUncomplicated Год назад
Hey, it sounds like you might have different data types in your column. You perhaps think its an int but there is actually some strings in there.
@rambandi4330 Год назад
Thanks for the video, Does this work for RDS oracle ?
@DataEngUncomplicated Год назад
Im not sure. I haven't worked with RDS oracle but in theory it should.
@rambandi4330 Год назад
@@DataEngUncomplicated Thanks for the response👍
@NasimaKhatun-jb7qo 10 месяцев назад
Hi Where are you running the code
@DataEngUncomplicated 10 месяцев назад
Hi, I'm running my code locally using an interactive glue session.
@NasimaKhatun-jb7qo 10 месяцев назад
I am trying my hands on how to run the code locally, can you create some video on how to run glue jobs locally(notebook version).. setup and configuration
@DataEngUncomplicated 10 месяцев назад
I actually have many videos on this, for example see this one: ruclips.net/video/__j-SyopVBs/видео.html. You can setup docker to run glue in there or use interactive sessions but that will cost compute in aws since you are just connecting to the cluster remotely but you can use a jupyter notebook to do this
@NasimaKhatun-jb7qo 10 месяцев назад
Yes I have seen that video and had the same impression of cost.
I am trying to setup local where I can use local spark(AWS glue) and let's say jupyter notebook. Also if locally I will be able to connect s3 and other services.
Do you recommend other way to work locally? Also how this setup can be done .. trying since long ,not getting success

Следующие

Автовоспроизведение

Query Redshift Table with SQL in Python | AWS SDK for Pandas