Importing CSV files from S3 into Redshift with AWS Glue

Majestic.cloud

Просмотров 81 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 1 янв 2025

Комментарии • 59

@anandd3081 4 года назад ⁺¹
Just amazing...! Cant put more words about this wonderful video..pls keep adding more videos...thank you Sir.
@tarvinder91 4 года назад
this is amazing. you explained why are we using such setup too. Great job
@illiprontimusic9764 4 года назад ⁺¹
Geez.... Thanks, it works. Although this is complicated as hell. Using Azure ADF, this video would be 10 mins max... it is WAY easier!
@Leonquenix 2 года назад
This helped a lot, my friend. Thank you very much!
@joshvanathan1000 5 лет назад ⁺⁷
Clear explanation from the scratch 😉😁 .
Thank you !!
@vigneshjaisankar7087 2 года назад
Thanks for the content.
I'm new to AWS, If you clarify this questions that would be more helpful.
From my understanding
S3 is source, redshift is target
first we have to create a table in redshift
second we read the data from s3 and create a datastore in glue
third we read the schema from redshift and create a datastore
then we are creating a job to connect the s3datastore and redshift to move the data from s3 to redshift
when we run the job the data get copied from s3 to redshift
connection is used to connect with redshift cluster to run the job.
Is my understanding corrent?
@johnfromireland7551 4 года назад ⁺¹
Advice for Majestic Cloud : Pause your screen recorder earlier in the video while your are waiting for services provisioning to occur.Apart from that you flashed through the various screens and it is far too tiny to see.
@Majesticcloud 4 года назад
Thank you for the advice. I will take it in consideration for future videos and when I will do a remake for this one.
@manojkumarchallagund 3 года назад
Hi, could you tell me how do we take the Glue to job to the higher environment? For Example, taking an export copy of a job from the Development environment to SIT environment ? is there any option for that?
@keshavamugulursrinivasiyen5502 3 года назад
Is it possible to execute this scenario in Free Tier account?
@RaushanKumar_nitks 4 года назад
Great Explanation from Scratch. Thanks you very much !!!
@crescentbabu7855 Год назад ⁺¹
Great tutorial
@Majesticcloud Год назад
Glad you think so!
@misatran1107 3 года назад
Hello Mr..thank for your video...hmm...I wonder why I should create crawler from redshift to a db....
I think create a job to transfer from s3 to redshift is enough
@qs6190 4 года назад ⁺¹
Thank you ! Very nice and concise !
@Majesticcloud 4 года назад
Glad it was helpful!
@vamsikrishna4691 4 года назад
Good explanation, very clear ... Thank You!!
@anandd3081 4 года назад ⁺³
Thank you Sir...Very useful video, tried implementing myself today..however facing issue when 'testing the connection' to redshift - it prompts "myRedshiftConnection failed. VPC S3 endpoint validation failed for SubnetId nnnnnnnn VPC: vpc-1234abcd21. Reason: Could not find S3 endpoint or NAT gateway for subnetId: subnet-1234abcd13 in Vpc vpc-1234abcd1 .
@SunilBholaTrader 3 года назад
create endpoint in vpc for s3
@keshavamugulursrinivasiyen5502 3 года назад
@@SunilBholaTrader Even i got the same error, however i created the end point for S3 (interface as well as gateway), still the same error. any Suggestions would be appreciate.
@SunilBholaTrader 3 года назад
@@keshavamugulursrinivasiyen5502 create self referencing security group
@keshavamugulursrinivasiyen5502 3 года назад
@@SunilBholaTrader thanks will try and let you know
@keshavamugulursrinivasiyen5502 3 года назад
@sunil Bhola, got it, i tried and it is working fine
@priyankasuneja4781 4 года назад
share example where loading from csv to database but when there is change in existing record then update and if new record insert.job bookmarks just insert incase of updates too...we want to update the existing record
@SunilBholaTrader 3 года назад
database dont have feature for update bookmark.. but files have
@tusharmayekar9072 4 года назад
How we can encode column also select dist key or sort key. Could you please share video on it? This is very nice explaination. Thanks
@v1ct0rx24 4 года назад
how about from dynamodb to s3? pls help
@user-ix1ob1hr1b 4 года назад
Hi, do you need to have the target table created in Redshift before creating the job?
@BabAcademy 4 года назад
yea most likely
@SunilBholaTrader 3 года назад
job can create table or if table exists - on "target" screen choose that
@piyushsonigra1979 4 года назад
How to rewrite existing or glue only take incremental data from s3 file ? no luck with bookmark
@Majesticcloud 4 года назад
Could you explain a bit more in detail what you're trying to accomplish ? I'm not sure I understood the question exactly.
@priyankasuneja4781 4 года назад
hey piyush..what approach did u used to accomplish the upsert task
@vijayravi1189 4 года назад ⁺¹
@@priyankasuneja4781 First create a staging table in Redshift with the same schema as the original table, but with no data. Then copy the data present in S3 into the staging table. Delete the common rows present in both staging and original table (say, you might be pulling data from the past 2 weeks). Then insert the data from staging table into original table. Now you will see extra new rows in the original table. This is a classic case of leveraging transaction capability in Redshift.
merge_qry = """
begin ;
copy mysql_dwh_staging.orders from 's3://mysql-dwh-52820/current/orders.csv'
iam_role 'arn:aws:I am:::role/redshift-test-role'
CSV QUOTE '\"' DELIMITER ','
acceptinvchars;
delete
from
mysql_dwh.orders
using mysql_dwh_staging.orders
where mysql_dwh.orders.order_id = mysql_dwh_staging.orders.order_id ;
insert into mysql_dwh.orders select * from mysql_dwh_staging.orders;
truncate table mysql_dwh_staging.orders;
end ;
"""
result = db.query(merge_qry)
@SunilBholaTrader 3 года назад
stage table in redshift .. populate it and then update main table
@adityanjsg99 2 года назад
To the point!! No nonsense
@KhangNguyen-iz9pb 4 года назад
Thanks for the helpful video! I am wondering wheter we make the job to store to redshift without the crawler to redshift before?
@SunilBholaTrader 3 года назад
crawler is to populate glue catalog with table metadata.. job do all ETL stuff
@keshavamugulursrinivasiyen5502 3 года назад
@Majesticcloud
Can you /anyone help to load the Date type (input file = MM/DD/YYYY) column into redshift using GLUE. I mean how to update in GLUE script. Appreciate your help
@makrantim 4 года назад
if CSV file has integer , is that handled by the glue and redshift as I get error. Thanks. Video example has double
@SunilBholaTrader 3 года назад
you have the option to modify the schema.. that why data cleanse comes first - before load
@advaitz 4 года назад
How to create glue workflow without console.
@SunilBholaTrader 3 года назад
glue workflow GUI is there.. else use SDK
@FP-mg5qk 4 года назад
Is it possible to do this but with DynamoDB instead of Redshift?
@SunilBholaTrader 3 года назад
any thing with odbc/jdbc and many other options
@timothyzee6592 5 лет назад
can i use MS mySQL to load the data base to AWS?
@SunilBholaTrader 3 года назад
export mysql to file and upload to s3
@mujtabahussain2293 4 года назад
Very useful. thanks a lot
@vsr1727 3 года назад ⁺¹
Thank you 👍🙏👌
@karennatally8409 2 года назад
thank you soo much!
@MrLangam 4 года назад ⁺¹
Thank you kind sir.
@jigri_pokhri 5 лет назад
Legend
@dalwindersingh5902 4 года назад ⁺¹
LOOKS CONFUSING and WRONG - why at the end you are showing data from taxable_csv table ( because etl job job setting shows taxable_csv is the source ), target is productdb_public_taxables_csv..... you should show target data productdb_public_taxables_csv....i dont understand the role of productdb_public_taxables_csv.
@sabanaar 4 года назад ⁺¹
Thanks a lot, very clear and useful explanations !!!

Следующие

Автовоспроизведение