Great tutorial!! Really very helpful for any AWS developer willing to learn Glue. Can you please create a video on AWE Data Pipeline with comparisons between the two services?
Hi all, We are using AWS Glue + PySpark to perform ETL to a destination RDS PostgreSql DB. Destination tables have columns with primary & foreign keys with UUID data type. We are failing to populate these destination UUID type columns. How can we achieve this, please suggest.
I am not sure what error you are getting. ETL job has to respect table level column constraint. As long as you are doing it; there should not be a problem.
Hi, I want to create glue studio connection with snowflake using any scripting language. It can be created using UI method, however want to create it using either terraform , cloudformation etc. Please help.
Apologies for the late response due to my summer break. I made one video about using SQL Transform in Glue Studio. Here is the link - ruclips.net/video/JoB6uarC0SE/видео.html Hope it helps,
Could you introduce about AWS Glue Spark UI with Job and Dev Endpoint (in Sagemaker) for monitoring Spark processes??? I want to know how to make Spark history server in AWS!
@@AWSTutorialsOnline Hi, currently a user can create a workflow using built in transformation & available connectors on the glue editor. Automatic code is also generated. My question is, using AWS cdk cloud formation, can we write such code and deploy and a glue workflow is created and we can open that workflow in editor on Aws console. ?
@@DineshKumar-cu3bg You can use CDK or CloudFormation for creating Glue Resources. Please check this - docs.aws.amazon.com/glue/latest/dg/populate-with-cloudformation-templates.html
Very clear. Good job. Thks a lot Maybe you can add some few steps to explain how the ouput data can be the consumes with Athena or/and with Quicksight.
Hi Thanks for the feedback. I do have another video which talks about consuming data with Athena. Please have a look - ruclips.net/video/l5Hz2qkp4K0/видео.html Please let me know if you want to cover anything else from the Athena point of view. Meanwhile - made note for the QuickSight - will come back with a demo for using QuickSight with Data Lake.
Hi, I've two problems 1). when I run the crawler on an S3 bucket where i've put the data (CSV file with pipe '|' delimited) , it doesn't put the name of the column in output schema neither it asks if the 1st row is header or not. So instead of actual column name it create col0, col1, .. and so on. how to tackle this problem? 2). if a folder contain multiple csv files and different kind of data the corwler creates only one table which appends all the csv files data into one. How to control these?
Hello Amit Thanks for the questions and apologies for the delayed response. 1) for the first part, you need to use crawler with a custom classifier. Please check this video of mine which will help - ruclips.net/video/-3Itap4FPHI/видео.html 2) for the second part, you might want to combine the similar files in different folders and make sure you check "Create a single schema for each S3 path." in the crawler, Please read "How to Create a Single Schema for Each Amazon S3 Include Path" section in the link here - docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html?icmpid=docs_glue_console
For snowflake, Glue provides custom connector. Please check this link - aws.amazon.com/blogs/big-data/performing-data-transformations-using-snowflake-and-aws-glue/ For data steam / click stream data, you can use Amazon Kinesis or AWS MSK for data ingestion.
It seems you cannot control file name in the target nodes. There are two ways you can resolve the name - 1) When the file is written in S3 bucket --> raise event and call Lambda function --> the lambda function will rename the file 2) In Glue job - use Custom Transform Node and use code like the following to write file with the name you want glueContext.create_dynamic_frame.from_options( 's3', {'paths': ['s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv']}, 'csv', {'withHeader': True})
One of the first AWS GLUE studio explanation which is very simple and can be followed - thank you for sharing
Glad it was helpful!
Great tutorial!! Really very helpful for any AWS developer willing to learn Glue.
Can you please create a video on AWE Data Pipeline with comparisons between the two services?
Hi all,
We are using AWS Glue + PySpark to perform ETL to a destination RDS PostgreSql DB. Destination tables have columns with primary & foreign keys with UUID data type. We are failing to populate these destination UUID type columns. How can we achieve this, please suggest.
I am not sure what error you are getting. ETL job has to respect table level column constraint. As long as you are doing it; there should not be a problem.
Hi, I want to create glue studio connection with snowflake using any scripting language. It can be created using UI method, however want to create it using either terraform , cloudformation etc. Please help.
to be honest - I need to check feasibility of it especially because of SnowFlake
A very clean and concise video! Keep up the good work
Much appreciated!
Very nice tutorial, easy to follow and understand, thank you!
Glad to hear that!
can you please make video on moving glue code to prod using CI CD
Nice tutorial!! , what about Spark SQL Transform ?
Apologies for the late response due to my summer break.
I made one video about using SQL Transform in Glue Studio. Here is the link - ruclips.net/video/JoB6uarC0SE/видео.html
Hope it helps,
Good content framed so nicely.. thanks !
Glad you liked it!
Could you introduce about AWS Glue Spark UI with Job and Dev Endpoint (in Sagemaker) for monitoring Spark processes??? I want to know how to make Spark history server in AWS!
Sure - will work on it.
Is there a way we write the code and it crestes a workflow on editor?
Can you please elaborate your question?
@@AWSTutorialsOnline Hi, currently a user can create a workflow using built in transformation & available connectors on the glue editor. Automatic code is also generated. My question is, using AWS cdk cloud formation, can we write such code and deploy and a glue workflow is created and we can open that workflow in editor on Aws console. ?
@@DineshKumar-cu3bg You can use CDK or CloudFormation for creating Glue Resources. Please check this - docs.aws.amazon.com/glue/latest/dg/populate-with-cloudformation-templates.html
Very clear. Good job. Thks a lot
Maybe you can add some few steps to explain how the ouput data can be the consumes with Athena or/and with Quicksight.
Hi
Thanks for the feedback. I do have another video which talks about consuming data with Athena. Please have a look - ruclips.net/video/l5Hz2qkp4K0/видео.html
Please let me know if you want to cover anything else from the Athena point of view. Meanwhile - made note for the QuickSight - will come back with a demo for using QuickSight with Data Lake.
a tip: you can watch movies on flixzone. Me and my gf have been using them for watching all kinds of movies lately.
@Jamison Khalil definitely, I've been watching on Flixzone for since december myself :)
Hi, I've two problems 1). when I run the crawler on an S3 bucket where i've put the data (CSV file with pipe '|' delimited) , it doesn't put the name of the column in output schema neither it asks if the 1st row is header or not. So instead of actual column name it create col0, col1, .. and so on. how to tackle this problem? 2). if a folder contain multiple csv files and different kind of data the corwler creates only one table which appends all the csv files data into one. How to control these?
Can I get the resolution to this??
Hello Amit
Thanks for the questions and apologies for the delayed response.
1) for the first part, you need to use crawler with a custom classifier. Please check this video of mine which will help - ruclips.net/video/-3Itap4FPHI/видео.html
2) for the second part, you might want to combine the similar files in different folders and make sure you check "Create a single schema for each S3 path." in the crawler, Please read "How to Create a Single Schema for Each Amazon S3 Include Path" section in the link here - docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html?icmpid=docs_glue_console
@@AWSTutorialsOnline thanks sir. Let me try again
Hi Amit, did it help?
if you can pls share similar videos on Redshift , MySQL using GLUE studio
Sure Sachin.
awesome bro
For snowflake, Glue provides custom connector. Please check this link - aws.amazon.com/blogs/big-data/performing-data-transformations-using-snowflake-and-aws-glue/
For data steam / click stream data, you can use Amazon Kinesis or AWS MSK for data ingestion.
@@AWSTutorialsOnline thank you . I will check on this
Can we migrate informatica XML files to AWS glue studio?
Hi, no - informatica XML files to Glue Studio migration is not there. You will have to reengineer.
is it possible to re-name target file name in S3 - right now it defaults it to run-DataSinkXXXXXXXXX
It seems you cannot control file name in the target nodes. There are two ways you can resolve the name -
1) When the file is written in S3 bucket --> raise event and call Lambda function --> the lambda function will rename the file
2) In Glue job - use Custom Transform Node and use code like the following to write file with the name you want
glueContext.create_dynamic_frame.from_options(
's3',
{'paths': ['s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv']},
'csv',
{'withHeader': True})
Thank you
You're welcome