Thanks a lot. I literally had to once run an EMR job to just delete 10 million data objects. This upgrade to AWS Glue would be something I would definitely try out.
I am dealing the following issue. When I try to load only the data from the new inserted files from S3 to Redshift, using the Job bookmarks, the Data catalog tables contain dupliactes values. How to resolve that? Note: The scenario is that I am receiving one file per day to S3, and this file contains data from the old files plus the new data.
After 20 minutes - I get: Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o98.purgeS3Path. Unable to execute HTTP request: The target server failed to respond. I have been trying to figure it out but it happens everytime after 22 minutes. I was able to run the job earlier and it worked on 2 million objects.
hey tried this method.. but when gluejob run completed throwing error purge object access denied.. gave IAM roles s3 full access still its showing error .. anything am missing.. Thanks
I created this aws glue script and it shows succeed after 2 mins. But I still see those data inside that bucket. Its a bucket containg logs only and it has over 25 million with 100+ gb. I updated bucket name. glueContext.purge_s3_path("s3://bucketname/", {"retentionPeriod": 0, "excludeStorageClasses": ["STANDARD_IA"], "manifestFilePath": "s3://bucketname/"} )
@@javascript_developer It didn't delete everything because you ran 6 times, it got deleted with your first delete only. It takes sometime to purge huge data.
Excelente, muchas gracias por tomarte el tiempo para que nosotros aprendamos
Thanks a lot. I literally had to once run an EMR job to just delete 10 million data objects. This upgrade to AWS Glue would be something I would definitely try out.
How do I provide the S3 path if there are multiple folders/sub folders within the bucket and I wish to delete the content of some specific folder
This is fantastic demo and it resolved one of my project's issue
Thanks man glad to hear that )
I am dealing the following issue. When I try to load only the data from the new inserted files from S3 to Redshift, using the Job bookmarks, the Data catalog tables contain dupliactes values. How to resolve that?
Note: The scenario is that I am receiving one file per day to S3, and this file contains data from the old files plus the new data.
Issue with this is it doesn't delete the folder of the objects. It also doesn't deal with versioning. Anyone got an answer for that?
After 20 minutes - I get: Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o98.purgeS3Path. Unable to execute HTTP request: The target server failed to respond. I have been trying to figure it out but it happens everytime after 22 minutes. I was able to run the job earlier and it worked on 2 million objects.
Hi Soumil! Thanks for the video. Please I'd like to know if it's possible to use a Glue job to delete data from AWS Aurora MySQL, and how?
You are awesome. I needed this.
Thanks man
Excellent job, is there way we use this job to delete only old files, say files older than 30 days.
nice but for a reason it doesn’t delete the “folders” just regular files
hey tried this method.. but when gluejob run completed throwing error purge object access denied.. gave IAM roles s3 full access still its showing error .. anything am missing.. Thanks
You missing iam try giving admin access
Also can this be done with a cron job periodically?
Yes
@@SoumilShah Please I'd like to know if it's possible to use a Glue job to delete data from AWS Aurora MySQL, and how?
I created this aws glue script and it shows succeed after 2 mins. But I still see those data inside that bucket. Its a bucket containg logs only and it has over 25 million with 100+ gb. I updated bucket name. glueContext.purge_s3_path("s3://bucketname/",
{"retentionPeriod": 0, "excludeStorageClasses": ["STANDARD_IA"],
"manifestFilePath": "s3://bucketname/"}
)
Running the script 6 times deleted all items in that bucket. Thanks.
@@javascript_developer It didn't delete everything because you ran 6 times, it got deleted with your first delete only. It takes sometime to purge huge data.
@@chaitanyaashah1455 Thank you for your reply. It got delete after few days automatically.