How to delete Large Number of Objects from AWS S3 using AWS Glue Job

Поделиться
HTML-код
  • Опубликовано: 5 янв 2025

Комментарии • 22

  • @CarlosMaldonado-q4s
    @CarlosMaldonado-q4s 2 месяца назад

    Excelente, muchas gracias por tomarte el tiempo para que nosotros aprendamos

  • @ramnathjayachandran2774
    @ramnathjayachandran2774 2 года назад

    Thanks a lot. I literally had to once run an EMR job to just delete 10 million data objects. This upgrade to AWS Glue would be something I would definitely try out.

  • @dhirendrabhattarai2333
    @dhirendrabhattarai2333 Год назад

    How do I provide the S3 path if there are multiple folders/sub folders within the bucket and I wish to delete the content of some specific folder

  • @stevenzhang6585
    @stevenzhang6585 2 года назад

    This is fantastic demo and it resolved one of my project's issue

    • @SoumilShah
      @SoumilShah  2 года назад

      Thanks man glad to hear that )

  • @ΓιωργοςΚαλ
    @ΓιωργοςΚαλ 2 года назад

    I am dealing the following issue. When I try to load only the data from the new inserted files from S3 to Redshift, using the Job bookmarks, the Data catalog tables contain dupliactes values. How to resolve that?
    Note: The scenario is that I am receiving one file per day to S3, and this file contains data from the old files plus the new data.

  • @dreamingaboutouterspace3878
    @dreamingaboutouterspace3878 2 года назад

    Issue with this is it doesn't delete the folder of the objects. It also doesn't deal with versioning. Anyone got an answer for that?

  • @jean-pierrefortin3190
    @jean-pierrefortin3190 7 месяцев назад

    After 20 minutes - I get: Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o98.purgeS3Path. Unable to execute HTTP request: The target server failed to respond. I have been trying to figure it out but it happens everytime after 22 minutes. I was able to run the job earlier and it worked on 2 million objects.

  • @basilikpe643
    @basilikpe643 2 года назад

    Hi Soumil! Thanks for the video. Please I'd like to know if it's possible to use a Glue job to delete data from AWS Aurora MySQL, and how?

  • @deerich36
    @deerich36 2 года назад

    You are awesome. I needed this.

  • @abhiganta
    @abhiganta 2 года назад

    Excellent job, is there way we use this job to delete only old files, say files older than 30 days.

  • @jarias1406
    @jarias1406 2 года назад

    nice but for a reason it doesn’t delete the “folders” just regular files

  • @shiroyashazoro
    @shiroyashazoro 2 года назад

    hey tried this method.. but when gluejob run completed throwing error purge object access denied.. gave IAM roles s3 full access still its showing error .. anything am missing.. Thanks

    • @SoumilShah
      @SoumilShah  2 года назад

      You missing iam try giving admin access

  • @basilikpe643
    @basilikpe643 2 года назад

    Also can this be done with a cron job periodically?

    • @SoumilShah
      @SoumilShah  2 года назад

      Yes

    • @basilikpe643
      @basilikpe643 2 года назад

      @@SoumilShah Please I'd like to know if it's possible to use a Glue job to delete data from AWS Aurora MySQL, and how?

  • @javascript_developer
    @javascript_developer 2 года назад

    I created this aws glue script and it shows succeed after 2 mins. But I still see those data inside that bucket. Its a bucket containg logs only and it has over 25 million with 100+ gb. I updated bucket name. glueContext.purge_s3_path("s3://bucketname/",
    {"retentionPeriod": 0, "excludeStorageClasses": ["STANDARD_IA"],
    "manifestFilePath": "s3://bucketname/"}
    )

    • @javascript_developer
      @javascript_developer 2 года назад

      Running the script 6 times deleted all items in that bucket. Thanks.

    • @chaitanyaashah1455
      @chaitanyaashah1455 Год назад

      @@javascript_developer It didn't delete everything because you ran 6 times, it got deleted with your first delete only. It takes sometime to purge huge data.

    • @javascript_developer
      @javascript_developer Год назад

      @@chaitanyaashah1455 Thank you for your reply. It got delete after few days automatically.