Great video, was really helpful. Just curious we can also use Glue instead of the aws batch right? as we wont need any kind of docker/ec2 setup here and glue jobs runs more than 15 minutes.
hey, can you elaborate a little? I'm currently fixated on an automation task which is handled using s3 buckets and lambda for now but now, I've got more files for the data processing task, so I'm thinking of extending my current use case to aws batch as well as glue (majorly for their comparative methods). I'll greatly appreciate any help and/or insights. Thanks so much!
@KnowledgeAmplifier1 is there any chance in future will see apache flink tutorial . I am looking for this tutorial . Can you please let me know any good documents or videos are available for this course. Thanks advance
Hello @kalpatarusahoo844, have already created multiple videos on different concepts related to AWS Kinesis Family , you can refer this playlist -- ruclips.net/p/PLjfRmoYoxpNrI6CuZt486uDjSvhhODleP&si=p0pIhlOAAq-Xl4ft Stay tuned for more videos
Hi, this setup spike my billing very high, The setup was to build lambda function to read the latest file from the s3 dir and make transformation then finally to s3 target dir, So this all setup with the python script has to run once the s3 notification to lambda function that an file just came to s3. But it went into a loop and made the s3 and lambda billing spike
The issue you're experiencing is likely due to an infinite loop caused by your Lambda function writing an object back to the same S3 bucket, which then triggers the Lambda function again via a put event. This continuous cycle can significantly increase your S3 and Lambda usage, leading to higher billing. To resolve this, please ensure the following: Check the Bucket Configuration: Verify if your Lambda function is writing to the same S3 bucket that triggers the function on a put event. If it is, this is likely causing the infinite loop. Separate Buckets or Folders: Different Buckets: Consider writing the transformed data to a different S3 bucket. Different Folders in the Same Bucket: Alternatively, you can configure two separate folders within the same bucket. Configure the S3 event notification to trigger the Lambda function only when a file is added to the specific source folder. The Lambda function should then write the transformed data to a different target folder. This can be set by specifying the folder name in the bucket prefix in the S3 event setup. By implementing one of these solutions, you can prevent the infinite loop and avoid unnecessary costs.
@@KnowledgeAmplifier1 mostly the setup is same as you explain, the source s3 dir read the latest .gz file and then save to another s3 dir the transformed with truncate if exited so this new transformed csv then merger to the master data csv to anothe s3 dir data file, the only one thing i would have missed here is to assign the prefix and suffix setup. i would have tested while it run and stop, but when i assign the s3 notification to tigger the target destination lambda fucn to execute on a particular time, so here after this does not stop till i delete all my resources.
Thanks for explanation man, I missed the adding the correct role and it was making me crazy. Now it's resolved.
Glad to hear your issue is resolved @surendharselvakumar7606! Happy Learning :-)
Same here 😂 I was trying from last 2-3 hours though GPT but finally bro given me a solution ❤
Amazing class! It was exactly what I needed. Thanks for you time and work!
Glad to hear this @pvshark! Happy Learning
could you pls share the files you have uploaded on the S3 bucket. The one which I'm uploaded seems to throw error somehow.
@ShreyaSingh-so6jg, FYR -- github.com/SatadruMukherjee/Dataset/blob/main/Setosa.csv
@@KnowledgeAmplifier1 Thankyou
You are welcome @@ShreyaSingh-so6jg ! Happy Learning
When to use lambda+batch and when to use event bridge pipe + batch? In this case, wouldn't pipe solve the problem?
Great video, was really helpful. Just curious we can also use Glue instead of the aws batch right? as we wont need any kind of docker/ec2 setup here and glue jobs runs more than 15 minutes.
hey, can you elaborate a little? I'm currently fixated on an automation task which is handled using s3 buckets and lambda for now but now, I've got more files for the data processing task, so I'm thinking of extending my current use case to aws batch as well as glue (majorly for their comparative methods). I'll greatly appreciate any help and/or insights. Thanks so much!
Good tutorial, however you need to slow down a little. You talk way too fast.
@KnowledgeAmplifier1 is there any chance in future will see apache flink tutorial . I am looking for this tutorial .
Can you please let me know any good documents or videos are available for this course.
Thanks advance
Kindly make a video of AWS kinesis
Hello @kalpatarusahoo844, have already created multiple videos on different concepts related to AWS Kinesis Family , you can refer this playlist -- ruclips.net/p/PLjfRmoYoxpNrI6CuZt486uDjSvhhODleP&si=p0pIhlOAAq-Xl4ft
Stay tuned for more videos
@@KnowledgeAmplifier1 I will check. thanks
No problem@@kalpatarusahoo844! Happy Learning
Thanks mate!
You're welcome @alagappan2214! Happy Learning :-)
Hi, this setup spike my billing very high,
The setup was to build lambda function to read the latest file from the s3 dir and make transformation then finally to s3 target dir,
So this all setup with the python script has to run once the s3 notification to lambda function that an file just came to s3.
But it went into a loop and made the s3 and lambda billing spike
Let me knew what is the issue I didn't not noticed first in this setup while running my python script.
The issue you're experiencing is likely due to an infinite loop caused by your Lambda function writing an object back to the same S3 bucket, which then triggers the Lambda function again via a put event. This continuous cycle can significantly increase your S3 and Lambda usage, leading to higher billing.
To resolve this, please ensure the following:
Check the Bucket Configuration: Verify if your Lambda function is writing to the same S3 bucket that triggers the function on a put event. If it is, this is likely causing the infinite loop.
Separate Buckets or Folders:
Different Buckets: Consider writing the transformed data to a different S3 bucket.
Different Folders in the Same Bucket: Alternatively, you can configure two separate folders within the same bucket. Configure the S3 event notification to trigger the Lambda function only when a file is added to the specific source folder. The Lambda function should then write the transformed data to a different target folder. This can be set by specifying the folder name in the bucket prefix in the S3 event setup.
By implementing one of these solutions, you can prevent the infinite loop and avoid unnecessary costs.
@@KnowledgeAmplifier1
mostly the setup is same as you explain, the source s3 dir read the latest .gz file and then save to another s3 dir the transformed with truncate if exited so this new transformed csv then merger to the master data csv to anothe s3 dir data file,
the only one thing i would have missed here is to assign the prefix and suffix setup.
i would have tested while it run and stop, but when i assign the s3 notification to tigger the target destination lambda fucn to execute on a particular time, so here after this does not stop till i delete all my resources.