YouTube Data Analysis | END TO END DATA ENGINEERING PROJECT

Поделиться
HTML-код
  • Опубликовано: 21 дек 2024

Комментарии •

  • @kpyoutuber4671
    @kpyoutuber4671 2 года назад +72

    Thank you to Darshil Parmar!.
    Please note that you deployed the lambda function at 39:00 minutes of the video. It is not mentioned specifically in your explanation.
    If not deployed it will only run the default code which will anyway run successfully with hello-world print.

    • @DarshilParmar
      @DarshilParmar  2 года назад +18

      Yes, I have made mistake while editing the video, lot of people faced this error

    • @haziq7885
      @haziq7885 Год назад +1

      thanks for the solution! i was stuck here for a long time 😅
      Darshil Parmar thanks so much for the video! hopefully this solution can be pinned for others to refer to :)

    • @fahadbakshi5449
      @fahadbakshi5449 Год назад +1

      @@DarshilParmar can you plz provide me a solution of error at 39:00 . i am getting error as
      "{
      "statusCode": 200,
      "body": "\"Hello from Lambda!\""
      }"

    • @DarshilParmar
      @DarshilParmar  Год назад +8

      @@fahadbakshi5449 It's not an error, You need to click on Deploy button

    • @theniyota
      @theniyota Год назад

      This post needs to be pinned. I wasted todays trying to figure out how to make the lambda function work until I decided to go through the comments.

  • @DarshilParmar
    @DarshilParmar  2 года назад +364

    It takes a lot of effort and energy to execute the entire project and record it! I hope you find this useful and make sure you Like this video :)

    • @aritra1414
      @aritra1414 2 года назад +1

      What according to you will be the best resource to understand lambda in depth? I need help on that. I am working on bigdata project, but this was not my domain, learning new things and I need to learn faster. Any leads will be helpful for me. Thanks in advance. Also, please keep producing such awesome contents. Thanks a lot!!

    • @DarshilParmar
      @DarshilParmar  2 года назад +4

      @@aritra1414 check out AWS reinvent videos on RUclips on lambda and read white paper on lambda to understand more

    • @vivekpuurkayastha1580
      @vivekpuurkayastha1580 2 года назад

      Hi Darsheel .. great video as always and yes i did click on like 😀 ... Can you please make a video on how to create a project in Dev environment and then switch to production environment in AWS ..... Basically how to manage the code Lifecycle in AWS from Dev to Production.... or may be you can point to a resource ... Thanks

    • @shashibhushansingh1628
      @shashibhushansingh1628 2 года назад

      By following these steps is im able to build this project in azure

    • @tanmayshinde7853
      @tanmayshinde7853 2 года назад +1

      I respect your efforts but to be honest I didn't understand anything. please me it in simpler way or breakdown it into smaller chunk if you can

  • @shrirajpawar7817
    @shrirajpawar7817 Год назад +76

    For people watching this tutorial now,
    AWS DataWrangler has been changed to AWS SDK for Pandas. Name has been changed but core functionality remains same

    • @vishnuvardhan9082
      @vishnuvardhan9082 10 месяцев назад +1

      thank you so much man, was going nuts on this! how did you know about this?

    • @TheAINoobxoxo
      @TheAINoobxoxo 9 месяцев назад +1

      thanks man much appreciated

    • @nikitha_sirka
      @nikitha_sirka 9 месяцев назад

      @@vishnuvardhan9082 hii,
      When I changed the layer to AWS SDKPandas and modified the code I found the same error
      Error :
      {
      "errorMessage": "Unable to import module 'lambda_function': No module named 'AWSSDKPandas'",
      "errorType": "Runtime.ImportModuleError",
      "stackTrace": []
      }

    • @TanmayMeda
      @TanmayMeda 9 месяцев назад +1

      Thank you very much

    • @iamayuv
      @iamayuv 7 месяцев назад

      thanks bhai

  • @snehakadam16
    @snehakadam16 Год назад +71

    Thank you to Darshil.
    This is for those who facing issues -
    1) Replace awswrangler with awssdkpandas in the code. The rest code remains the same.
    2) Add Layer : AWSDataWrangler-Python3.8 replaced it with AWSSDKPandas-Python3.8 version 10
    3) Create db_youtube_cleaned db using Glue or Athena before running the code.
    4) For Task timed out issue - increasing the memory along with time, for eg. time = 5 min, memory = 512 MB
    Hope this helps :)
    Tip: Guys, please go through the comments, if you are stuck. You will be able to find a solution for sure.

    • @DarshilParmar
      @DarshilParmar  Год назад

      Thanks for putting this in one comment

    • @snehakadam16
      @snehakadam16 Год назад

      Thank you for the amazing tutorial and putting a lot of effort @@DarshilParmar. Looking forward to more projects :)

    • @vamsivenna
      @vamsivenna Год назад

      @snehakadam16 @Darshilparmar facing issues like
      "errorMessage": "Unable to import module 'lambda_function': No module named 'awssdkpandas'",
      "errorType": "Runtime.ImportModuleError",
      "stackTrace": []
      help me out

    • @vamsivenna
      @vamsivenna Год назад

      it worked i think the database name should be = ""db_youtube_cleansed"""
      even awswrangler with AWSSDKPandas-Python3.8 version 10 and memory 256 mb is working fine for me. Thank you.
      But as per the video the database should get created automatically

    • @prathmeshsinha4705
      @prathmeshsinha4705 Год назад

      How did you solved this ?@@vamsivenna

  • @bfkgod
    @bfkgod 2 года назад +27

    Darshil, you have made one of the most valuable DE learning channels on youtube. Keep up the amazing work! Thank you.

    • @DarshilParmar
      @DarshilParmar  2 года назад

      Thanks, will do!

    • @Punithan-rj5ng
      @Punithan-rj5ng 9 месяцев назад

      @@DarshilParmar Function Logs
      START RequestId: 2b020a60-532e-4b33-9933-7cc87b5406cc Version: $LATEST
      An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.
      Error getting object youtube/raw_statistics_reference_data/CA_category_id.json from bucket de-on-youtube-raw-useast1dev. Make sure they exist and your bucket is in the same region as this function.
      LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html
      [ERROR] EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.

  • @ueeabhishekkrsahu
    @ueeabhishekkrsahu Год назад +5

    After consistently working for 2 days, finally done with the project.

    • @sembrueldorinvil4167
      @sembrueldorinvil4167 Год назад

      How did you solve the runtime problem?

    • @ishan358
      @ishan358 Год назад

      ​@@sembrueldorinvil4167same question bro

    • @ishan358
      @ishan358 Год назад

      How did you solve runtime lamda error ?

    • @sembrueldorinvil4167
      @sembrueldorinvil4167 Год назад +2

      Found the answer. Assign memory to your function. I think 500 MB should work.

    • @ishan358
      @ishan358 Год назад

      @@sembrueldorinvil4167 then also same error so much frustrating it is

  • @arpansatpathi9645
    @arpansatpathi9645 2 года назад +15

    Thank you Darshil for this wonderful video. One thing I would like to point out that might help others following this tutorial is whenever you update your lambda function, click on deploy first to actually test your changes. In my case I wasn't getting any errors and later realized that the default hello world code was still running.

    • @DarshilParmar
      @DarshilParmar  2 года назад +4

      Yes, I might have made mistake while editing the video, I did click on deploy and lot of people missed it.
      I will keep this in my mind
      Thank you for the feedback

    • @arpansatpathi9645
      @arpansatpathi9645 2 года назад +2

      @@DarshilParmar while we're at it, could you please give a solution for the EntityNotFoundException that somebody else also pointed out. I'm also getting the same error and haven't been able to resolve it. Tried creating the cleansed database in glue manually but still it is not working. Hope to get a reply.
      Thanks in advance :)

    • @angelnadar1209
      @angelnadar1209 2 года назад

      Thanks Ashutosh ,same thing faced by me .thanks for posting this comment ,its helpful.

    • @angelnadar1209
      @angelnadar1209 2 года назад

      ​@@arpansatpathi9645 was it resolved ?if yes could you post the solution?​

    • @thesevenacoustics
      @thesevenacoustics Год назад

      @@arpansatpathi9645 rename 'db_youtube_cleaned' to 'de_youtube_cleaned' in env varialble

  • @southafricangamer7174
    @southafricangamer7174 3 месяца назад +2

    This project is insane and I'm only halfway. As someone that looked at AWS I was like HUH. You did a great job explaining it so far. :)

  • @ajtam05
    @ajtam05 2 года назад +6

    Just wanted to say thank you to Darshil Parmar for these projects. It's hard to find anything online that helps to this extent from end-to-end. This is great stuff! Cheers! :)

    • @mcaddit6802
      @mcaddit6802 Год назад

      I am unable to proceed further after clicking on test getting err0r:"errorMessage": "'s3_cleansed_layer'",
      "errorType": "KeyError", can anyone pls tell what's the problem?

    • @vinaydhande9926
      @vinaydhande9926 Год назад

      Are You Solve the error@@mcaddit6802

  • @Lapookie
    @Lapookie 2 года назад +25

    (edited) Important note on missing libraires :
    - AWSDataWrangler-Python3.8 is not still available
    - I replaced it with AWSSDKPandas-Python3.8 version 1

    • @anupammathur918
      @anupammathur918 2 года назад +3

      I am having error for database db_youtube_cleaned not found can you please check once?

    • @soumyaranjandash3597
      @soumyaranjandash3597 2 года назад

      @@anupammathur918 same here

    • @anupammathur918
      @anupammathur918 2 года назад +2

      @@soumyaranjandash3597 becz that is not created go to athena nd create one db with that name

    • @mackshonayi943
      @mackshonayi943 2 года назад +1

      @@anupammathur918 Thanks this helped. I created the database in Glue and it worked

    • @sahityamamillapalli6735
      @sahityamamillapalli6735 2 года назад

      @@anupammathur918 can you please elaborate in anthena data sources are there

  • @fabriciomiriani
    @fabriciomiriani 2 года назад +2

    Amazing job - I'm just starting to use AWS because I would like to become a Cloud Engineer and this just incredible. Thank you a lot for your effort !!

  • @janwienke6479
    @janwienke6479 Год назад +6

    Great Project Documentation to try for yourself. One little thing to add would be a rough aws cost estimate. Definitely a thing I would be looking for if I was starting.

  • @SankarJankoti
    @SankarJankoti 2 года назад +2

    Your content on data is pure! No match.

    • @DarshilParmar
      @DarshilParmar  2 года назад

      Thank you

    • @meetpatel1873
      @meetpatel1873 2 года назад

      Did you get charged while using AWS services under free tier?🤔

  • @prikshitbatta
    @prikshitbatta Год назад +5

    Hi, Darshil thanks for this project. Faced a lot of errors but took two days to complete the project. In the end, it is satisfying.😀

    • @vanadin8009
      @vanadin8009 Год назад

      can you give the estimated cost of aws services used in this project it will be of so much help and thank you

    • @vasudevreddy3527
      @vasudevreddy3527 Год назад

      @@vanadin8009 we can do basically for free with free tier AWS account

    • @prasadprojects
      @prasadprojects 3 месяца назад

      @@vanadin8009 Can you answer now ?

    • @ADESHKUMAR-yz2el
      @ADESHKUMAR-yz2el 2 месяца назад

      Hi.. need a help, I have free tier account and if I use Glue will it cost me?, if, yes. then how much for this work.
      thankyou :)

    • @prasadprojects
      @prasadprojects 2 месяца назад +1

      @@ADESHKUMAR-yz2el for this project overall in one year less than $5

  • @mananyadav6401
    @mananyadav6401 2 года назад +8

    Amazing @darshil ....It is clearly visible how much effort u have put in for ppt , video reording , storyboarding and including small small nuances and error that could be potentially faced.
    It can't express in words how valuable it is and how much information you are providing for the community. Really inspiring and motivating.
    Someone in other comment rightly mentioned It is a pure gem on RUclips

  • @vitoriagarcia9876
    @vitoriagarcia9876 Год назад +3

    These tutorials are so helpful for me! And also, they show how much effort on production you put into them. Thank you so much, Darshil!

  • @rohanchoudhary672
    @rohanchoudhary672 Год назад +1

    Took me 5 days to complete this video with hands on.
    But these were all worth it.
    - Complete noob me

    • @ishan358
      @ishan358 Год назад

      Bro help me to solve run time error @rohanchoudhary672

  • @idhwanibhatt
    @idhwanibhatt 2 года назад +7

    Thank you so much Darshil for this video. We need more such project based learning in data engineering instead of just cliche theory. 😆

    • @DarshilParmar
      @DarshilParmar  2 года назад +2

      Yes, more videos like this is coming

    • @mcaddit6802
      @mcaddit6802 Год назад

      I am unable to proceed further after clicking on test getting err0r:"errorMessage": "'s3_cleansed_layer'",
      "errorType": "KeyError", can anyone pls tell what's the problem?

  • @jerichocruz29
    @jerichocruz29 2 года назад +3

    Darshil, this is an amazingly executed project and it was easy to follow. Thanks for taking the time to put this together. Great channel.

  • @mustafamujahid3964
    @mustafamujahid3964 3 месяца назад

    THANKS A LOT DARSHIL FOR HELPING IN MY FIRST DATA ENGINEERING PROJECT.

  • @ajitagalawe8028
    @ajitagalawe8028 2 года назад

    Best video I have seen so far for the end to end project in big data. Thanks!

  • @teja_surya
    @teja_surya 2 года назад +1

    This project and you explaining it in a simple and elaborate way was awesome. Keep them coming!

  • @lloydwang8108
    @lloydwang8108 2 года назад +1

    hey Darshil, I rarely comment but just wanted to say a big thank u. This helped me out a lot! Looking forward to more of such content in the future :D

  • @revathil8986
    @revathil8986 8 месяцев назад

    Excellent explanation. Each and every step is easy to follow and understandable.

  • @mackshonayi943
    @mackshonayi943 2 года назад

    Thank you Darshil, I completed this part successfully. Your content is invaluable may God bless you

    • @fahadbakshi5449
      @fahadbakshi5449 Год назад

      bro i got stuck at 30:00 minute can u plz help me

    • @ishan358
      @ishan358 Год назад

      How did you solve runtime error

  • @imsdengineer
    @imsdengineer 2 года назад

    Great Stuff, You Rock boey! The entire video was very much intuitive and I must say that without a shadow of doubt that all the nitty gritties of Data is discussed in this, heading for the second part now. Worth a ⌚

  • @Watson22j
    @Watson22j Год назад +1

    Hey Darshil, if you were a beginner how would you write such long code in lemda? I am a beginner and wondering how will I be able to write such codes

  • @____prajwal____
    @____prajwal____ Год назад +8

    FYI - Now AWS Wrangler has been renamed to AWS SDK Pandas

    • @reypaulobae4895
      @reypaulobae4895 Год назад +1

      Lifesaver ! Thanks

    • @mayurkumar23
      @mayurkumar23 10 месяцев назад

      prajwal, I am getting this error:
      {
      "errorMessage": "Glue table does not exist in the catalog. Please pass the `path` argument to create it.",
      "errorType": "InvalidArgumentValue",
      "stackTrace": [
      " File \"/var/task/lambda_function.py\", line 40, in lambda_handler
      raise e
      ",
      " File \"/var/task/lambda_function.py\", line 27, in lambda_handler
      wr_response = wr.s3.to_parquet(
      ",
      " File \"/opt/python/awswrangler/_config.py\", line 735, in wrapper
      return function(**args)
      ",
      " File \"/opt/python/awswrangler/_utils.py\", line 178, in inner
      return func(*args, **kwargs)
      ",
      " File \"/opt/python/awswrangler/s3/_write_parquet.py\", line 719, in to_parquet
      return strategy.write(
      ",
      " File \"/opt/python/awswrangler/s3/_write.py\", line 313, in write
      raise exceptions.InvalidArgumentValue(
      "
      ]
      }
      Please help.

  • @ArnavMondal14
    @ArnavMondal14 Год назад +1

    Great video. Loved it and helped me build my resume. Would love to do what you do today and freelance

  • @aminearguig3114
    @aminearguig3114 7 месяцев назад +1

    I have a question and i hope you will answer,
    is these services free in aws free tier

  • @fayssalelaazouziai1573
    @fayssalelaazouziai1573 Год назад +1

    Thank you so much Darshil for this video.
    I have a question , does the glue table created automaticaly ? because i get a timout , and i think the problem is whit that . can u please provide with more information like should i create new crawel or what to run the create the cleand_table automaticly

    • @imenbenhassine9710
      @imenbenhassine9710 Год назад

      faced the same issue , so i found out that you need to create the cleaned catalog_db in Glue then the cleaned_table will be created automaticly ; for timeout try to increase the memory to along with the time . hope it helps

  • @nardsmath3511
    @nardsmath3511 2 года назад +6

    Hi Darshil, great video, could you please let me know how to save to table db_youtube_cleaned as I am getting the error : "An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found." Thanks in advance :)

    • @DarshilParmar
      @DarshilParmar  2 года назад +2

      Can you timestamp the video? I can explain it easily then

    • @nardsmath3511
      @nardsmath3511 2 года назад

      @@DarshilParmar yes the error occurs at 45.37 but I think it relates to the 39.30 where the names are set

    • @arpansatpathi9645
      @arpansatpathi9645 2 года назад

      I'm facing the same issue

    • @dorothysilverman7660
      @dorothysilverman7660 2 года назад

      I'm facing the same error as well.

    • @gauravverma7082
      @gauravverma7082 2 года назад +1

      @@dorothysilverman7660 ​ @Arpan Satpathi first create database name ' db_youtube_cleaned ' in glue, then run test.

  • @lihongzheng8216
    @lihongzheng8216 5 месяцев назад +1

    i tried deploy button at 39:00 and changed time to 15 min (which is max), but it still shows timeout, anybody knows how to fix it?

  • @pankajchandel1000
    @pankajchandel1000 Год назад

    among this and covid project ..which one should i try building first as a beginner ?

  • @ogissgi7441
    @ogissgi7441 2 года назад +2

    Thank you so much Darshil for the video! I am having an issue when trying to create a crawler, getting error : "The following crawler failed to create: "name of the crawler"
    Here is the most recent error message: Account 'Number of account' is denied access." Tried to check the IAM roles created, deleted recreated again, however still receiveing the same message. Would you have an idea what could be the issue?

    • @rizbasamalah5326
      @rizbasamalah5326 2 года назад +2

      me too bro,
      do you solve already?

    • @Devine_9
      @Devine_9 4 месяца назад

      I also got this error anyone solved this

  • @ahmedmohiuddin1866
    @ahmedmohiuddin1866 2 года назад

    WOWW. This is amazingggg. Thanks Darshil. I have just started watching the video and looking at the content got me excited.

    • @DarshilParmar
      @DarshilParmar  2 года назад +1

      This is the type of comment I wait for, thanks for supporting my work!

    • @ahmedmohiuddin1866
      @ahmedmohiuddin1866 2 года назад

      @@DarshilParmar you’re welcome

  • @jitendrasinghthakur6364
    @jitendrasinghthakur6364 2 года назад +3

    hey darshil, everything is fine accept getting an error says
    "errorMessage": "An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.",
    "errorType": "EntityNotFoundException",
    how to resolve

    • @ajtam05
      @ajtam05 2 года назад +3

      Yeah, I got the same issue. It has to do w/ the Lambda "Configuration" tab > "Environment variables" we input @39:35. But I'm not entirely sure where the "Value" we input for each "Key" came from or is associated to?

    • @naveenkonda395
      @naveenkonda395 2 года назад

      @@ajtam05 have you solved this issue?

    • @carti8778
      @carti8778 2 года назад +1

      @@naveenkonda395 just create a table in athena by tying the SQL query :
      create database db_youtube_cleaned
      first one has to create a table then only lambda will update it with the data

    • @prikshitbatta
      @prikshitbatta Год назад +1

      This is one of the biggest mistakes I found, It should be corrected because we were not told to create a new database and then the path is coming to be different. It took me 3 hours to debug this.

  • @wizaasir
    @wizaasir 3 месяца назад

    at 40:48 where all all these names for environment variables coming from, and what are they even

    • @DarshilParmar
      @DarshilParmar  3 месяца назад

      These used in code as db name, path etc…
      You can directly hard code it but best practice is to store in env

  • @mairios521
    @mairios521 Год назад +1

    Hi Darshil! First of all, I would like to say "Thank you" for this tutorial.
    I need to mention something, I was following each steps but AWS is now different and some options are no longer available or they are so different.
    I can't believe that AWS platform changed so much in just one year.
    My question is: will you update this tutorial in the future?

    • @DarshilParmar
      @DarshilParmar  Год назад +1

      Everything is same, you just have to find right options with new UI

    • @mairios521
      @mairios521 Год назад

      @@DarshilParmar Thanks for your quick response!!

  • @hassannasr7736
    @hassannasr7736 8 месяцев назад +1

    If you are getting a runtime error when running the lambda function even after 3 minutes. Make sure to add
    import pandas as pd
    This will solve the issue as the AWS wrangler changed to AWS SDK Pandas

  • @ashutoshprakash9468
    @ashutoshprakash9468 2 года назад +1

    Great Work, Darshil!!! Next level Data Engineering knowledge provided by you in this content. ✌️ Industry level project.

  • @ataurrehman3664
    @ataurrehman3664 2 года назад +1

    Thank you so much for making this video!!! This would be 6-7th video of yours which I've added to my playlist. I request you to post more such project videos in different domains.

    • @DarshilParmar
      @DarshilParmar  2 года назад +2

      Thank you, yes I will try to post such videos

  • @ShubhamKumar-fs9wi
    @ShubhamKumar-fs9wi Год назад

    Thank you Darshil for this amazing video, it was very helpful. Just completed this whole project plus did some extra work of moving data to redshift using glue job as well while creation connection and enabling vpc endpoint.:)

  • @youraverageguide
    @youraverageguide 2 года назад +1

    Part1 done. It was really informative. Waiting for part 2!

    • @DarshilParmar
      @DarshilParmar  2 года назад

      Check link in the description for that

    • @harshalshende69
      @harshalshende69 2 года назад

      Bro can u plzz help me for this actually I stuck in part 1 during catalog data from1week so I can move forward if u tell my mistakes over their🙏it will big help for me

    • @ishan358
      @ishan358 Год назад

      ​@@harshalshende69do you find solution

  • @castlemonohunter3019
    @castlemonohunter3019 2 года назад

    Thank you so much for doing this.... The only one on youtube with curated data related content ❤💫

    • @DarshilParmar
      @DarshilParmar  2 года назад

      Thank you for your support and kind words

  • @LiaqatAli-tn9np
    @LiaqatAli-tn9np Год назад +2

    Hi Darshil, great work, I faced the issue of "Access Denied" in creating Crawler please help me out of this issue

  • @rohitsaha08
    @rohitsaha08 Год назад

    this is a great project with your excellent guidance Darshil. Thank you!😀

  • @banarasi91
    @banarasi91 Год назад +9

    hello guys,you might be getting error at the point of testing that is because of db name has been not changed in environment variable, please take care he has forget to change db name , if you notice in athena database name is db_youtube_cleaned but it should be de_youtube_cleaned, which is giving error in lamda final testing as "Entity not found"

    • @geekyprogrammer4831
      @geekyprogrammer4831 Год назад

      At 46:51, it either gives RunTime error or Timeout Error to me. Kindly help!

    • @21-lengocmai97
      @21-lengocmai97 Год назад

      @@geekyprogrammer4831 I have changed db name in environment variable but it still gave RunTime error. What should I do to fix this?

    • @saxdrakonis6456
      @saxdrakonis6456 Год назад +1

      @@geekyprogrammer4831 facing same issue. I see a parquet file being generated in the gcs bucket, but the lambda function is timing out. Were you able to rectify it ?

    • @Livecampingbgmi
      @Livecampingbgmi Год назад

      @@geekyprogrammer4831 you got the solution? Please let me know?

    • @Livecampingbgmi
      @Livecampingbgmi Год назад

      @@21-lengocmai97 u got the solution? Please let me know

  • @aarshmehtani5468
    @aarshmehtani5468 8 дней назад

    how we have declared the environment variables at 40.49 minutes of the video?

  • @paytmoffers7794
    @paytmoffers7794 Год назад +2

    I hope you have more such projects for us in your pipeline 😍 Please do it

  • @AV-bp3bc
    @AV-bp3bc 2 года назад +1

    Can u do the AWS part with Google cloud as well

  • @ashishkumarg5
    @ashishkumarg5 8 месяцев назад

    What is the similar service in GCP for AWS Glue Crawler ?

  • @aartimehta4807
    @aartimehta4807 10 месяцев назад +2

    Hi @DarshilParmar my lambda function is timing out, i increased the time to 15 minutes which is the max time but still it is not completing and my lambda function throwing an error. I followed the exact same steps shown in the video. Can you or someone suggest to me where I am going wrong?

    • @oscarloterogiraldo460
      @oscarloterogiraldo460 10 месяцев назад +2

      I'm stuck in the same part. My lambda fuction does not execute, it always throw a timed out error. @DarshilParmar

    • @DarshilParmar
      @DarshilParmar  10 месяцев назад +3

      Try increasing memory too

    • @aartimehta4807
      @aartimehta4807 10 месяцев назад +1

      @@DarshilParmar Hey, Increased it to 2048MB and it took hardly 2 sec to execute. Thank you for your help. Appreciate it.

    • @oscarloterogiraldo460
      @oscarloterogiraldo460 10 месяцев назад +1

      I just increased it to 500MB and it also ran very fast! Thank you both very much!

  • @kruthithunoli1
    @kruthithunoli1 19 дней назад

    Hi Darshil,
    Can explain some more details on environment variable inputs.

  • @trishasingh8832
    @trishasingh8832 2 года назад

    Great work Darshil! 🔥🔥

  • @ranjansrivastava9256
    @ranjansrivastava9256 11 месяцев назад

    Dear Darshil, Could you please let us know which architecture have you used in the demo -- Lambda architecture or Kappa Architecture. Wanted to understand more on architecture prospective. Please share your thoughts.

  • @shreeyajoshi9771
    @shreeyajoshi9771 2 года назад

    Thanks a loads for this video Darshil! Very much appreciated! 👏👏👏👏

  • @mandardeshpande2246
    @mandardeshpande2246 Год назад +1

    Hi Darshil while running the Athena job getting HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1]
    This query ran against the "de_database_raw" database, unless qualified by the query. error

  • @HemantSharma-fw2gx
    @HemantSharma-fw2gx 2 года назад

    Thank you so much for this darshil!...keep up the good work🙌

  • @dntking40
    @dntking40 Год назад

    Excellent work. Thank you for this amazing workshop.

  • @ritikkeshari1463
    @ritikkeshari1463 2 года назад

    Thanks alot brother..needed such kind of lecture ..really helped in enhancing my skills..please make more such videos

  • @sankalpporwal9646
    @sankalpporwal9646 2 года назад +3

    At last when I'm running SELECT query on cleaned_statistics_reference_data, it's giving HIVE_UNKNOWN_ERROR: Path is not absolute s3//
    Plz help

    • @anushakamath7374
      @anushakamath7374 2 года назад +2

      Hi, did you find a solution to this issue?

    • @Lapookie
      @Lapookie 2 года назад +1

      @anusha kamath Solved it by :
      - Deleting data in the s3 bucket : youtube-cleaned-useast1-dev
      - Deleting "db_youtube_cleaned" database in AWS Glue
      - Recreating database in AWS Glue and name it : db_youtube_clean
      - Updating environment "glue_catalog_db_name" variable, rename it : db_youtube_clean
      - Updating environment "s3_cleansed_layer" variable in the lambda function by adding a / at the end of the path
      THEN
      -Refresh all and re execute the lambda function.
      - Then run the SQL query in Athena
      It worked by magic I don't know what was wrong, force it some time, delete, re upload, re run :)

  • @surrealsoupuniverse
    @surrealsoupuniverse Год назад

    Hello what should i learn before doing this project? What are the prerequisites? Thanks

  • @penninahgathu7956
    @penninahgathu7956 2 года назад

    Thank you so much for teaching us such valuable content! Be blessed

  • @ishikapatel3318
    @ishikapatel3318 11 месяцев назад

    hello thank you for this video. I am having a problem while configuring, every time i configure I get exited out.

  • @davidaliaga4708
    @davidaliaga4708 7 месяцев назад

    do you have a end to end but not with AWS? (like hadoop or spark?)

  • @guillermojastrzebski954
    @guillermojastrzebski954 2 года назад +8

    Thanks for this great content.
    I'm getting errors with the lambda function:
    1. Video is missing to indicate to hit the "Deploy" button.
    2. After adding the layers, increasing timeout and granting permission to lambda function, I still get this:
    Test Event Name
    lambdaTestEvent
    Response
    {
    "errorMessage": "An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.",
    "errorType": "EntityNotFoundException",
    "stackTrace": [
    " File \"/var/task/lambda_function.py\", line 39, in lambda_handler
    raise e
    ",
    " File \"/var/task/lambda_function.py\", line 26, in lambda_handler
    wr_response = wr.s3.to_parquet(
    ",
    " File \"/opt/python/awswrangler/_config.py\", line 450, in wrapper
    return function(**args)
    ",
    " File \"/opt/python/awswrangler/s3/_write_parquet.py\", line 666, in to_parquet
    catalog._create_parquet_table( # pylint: disable=protected-access
    ",
    " File \"/opt/python/awswrangler/catalog/_create.py\", line 301, in _create_parquet_table
    _create_table(
    ",
    " File \"/opt/python/awswrangler/catalog/_create.py\", line 152, in _create_table
    client_glue.create_table(**args)
    ",
    " File \"/var/runtime/botocore/client.py\", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
    ",
    " File \"/var/runtime/botocore/client.py\", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)
    "
    ]
    }
    Function Logs
    START RequestId: e124c5fb-a734-417c-a227-f1ac36b93a11 Version: $LATEST
    An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.
    Error getting object youtube/raw_statistics_reference_data/US_category_id.json from bucket de-on-youtube-raw-useast1-7011-dev. Make sure they exist and your bucket is in the same region as this function.
    [ERROR] EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the CreateTable operation: Database db_youtube_cleaned not found.
    Traceback (most recent call last):
    File "/var/task/lambda_function.py", line 39, in lambda_handler
    raise e
    File "/var/task/lambda_function.py", line 26, in lambda_handler
    wr_response = wr.s3.to_parquet(
    File "/opt/python/awswrangler/_config.py", line 450, in wrapper
    return function(**args)
    File "/opt/python/awswrangler/s3/_write_parquet.py", line 666, in to_parquet
    catalog._create_parquet_table( # pylint: disable=protected-access
    File "/opt/python/awswrangler/catalog/_create.py", line 301, in _create_parquet_table
    _create_table(
    File "/opt/python/awswrangler/catalog/_create.py", line 152, in _create_table
    client_glue.create_table(**args)
    File "/var/runtime/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
    File "/var/runtime/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)END RequestId: e124c5fb-a734-417c-a227-f1ac36b93a11
    REPORT RequestId: e124c5fb-a734-417c-a227-f1ac36b93a11 Duration: 8167.89 ms Billed Duration: 8168 ms Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 3626.32 ms
    Can you please take a look at it?

    • @DarshilParmar
      @DarshilParmar  2 года назад +2

      Hey,
      Thanks for sharing this
      I made mistake in editing and missed that part
      For the error I’d say just create table on Athena directly
      Also you can join discord channel for futures queries

    • @guillermojastrzebski954
      @guillermojastrzebski954 2 года назад +11

      @@DarshilParmar Thank you. It worked.
      As a reference for who are getting the same error: the video is missing the creation of a database. To do so, go to Athena, create a new query with this SQL "create database de_youtube_cleaned" and run it. Lambda function should work fine after that

    • @satishmajji481
      @satishmajji481 2 года назад +1

      @@guillermojastrzebski954 This really helped me! Thanks :)

    • @sidharthaamperayani5965
      @sidharthaamperayani5965 2 года назад +1

      @@guillermojastrzebski954 massive thanks for this!!
      I've been searching the internet for couple of hours to rectify this. Presumed someone must have got into same issue and checked the comments section. Saved me a lot of trouble!

    • @dorothysilverman7660
      @dorothysilverman7660 2 года назад +1

      I had to make the table in glue, and then lambda worked as well

  • @ashutoshlonkar4338
    @ashutoshlonkar4338 9 месяцев назад

    Hi Darshil , thank you for this beautiful and easy to understand concept , but while adding lambda layer , i can't find AWSDataWrangler for Virginia resgoin . I tried to deploy existing AWSDataWrangler layer and then add it through custom layer. I did succeed in that , but my data is not getting cleaned and getting stored on cleaned S3 bucket.

  • @percyjackson1662
    @percyjackson1662 2 года назад +6

    for those trying it now-
    1. awswrangler name has been changed to awssdkpandas. Rest code wise - it remains the same
    2. you need to have glue database created before hand, otherwise it throws error .

    • @satyabratadey3898
      @satyabratadey3898 Год назад

      Hi Percy, while trying to add Aws layers, I only get 3 options - AppConfig Extension, Lambda Insights Extension, Parameters and Secret Lambda extension. Not sure what I am missing. Please help

    • @satyabratadey3898
      @satyabratadey3898 Год назад

      @Darshil

  • @huzaifa_2590
    @huzaifa_2590 Год назад

    It Was An Amzaing Project. I Learned Alot From This Video. Thank You So Much, Appreciated The Work.

  • @trytrybutdontcry1-zf9yu
    @trytrybutdontcry1-zf9yu 7 месяцев назад +1

    Great Video, Thank you so much!!

  • @shyam96105
    @shyam96105 2 года назад

    Hi Darshil, please make video on how do you deliver the project to your clients after completing it.

  • @incognitomato
    @incognitomato 13 дней назад

    i'm facing problem at the "aws s3 ls". I clicked it on my cmd, but still there was no activity. I'm a windows 10 user. Kindly help.

  • @danala5963
    @danala5963 2 года назад +1

    This was a great help..one question though..when you executed this project using different AWS services S3, Athena, Glue etc.. what was the approx. cost you got after full project execution...Thanks

    • @DarshilParmar
      @DarshilParmar  2 года назад +1

      Most likely there won't be any charge if you are under free trial but even if they charge you it will be max 3-5$
      You can raise support ticket stating you were just trying to learn about service and they won't charge you

  • @thimirabandara679
    @thimirabandara679 Год назад +1

    Does it cost money to use aws services? Specifically for Athena

    • @DarshilParmar
      @DarshilParmar  Год назад

      You can set billing alarm first and also check aws free tier if it is included

  • @thiagoduarte7207
    @thiagoduarte7207 Год назад

    The AWS tools used in this project are available in the free tier?

  • @raghuboyapati7311
    @raghuboyapati7311 2 года назад

    Great video man.
    AWS has updated the Emphemeral storage of Lambda to 10 GB

  • @Devine_9
    @Devine_9 4 месяца назад

    Thank you so much Darshil for the video! I am having an issue when trying to create a crawler, getting error : "The following crawler failed to create: "name of the crawler"
    Here is the most recent error message: Account 'Number of account' is denied access." Tried to check the IAM roles created, deleted recreated again, however still receiveing the same message. Would you have an idea what could be the issue?

  • @zendr0
    @zendr0 2 года назад +1

    Absolute gem! Thank you for making this video. Learned a lot today.
    And if possible, Although I know you have your job, please try to make more of such content in future.
    Lots of love💛💛

    • @DarshilParmar
      @DarshilParmar  2 года назад +1

      I will try my best to provide as much as I can

    • @mcaddit6802
      @mcaddit6802 Год назад

      I am unable to proceed further after clicking on test getting err0r:"errorMessage": "'s3_cleansed_layer'",
      "errorType": "KeyError", can anyone pls tell what's the problem?

    • @ishan358
      @ishan358 Год назад

      How do you solve runtime error

    • @drishtihingar2160
      @drishtihingar2160 Год назад

      @@mcaddit6802 yeah I am also getting same error, how did you solved it. Can you help me out

  • @nabeelasyed1034
    @nabeelasyed1034 7 месяцев назад

    Hi Darshil,
    Amazing content. I have a question.
    I am not able to find AWSwragler layer in options. Could you provide a link for downloading it so that I can custom it.

    • @DarshilParmar
      @DarshilParmar  7 месяцев назад +1

      Go through comments, you will find solution

  • @Bijuthtt
    @Bijuthtt 2 года назад

    Awesome tutorial project. I could complete this session.

  • @naraendrareddy273
    @naraendrareddy273 Год назад

    I can't find the Lambda function's AWS Datawrangler layer option. I can't even find the right arn for us-east-1. You did it at this timestamp: 45:08
    Edit: AWS Datawrangler is now called AWS SDK for pandas

  • @adityaanand835
    @adityaanand835 2 года назад

    Really appreciate your hardwork you bring to tthe table!!!

    • @DarshilParmar
      @DarshilParmar  2 года назад

      Thank you making my hardwork pay off by watching video

  • @venkatsaiphanindraanagam
    @venkatsaiphanindraanagam Год назад

    Hi Drashil. Thank u for the amazing work. Right now AWS doesnot have datawrangler lambda layer. so i am not able to execute the function. is there any other way to execute the function

    • @sambitkumar7621
      @sambitkumar7621 10 месяцев назад +1

      same i am also unble to execute the lamda function. not able to test it .
      after the test ish shows
      {
      "statusCode": 200,
      "body": "\"Hello from Lambda!\""
      }
      in the response
      Any suggestions ?

    • @maryamnajimi6120
      @maryamnajimi6120 10 месяцев назад

      did you get the answer? I am facing the same issue!@@sambitkumar7621

  • @rohitagarwal5319
    @rohitagarwal5319 Год назад

    hello Darshil instead of lamda can we do the same sort of transformation using ETL Glue job ?

    • @DarshilParmar
      @DarshilParmar  Год назад

      Yes

    • @rohitagarwal5319
      @rohitagarwal5319 Год назад

      @DarshilParmar I tried using flatten transform in ETL job but it didn't work
      is it because json contains array?
      can you suggest me how to proceed with ETl in few words so that I can work on that

  • @Likhitha_R
    @Likhitha_R 3 месяца назад

    Will the AWS account be free foverver, to use the above mentioned features, or will it be free only for few months

  • @iamdare
    @iamdare 2 года назад

    Hi Darshill, good video and thanks very much. I learned a lot. Please in your subsequent videos, do try to zoom in more often so we can get to see what you’re doing on the screen. Thanks.

  • @Alexchow-s3q
    @Alexchow-s3q 3 месяца назад

    Thank you Darshil for this awesome video! I have issues viewing the cleaned date in athena. i got "HIVE_UNKNOWN_ERROR: Path missing in file system location: [my path]
    This query ran against the "[cleaned db]" database, unless qualified by the query. " but i checked the path name are correct and i can access the parquet file locally. Can anyone help with this issue?

  • @mohammedjouhar6363
    @mohammedjouhar6363 2 года назад

    Thank you, man.. Keep up the good work!

  • @songjourney394
    @songjourney394 2 года назад

    Hi Darshil,
    When did you create the database "db_youtube_cleaned" in the video?

    • @anishkini6901
      @anishkini6901 2 года назад +1

      Do you get CreateTable operation: Database db_youtube_cleaned not found. error ? , have you resolved it?

    • @gauravverma7082
      @gauravverma7082 2 года назад +1

      @@anishkini6901 first create database name ' db_youtube_cleaned ' in glue, then run test.

  • @arjitsharma2503
    @arjitsharma2503 Год назад +1

    not able to remove the timeout error

  • @raunakkumar7004
    @raunakkumar7004 Год назад

    Sir, I just want to know how you know which code to write where you write in the lambda service, how do you know that now I have to use this package like os and so on. Is this is written i documentation or some were else

    • @DarshilParmar
      @DarshilParmar  Год назад

      Practice practice and practice
      When you start you will research and the more your research more things you will find

  • @shouryanagpal5813
    @shouryanagpal5813 11 месяцев назад

    Hi Darshil, I always think of starting your project videos but I always got stuck whether aws cloud services willl be charged or it's free or is there any other alternatives

  • @intrepidm8753
    @intrepidm8753 2 года назад

    its a great one, very useful n resourceful for aspirants like me👍🏼

  • @abhisheknakate9347
    @abhisheknakate9347 2 года назад

    thank you darshil very informative content........please upload 2nd part of video

    • @DarshilParmar
      @DarshilParmar  2 года назад

      It is uploaded, check link in the description

  • @gentleman.editsss
    @gentleman.editsss Год назад +2

    I gave all the permissions to the role I created, but while creating the crawler, it says access denied, please help!

    • @SajjanDivya
      @SajjanDivya 4 месяца назад

      did you figure this out? Im having the same issue and not able to move forward.

    • @abrarsaeed5263
      @abrarsaeed5263 3 месяца назад

      @@SajjanDivya anyone did figure it out>??

    • @roselinamoven7986
      @roselinamoven7986 3 месяца назад

      same issue

  • @TheAINoobxoxo
    @TheAINoobxoxo 9 месяцев назад

    Hi @darshil #Darshil the cleansed glue table is not getting created for me but the parquet file under s3 cleansed bucket is getting created
    i have procceded with the correct stones
    the lamda function code is not changed
    the variables are correct
    the time out time 3 min 3 sec
    the roles are assigned correctly with s3 and glue full access
    Used the layer mentioned by some as aws sdk pandas
    ther error still is timeout or no error at all
    what place should i look for the reason for cleansed table not gettng create d

    • @hassannasr7736
      @hassannasr7736 8 месяцев назад

      If you are getting a runtime error when running the lambda function even after 3 minutes. Make sure to add
      import pandas as pd
      This will solve the issue as the AWS wrangler changed to AWS SDK Pandas

  • @yassaryelurkar3631
    @yassaryelurkar3631 2 года назад

    Hey great video. I wanted to ask whether I will be charged for using AWS Athena coz it mentioned additional charges for using athena query when I opened it. Thanks for the video.

  • @paulshobhik
    @paulshobhik Месяц назад

    Did anyone faced the issue where I deployed the code too but when i click test nothing happens. No error and nothing. Am i missing anything?

  • @siddhideshmukh6424
    @siddhideshmukh6424 2 года назад

    Hey Darshil ,i rarely comment but just wanted to tell you tht you are awesome nd you content is just amazing ❤️