ETL | Incremental Data Load from Amazon S3 Bucket to Amazon Redshift Using AWS Glue | Datawarehouse

Поделиться
HTML-код
  • Опубликовано: 20 дек 2024

Комментарии • 146

  • @zzzzzzzmr9759
    @zzzzzzzmr9759 10 месяцев назад +4

    great video! I have two questions: 1. why the table in the redshift is not in the same order as it in the CSV file? 2. why in the ETL job configuration the S3 source type choose the data catalog table instead of S3 location? Does that mean we can complete the incremental data load from s3 to redshift just choosing the S3 location and don't use the crawler? Thanks in advance.

    • @cloudquicklabs
      @cloudquicklabs  10 месяцев назад +1

      Thank you for watching my videos.
      1. It's purely functionality of ETL job which Extract the data from s3 bucket .
      2. We can directly use s3 bucket as source when you have single csv file. But when bucket has multiple csv file it's better to use data catalog so that you can map the schema from source to destination.

    • @thelifehackerpro9943
      @thelifehackerpro9943 3 месяца назад

      @@cloudquicklabs regarding point 2, even if I have multiple csv files, but with same schema, I think there should not be any reason to use crawler, since we have fixed schema in redshift.

    • @beruski89
      @beruski89 28 дней назад

      can i use one crawler for one bucket with multiple schema data

  • @rahulsood81
    @rahulsood81 8 месяцев назад +3

    Can you please explain why I need to run the crawler again if there are no changes to file location or fields / structure of the source (S3 file).

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад

      Thank you for watching my videos.
      Indeed you don't need to re-run if all files and folders are scraped by crawler already.

  • @jnana1985
    @jnana1985 8 месяцев назад +1

    For incremental load to work do we need to enable job bookmark in glue? Or it's not required?

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад +1

      Thank you for watching my videos.
      It's not required, what's important is closing right option while declaring destination in ETL job. The configurations shown should be fair enough here.

  • @saikrishna8338
    @saikrishna8338 8 месяцев назад +2

    thanks for the valuable inputs. Small query, what if my incremental file is landed on a diff folder on same bucket. How crawler is going to handle.?

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад +1

      Thank you for watching my videos.
      Crawler would work (scrapes all data underneath a defined folder)

  • @awaise92
    @awaise92 8 месяцев назад +2

    Great content !
    I have a question on this. Can we run Merge operation on an external database as well? like Oracle, Snowflake , etc?

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад +1

      Thank you for watching my videos.
      Yes.. we can do this.
      I shall create a new video on this topic soon.

    • @jeffrey6124
      @jeffrey6124 Месяц назад

      @@cloudquicklabs Hi where you able to create this video already? Thanks

  • @SaiA-f6d
    @SaiA-f6d 4 дня назад +1

    insighful conectnt. I am trying to load s3 to postgresql but in the visual ETL the targeet postgresql not showing the table name to store, Connections part connected to RDS and attached same. pls let me know the issue around it. thanks

    • @cloudquicklabs
      @cloudquicklabs  3 дня назад

      Thank you for watching my videos.
      This scenio is labbed at my channel. I am working on this scenario a video on this topic would be uploaded soon.

  • @Vijay-d9m6k
    @Vijay-d9m6k 23 дня назад +1

    Thank you for the great explanation. I have 1 question:
    Q. What changes do I have to make in above solution to maintain the history of records in Redshift ? I meant I want to capture every change happened for an Id like SCD2, then in that case what should be my approach ?

    • @cloudquicklabs
      @cloudquicklabs  22 дня назад +1

      Thank you for watching my videos.
      Redshift doesn’t natively support full CDC, but you can implement it using DMS, third-party ETL tools, or by building custom solutions with Kinesis, Lambda, or Redshift Spectrum. The choice depends on your specific requirements for latency, complexity, and integration needs.
      I shall explore and try to make video on this soon.

  • @nlopedebarrios
    @nlopedebarrios 11 месяцев назад +1

    Why did you run the crawler after the Glue job finished and before running it for the second time? is that required in order to MERGE to succeed?

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Thank you for watching my videos.
      I wanted to show that when you source data gets incremented pipelines ETL pipelines copies only the increamental data not the duplicates.

    • @nlopedebarrios
      @nlopedebarrios 11 месяцев назад +1

      @@cloudquicklabs Yes, I understand that, but before running the job for the second time, to illustrate that merge works, you run the crawler. Is that needed/recommended?

    • @basavarajpn4801
      @basavarajpn4801 11 месяцев назад +1

      It's because of if there is any change in the schema to identify it,so we run everytime the crawler before running the job to keep the latest changes from the source

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Indeed it's needed to load the data from the file. Note it's same file in same path.

    • @gauravrai4398
      @gauravrai4398 10 месяцев назад

      But it is data change not schema change ... We can verify by running Athena query on catalog table .... Anyways nice use case explanation

  • @snakhil90
    @snakhil90 6 месяцев назад +1

    Incase of SCD, how we can define the SCD logic for merge and load? which option will have this option?

    • @cloudquicklabs
      @cloudquicklabs  6 месяцев назад

      Thank you for watching my videos.
      In terms of SCD , I believe the process would remain same untill the schema of table remains same.

  • @thihakyaw6189
    @thihakyaw6189 11 месяцев назад +1

    I want to know why the rows data is come out when you choose data catalog table from s3 source in your glue job because I just saw crawler that only copy metadata from s3 csv to data catalog not the data right?
    when I try like you in my case , it says no data to display because there is only schema information updated in data catalog table. Please let me know

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Thank you for watching my videos.
      It's default verification from Glue which display some sample data when you are mapping tables.
      In your case you must missing one step.
      Please do watch video again do the steps as mentioned in video.

  • @SriHarikakaruturi-d4k
    @SriHarikakaruturi-d4k 11 месяцев назад +1

    This video helped a lot. Is there a way you can add trust relationships during IAM role creation in the repo? Thank you

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Thank you for watching my videos.
      Did you see a dedicated video here ruclips.net/video/sw-a8nexTY8/видео.html

  • @RajYadav-eb6pp
    @RajYadav-eb6pp 7 месяцев назад +1

    I have two question
    1 why you have rsn the crawler twice
    2 if there is files continiusly ( different name ) then how can we use glue job to incremental load

    • @cloudquicklabs
      @cloudquicklabs  7 месяцев назад

      Thank you for watching my videos.
      I ran twice to show the demo of incremental load at second time. And when file names are continuously different you would depend on folder path of files and also note that atheist column should be same and in job you need map it to destination target tables.

  • @KwakuBoateng-l1f
    @KwakuBoateng-l1f 10 месяцев назад +1

    Hello please.. Do you hold classes any where, and do you provide project support?

    • @cloudquicklabs
      @cloudquicklabs  10 месяцев назад +1

      Thank you for watching my videos.
      Don't provide any classes.
      But I provide project support and project works.

    • @KwakuBoateng-l1f
      @KwakuBoateng-l1f 10 месяцев назад

      @@cloudquicklabs How can I go about it - getting project help

  • @jaideep1222
    @jaideep1222 6 месяцев назад +1

    Do we need to run crawler every time when ever there is a new data that comes into S3 ?

    • @cloudquicklabs
      @cloudquicklabs  6 месяцев назад

      Thank you for watching my videos.
      You mean to add lambda trigger when objects created in s3 bucket or just schedule the lambda.

    • @jaideep1222
      @jaideep1222 6 месяцев назад +1

      @@cloudquicklabs In this video at 33 minute for to fetch the incremental data the crawler has ran again. Do we really need to run the crawler if there is no schema change but only has the new data?

    • @cloudquicklabs
      @cloudquicklabs  6 месяцев назад

      Indeed I run it for second to fetch data from source but update only incremental data at destination side.

  • @archanvyas4891
    @archanvyas4891 7 месяцев назад +1

    Nice video. I have a question I am getting my data from Raspberrypi to s3 and it updates whenever i run in raspberrypi. Now when my job succussed and update that s3 and run the same job I am getting error so whats the process.

    • @cloudquicklabs
      @cloudquicklabs  7 месяцев назад

      Thank you for watching my videos.
      You should not get any error when you run twice, Did you check logs for what is the error message that you see there.

    • @archanvyas4891
      @archanvyas4891 6 месяцев назад

      @@cloudquicklabs How to check logs of it, In cloud watch loggroups as i see my live tail it is saying that it is 100%displayed . How to resolve it?

  • @lipi1004
    @lipi1004 Год назад +1

    Is Incremental loading available for SQl Server and Postgre as target?

    • @cloudquicklabs
      @cloudquicklabs  Год назад +1

      Thank you for watching my videos.
      I shall create new video on incremental loading from RDS (SQL/Postgres) to Amazon Redhsift.
      If not
      We still have work around do it here, I.e first load to S3 bucket and then load to Amzon Redshift.

    • @rahulpanda9256
      @rahulpanda9256 10 месяцев назад

      @@cloudquicklabs She is referring RDS as target from S3.

  • @kakaji_cloud8106
    @kakaji_cloud8106 8 месяцев назад +1

    What if primary key is not in incremental or decreasing order in the data. will incremental data load work?

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад

      Thank you for watching my videos.
      Indeed it should load as Primary key is unique in tables and records are identified by that.

  • @sharpus20
    @sharpus20 20 дней назад +1

    Can you please tell how to add filter like age > 25 to glue step before loading into redshift

    • @cloudquicklabs
      @cloudquicklabs  20 дней назад

      Thank you for watching my videos.
      I did not get this question.
      Could you please provide more information here.

    • @sharpus20
      @sharpus20 19 дней назад +1

      @cloudquicklabs Hi I have a question to transfer data from S3 > AWS glue > redshift. In this I have to add filter to data from S3 table and filter is for age column age > 25. So only values less than 25 shall transfer to redshift like this

    • @sharpus20
      @sharpus20 19 дней назад +1

      Also can you please share video link for below type of task
      Machine Learning: SageMaker -> Grab data from S3 -> Build the model -> Push
      the model data to S3

    • @cloudquicklabs
      @cloudquicklabs  19 дней назад

      Thank you for watching my videos.
      Did you check my video ruclips.net/video/O0GZVsGfHdo/видео.html
      Here have explained how we can do the advanced data Transformation with ETL job

    • @cloudquicklabs
      @cloudquicklabs  19 дней назад

      Thank you for providing this input I shall create a video on this topic soon.

  • @KapilKumar-hk9xk
    @KapilKumar-hk9xk Месяц назад +1

    Few doubts. 1) In real scenario TestReport.csv won’t be updated right? Suppose we have TestReport_1.csv and TestReport_2.csv in particular intervals and in these files we have few old but updated records and few new records. How to handle such situations?
    Pls redirect me if you have already explained such situations in any of your videos!!

    • @cloudquicklabs
      @cloudquicklabs  Месяц назад +1

      Thank you for watching my videos.
      I haven't covered these scenarios in these videos.
      I shall create new version of this video soon where I cover the demo of these scenario.

  • @Digvijay-10
    @Digvijay-10 2 дня назад +1

    appreciate ur efforts 🙏

    • @cloudquicklabs
      @cloudquicklabs  День назад

      Thank you for watching my videos.
      Glad that it helped you.

  • @sapramprasannakumar8616
    @sapramprasannakumar8616 5 месяцев назад +1

    Hello the data is loaded from s3 to redshift is zig zag manner, we have the data from source like 1,2 3,4 order , but the target is 1,4,11 and so on , how to get serial data to the redshift

    • @cloudquicklabs
      @cloudquicklabs  5 месяцев назад

      Thank you for watching my videos.
      It should not be the case here , I believe only columns should disordered while row data should be intact.
      Please watch this video again.

  • @kovirisiva3567
    @kovirisiva3567 2 месяца назад +1

    While test connection in the gule i was encountered few erroes like status logger unrecognised format specifiers like can you help me how to encounter those problems

    • @cloudquicklabs
      @cloudquicklabs  2 месяца назад

      Thank you for watching my videos .
      Have you followed the video direction as expected.

  • @prabhajayashetty2297
    @prabhajayashetty2297 Месяц назад +1

    Thank you for this video , I was able to load the incremental data

    • @cloudquicklabs
      @cloudquicklabs  Месяц назад

      Thank you for watching my videos.
      Glad that it helped you.

  • @SravyaPavithran-ze3ge
    @SravyaPavithran-ze3ge 17 дней назад +2

    Hi. While creating the connection am getting error. In this video there is no option showing to select the IAM role. But when i tried now we need to choose the iam role while creating connection. Can you help what policies required

    • @mycwid
      @mycwid 12 дней назад +1

      I am having the same question and issue when creating connection: Create connection failed during validating credentials. Please validate connection inputs and VPC connectivity to Security Token Service, Secrets Manager and REDSHIFT.

    • @cloudquicklabs
      @cloudquicklabs  12 дней назад

      Thank you for watching my videos.
      I have a IAM role with admin access and trusted by aws glue and redshift services but you could try with least privileged like below.
      I am role with below policies.
      1. S3 BUCKET full access.
      2. AWS Glue full access.
      3. Amazon Redshift full access.
      4. Amazon Cloudwatch full access.
      Trust policy.
      1. aws glue service.
      2. aws redshift services.

    • @cloudquicklabs
      @cloudquicklabs  12 дней назад

      Thank you for watching my videos.
      I have a IAM role with admin access and trusted by aws glue and redshift services but you could try with least privileged like below.
      I am role with below policies.
      1. S3 BUCKET full access.
      2. AWS Glue full access.
      3. Amazon Redshift full access.
      4. Amazon Cloudwatch full access.
      Trust policy.
      1. aws glue service.
      2. aws redshift services.

    • @mycwid
      @mycwid 12 дней назад +1

      @@cloudquicklabs thanks for the reply but does not seem to work. I have opened a support ticket after spending hours on this.

    • @cloudquicklabs
      @cloudquicklabs  12 дней назад

      If it is not the IAM role issue.
      It must be vpc and security group related issues. Please check the vpc endpoint and security group inbound rules. And check if you have source as security as itself at one inbound rule as well.

  • @satheeshambilaeti1894
    @satheeshambilaeti1894 2 месяца назад +1

    Im not getting Endpoint JDBC URL ODBC URL in worker group, how can we get them ?

    • @cloudquicklabs
      @cloudquicklabs  2 месяца назад

      Thank you for watching my videos.
      Please check the workspace it should be there.

  • @alphamaryfrancis862
    @alphamaryfrancis862 10 месяцев назад +1

    Amazing content!!!
    When im trying to read data from s3 on aws glue it is giving me error.
    Can u please guide

    • @cloudquicklabs
      @cloudquicklabs  10 месяцев назад

      Thank you for watching my videos.
      Could you please tell me what is the error that you are getting.

  • @AheleswaranE
    @AheleswaranE 21 день назад +1

    Can you please upload videos on t15 wings -1 hands on for tcs

    • @cloudquicklabs
      @cloudquicklabs  21 день назад

      Thank you for watching my videos.
      This covers only cloud hands specially Azure and AWS in general.
      With this knowledge you can Crack any company Cloud DevOps roles

  • @hoangng16
    @hoangng16 11 месяцев назад +1

    I have multiple large tables in CSV format in AWS S3, should I:
    1. load them from S3 to RDS (mySQL) => do my queries using mySQL Workbench => export the expected data to S3 => sync the expected data to SageMaker for visualization and other analysis using a Notebook instance?
    2. load them from S3 to Redshift => doing queries using Redshift, actually I'm not quite sure what to do next in this direction, the goal is to have some filtered data for visualization and analysis.
    Thank you

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Thank you for watching my videos.
      Is your requirements here is just to Visualize data or do you want to have ML run on them (since you said you want run Sagemaker as it is for ML and costly tool).
      You could check for Amazon Quicksight if you are looking visualization.

    • @hoangng16
      @hoangng16 11 месяцев назад

      Thank you,@@cloudquicklabs. I actually want to do some ML, but that part can be done on a local machine. The primary goal now is to load data from S3 to something I can query to analyze the data better.

    • @SidharthanPV
      @SidharthanPV 3 месяца назад

      @@hoangng16 once you catalog it you can access it from Athena as well.

  • @rahulpanda9256
    @rahulpanda9256 10 месяцев назад +1

    How do we ensure Primary Key sequence is intact with Source.

    • @cloudquicklabs
      @cloudquicklabs  10 месяцев назад

      Thank you for watching my videos.
      I believe you need to use Data Quality rules here to check if your primary keys are in sequence. Ma be you need watch my latest video here ruclips.net/video/DMQRFwbeYMc/видео.htmlsi=mu-t_cNUIvXzHIXv which might give ideas.

  • @RupeshKumar-kw7zw
    @RupeshKumar-kw7zw Год назад +1

    Hi, I'm getting invalid input exception error, can you pls resolve?

    • @cloudquicklabs
      @cloudquicklabs  Год назад

      Thank you for watching my videos.
      Could you please watch the video again.

  • @tejpatta3641
    @tejpatta3641 7 месяцев назад +1

    Thank you...very useful...great video👍 small query is - why the table in the redshift is not in the same order as it in the CSV file?

    • @cloudquicklabs
      @cloudquicklabs  7 месяцев назад

      Thank you for watching my videos.
      Glad that it helped you.
      If you mean column of table , then I would say it does not matter untill proper records are getting updated in respective column

  • @yugalbawankar9039
    @yugalbawankar9039 Год назад +1

    which iam role given to redshift workgroup?

    • @cloudquicklabs
      @cloudquicklabs  Год назад

      Thank you for watching my videos.
      I am using admin permissions as it is demo.
      Did you check reference documents shared at : github.com/RekhuGopal/PythonHacks/tree/main/AWS_ETL_Increamental_Load_S3_to_RedShift

    • @yugalbawankar9039
      @yugalbawankar9039 Год назад

      @@cloudquicklabs
      Which IAM role given to redshift workgroup?
      Please create and upload basic video on it.
      I want to build this project. But don't understand which IAM role given to redshift workgroup?

    • @kalyanishende618
      @kalyanishende618 11 месяцев назад +1

      How much cost is expected if I try this with my personal aws account

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Thank you for watching my videos.
      All the resources involved in this solution are costly. Please make use of AWS given calcutors to estimate you're cost here. As this was lab session for me I did set up and cleaned up later once lab is completed and it might have costed me very less.

  • @saravananba748
    @saravananba748 25 дней назад

    I am importing the s3 data to redshift serverless, while the copy query is running it keeps on loading/running the query. I am loading the data which has 100 rows only. what is the problem?

    • @cloudquicklabs
      @cloudquicklabs  25 дней назад

      Thank you for watching my videos.
      Did you check if connection to Amazon Redshift is working as expected.
      And check if source path is correct in ETL job.

  • @nitikjain993
    @nitikjain993 Год назад +2

    Could you please make a video how to make fact & dimension table in redshift it would be great if you make that video too s3 to redshift using glue

    • @cloudquicklabs
      @cloudquicklabs  Год назад

      Thank you for watching my videos.
      I shall work on this concept and create a video.

  • @beruski89
    @beruski89 28 дней назад +1

    very good.. but u can trim it in 15 min

    • @cloudquicklabs
      @cloudquicklabs  28 дней назад

      Thank you for watching my videos.
      Glad that it helped you.
      I shall take this input in my next videos.

  • @nitikjain993
    @nitikjain993 Год назад +1

    Much waited video thankyou so much,could you attach the code here which have been generated after glue job confriguration.

    • @cloudquicklabs
      @cloudquicklabs  Год назад

      Thank you for watching my videos.
      Glad that it's useful video to you.
      I have not copied the Glue job code but I have attached relevant files in description please use the same and do follow the these steps it would be working for you.

  • @GaneshBabu-vr2lg
    @GaneshBabu-vr2lg 6 месяцев назад +1

    so my question is in this video why you not schedule it. it is a incremental data the data is load on s3 any time. so the incremental data is go to schedule it when the data is been upload means the automatically job will run and execute it. so why you default the execute the job run is how many times you run a crawler and run a job..?

    • @cloudquicklabs
      @cloudquicklabs  6 месяцев назад

      Thank you for watching my videos.
      Indeed, it should be scheduled , I have mentioned it in video.
      As this is demo I have shown it via manual trigger.
      I am working on v3 of this video, where I cover missing points.

    • @GaneshBabu-vr2lg
      @GaneshBabu-vr2lg 6 месяцев назад +1

      @@cloudquicklabs kk is this uploded

    • @cloudquicklabs
      @cloudquicklabs  6 месяцев назад +1

      Not yet I am still working on it.

  • @akshaymuktiramrodge3233
    @akshaymuktiramrodge3233 Год назад +1

    This is what I wanted.... Thank u so much 🙏

    • @cloudquicklabs
      @cloudquicklabs  Год назад

      Thank you for watching my videos.
      Glad that it helped you.

  • @mahir14_
    @mahir14_ 7 месяцев назад +1

    can you make oracledb(source with cdc) -->S3 --> Glue(some Transformation) --> Redshift warehouse whole usecase please

    • @cloudquicklabs
      @cloudquicklabs  7 месяцев назад

      Thank you for watching my videos.
      Indeed , I shall add this to my to do list and make a video on it.

    • @ranidalvi1064
      @ranidalvi1064 7 месяцев назад +1

      Yes I am also waiting this type of project with ecommers data

  • @swapnil_jadhav1
    @swapnil_jadhav1 8 месяцев назад +1

    Target node is not supported
    What to do

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад

      Did you follow the video here ?

    • @swapnil_jadhav1
      @swapnil_jadhav1 8 месяцев назад

      @@cloudquicklabs yes

    • @swapnil_jadhav1
      @swapnil_jadhav1 8 месяцев назад

      @@cloudquicklabs I am using dms to transfer data from rsd mysql to s3
      Then by using glue I am transfering data from s3 to redshift.. and in glue I am getting error

  • @supriyakulkarni8378
    @supriyakulkarni8378 9 месяцев назад +1

    Tried the Redshift test connection but it failed due to this error
    ERROR StatusLogger Unrecognized format specifier [d]

    • @cloudquicklabs
      @cloudquicklabs  9 месяцев назад

      Thank you for watching my videos.
      Are you getting error while creating connection or while execution of ETL pipeline.

    • @supriyakulkarni8378
      @supriyakulkarni8378 9 месяцев назад

      While testing connection to redshift

  • @liubrian6843
    @liubrian6843 2 месяца назад +1

    Nice video! can I just use crawler to crawl new folders only and then in glue job, just use bookmark and append the data instead of merging ? if the table is large, merging will be very expensive operation, no?

    • @cloudquicklabs
      @cloudquicklabs  2 месяца назад

      Thank you for watching my videos.
      I have not created video on the dynamic folder structure in AWS glue. I shall create it soon. There should not be any extra cost here.

  • @kalyanishende618
    @kalyanishende618 11 месяцев назад +1

    Thank you so much I got revision and good idea

    • @cloudquicklabs
      @cloudquicklabs  11 месяцев назад

      Thank you for watching my videos.
      Glad that it helped you.

  • @prabhathkota107
    @prabhathkota107 10 месяцев назад +1

    Very useful.. thanks... subscribed now for more interesting content

    • @cloudquicklabs
      @cloudquicklabs  10 месяцев назад

      Thank you for watching my videos.
      Happy learning.

    • @prabhathkota107
      @prabhathkota107 7 месяцев назад

      @@cloudquicklabs Some issue with glueContext.write_dynamic_frame.from_catalog, where as glueContext.write_dynamic_frame.from_jdbc_conf is perfectly working fine....
      Getting below error while writing to Redshift catalog table:
      Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o130.pyWriteDynamicFrame. Exception thrown in awaitResult:
      SQLException thrown while running COPY query; will attempt to retrieve more information by querying the STL_LOAD_ERRORS table
      Could you please guide

  • @AliMirfaisal
    @AliMirfaisal 8 месяцев назад +1

    How to load incremental data by using python script.

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад

      Thank you for watching my videos.
      I shall do one vidoe using Python scripts on #ETL pipeline.

  • @jeffrey6124
    @jeffrey6124 Месяц назад +1

    Great video! 🤓

    • @cloudquicklabs
      @cloudquicklabs  Месяц назад +1

      Thank you for watching my videos.
      Glad that it helped you.

  • @thelifehackerpro9943
    @thelifehackerpro9943 4 месяца назад +1

    It should automatically trigger based on s3 event

    • @cloudquicklabs
      @cloudquicklabs  4 месяца назад

      Thank you for watching my videos.
      I shall create new version of this video where this would be considered.

  • @cromagaming4694
    @cromagaming4694 Месяц назад +1

    Jai Sevalal Bhai

    • @cloudquicklabs
      @cloudquicklabs  Месяц назад +1

      Thank you for watching my videos.
      Glad that it helped you.
      Jai Sevalal.!

  • @shakthimaan007
    @shakthimaan007 4 месяца назад +1

    Easy and smooth.

    • @cloudquicklabs
      @cloudquicklabs  4 месяца назад

      Thank you for watching my videos.
      Glad that it helped you.

  • @InnoCoreAnalyticsInc
    @InnoCoreAnalyticsInc 3 месяца назад +1

    It is extremely useless!

    • @cloudquicklabs
      @cloudquicklabs  3 месяца назад

      Thank you for watching my videos.
      Glad that it helped you.

    • @botjabber9187
      @botjabber9187 Месяц назад

      What 😮 do you mean extremely useful?

  • @prabhajayashetty2297
    @prabhajayashetty2297 2 месяца назад +1

    This is great video!! Thank you for this :)
    My job got failed with error : Error Category: UNCLASSIFIED_ERROR; Failed Line Number: 20; An error occurred while calling o113.pyWriteDynamicFrame. Exception thrown in awaitResult::

    • @cloudquicklabs
      @cloudquicklabs  2 месяца назад

      Thank you for watching my videos.
      Error could be due to many reasons 1. Check if data format is as expected in source side. Due to syntactical error this kind of errors can happen.

  • @AliMirfaisal
    @AliMirfaisal 7 месяцев назад +1

    How to contact yoy

    • @cloudquicklabs
      @cloudquicklabs  7 месяцев назад

      You could reach me over email vrchinnarathod@gmail.com

  • @saikrishna8338
    @saikrishna8338 8 месяцев назад +1

    thanks for the valuable inputs. Small query, what if my incremental file is landed on a diff folder on same bucket. How crawler is going to handle.?

    • @cloudquicklabs
      @cloudquicklabs  8 месяцев назад

      Thank you for watching my videos.
      While defining the crawlers you would give the path, choose right path accordingly, and crawler would scrape all data from all files and folder present underneath the defined folder.

    • @saikrishna8338
      @saikrishna8338 8 месяцев назад

      @@cloudquicklabs thanks for the reply. what if my folder structure is like the below.
      input_bucket/year={current_year}/month={current_month}/day={current_date}/file.txt
      how can i define my crawler to check the file based on date and load data on incremental basis not as full refresh...any idea ??