AWS Tutorials - Streaming Data Ingestion in Amazon Redshift

Поделиться
HTML-код
  • Опубликовано: 31 июл 2024
  • Code Snippet - github.com/aws-dojo/analytics...
    Video - Handing JSON Data in Redshift - • AWS Tutorials - Handli...
    Kinesis Data Stream is used to ingest real-time streaming data. Now such streaming data can be ingested to Amazon Redshift for the real-time analytics purpose. Learn how to integrate Kinesis Data Stream with Amazon Redshift.
  • НаукаНаука

Комментарии • 31

  • @PrakashReddyK
    @PrakashReddyK 2 года назад +1

    Awesome , this came last week , thank you very much for your content

  • @parthasarathibarman9862
    @parthasarathibarman9862 2 года назад +4

    Trust me you're doing an amazing job touching so many lives! Thank you for all your contributions

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад +1

      Thank you so much!

    • @varra19
      @varra19 Год назад

      ​@@AWSTutorialsOnline awesome videos. thanks alot.
      if possible please make some video using aws msk creation and ingestion into redshift.

  • @kashishjaiswal7907
    @kashishjaiswal7907 Год назад

    Great Work ..!!!!

  • @varra19
    @varra19 Год назад

    ​ @AWS Tutorials awesome videos. thanks alot. if possible please make some video using aws msk creation and ingestion into redshift.

  • @user-hk8pf7et5w
    @user-hk8pf7et5w 2 года назад

    Thanks for the great video! Can you also show demo for S3 --> AWS Kinesis Data Streams --> Redshift? thanks in advance

  • @subhamaybhattacharyya
    @subhamaybhattacharyya 2 года назад +1

    Thanks for the great content. But, I am unable to create the materialized view and getting the following error:
    ERROR: Operations on local objects in external schema are not enabled. [ErrorId: 1-62f52b1e-5ea3ac1d18bf9a1234e86e4f]
    Am I missing any step ? I have followed all the steps as per the tutorial.

  • @anderson.rsantana
    @anderson.rsantana 2 года назад +1

    great, something that you barely see on documentations. thanks

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      Well to be honest, it is documented but got published a week back only that is why you didn't come across.

  • @Mustafa-yk8lk
    @Mustafa-yk8lk 2 года назад +1

    we have now redshift as consumer of stream for optimize the pipeline but i think it is still in preview !

  • @nehabhopale9694
    @nehabhopale9694 Год назад

    hi i don't want data under single column payload i want data pushed into individual colums(eg. vibrations is one column) how we can achieve this?

  • @YEO19901
    @YEO19901 2 года назад +1

    Is it require to insert data into other staging layer the moment we query it from datastream very first time Or will data persist in datastream until we manually purge it?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      In Kinesis, the data is immutable. It remains there for the retention period configured. When you refresh materialized view; it fetches the latest data from the Kinesis data stream.

  • @sr852008
    @sr852008 2 года назад +1

    So sir.. Once data is read into materialized view then it is vanished from kinesis data stream? Or we can read same data with some other aws components/tools also?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад +1

      Data in Kinesis is purged only after configure retention period. Till that period, any consumer can consume the data.

  • @dude0001tube
    @dude0001tube 2 года назад

    Is there a way to deserialize events in AWS MSK (Kafka) that are Avro serialized? Specifically when the schema is stored in AWS Glue Schema Registry?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      I need to check. Glue does support streaming data with Kafka but I did not get chance to work with it.

    • @dude0001tube
      @dude0001tube 2 года назад +1

      @@AWSTutorialsOnline I got a response from AWS that MSK and Avro deserialization are on the roadmap for Redshift Streaming Ingestion. We do already do Spark Streaming of MSK Avro serialized message into an S3 data lake, so I can confirm that does indeed work. I was hoping to PoC using this to go directly to RedShift for a materialized view but will have to wait until later this year. Thank you for the video and the reply!

  • @tracyding4906
    @tracyding4906 2 года назад +1

    I didn't see the ingestion part, do you run python codes at EC2?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      The ingestion code I run from SageMaker notebook. Sample code is given in a link in the description. And yes, you can run python code from EC2. You need to setup authorization either using EC2 instance profile or using access/secret key profile or using cognito.

  • @hsz7338
    @hsz7338 2 года назад +1

    Thank you for the Video, it is great. I think this is a nice feature. my question is similar to @Gunjan Jain one, but not about the data purge? I am wondering whether Redshift persists the data automatically during the process so that we won't rely on Kinesis Data Stream for its data retention.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад +1

      Materialized view does not store data but provides a way to query data from Kinesis in the real-time. So you need to process data using Materialized View within the retention period of Kinesis data stream otherwise the data is not available anymore for processing.

    • @YEO19901
      @YEO19901 2 года назад

      It seems Approximatearrivaltimestamp is what we will use to move the data from materialized view to other staging layer.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      @@YEO19901 if you want to persist. If all you want is process data within retention period; you don't need any persisting of data.

    • @tracyding4906
      @tracyding4906 2 года назад +1

      @@AWSTutorialsOnline Materialized view in SQL server has the data stored.

    • @tracyding4906
      @tracyding4906 2 года назад

      @@AWSTutorialsOnline Materialized views are especially useful for queries that are predictable and repeated over and over. Instead of performing resource-intensive queries on large tables, applications can query the pre-computed data stored in the materialized view. Data is stored in the materialized view