Introduction to Datastream for BigQuery

Поделиться
HTML-код
  • Опубликовано: 17 окт 2024

Комментарии • 35

  • @googlecloudtech
    @googlecloudtech  2 года назад +1

    What do you think about Datastream for your data capture and replication needs? Let us know in the comments below and subscribe for more Google Cloud tips and tricks → goo.gle/GoogleCloudTech

    • @Farshad-Ghorbani
      @Farshad-Ghorbani Год назад

      Is there any solution for partitioning data with datastream?

  • @yunusarif6963
    @yunusarif6963 2 года назад +10

    I managed to play around with Datastream with BigQuery as destination. The problem with this approach is that the tables created are not partitioned. For those of as who do incremental load from our BigQuery replica to our reports, will always have to scan the whole table which comes with a cost, compared to scanning and querying only new data in the BigQuery replica

    • @terminalrecluse
      @terminalrecluse 2 года назад

      Perhaps cli or programmatic access (not shown in the UI) will allow for specifying a partition key

    • @Farshad-Ghorbani
      @Farshad-Ghorbani Год назад

      ​@@terminalrecluse I checked the CLI as well, but I couldn’t find it there. Is there any solution for partitioning data? @googlecloudtech

    • @etaimargolin8449
      @etaimargolin8449 Год назад +2

      Currently partitions aren't leveraged for optimizing the size of the table scan. We are looking into implementing this as a future improvement.

  • @vitamin2732
    @vitamin2732 Год назад

    if it works as described, it is really COOOOOOL. thanks a lot. @Gabe Weiss, some questions: 1. any limitations for Datastream for BigQuery? 2. I am using Cloud SQL so it would be great to have a tutorial for this combination. 3. Looks like AlloyDB competitor, isnt it?) what are the core differences? (I am thinking about AlloyDB in new verison of our project to avoid streaming analytic data to Bigquery)

  • @dexterkarinyo9144
    @dexterkarinyo9144 5 месяцев назад

    Hey! on 2:31, do you have example on how did you create or set up that connection? Thanks.

  • @danielcarter9666
    @danielcarter9666 Год назад

    Can I stream a subset of columns from my source? the cli help (gcloud datastream streams create --help)suggests yes, but when i specify mysql_columns in the suggested format gcloud errors out with ERROR: gcloud crashed (ValidationError): Expected type for field column, found {} (type )

    • @etaimargolin8449
      @etaimargolin8449 Год назад

      Yes, a subset of columns is supported, in both the UI and API. There was a bug in gcloud around this capability, it should be fixed now.

  • @darshanparmar7961
    @darshanparmar7961 2 года назад

    what if there are records updated/deleted in source system(mysql) does it also perform update/delete in bigquery or it works in append only mode.

    • @etaimargolin8449
      @etaimargolin8449 Год назад

      Datastream replicates all UPDATE / INSERT / DELETE operations to the destinaton. Support for append-only mode is planned for a future release.

  • @FuyangLiu
    @FuyangLiu Год назад

    Is there a way to let CloudSQL IAM user (or a ServiceAccount User) be accepted as a way to connection to the CloudSQL db?

    • @gabeweiss
      @gabeweiss Год назад

      Not currently sadly, no. BUT, it's something we're thinking about. No promises on timeline, but it's definitely something we're working on adding.

  • @danielcarter9666
    @danielcarter9666 Год назад

    Through the GUI, when selecting the source objects to replication i can use wildcards such as "*.mytable". How do i do this with the CLI? When i describe a stream created through the GUI (gcloud datastream streams describe) the database field is simply missing, but when i try to create a new stream using the same format gcloud bombs out with "ERROR: gcloud crashed (ParseError): Cannot parse YAML: missing key "database"."

    • @etaimargolin8449
      @etaimargolin8449 Год назад

      Yes, this is supported - you need to specify an empty database key ( "database": "" )

  • @HoaTran-rp4kf
    @HoaTran-rp4kf Год назад

    What happens if I accidentally delete destination table in BigQuery? How can I restore the table?

    • @etaimargolin8449
      @etaimargolin8449 Год назад

      Datastream will recreate the table automatically, and the data that was deleted can be recovered by trigerring a backfill from the source.

    • @HoaTran-rp4kf
      @HoaTran-rp4kf Год назад

      Hi@@etaimargolin8449 , I found out that some rows were duplicated in the destination table in BigQuery. I cannot delete any rows of the table. How can I solve it?

  • @ShahnewazKhan
    @ShahnewazKhan 2 года назад +2

    When will postgres cloudsql datastream be available?

    • @david7482
      @david7482 2 года назад +3

      it’s also available now 🎉
      cloud.google.com/datastream/docs/sources-postgresql

    • @felipe.veloso
      @felipe.veloso 2 года назад +2

      Wowowoqoqoqo great news!!!

  • @analyticshub499
    @analyticshub499 2 года назад

    can MariaDB be used instead of MySQL as source to stream data to bigquery?

    • @gabeweiss
      @gabeweiss 2 года назад

      Yes it can! See here for supported versions of MySQL supported: cloud.google.com/datastream/docs/faq#behavior-and-limitations

  • @felipe.veloso
    @felipe.veloso 2 года назад

    Its already en preview for Postgres??? 😮😮

  • @apvyas80
    @apvyas80 2 года назад

    Does it support Customer managed encryption keys ?

    • @etaimargolin8449
      @etaimargolin8449 Год назад

      Datastream supports CMEK for data stored at rest. Support for CMEK on data loaded to BigQuery will be added shortly.

  • @abhinavtripathi970
    @abhinavtripathi970 Год назад

    will it accept Schema changes ?

    • @etaimargolin8449
      @etaimargolin8449 Год назад

      Yes, many schema changes are automatically detected and supported, but some changes might not be detected automatically and may result in events from that table being dropped. In this case, Datastream will report the reason for the event(s) being dropped, and any missing data can be recovered using a backfill of the table.

  • @Ng6666yt
    @Ng6666yt 2 года назад +4

    Why

  • @rameshyadav1723
    @rameshyadav1723 Год назад

    Can this feature be used to load from Bigquery to CloudSQL(Postgres) and have realtime streaming for operational purposes.
    @googlecloudtech

    • @vitamin2732
      @vitamin2732 Год назад

      why do you need it?

    • @rameshyadav1723
      @rameshyadav1723 Год назад

      @@vitamin2732 There is already established process where outputs are stored in BQ, now we needed to send outputs to CloudSQL for API consumption. We need both outputs one which is stored in BQ for analytical reporting and other one for realtime usage through API.
      Hope it make sense
      So wondering how to get realtime streaming from BQ to CloudSQL tables, having automatic CDC feature

    • @vitamin2732
      @vitamin2732 Год назад

      @@rameshyadav1723 it looks like wrong architecture.... normally you need to stream from cloud SQL to BQ

    • @ariefhalim5425
      @ariefhalim5425 Год назад

      @@rameshyadav1723 I think Datastream latency is too slow to be used for realtime transaction API