- Видео 13
- Просмотров 100 236
Cloud 4 Data Science
Добавлен 16 май 2022
Introduction to Dataform in Google Cloud Platform
This tutorial shows the overview of the Dataform service in the Google Cloud Platform.
Link to the GitHub repo with workflow: github.com/rafaello9472/dataform-demo/branches/stale
Dataform documentation: cloud.google.com/dataform/docs/overview
00:00 Introduction
00:42 What is Dataform?
01:10 Key Features and Benefits
02:30 Why Dataform?
02:43 Key Concepts
04:19 From SQL to SQLX
04:57 Version Control and Collaboration
06:03 Dependency Management
07:30 Javascript in Dataform
09:50 Workflow Execution Scheduling Options
10:35 Creating Dataform Repository
11:41 Creating Necessary GitHub Assets
13:41 Adding Access Token to Secret Manager
14:18 Adding IAM Roles to Service Account
15:46 Creating Development Works...
Link to the GitHub repo with workflow: github.com/rafaello9472/dataform-demo/branches/stale
Dataform documentation: cloud.google.com/dataform/docs/overview
00:00 Introduction
00:42 What is Dataform?
01:10 Key Features and Benefits
02:30 Why Dataform?
02:43 Key Concepts
04:19 From SQL to SQLX
04:57 Version Control and Collaboration
06:03 Dependency Management
07:30 Javascript in Dataform
09:50 Workflow Execution Scheduling Options
10:35 Creating Dataform Repository
11:41 Creating Necessary GitHub Assets
13:41 Adding Access Token to Secret Manager
14:18 Adding IAM Roles to Service Account
15:46 Creating Development Works...
Просмотров: 31 886
Видео
Automate Python script execution on GCP
Просмотров 28 тыс.Год назад
This tutorial shows how to automate Python script execution on GCP with Cloud Functions, Pub/Sub and Cloud Scheduler. Link to the GitHub repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Automate Python script execution on GCP 00:00 Introduction 00:21 Architecture overview 01:09 GUI - Pub/Sub 01:28 GUI - Cloud Functions 02:39 Python code walkthrough 05:18 GUI - Cloud Sch...
Create Text Dataset in Vertex AI
Просмотров 3,7 тыс.Год назад
This tutorial shows how to create a Text Dataset in Vertex AI for single-label classification and sentiment analysis tasks. Link to the GitHub repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Create text dataset in Vertex AI Kaggle Ecommerce Text Classification Dataset: www.kaggle.com/datasets/saurabhshahane/ecommerce-text-classification Kaggle Twitter and Reddit Sentim...
Predict with batch prediction in Vertex AI - Image Classification
Просмотров 2,7 тыс.2 года назад
This tutorial shows how to make predictions on image classification dataset with batch prediction in Vertex AI. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Predict with batch prediction in Vertex AI - Image Classification Create model used in this tutorial: ruclips.net/video/dl-UNtgLC1s/видео.html&ab_channel=Cloud4DataScience Input data requireme...
Train AutoML Image Classification model in Vertex AI
Просмотров 1,2 тыс.2 года назад
This tutorial shows how to train an AutoML image classification model in Vertex AI with Python SDK. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Train AutoML model in Vertex AI - Image classification Create Image Dataset used in this tutorial: ruclips.net/video/39PxXRvo7qw/видео.html&ab_channel=Cloud4DataScience 00:00 Introduction 00:16 Dataset us...
Create Image Dataset in Vertex AI
Просмотров 2,6 тыс.2 года назад
This tutorial shows how to create an Image Dataset for a single-label classification task in Vertex AI. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Create image dataset in Vertex AI Kaggle Lemon Quality Dataset: www.kaggle.com/datasets/yusufemir/lemon-quality-dataset Prepare image training data for classification: cloud.google.com/vertex-ai/docs/...
Run custom training job with custom container in Vertex AI
Просмотров 6 тыс.2 года назад
This tutorial shows how to run a custom training job with a custom container in Vertex AI. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Run custom training job with custom container in Vertex AI Kaggle Stroke Prediction Dataset: www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset Environment variables for special Cloud Storage directorie...
Run custom training job with pre-built container in Vertex AI
Просмотров 3,9 тыс.2 года назад
This tutorial shows how to run a custom training job with a pre-built container in Vertex AI. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Run custom training job with pre-built container in Vertex AI Kaggle Stroke Prediction Dataset: www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset Environment variables for special Cloud Storage dire...
Predict with online prediction in Vertex AI
Просмотров 4,4 тыс.2 года назад
This tutorial shows how to make predictions on tabular dataset with online prediction in Vertex AI. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Predict with online prediction in Vertex AI Link to gcloud CLI authorization: cloud.google.com/sdk/docs/authorizing 00:00 Introduction 00:36 Endpoint setup 03:22 Online prediction - Python 08:32 Online pr...
Predict with batch prediction in Vertex AI
Просмотров 4,9 тыс.2 года назад
This tutorial shows how to make predictions on tabular dataset with batch prediction in Vertex AI. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Predict with batch prediction in Vertex AI 00:00 Introduction 00:33 Dataset discussion 01:49 Batch prediction job setup in Jupyter 03:31 Going through predictions in BigQuery 04:12 Transforming raw results...
Train AutoML model in Vertex AI
Просмотров 2,6 тыс.2 года назад
This tutorial shows how to train AutoML classification or regression model on tabular dataset in Vertex AI. Google documentation describing tabular dataset preparation: cloud.google.com/vertex-ai/docs/datasets/prepare-tabular Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/blob/main/Train AutoML model in Vertex AI/classification.ipynb Kaggle Stroke Prediction ...
Create Tabular Dataset in Vertex AI
Просмотров 2,8 тыс.2 года назад
This tutorial shows how to create Tabular Dataset in Vertex AI, from BigQuery table, Google Cloud Storage CSV file or Pandas DataFrame. Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Create tabular dataset in Vertex AI 0:00 Datasets creation options 0:42 Create from BigQuery table 2:58 Create from Google Cloud Storage CSV file 4:12 Create from Panda...
Connect Jupyter Notebook with Vertex AI
Просмотров 6 тыс.2 года назад
This tutorial shows how to connect your Jupyter Notebook with the Vertex AI from both GCP and the local environment. Link to the documentation describing process of setting the environment variable: cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable Link to the Github repo with code from this tutorial: github.com/rafaello9472/c4ds/tree/main/Connect Jupyter Not...
It was wonderful and fluent expression. Thank you. I am a bit confused. All the explanations can be done using standard SQL. What is the benefit of Dataform. Thanks in advance.
Thanks for this wonderful insights
Hello, are you still active and availble for questions? I would like to ask you if it's possible to have a function that run and close a VM within google cloud itself? Thanks
Thank you for the interesting explanation. Would it be possible to share the slide deck? That would be great
Sure, here you go docs.google.com/presentation/d/1w0dthQkz5Wo84vDqO4OWZde9W_JjrmqHxDonXEKIisM/edit?usp=sharing
@@cloud4datascience772 Thanks I can use it to explain some more to my colleagues
excellent tutorial! Do you have a video on pipeline with vertex AI?
Hi I love your videos so far! Can you make a video on vertex pipelines with cloud scheduling for model training and deployment to an application? Looking forward for your reply!
Great video, very clear. Thank you
Hi, as I can see, there are no branches in Github linked under the vide? Is it because there will be some code updates and upgrades coming? Very well presentation, and I should say that before watching the turorial I had few places that gave me hard stops.. Now everything seems clearer. Thank you! Kind regards.
Hi, I am glad that you liked the video, thank you for the kind words! When it comes to branches, github moved it to stale branches, but you can still access those: github.com/rafaello9472/dataform-demo/branches/stale
@@cloud4datascience772 🙏🏻Thank you very much, I'll definitely do that ;)
Thank you! Great video
this is very helpful, thank you
This video was very helpful for me! Thank you 👍
I have a question, what about if I had a script with selenium (web scraping) library? There would be a problem right, because as far I can understand it needs a driver installed in your machine to work 😪
You can install libraries using requirements.txt, otherwise it could be a challenge if you want to customize the execution environment
Thanks a lot for sharing your knowledge!! Greetings from Mexico
Hi ... I wanted to take a moment to express my appreciation for your videos. They have been incredibly helpful as I learn about GCP, and your clear explanations make complex topics much more accessible. One question here, here every thing we are doing it manually using GUI and alternatively can we do the same process using Python SDK right from building package and pushing the package to Cloud Bucket and then training setup, Model Registry, Deployment and end point creation? If yes, when you get a time can you please post a video on this?
Thank you for the kind words! You are right, majority of the things can be done also by using Python SDK or gcloud CLI tool. I always try to focus first on the GUI approach as once you learn it it’s much easier to automate entire process with code. Unfortunately due to lot of project work on my end recently, I am not planning any new videos on that topic in the nearest future. I might come back to it, but I don’t know when it might happen. Hope that this video will be sufficient starting point for you!
I want to create a text data set, but all of my text is in pdf form. How would I go about doing that?
same, did you find a resolution on this?
@@August-m8l I didnt create a text dataset - I just used the pdfs as is. I created a GCS bucket with the pdfs, and i used gemini multimodal to process the text within the files. Hope that helps!
@@ekhemka-x6q thanks!
what is the zip file ? i didnt understant what is it about..
It’s the python source code
Thanks for sharing this brief and informative tutorial! Really helpful👍
Thanks, very informative video. You create a git repository with public mode, where you able to connect the private git rep too?
Hello, thanks for your helpful videos! Question: I want to perform batch prediction for a foundation prebuilt model (llama3). I have downloaded Llama 3 chat-8b model into VertexAI Model registry. When I try to start a batch prediction job, I get the following issue: InternalServerError: 500 Unknown ModelSource source_type: MODEL_GARDEN model_garden_source { public_model_name: "publishers/meta/models/llama3" } for model projects/591244989428/locations/us-east1/models/llama3_chat_8b@1 Any idea on wha the issue is about? I didn't find any helpful resources on this. Appreciate any help!
Is it possible to use javascript parametrizable variables on your sqlx files for the queries?
How to use bigquery tables to read, make some transformations and write to other tables, with dataform?
Great video :)
Thank you so much!! I have a question though, it was said that to link a third-party remote repository to a Dataform repository, you need to first authenticate it. Any thoughts on this?
Lol it’s not working. Can’t deploy. What can be the problem? I did exactly what you did, except can’t create the same bucket so named the bucket as c4ds1. Also changed the code for that.
Please answer to my question, I need to do the same thing as you did in this video. My python script works just fine under Google cloud shell. However, I am still having trouble making it work as cloud function. The purpose is to schedule the execution of the function. It consist of extracting a data from a web site and save in google sheet. I was able to make run it under google cloud shell. Any clue from you ?
Did you find?
Can we do this using google compute like GPU? how can we do that?
There is no easy direct way to do it with Cloud Functions. For GPU you would need to use Google Compute Engine and select GPU machine type, but process of automating some script execution would be much different, and it's not covered in my tutorial.
Thanks for this video! I have no knowledge of AI/ML. I just want to use vertex ai for my mobile application purposes. Now I have a dataset in the Kaggle. I have downloaded the dataset and all are the images. Now I want to use that to create an model and use API into my mobile app. How to do it in the GUI of vertex AI? I mean there are only images. As there isn't any CSV or JSON file, it's very time-consuming to upload 100 of images and label every one because I will be using Image Object detection. Is there any direct way to get the CSV or JSON from kaggle. or Can I get a trained model direclty. what's the exact flow to do get what I want?
Hi, you need to prepare either a CSV or JSONL file with image locations and labels, once you have it, use it to create an image dataset in Vertex AI, this, of course, requires some programming knowledge as I don't believe there will be a ready file for that purpose on Kaggle. I'm showing the process of creating such file from images I uploaded to Cloud Storage, hopefully, it can be a good start for you. If you have any doubts regarding the file, you can always refer to the documentation => cloud.google.com/vertex-ai/docs/image-data/classification/prepare-data
Great job! I followed your instructions, and everything started working smoothly for me. Your tutorial is fantastic - keep up the excellent work! You have the potential to reach 1 million subscribers. Keep pushing forward !!!
That is great to hear! Thank you for the kind words :)
Thank you ! very clear your explanations !
hey what about model deployment? can u make video on it?
Where can I get the stroke data
Here you go => www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
Thank you so much for such a wonderful tutorial. Perfectly designed and structured. And also thank you so much to help me to meet such a powerful platform.
Thank you for the kind words!
Hi! First of all, great video! Really simple and intuitive. It worked for me when I called only one function inside the hello_pubsub, but when I tried to call several others, from others .py, looked like the function run perfectly but with no results. Is there a way to make the cloud functions wait before every function finishes before moving to the next one? Thanks
In other words, I need the second function to get the result from the first, and so on. Thats why I need it to wait
I am so excited about the whole explanations
This is great
It’s very important to listen to this lecture and following same to be able to make online business accounts realized
This is awesomely exciting news about your business account
Depository platform is a very wide program to use in creating your own Database platform
I am so excited about your examples like Java print depository and all others you’ve mentioned
Thanks again and again for sharing this Database Depository code
I appreciate your interest in sharing this wonderful business opportunities online with your friends and family members who love sharing their business opportunities online with their businesses
Thanks so much for your concern about this wonderful program on Database form for businesses online
How to make batch prediction with a custom container? It would be nice to have a tutorial that uses a custom container to run a run and save it in model registry and then run a batch prediction
This video saved me. I was almost losing my patience while looking for these codes in the GCP platform, and I couldn't find them. Instead of using Jupyter Notebook, I am implementing them as a cloud function to automate the process of training every three months. THANK YOU.
thanks for the demo. what if I have two versions of the model, and I want to use the second one instead of the first one ?
Great video!!!!!!, Could you please let me know, how to use this managed datasets in the Kubeflow component which will be further used to execute the vertex Ai Pipeline.
best indian ever love u from morocco snor and chih and sk7k7 and hatim l7waa
Love you man
for me project was not equal to project name but id. seems to be the solution to a 403 for people on Stackoverflow
Is it possible to read a file directly from Google Cloud Storage using dataform?