Did exactly the same I built a wrapper on top of the cli to extract notebooks and would add the notebooks into my source control. I use artifacts to add in my own custom classes and functions in Pyspark.
Thanks for the video. I am wondering what path do we use now for calling other notebooks (using %run). It can be different within the repos and when it is actually deployed to workspace.
I revisited this in a recent video for "files in repos", goes through the library development approach. Take a look and let me know if that covers your questions! ruclips.net/video/nN-NPnfJLNY/видео.html Simon
thanks for this video, it's very useful! - i've got two questions :) 1. when you set it up, you already had a project in devops and the notebooks got pushed from devops to databricks? is it possible to do it the other way around? eg i have a bunch of notebooks in databricks and i want to set up source control - i want all of the notebooks in databricks to be pushed to devops automatically when the source control is set up - can it be done? or do i have to add them manually and move them to the repo? 2. also when it comes to CI/CD in azure devops - is there a simple way to not only deploy notebooks to different workspaces but also deploy actual databricks objects such as eg tables? eg there would be a script that checks if a table exists and if not it creates it or alters it if there are fields missing? Thanks!
Great video! Works for my case even when our repo is integrated with other code base. This is due to how we deploy code via IaC. It would be nice if I could point to a sub folder in the repo. Cleaner.
This is a wonderful video, but I have a doubt: Does "Databricks SQL ANALYTICS" work with REPOS too? Is there a way to store in a secure place all my queries and dashboards?
Depressingly, no. There's no source control integration within Databricks SQL - it does maintain a history of your various query objects, but they're all held within databricks itself, rather than integrating back to Git. It guess it's the same thing we've seen with other "business facing BI tools" where the assumption is that data analysts don't "get" source control.... would be nice to have the option though!
Thanks for this great video! If I understand correctly, the Repos don‘t replace the Workspace? Could you explain or share some resources, how to release the (main?) branch in the workspace?
Just as a short follow up. I can see and access notebooks in ADF, when I link the workspace as a source. Are there any objections to use notebook in production like this? (I didn‘t try autoloaders yet...)
Great video, I’m also wondering about the %run notebook functionality. How would that work inside the git repo? It will work with relative path but not absolute patch?
Hey - you can use either, but the absolute path will reference a path to the repo in dbfs, unlike the standard notebook paths, so could would change between deployments etc - generally better to stick with relative if possible!
Thanks for the video! In my project I have built a CI pipeline using databricks-cli & Azure DevOps, using databricks-cli I can trigger the Notebook which is present in my workspace (have synched workspace NB with Azure Repos) but how can I trigger Notebooks present in Databricks Repos??
Hi Aditya, I don't know if my suggestion can help you, but you could create a "global parameter" in "AZURE DATA FACTORY (ADF)", pointing to the REPOS where your notebook is. For example: A) Create a global parameter in ADF: DBC_NOTEBOOK_PATH = /Repos/myemail@sparkfans.com/notebooks/ B) In the PIPELINE that runs this notebook, in the component NOTEBOOK, configure the NOTEBOOK PATH parameter with this value: @concat(pipeline().globalParameters.DBC_NOTEBOOK_PATH, '/silver/my_notebook.dbc') With these steps, you need only to change the DBC_NOTEBOOK_PATH value to work with WORKSPACE or REPOS C) One advice: when you are working with REPOS and WORKSPACE, be careful with your notebook's %run commands. You need to adopt the "relative path" instead of the "absolute path". Example: Rather then %run "global/functions.dbc" Use: "./../../../functions.dbc" (the number of "../" depends of where your "my_notebook.dbc" is located) Best regards
You can now! You couldn't when I made this original video. Check out the video I put together on .Py files in repos: ruclips.net/video/nN-NPnfJLNY/видео.html
Have you notice that if you make a commit in the Repo in a notebook, when you look at the same notebook in the Workspace that change is ignored? I'm facing this big problem, the notebooks in the workspace always ignore my last commit change...
Not seen it, but we've also changed the patterns we use. We no longer sync workspace notebooks with git, we instead use a DevOps pipeline to push notebooks to the workspace version when changes we've made Inna feature branch are pulled into our Dev branch. Had no problems with that pattern! Simon
@@AdvancingAnalytics great tip, I'll the same. Do you have any resource/git repo with examples for that pipeline. I'm a Data Scientist and I'm new to those things about DevOps/MLOps. I appreciate any book, youtube channel or online course you could share. Thanks again, you have a fantastic channel, I love it ❤
Much better than notebook based GIT control. It was a mess, especially when moving notebooks to other folders. Thanks.
DB Repos IS still a mess =(
Wow, you solved my biggest problem while collaborating with other people. Thank you.
Did exactly the same I built a wrapper on top of the cli to extract notebooks and would add the notebooks into my source control. I use artifacts to add in my own custom classes and functions in Pyspark.
This feature is very useful and I hadn't heard about it before. It meets my current needs :) Thanks for share your tips!
Thanks for the video. I am wondering what path do we use now for calling other notebooks (using %run). It can be different within the repos and when it is actually deployed to workspace.
Great video 😊 Would be great to see another video or blog talking through the patterns your using with databricks repos!
I revisited this in a recent video for "files in repos", goes through the library development approach. Take a look and let me know if that covers your questions! ruclips.net/video/nN-NPnfJLNY/видео.html
Simon
Finally!
thanks for this video, it's very useful! - i've got two questions :)
1. when you set it up, you already had a project in devops and the notebooks got pushed from devops to databricks?
is it possible to do it the other way around? eg i have a bunch of notebooks in databricks and i want to set up source control - i want all of the notebooks in databricks to be pushed to devops automatically when the source control is set up - can it be done? or do i have to add them manually and move them to the repo?
2. also when it comes to CI/CD in azure devops - is there a simple way to not only deploy notebooks to different workspaces but also deploy actual databricks objects such as eg tables? eg there would be a script that checks if a table exists and if not it creates it or alters it if there are fields missing?
Thanks!
Great video! Works for my case even when our repo is integrated with other code base. This is due to how we deploy code via IaC. It would be nice if I could point to a sub folder in the repo. Cleaner.
Thank you for sharing the video.....
Thank you for the video!
This is a wonderful video, but I have a doubt: Does "Databricks SQL ANALYTICS" work with REPOS too? Is there a way to store in a secure place all my queries and dashboards?
Depressingly, no. There's no source control integration within Databricks SQL - it does maintain a history of your various query objects, but they're all held within databricks itself, rather than integrating back to Git. It guess it's the same thing we've seen with other "business facing BI tools" where the assumption is that data analysts don't "get" source control.... would be nice to have the option though!
Thanks for this great video!
If I understand correctly, the Repos don‘t replace the Workspace?
Could you explain or share some resources, how to release the (main?) branch in the workspace?
Just as a short follow up. I can see and access notebooks in ADF, when I link the workspace as a source. Are there any objections to use notebook in production like this? (I didn‘t try autoloaders yet...)
Great video, I’m also wondering about the %run notebook functionality. How would that work inside the git repo? It will work with relative path but not absolute patch?
Hey - you can use either, but the absolute path will reference a path to the repo in dbfs, unlike the standard notebook paths, so could would change between deployments etc - generally better to stick with relative if possible!
@@AdvancingAnalytics so does that mean you can use this repo as if it was local on your pc and import modules and stuff like a normal filestructure
Thanks for the video! In my project I have built a CI pipeline using databricks-cli & Azure DevOps, using databricks-cli I can trigger the Notebook which is present in my workspace (have synched workspace NB with Azure Repos) but how can I trigger Notebooks present in Databricks Repos??
Hi Aditya,
I don't know if my suggestion can help you, but you could create a "global parameter" in "AZURE DATA FACTORY (ADF)", pointing to the REPOS where your notebook is.
For example:
A) Create a global parameter in ADF:
DBC_NOTEBOOK_PATH = /Repos/myemail@sparkfans.com/notebooks/
B) In the PIPELINE that runs this notebook, in the component NOTEBOOK, configure the NOTEBOOK PATH parameter with this value:
@concat(pipeline().globalParameters.DBC_NOTEBOOK_PATH, '/silver/my_notebook.dbc')
With these steps, you need only to change the DBC_NOTEBOOK_PATH value to work with WORKSPACE or REPOS
C) One advice: when you are working with REPOS and WORKSPACE, be careful with your notebook's %run commands. You need to adopt the "relative path" instead of the "absolute path".
Example:
Rather then
%run
"global/functions.dbc"
Use:
"./../../../functions.dbc" (the number of "../" depends of where your "my_notebook.dbc" is located)
Best regards
Can you also ad non notebook files(python source files) in the repo and run them on the cluster
You can now! You couldn't when I made this original video. Check out the video I put together on .Py files in repos: ruclips.net/video/nN-NPnfJLNY/видео.html
Have you notice that if you make a commit in the Repo in a notebook, when you look at the same notebook in the Workspace that change is ignored? I'm facing this big problem, the notebooks in the workspace always ignore my last commit change...
Not seen it, but we've also changed the patterns we use. We no longer sync workspace notebooks with git, we instead use a DevOps pipeline to push notebooks to the workspace version when changes we've made Inna feature branch are pulled into our Dev branch. Had no problems with that pattern!
Simon
@@AdvancingAnalytics great tip, I'll the same. Do you have any resource/git repo with examples for that pipeline. I'm a Data Scientist and I'm new to those things about DevOps/MLOps. I appreciate any book, youtube channel or online course you could share. Thanks again, you have a fantastic channel, I love it ❤
@@AdvancingAnalytics Please share which AzureDevops release pipeline step do you use for pushing Git code into ADB Workspace.