Databricks CI/CD: Azure DevOps Pipeline + DABs

Поделиться
HTML-код
  • Опубликовано: 15 сен 2024

Комментарии • 12

  • @gangadharneelam3107
    @gangadharneelam3107 12 дней назад +1

    Hey Dustin,
    We're currently exploring DABs, and it feels like this was made just for us!😅
    Thanks a lot for sharing it!

  • @albertwang1134
    @albertwang1134 18 дней назад

    I am learning DABs at this moment. So lucky that I found this video. Thank you, @DustinVannoy. Do you mind if I ask a couple of questions?

    • @DustinVannoy
      @DustinVannoy  18 дней назад

      Yes, ask away. I'll answer what I can.

    • @albertwang1134
      @albertwang1134 17 дней назад

      Thank you, @@DustinVannoy. I wonder whether the following development progress does make sence. And if there any thing we could improve it.
      Background:
      (1) We have two Azure Databricks workspaces, one is for development, one is for production.
      (2) I am the only Data Engineer in our team, and we don't have dedicate QA. I am responsible to development and test. Who consume the data will do UAT.
      (3) We use Azure DevOps (repository and pipelines).
      Process:
      (1) Initialization
      (1.1) Create a new project by using `databricks bundle init`
      (1.2) Push the new project to Azure DevOps
      (1.3) On development DBR workspace, create a GIT Folder under `/Users/myname/` and link to the Azure DevOps repository
      (2) Development
      (2.1) Create a feature branch on DBR workspace
      (2.2) Do my development and hand test
      (2.3) Create a unit test job and the scheduled daily job
      (2.4) Create a pull request from the feature branch to the main branch on DBR workspace
      (3) CI
      (3.1) An Azure CI pipeline (build pipeline) will be trigerred after the pull request is created
      (3.2) The CI pipeline will check out the feature branch, and do `databricks bundle deploy` and `databricks bundle run --job the_unit_test_job` on the development DBR workspace by using Service Principal.
      (3.3) The test result will show on the pull request
      (4) CD
      (4.1) If everything looks good, the pull request will be approved
      (4.2) Manually trigger an Azure CD pipeline (release pipeline). Checkout the main branch, do `databricks bundle deploy` to the production DBR workspace by using Service Principal
      Explanation:
      (1) Because we are a small team and I am the only person who works on this, we do not have a `release` branch to simply the process
      (2) Due to the same reason, we also do not have a staging DBR workspace

    • @DustinVannoy
      @DustinVannoy  15 дней назад +1

      Overall process is good. It’s typical not to have a separate QA person. I try to use yaml pipeline for the release step so code would look pretty similar to what you use to automate deploy to dev. I recommend having unit tests you can easily run as you build which is why I try to use Databricks-connect to run a few specific unit tests at a time. But, running workflows on all-purpose or serverless isn’t too bad an option for quick testing as you develop.

  • @benjamingeyer8907
    @benjamingeyer8907 19 дней назад

    Now do it in Terraform ;)
    Great video as always!

    • @DustinVannoy
      @DustinVannoy  19 дней назад +1

      🤣🤣 it may happen one day, but not today. I would probably need help from build5nines.com

  • @thusharr7787
    @thusharr7787 17 дней назад

    Thanks, one question I have some metadata files in the project folder, I need to copy this to a volume in Unity catlog. Is it possible through this deploy process ?

    • @DustinVannoy
      @DustinVannoy  15 дней назад

      Using Databricks CLI path, you can have command that copies data up to volume. Replace all the curly brace { } parts with your own values.
      databricks fs cp --overwrite {local_path} dbfs:/Volumes/{catalog}/{schema}/{volume_name}/{filename}

  • @albertwang1134
    @albertwang1134 7 дней назад

    Hi Dustin, have you tried to configure and deploy a single node cluster by using Databricks Bundle?

    • @DustinVannoy
      @DustinVannoy  6 дней назад

      Yes, it is possible. It looks something like this:
      job_clusters:
      - job_cluster_key: job_cluster
      new_cluster:
      spark_version: 14.3.x-scala2.12
      node_type_id: m6gd.xlarge
      num_workers: 0
      data_security_mode: SINGLE_USER
      spark_conf:
      spark.master: local[*, 4]
      spark.databricks.cluster.profile: singleNode
      custom_tags: {"ResourceClass": "SingleNode"}

    • @albertwang1134
      @albertwang1134 4 дня назад

      @@DustinVannoy Thanks a lot! This cannot be found in the Databricks documentation.