How to Stage Data in Snowflake

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • In this episode, we'll take a look at how to use Snowflake internal staging in order to process csv, json, xml, or other text based files and put their data into Snowflake tables. Often we have many fragmented data files of the same type and structure, and need to get that data into our Snowflake tables. Creating a Stage helps us to process that data. We'll use the Snowflake connector with Python to create persistent and temporary stages, then load data from our local environment, process it, then query and review the new data in our Snowflake table. Let's go! *For external data staging on Amazon and Azure, stay tuned.
    Related Videos:
    How to Create a Scheduled Task in Snowflake Using Python
    • How to Create a Schedu...
    Python on Snowflake - How to Delete Records from One Table Using an ID Column from Another
    • Python on Snowflake - ...
    How to Connect Python to Snowflake using ODBC
    • How to Connect Python ...
    How to Stage Data in Snowflake
    You are watching this one now!
    Python on Snowflake: How to Get PyArrow Table Output From a Data Warehouse
    • Python on Snowflake: H...
    How to Use Flatten to Query JSON Tabular Array Data in Snowflake
    • How to Use Flatten to ...
    Join me on Patreon!
    / mackenziedataengineering
    Demo of my BZ RDP Cloaker:
    www.patreon.co...
    Check out some tech (and support my channel!):
    www.amazon.ca/...
    Want the code from this video?
    mackenziemacken...
    Interested in transforming your career or finding your next gig?
    system.billzon...
    Want my team to do a project for you? Let's get to it!
    system.billzon...
    Contact form you can find at www.mackenziema...
    Follow me on social media:
    / mackenziedataanalytics
    / seamacke
    / seamacke
    / seamacke
    / psmackenzie
    Get Microsoft Office including Access:
    click.linksyne...
    Got a RUclips Channel? I use TubeBuddy, it is awesome. Give it a try:
    www.tubebuddy....
    #snowflake #python

Комментарии • 4

  • @miles6875
    @miles6875 Год назад +1

    Interesting topic - trying to expand my horizons. Thanks for the bite-sized learning Sean

    • @seanmackenziedataengineering
      @seanmackenziedataengineering  Год назад

      Staging is the gateway to Big Data solutions. If you learn this technique, it will open a whole new world! Cheers

  • @szechan8958
    @szechan8958 Год назад

    Hello Sean, thank you for putting this video together! I have a few questions that I may have missed from this video: 1) Is there a reason I would stage CSV file into Snowflake as oppose to upload a CSV file that will become a table in Snowflake? I don't fully understand the goal behind staging. 2) Say I have a CSV file with 5k rows that gets updated once a month, each update will generate an addition of 5 new rows. Do I stage the same CSV file every month? 3) Say I need to do data cleaning to the CSV file, do I do it before staging using Python/Pandas or after staging? If the latter, what keywords/terms I should Google?

    • @seanmackenziedataengineering
      @seanmackenziedataengineering  Год назад

      Great questions. One stage may have a particular design that belongs to it. I talked a little about this in the video (the topic is actually pretty big). For example, your csv has no formatting or anything because it is just text, but in your stage you can really refine things like data types, and also date formats and all kinds of other stuff. 1. So, when a CSV is uploaded to *that* stage, the design is already known, so it can be processed easily and accurately. 2. Sure, you could do that for one CSV, but what if you got thousands of the same kind of CSV on the same day each month? With staging, you can stage and process them together. 3. You can do data cleaning before or after you stage; the stage itself will have many options set on how to deal with empty values, date formatting, etc. If all of the CSV files follow the same format and have the same requirements, you can just run them through your stage and into your table. So, look at the Snowflake documentation for staging and you will see the (immense) number of options available to you in creating the stage. This video just scratches the surface!
      To compare: Your 5k row CSV could easily be done with Pandas: ruclips.net/video/JMMp5zvWdxs/видео.html or you could use pd_writer ruclips.net/video/mJSWAj_D6f4/видео.html and just automate your process, but when you start getting to scale, you definitely want the capabilities of staging.