Building ML Models in Snowflake Using Python UDFs and Snowpark | DEMO

Поделиться
HTML-код
  • Опубликовано: 22 мар 2023
  • Learn how to build machine-learning models in Snowflake in this demo by Sonny Rivera of Thoughtspot and Chris Hastie of InterWorks. During the demo, they show how to use Snowpark to clean your data and perform feature engineering, build and train sales forecast models using Python in Snowflake, use Python UDFs to expose your predictive models, and present and analyze your models in ThoughtSpot.
    To access the code used in this demo, go to:
    github.com/ChrisHastieIW/Snow...
    To access the Quickstart guide for this topic, go to:
    github.com/thoughtspot/quicks...
    Learn more about Thoughtspot:
    Website: www.thoughtspot.com
    Twitter: @thoughtsport
    LinkedIn: /www.linkedin.com/company/thoughtspot
    Learn more about Interworks:
    Website:interworks.com
    Twitter: @interworks
    LinkedIn: www.linkedin.com/company/interworks
    To connect with the presenters:
    Sonny Rivera, Senior Analytics Evangelist, ThoughtSpot
    LinkedIn: / sonnyrivera
    Chris Hastie, Data Engineering and Analytics Consultant, InterWorks
    LinkedIn: / chris-hastie
    Learn how to build your application on Snowflake:
    developers.snowflake.com
    Continue the conversation by joining the Snowflake Community:
    community.snowflake.com
    ❄Join our RUclips community❄ bit.ly/3lzfeeB
    "

Комментарии • 11

  • @octo3010
    @octo3010 10 месяцев назад +1

    How would you solve this with a vectorized UDF?
    Is there a demo on same.

    • @sonny.rivera
      @sonny.rivera 10 месяцев назад +2

      Chris and I did not vectorize the UDF. That's a great idea. I'll sync with Chris and see want we do. Thanks for the great suggestion.

    • @octo3010
      @octo3010 10 месяцев назад

      @@sonny.rivera that would be very helpful.
      On a side note, is there any reference material on optimising costs for Snowflake compute resources.

    • @snowflakedevelopers
      @snowflakedevelopers  10 месяцев назад +1

      Here a few resources to get you started:
      medium.com/snowflake/best-practices-to-optimize-snowflake-spend-73b8f66d16c1
      medium.com/snowflake/using-snowflakes-scale-to-zero-capabilities-for-fun-profit-f326a1d222d0
      medium.com/snowflake/deep-dive-into-managing-latency-throughput-and-cost-in-snowflake-2fa658164fa8
      medium.com/snowflake/improve-snowflake-price-performance-by-optimizing-storage-be9b5962decb
      medium.com/snowflake/compute-primitives-in-snowflake-and-best-practices-to-right-size-them-b3add53933a3

    • @octo3010
      @octo3010 10 месяцев назад

      Thank you !

  • @nagasai5029
    @nagasai5029 10 месяцев назад

    Where can i find the data set that is used in this video

  • @tahabekmez5072
    @tahabekmez5072 11 месяцев назад

    When you run the ml, does it run on local machine or within snowflake?

    • @sonny.rivera
      @sonny.rivera 11 месяцев назад +1

      I often dev and test using VS Code/python on my local instance and then deploy the code to snowflake & snowpark that runs in the cloud.

  • @saeedrahman8362
    @saeedrahman8362 Год назад

    if we do the per category training and predictions in udf function generate_auto_arima_predictions via pandas dataframe we wouldn't get any parallelization benefit, right ? We would process all the categories sequenetially.
    Shouldn't we use UDTF for these kind of operations ?

    • @snowflakedevelopers
      @snowflakedevelopers  Год назад

      Thanks for your comment! A UDTF would be a stronger option, as it could leverage parallel partitioning to perform these concurrently instead (as you mention). Check out the following two articles on training ARIMA models:
      interworks.com/blog/2022/11/22/a-definitive-guide-to-creating-python-udtfs-directly-within-the-snowflake-user-interface/
      interworks.com/blog/2022/11/29/a-definitive-guide-to-creating-python-udtfs-in-snowflake-using-snowpark/
      For some more information on UDTFs and how they work, see:
      interworks.com/blog/2022/11/15/an-introduction-to-python-udtfs-in-snowflake/
      Thanks!

    • @sonny.rivera
      @sonny.rivera 11 месяцев назад

      The models will run concurrently on the virtual warehouse. The UDTF is really just calling the 'predict' function. The model training is happening in the stored proc.