Distributed Machine Learning with Apache Spark / PySpark MLlib

Поделиться
HTML-код
  • Опубликовано: 17 авг 2022
  • The Kaggle housing.csv file: www.kaggle.com/datasets/camnu...
    The Colab Notebook: colab.research.google.com/dri...
    PySpark RDD Introduction: • Apache Spark / PySpark...
    PySpark SQL Introduction: • PySpark Tutorial: Spar...
    PySpark MLlib Docs: spark.apache.org/docs/latest/...
    Thank you for watching the video! You can learn Data Science FASTER at mlnow.ai/ :)
    Master Python at mlnow.ai/course-material/python/!
    Learn SQL & Relational Databases at mlnow.ai/course-material/sql/!
    Learn NumPy, Pandas, and Python for Data Science at mlnow.ai/course-material/data...!
    Become a Machine Learning Expert at mlnow.ai/course-material/ml/!
    Don't forget to subscribe if you enjoyed the video :D

Комментарии • 23

  • @GregHogg
    @GregHogg  10 месяцев назад

    Take my courses at mlnow.ai/!

  • @nickie2222
    @nickie2222 10 месяцев назад +3

    Thank you for the video on mllib, I haven't watch much yet, but looks promising.
    The machine learning stuff starts about 12:00 - the beginning is a warm up to PySpark. Chapters/timestamps would have been helpful for a 40 min video (with each chapter being a different stage in the process or function).

  • @davtg8172
    @davtg8172 Год назад

    Thx Greg ! It's a very good tutorial from pyspark ! comprehensive with a lot of examples

  • @erint.4917
    @erint.4917 Год назад +3

    Great tutorial, Greg - really appreciate how you distilled such a comprehensive overview into a single video. Would you consider doing a video showing how to create a complete ML pipeline -- i.e., using output from Imputer(), StringIndexer(), OneHotEncoderEstimator(), VectorAssembler(), and VectorIndexer() -- for a dataset with multiple categorical and numerical features?

  • @maximinmaster7511
    @maximinmaster7511 Год назад

    Thank you for this tutorial on PySpark !

    • @GregHogg
      @GregHogg  Год назад +1

      You're very welcome 🙂

  • @drjabirrahman
    @drjabirrahman Год назад

    Good information Greg! Thanks for sharing.

    • @GregHogg
      @GregHogg  Год назад

      Glad to hear it! You're very welcome

  • @Value_Pilgrim
    @Value_Pilgrim Год назад

    Thanks. That was pretty comprehensive.

  • @suman14san
    @suman14san 11 месяцев назад

    Fantastic tutorial.

  • @user-kv2mn8bo4z
    @user-kv2mn8bo4z 3 месяца назад

    This is really helpful.
    Thank U

    • @GregHogg
      @GregHogg  3 месяца назад

      Super glad to hear it, you're very welcome! Thanks so much for the support ❤️

  • @arsheyajain7055
    @arsheyajain7055 Год назад

    Oh awesome thanks!

  • @GregThatcher
    @GregThatcher Месяц назад

    Thanks!

    • @GregHogg
      @GregHogg  Месяц назад

      Greg! You're too nice hahaha

  • @ammaralhawashem560
    @ammaralhawashem560 2 месяца назад

    Thank you
    I just found one thing is confusing
    which is that you did the standard scaling AFTER the merging into one column
    shouldn't have you done it for each column before the merging?

    • @GregHogg
      @GregHogg  2 месяца назад +1

      I don't remember sorry. But you're probably right

    • @ammaralhawashem560
      @ammaralhawashem560 2 месяца назад

      @@GregHogg Thankyou for your reply
      I have done an experiment; in order to observe, I tried two features with large difference in values
      and use 5 million rows
      It seems even if we merge all the features before applying the scaling it will still calculate the parameters (mean & STD dev) for each feature
      In summary, you did NOT make any mistake

  • @shiminglu3940
    @shiminglu3940 Год назад +1

    I wish I had seen this when I took Econ 424(ml) at uw😂