Polars is the Pandas killer / Igor Mintz (Viz.ai)

Поделиться
HTML-код
  • Опубликовано: 3 окт 2024
  • While pandas is the de-facto dataframe solution for python, Polars is competing head to head with scale, speed and ease of use.
  • НаукаНаука

Комментарии • 22

  • @z.r.777
    @z.r.777 5 месяцев назад +15

    Why did they keep interrupting??

  • @moose304
    @moose304 5 месяцев назад +6

    Previously used Pandas but the speed and syntax are just so much better and more consistent in Polars (after the initial learning curve) I only switch back to Pandas if HAVE to (usually GIS related). Also, one update since this talk was given, it was announced that Polars will also be getting GPU support via Rapids (same project that brought GPU support to Pandas)

    • @IgorMintz
      @IgorMintz 5 месяцев назад +3

      You're right! I've just seen the update regarding rapids-polars-GPU a few days after the lecture. Now I need to update the slides lol

  • @gorillaglued
    @gorillaglued 5 месяцев назад +4

    Lazy processing with scan_parquet and sink_parquet is pretty good, use it all the time. I usually have 10's, or even 100 GB's of csv's. Just scan through it dump to parquet (still bigger than I could keep in memory), then break it apart with sink/scan and duckdb.

  • @twentytwentyeight
    @twentytwentyeight 5 месяцев назад +1

    The polars learning curve was tougher for me than dask but performance is consistent 👍🏾

  • @Sean_neaS
    @Sean_neaS 5 месяцев назад +3

    I was trying to learn pandas just as Polars was coming out and I'm glad I switched to Polars. I was so frustrated with pandas nonsensical quirks and inconsistencies. It didn't seem to follow any conventions I'd ever seen. Maybe there was some statistics or math standard that I'm not familiar with, but as a programmer, Polars has a beautiful API that does what I expect it to do.

  • @danielthompson2561
    @danielthompson2561 3 месяца назад

    I wish it could lazy scan from a database - the lazy frame function is excellent, but for data security reasons, I’m working from secure database and not parquet files.

  • @sweealamak628
    @sweealamak628 5 месяцев назад +1

    I'll take Polars seriously when employers ask for certification. Till that happens, I see enterprises unable to uproot themselves from pandas.

    • @ringpolitiet
      @ringpolitiet 5 месяцев назад +2

      Are they asking for pandas certification now?

    • @sweealamak628
      @sweealamak628 5 месяцев назад

      @@ringpolitiet No of course not. There is no pandas cert, but cert in ML or Data Analytics using pandas.

  • @fburton8
    @fburton8 5 месяцев назад

    Sectarianism aplenty!

  • @KirillEgorov-p7y
    @KirillEgorov-p7y 5 месяцев назад +2

    Video is very basic. If you have tried polars already don’t waist your time

  • @mokus603
    @mokus603 5 месяцев назад +8

    Polars might be faster but the syntax is so bad, it's extremely uncomfortable.

    • @davidmas26694
      @davidmas26694 5 месяцев назад +10

      Omg syntax in pandas is way worse

    • @marco_gorelli
      @marco_gorelli 5 месяцев назад

      It's beautiful once you get used to it - I'd suggest reading the blog post "The Expressions API in Polars is Amazing", if you search for it you'll find it

    • @ringpolitiet
      @ringpolitiet 5 месяцев назад +1

      Do you have an example of something you find uncomfortable in polars that is better in pandas?

    • @patericktran
      @patericktran 5 месяцев назад +1

      😂 i think u did not know sql

    • @chobblegobbler6671
      @chobblegobbler6671 5 месяцев назад +1

      Better than pandas, same shit as pyspark lesser than sql
      Databricks, Spark Sql best

  • @BUY_YT_VIEWS_m044
    @BUY_YT_VIEWS_m044 5 месяцев назад +2

    Wow, this is the kind of content, keep me visiting youtube.