PySpark Tutorial 17: PySpark Correlation Analysis | PySpark with Python

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 15

  • @kike1022
    @kike1022 2 года назад

    How do you create a csv file of your final data frame correlation results? Excellent video by the way, great job!

    • @StatsWire
      @StatsWire  2 года назад

      Thank you. You can convert it to pandas dataframe and then write as a csv file.

  • @tutorb6975
    @tutorb6975 2 года назад

    I followed your video and it work well, but I am trying to create heatmap instead of the table/data frame.
    import seaborn as sns
    plt.figure(figsize=(20,18))
    #corr = Tablematrix[cor_col]
    sns.heatmap(dataset.correlation.corr(),
    cmap= 'viridis', vmax=1.0,vmin=-1.0, linewidths=1,
    annot=False,annot_kws={"size":1}, square=True);
    plt.set_xticklabels(plt.get_xticklabels(), rotation=30)
    plt.xlabel("X-Axis")
    plt.ylabel("Y-Axis")
    plt.show()
    how can I do this without Pandas?

    • @StatsWire
      @StatsWire  2 года назад

      Hello, it's very easy to create using seaborn. In notebook where I finish coding below that insert a new cell and just copy pase the below line of code and you will be able to plot in 1 minute
      >import seaborn as sns
      >sns.heatmap(corrmatrix, xticklabels=df_corr.columns,
      yticklabels=df_corr.columns, annot=True)

  • @pritishbanerjee5517
    @pritishbanerjee5517 3 года назад

    Hi Amir, in colab the following line is giving error:
    matrix = Correlation.corr(df_vector, vector_col).collect()[0][0]
    I have taken your house_price.csv dataset.
    Till then everything executed properly.

    • @StatsWire
      @StatsWire  3 года назад

      Can you paste the error message here

    • @StatsWire
      @StatsWire  3 года назад +1

      Ok, you are getting this error because there are missing values in your data frame. First, you need to either remove missing values or impute them mean or median

    • @pritishbanerjee5517
      @pritishbanerjee5517 3 года назад +1

      @@StatsWire Great I will try it, but I used the "house_price.csv" from your gitub. Thanks a lot I will do the same.

    • @StatsWire
      @StatsWire  3 года назад +1

      @@pritishbanerjee5517 I might have edited that CSV. I will check and reply to you. Sorry for the inconvenience Pritish

  • @mahdisaid3755
    @mahdisaid3755 Год назад

    Could you please send me the dataset house.csv link ?

    • @StatsWire
      @StatsWire  Год назад +1

      Hi, please find the dataset link: github.com/siddiquiamir/Data

  • @mazharalamsiddiqui6904
    @mazharalamsiddiqui6904 3 года назад

    Nice

  • @kashyaprathore6644
    @kashyaprathore6644 3 года назад

    bro why 9 unavailable videos are hidden

    • @StatsWire
      @StatsWire  3 года назад

      I have scheduled them, they will come one by one