How to Read and Write PySpark DataFrame | Big Data PySpark Tutorial

Поделиться
HTML-код
  • Опубликовано: 2 фев 2025

Комментарии • 30

  • @ravi-y7b1d
    @ravi-y7b1d Месяц назад

    facing the error while saving the file

  • @AwaisBinMukhtar
    @AwaisBinMukhtar Год назад +1

    Py4JJavaError - facing this error kindly help for this

    • @BOSS-AI-20
      @BOSS-AI-20 Год назад

      if also not work then paste the same file in c:/windows/system32 it will work fine

    • @AwaisBinMukhtar
      @AwaisBinMukhtar Год назад

      i just switched to Databricks and all of the issues resolved there . like there is no need to setup all the things and all its just like we use collab over jupyter notebook so in the same case you can use databricks for pyspark as well

    • @BOSS-AI-20
      @BOSS-AI-20 Год назад

      @@AwaisBinMukhtar Nice
      btw you're using databricks community version ?

    • @QuangHuy-is7jo
      @QuangHuy-is7jo 10 месяцев назад

      @@BOSS-AI-20 Can you explain in more detail?

    • @villaloboscastanedagerman1171
      @villaloboscastanedagerman1171 10 месяцев назад

      SAME ERROR

  • @villaloboscastanedagerman1171
    @villaloboscastanedagerman1171 10 месяцев назад

    Py4JJavaError: An error occurred while calling o62.save.
    : java.lang.UnsatisfiedLinkError: 'boolean

    • @sameerratnaparkhi8733
      @sameerratnaparkhi8733 9 месяцев назад +2

      download hadoop.dll and set path, It fixed this issue for me.

    • @indianintrovert281
      @indianintrovert281 9 месяцев назад

      @@sameerratnaparkhi8733 Thanks Bro, it worked (Saved a lot of time)

    • @monikashinde7227
      @monikashinde7227 6 месяцев назад

      ​@@sameerratnaparkhi8733Thank you for me also it's solved the issue which I am facing from 1 week ago

  • @grandeur_82
    @grandeur_82 8 месяцев назад

    I got this error:
    Py4JJavaError Traceback (most recent call last)
    Cell In[20], line 5
    1 output.write\
    2 .format("csv").mode("overwrite")\
    3 .option("path", "file:///output/op/")\
    4 .partitionBy("age")\
    ----> 5 .save()

    • @patrickwheeler7107
      @patrickwheeler7107 5 месяцев назад

      Curious did you ever get this figured out?

    • @DEMON-jg3zl
      @DEMON-jg3zl 5 месяцев назад

      @@patrickwheeler7107 same to me so did you?

    • @patrickwheeler7107
      @patrickwheeler7107 5 месяцев назад

      @@DEMON-jg3zl I haven't yet. I did some research but I haven't had time to deep dive due to work...

    • @DEMON-jg3zl
      @DEMON-jg3zl 5 месяцев назад

      @@patrickwheeler7107 ahh atleast thanks to reply me back "sir" and for return I'll make sure to give you the solution to this particular problem before monday (yeah because my fckin free chat gpt quota is up today)

    • @patrickwheeler7107
      @patrickwheeler7107 3 месяца назад

      @@DEMON-jg3zl I ended up finding out I was missing the HADOOP.DLL file in my system32 folder. You can google is and pull it down off of a GIT repo.

  • @sachindubey4315
    @sachindubey4315 Год назад

    I m trying to write file but facing error " java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
    at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249)
    at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454)"
    any solution are there of this

    • @albertopedro8632
      @albertopedro8632 Год назад

      I´VE BEEN FACED THE SAME ERROR

    • @BOSS-AI-20
      @BOSS-AI-20 Год назад

      check my comment above

    • @xx-pn7it
      @xx-pn7it Год назад

      Any solution for that

    • @villaloboscastanedagerman1171
      @villaloboscastanedagerman1171 10 месяцев назад

      I HAVE SAME ERROR

    • @tossthefeathers4135
      @tossthefeathers4135 9 месяцев назад

      @@villaloboscastanedagerman1171 You will need to check the compatible version of winutils.exe and hadoop.dll file for your Spark version. For e.g. for me, my spark version is 3.5.1, so for that the compatible hadoop version is 3.3.4 and lower.
      We can find this compatible hadoop/winutil version for each spark version as follows: go to your Spark folder i.e SPARK_HOME location. Then in that folder open the RELEASE file using Notepad. There you will see something like this: (in my case)
      Spark 3.5.1 (git revision fd86f85e181) built for Hadoop 3.3.4
      Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver
      So you can see that for Spark 3.5.1 version, the hadoop version 3.3.4 is compatible.
      So, in this case, we need to go to github.com/cdarlint/winutils/tree/master and download both files i.e hadoop.dll and winutils.exe from 3.2.2 version (as there is no folder for 3.3.4 version and next lowest is 3.2.2, so that works)
      Now paste both files in the bin folder of HADOOP_HOME i.e C:/hadoop/bin/
      I did the above exercise and it worked for me. I gues its because winutils.exe version is not compatible with the spark verison installed..