Install Apache PySpark on Windows PC | Apache Spark Installation Guide

Поделиться
HTML-код
  • Опубликовано: 27 дек 2024

Комментарии • 443

  • @nftmobilegameshindi8392
    @nftmobilegameshindi8392 9 месяцев назад +10

    spark shell not working

  • @indianintrovert281
    @indianintrovert281 7 месяцев назад +31

    Those who are facing problems like 'spark-shell' is not recognized as an internal or external command
    On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
    And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
    If it worked, like this so that more people benefit from this

    • @SharinH
      @SharinH 7 месяцев назад +1

      It worked .. Thank you

    • @jagjodhsingh2358
      @jagjodhsingh2358 7 месяцев назад +1

      It worked, thanks :)

    • @Manishamkapse
      @Manishamkapse 7 месяцев назад +1

      Thank you 😊 so much it worked

    • @Manishamkapse
      @Manishamkapse 7 месяцев назад

      Thank you 😊 so much it worked

    • @vishaltanwar2238
      @vishaltanwar2238 7 месяцев назад +1

      why did we get this error?

  • @joshizic6917
    @joshizic6917 Год назад +10

    how is your spark shell running from your users directory?
    its not running for me

    • @Sai_naga
      @Sai_naga 4 месяца назад +2

      did it workfor you now? same issue ffacing here

  • @riptideking
    @riptideking 9 месяцев назад +2

    'pyspark' is not recognized as an internal or external command,
    operable program or batch file.
    getting this error and tried it for whole day and same issue.

    • @srishtimadaan03
      @srishtimadaan03 7 месяцев назад +1

      On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
      And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)

    • @Sai_naga
      @Sai_naga 4 месяца назад

      @@srishtimadaan03 hello....but we added spark home in environment variables, what is the point of running it from the exact location? Environment variables should help system to find the command.

  • @prateektripathi3834
    @prateektripathi3834 Год назад +5

    Did Everything as per the video, still getting this error : The system cannot find the path specified. on using spark-shell

    • @srishtimadaan03
      @srishtimadaan03 7 месяцев назад +1

      On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
      And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)

    • @jaymanhire
      @jaymanhire Месяц назад

      @@srishtimadaan03 Yes!

    • @youknowwhatlol6628
      @youknowwhatlol6628 Месяц назад

      @@srishtimadaan03 it doesnt work for me....fuc fuc fuc fuc what do i doooo

  • @ismailcute1584
    @ismailcute1584 10 месяцев назад +5

    Thank you so much for this video. Unfortunately, I couldn't complete this - getting this erros C:\Users\Ismahil>spark-shell
    'cmd' is not recognized as an internal or external command,
    operable program or batch file. please help

    • @JesusSevillanoZamarreno-cu5hk
      @JesusSevillanoZamarreno-cu5hk 10 месяцев назад +1

      execute as admin

    • @johnpaulmawa4808
      @johnpaulmawa4808 6 месяцев назад +1

      @@JesusSevillanoZamarreno-cu5hk You are the bestest and sweetest in the world

    • @frankcastelo9987
      @frankcastelo9987 2 месяца назад +1

      I was having the same issue as you, and it turn to work, simply doing what Jesus said (OMG!): "Run it as admin". Thanks everyone.. Indeed, Jesus saves us!!

  • @sisterkeys
    @sisterkeys Год назад +3

    What I was doing in 2 days, you narrowed to 30 mins!! Thank you!!

    • @ampcode
      @ampcode  11 месяцев назад

      Thank you so much! Subscribe for more content 😊

  • @donjuancapistrano2382
    @donjuancapistrano2382 Месяц назад

    The best video on installing payspark, even in 2024. Many thanks to the author!

    • @playtrip7528
      @playtrip7528 Месяц назад +2

      which spark version did u downloaded ?

    • @donjuancapistrano2382
      @donjuancapistrano2382 Месяц назад

      @playtrip7528 I downloaded 3.5.3 and pre build for Hadoop 3.3 with 3.0.0 winutils

    • @donjuancapistrano2382
      @donjuancapistrano2382 Месяц назад

      ​@@playtrip7528 I downloaded a 3.5.3 version of pyspark and 3.3 pre built for Hadoop with 3.0.0 winutils

  • @rayudusunkavalli2318
    @rayudusunkavalli2318 10 месяцев назад +5

    i did every step you have said, but still spark is not working

  • @meditationmellowmelodies7901
    @meditationmellowmelodies7901 8 месяцев назад +2

    I followed all the setps but getting error
    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.

    • @Mralbersan
      @Mralbersan 7 месяцев назад +1

      the same happens to me

    • @indianintrovert281
      @indianintrovert281 7 месяцев назад +1

      Facing same error, Did you find any solution for it?

  • @anandbagate2347
    @anandbagate2347 2 месяца назад +2

    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.

  • @ArtificialIntelligenceColombia
    @ArtificialIntelligenceColombia 4 месяца назад

    WHAT A PROCESS!! It worked for me just by run spark-shell in cmd as ADMIN. thank you for the video!

  • @ramnisanthsimhadri3161
    @ramnisanthsimhadri3161 7 месяцев назад +4

    I am not able to find the package type: pre-build for Apache Hadoop 2.7 in the drop-down. FYI - my spark release versions that i can see in the spark releases are 3.4.3 and 3.5.1.

    • @mindcoder4823
      @mindcoder4823 2 месяца назад

      How did you solve this? I am running into the same issue

  • @ipheiman3658
    @ipheiman3658 Год назад +3

    This worked so well for me :-) The pace is great and your explanations are clear. I am so glad i came across this, thanks a million! 😄 I have subscribed to your channel!!

  • @arnoldochris5082
    @arnoldochris5082 Год назад +13

    Ok guys this is how to do it, incase you are having problems👇
    1.) I used the latest version 3.5.0, (Pre-built for apache hadoop 3.3 or later) - downloaded it.
    2.) Extracted the zip file just as done, the first time it gave me a file, not a folder but a .rar file which winrar could not unzip, so I used 7-zip and it finally extracted to a folder that had the bins and all the other files.
    3.) In the system variables he forgot to edit the path variables and to add %SPARK_HOME%\bin.
    4.) Downloaded winutils.exe for hadoop 3.0.0 form the link provided in the video.
    5.) Added it the same way but c>Hadoop>bin>winutils.exe
    6.) Then edit the user variables as done then do the same to the path %HADOOP_HOME%\bin
    Reply for any parts you might have failed to understand🙂

    • @MANALROGUI
      @MANALROGUI Год назад

      What do you mean for the 3rd step ?

    • @stay7485
      @stay7485 Год назад

      Thanks

    • @ampcode
      @ampcode  11 месяцев назад

      Thank you so much 😊

    • @sarahq6497
      @sarahq6497 7 месяцев назад +3

      Hello, I had to use the latest version as well, but I'm not able to make it work, I followed the tutorial exactly :(

    • @Sai_naga
      @Sai_naga 4 месяца назад +1

      @@sarahq6497 me too... when i am running the spark-shell command from the exact spark location on cd, it works... but when i run it just after opening cmd, it doesn't it gives error like spark-shell is not found

  • @prashanthnm3406
    @prashanthnm3406 6 месяцев назад +1

    Thanks bro fixed it after struggling for 2 days 2 nights 2hours 9mins.

    • @nickcheruiyot9069
      @nickcheruiyot9069 6 месяцев назад

      Hello, I have been trying to install it for some days too, I keep getting an error when I try to run the spark shell command is not recognized any suggestions?

  • @BOSS-AI-20
    @BOSS-AI-20 Год назад +4

    In cmd the comand spark-shell is running only under C:\Spark\spark-3.5.0-bin-hadoop3\bin directory not globally
    same for pyspark

    • @s_a_i5809
      @s_a_i5809 Год назад +3

      yeah man , same for me.. did you found any fixes... if, let me know :)

    • @BOSS-AI-20
      @BOSS-AI-20 Год назад

      @@s_a_i5809 add your Environment variables under system variables not user variables.

    • @ankitgupta5446
      @ankitgupta5446 Год назад

      100 % working solution
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=lzXq4Ts7ywqG-vZg

    • @lucaswolff5504
      @lucaswolff5504 9 месяцев назад

      I added C:\Program Files\spark\spark-3.5.1-bin-hadoop3\bin to the system variables and it worked

    • @BOSS-AI-20
      @BOSS-AI-20 9 месяцев назад

      @@lucaswolff5504 yes

  • @laxman0457
    @laxman0457 Год назад +3

    i have followed all your steps,still i'm facing an issue.
    'spark2-shell' is not recognized as an internal or external command

    • @nayanagrawal9878
      @nayanagrawal9878 Год назад

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

    • @thedataguyfromB
      @thedataguyfromB Год назад

      Step by step spark + PySpark in pycharm solution video
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=aaITbbN7ggnczQTc

  • @harshithareddy5087
    @harshithareddy5087 11 месяцев назад +3

    I don't have the option for Hadoop 2.7 what to choose now???

    • @LLM_np
      @LLM_np 10 месяцев назад

      did you get any solution?
      please let me know

    • @geetalimatta
      @geetalimatta 3 месяца назад

      @@LLM_np NO

  • @neeleshgaikwad6387
    @neeleshgaikwad6387 Год назад +2

    Very helpful video. Just by following the steps you mentioned I could run the spark on my windows laptop. Thanks a lot for making this video!!

    • @ampcode
      @ampcode  Год назад

      Thank you so much!😊

    • @iniyaninba489
      @iniyaninba489 Год назад

      @@ampcode bro I followed every step you said, but in CMD when I gave "spark-shell", it displayed " 'spark-shell' is not recognized as an internal or external command,
      operable program or batch file." Do you know how to solve this?

    • @sssssshreyas
      @sssssshreyas 7 месяцев назад

      @@iniyaninba489 add same path in User Variables Path also, just like how u added in System Variables Path

  • @rakesh.kandula
    @rakesh.kandula Год назад +3

    Hi, i followed exact steps (installed spark 3.2.4 as that is the only version available for hadoop 2.7). Spark-shell command is working but pyspark is thrwing errors.
    if anyone has fix to this please help me.
    Thanks

    • @thedataguyfromB
      @thedataguyfromB Год назад

      Step by step solution
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=aaITbbN7ggnczQTc

  • @amitkumarpatel7762
    @amitkumarpatel7762 9 месяцев назад +5

    I have followed whole instruction but when I am running spark -shell is not recognised

    • @JustinLi-y6q
      @JustinLi-y6q 2 месяца назад +2

      same here

    • @mindcoder4823
      @mindcoder4823 2 месяца назад +1

      @@JustinLi-y6q did you get it? i was having the same issue, but i downgraded my java version to version 17 and it is now working fine. The Java 23 is not compatible with spark 3. XXX i think. Did not work for me

  • @sibrajbanerjee6297
    @sibrajbanerjee6297 6 месяцев назад +1

    I am getting a message of 'spark-version' is not recognized as an internal or external command,
    operable program or batch file. This is after setting up the path in environment variables for PYSPARK_HOME.

    • @Sai_naga
      @Sai_naga 4 месяца назад

      try running as administrator.

  • @saikrishnareddy3474
    @saikrishnareddy3474 Год назад +3

    I’m little confused on how to setup the PYTHONHOME environment variable

    • @thedataguyfromB
      @thedataguyfromB Год назад

      Step by step
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=aaITbbN7ggnczQTc

  • @alireza2295
    @alireza2295 3 месяца назад

    Great. I followed the instructions and successfully installed spark. Thank you!

  • @anthonyuwaifo8605
    @anthonyuwaifo8605 Год назад +2

    I got the below error while running spyder even though i have added the PYTHONPATH.
    File ~\anaconda\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)
    File c:\users\justa\.spyder-py3\temp.py:26
    df = spark.createDataFrame(data = data, schema = columns)
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1276 in createDataFrame
    return self._create_dataframe(
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:1318 in _create_dataframe
    rdd, struct = self._createFromLocal(map(prepare, data), schema)
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:962 in _createFromLocal
    struct = self._inferSchemaFromList(data, names=schema)
    File ~\anaconda\lib\site-packages\pyspark\sql\session.py:834 in _inferSchemaFromList
    infer_array_from_first_element = self._jconf.legacyInferArrayTypeFromFirstElement()
    File ~\anaconda\lib\site-packages\py4j\java_gateway.py:1322 in __call__
    return_value = get_return_value(
    File ~\anaconda\lib\site-packages\pyspark\errors\exceptions\captured.py:169 in deco
    return f(*a, **kw)
    File ~\anaconda\lib\site-packages\py4j\protocol.py:330 in get_return_value
    raise Py4JError(
    Py4JError: An error occurred while calling o29.legacyInferArrayTypeFromFirstElement. Trace:
    py4j.Py4JException: Method legacyInferArrayTypeFromFirstElement([]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:1623)

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. Could you please check if you are able to run spark-submit using cmd?

    • @ankitgupta5446
      @ankitgupta5446 Год назад

      100 % working solution
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=lzXq4Ts7ywqG-vZg

  • @satishboddula4942
    @satishboddula4942 Год назад +3

    I have done exactly you shown in tutorial but when I am running the spark-shell command in cmd getting "spark-shell
    The system cannot find the path specified."

    • @ganeshkalaivani6250
      @ganeshkalaivani6250 Год назад +2

      yes same error..did you find out the colustion

    • @satishboddula4942
      @satishboddula4942 Год назад +4

      @@ganeshkalaivani6250 yes the spark don't support with latest java and python version try with java 1.8 and python 3.7 and spark 2.7

    • @ganeshkalaivani6250
      @ganeshkalaivani6250 Год назад +1

      @@satishboddula4942 can you please share the java 1.8 download link jdk showing only 18,19 and 20 version

    • @ganeshkalaivani6250
      @ganeshkalaivani6250 Год назад +1

      @@satishboddula4942 still system path cannot find out error

    • @shashankkkk
      @shashankkkk Год назад +3

      C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin\ add this to env var path

  • @rahmaesam2732
    @rahmaesam2732 Месяц назад

    still hadoop not recognize even with your installation it give you a warning message "
    unable to load native.hadoop library"

  • @nagarajgotur
    @nagarajgotur Год назад +2

    spark-shell is working for me, pyspark is not working from home directory, getting error 'C:\Users\Sana>pyspark
    '#' is not recognized as an internal or external command,
    operable program or batch file.'
    But when I go to python path and run the cmd pyspark is working. I have setup the SPARK_HOME and PYSPARK_HOME environment variables. Could you please help me. Thanks

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. Could you please also set PYSPARK_HOME as well to your python.exe path. I hope this will solve the issue😅👍

    • @bintujose1981
      @bintujose1981 Год назад +1

      @@ampcode nope. Same error

  • @priyankashekhawat6174
    @priyankashekhawat6174 2 месяца назад

    A very good and amazing content. You can not find better place then this video to setup pyspark (Y).

  • @sanchitabhattacharya353
    @sanchitabhattacharya353 10 месяцев назад +1

    while launching the spark-shell getting the following error, any idea??
    WARN jline: Failed to load history
    java.nio.file.AccessDeniedException: C:\Users\sanch\.scala_history_jline3

  • @AkshayNagendra
    @AkshayNagendra Год назад +1

    I followed all the steps but I'm getting this error
    'spark-shell' is not recognized as an internal or external command, operable program or batch file

    • @Karansingh-xw2ss
      @Karansingh-xw2ss Год назад

      Yeah I'm also facing this same issue

    • @ankitgupta5446
      @ankitgupta5446 Год назад

      100 % working solution
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=lzXq4Ts7ywqG-vZg

  • @abhinavtiwari6186
    @abhinavtiwari6186 Год назад +2

    after 11:17 I am getting this error:
    'spark-shell' is not recognized as an internal or external command, operable program or batch file.
    I have checked the environment variables too.

    • @ampcode
      @ampcode  Год назад +2

      Hello..sorry for late response...could you please navigate once to the spark bin folder and open the CMD there and kick off the spark-shell command? If the spark works fine in the bin directory then definitely it will be the issue with environment variables.
      Please let me know if any difficulties. :)

    • @abhinavtiwari6186
      @abhinavtiwari6186 Год назад +3

      @@ampcode now this is error I am getting after getting into the bin folder
      C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin> spark-shell
      The system cannot find the path specified.

    • @abhinavtiwari6186
      @abhinavtiwari6186 Год назад +5

      My problem finally got solved tonight... I needed to add this C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path

    • @ampcode
      @ampcode  Год назад +1

      I'm very glad you solved your problem. Cheers!

    • @aswinjoseph
      @aswinjoseph Год назад

      @@abhinavtiwari6186 I try this also but same issue only "The system cannot find the path specified"

  • @kchavan67
    @kchavan67 Год назад +1

    Hi, following all the steps given in video, I am still getting error as "cannot recognize spark-shell as internal or external command" @Ampcode

    • @psychoticgoldphish5797
      @psychoticgoldphish5797 Год назад

      I was having this issue as well, when I added the %SPARK_HOME%\bin, %HADOOP_HOME%\bin and %JAVA_HOME%\bin to the User variables (top box, in the video he shows doing system, bottom box) it worked. Good luck.

    • @thedataguyfromB
      @thedataguyfromB Год назад

      Step by step spark + PySpark in pycharm solution video
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=aaITbbN7ggnczQTc

  • @Ohisthisyou
    @Ohisthisyou 9 дней назад

    can someone help , i have downloaded hadoop 3.3 which is the newest version but it is not showing in github . what to do ?

  • @YohanTharakan
    @YohanTharakan Год назад +7

    Hi, I completed the process step by step and everything else is working but when I run 'spark-shell' , it shows - 'spark-shell' is not recognized as an internal or external command,
    operable program or batch file. Do you know what went wrong?

    • @viniciusfigueiredo6740
      @viniciusfigueiredo6740 Год назад +1

      I'm having this same problem, the command only works if I run CMD as an administrator. Did you manage to solve it?

    • @hulkbaiyo8512
      @hulkbaiyo8512 Год назад

      @@viniciusfigueiredo6740 same as you, run as administrator works

    • @shivamsrivastava4337
      @shivamsrivastava4337 Год назад +1

      @@viniciusfigueiredo6740 same issue is happening with me

    • @RohitRajKodimala
      @RohitRajKodimala Год назад

      @@viniciusfigueiredo6740same issue for me did u fix it?

    • @santaw
      @santaw Год назад +2

      Anyone solved this?

  • @AbhiShek-m6s
    @AbhiShek-m6s 29 дней назад

    I did everything until the environment variables setup, still while using cmd spark-shell it is giving me "'spark-shell' is not recognized as an internal or external command,
    operable program or batch file."
    versions I used -
    For Java:
    java version "11.0.24" 2024-07-16 LTS
    Java(TM) SE Runtime Environment 18.9 (build 11.0.24+7-LTS-271)
    Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.24+7-LTS-271, mixed mode)
    For Python:
    Python 3.11.0rc2
    For Spark:
    spark-3.5.3-bin-hadoop3
    For Hadoop: (file from below location)
    winutils/hadoop-3.3.6/bin
    /winutils.exe

  • @yashusachdeva
    @yashusachdeva 10 месяцев назад

    It worked, my friend. The instructions were concise and straightforward.

  • @Jerriehomie
    @Jerriehomie Год назад +2

    Getthing this error: WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped. People have mentioned to use python fodler path which I have as you have mentioned but still.

    • @bukunmiadebanjo9684
      @bukunmiadebanjo9684 Год назад +1

      I found a fix for this. Change your python path to that of anaconda(within the environment variable section of this video) and use your anaconda command prompt instead. No errors will pop up again.

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. Could you please let me know if you are still facing this issue and also confirm if you’re able to open spark-shell?

    • @shivalipurwar7205
      @shivalipurwar7205 Год назад +1

      @@bukunmiadebanjo9684 Hi Adebanjo, my error got resolved with you solution. Thanks for your help!

  • @geetakavalad8983
    @geetakavalad8983 Месяц назад

    I have followed all the steps and added all the system variables but at that time winutils file was not present in my system

    • @geetakavalad8983
      @geetakavalad8983 Месяц назад

      Now I have that file how to make the changes plz let me know

  • @gbs7212
    @gbs7212 Месяц назад

    thank you so much, very helpful! The only error I got was running spark-shell, but from other comments I figured out that you can either run the command prompt as admin or cd into the spark folder and then call it

  • @nihalisahu3857
    @nihalisahu3857 3 месяца назад

    in CMD while running spark-shell getting error like ERROR SparkContext: Error initializing SparkContext.

  • @susmayonzon9198
    @susmayonzon9198 Год назад +2

    Excellent! Thank you for making this helpful lecture! You relieved my headache, and I did not give up.

    • @ampcode
      @ampcode  Год назад

      Thank you so much!

    • @moathmtour1798
      @moathmtour1798 Год назад +1

      hey , which version of hadoop did you install because the 2.7 wasn't available

  • @coclegend715
    @coclegend715 Год назад +1

    everything is working fine until i run "pyspark" in my command prompt which shoes an error "ERROR: The process with PID 38016 (child process of PID 30404) could not be terminated.
    Reason: There is no running instance of the task.
    ERROR: The process with PID 30404 (child process of PID 7412) could not be terminated.
    Reason: There is no running instance of the task."

  • @anastariq1310
    @anastariq1310 Год назад +1

    After entering pyspark in cmd it shows "The system cannot find the path specified. Files\Python310\python.exe was unexpected at this time" please help me resolve it

    • @mahamudullah_yt
      @mahamudullah_yt Год назад

      i face the same problem. is there any solution

  • @Manoj-ed3lj
    @Manoj-ed3lj 6 месяцев назад

    installed successfully but when i am checking hadoop version, i am getting an like hadoop is not recognized as internal or external command

  • @cloudandsqlwithpython
    @cloudandsqlwithpython Год назад +1

    Great ! got SPARK working on Windows 10 -- Good work !

    • @ampcode
      @ampcode  11 месяцев назад

      Thank you so much! Subscribe for more content 😊

  • @AnuragPatel-y9j
    @AnuragPatel-y9j Год назад +1

    ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
    I am getting above error while running spark or pyspark session.
    I have ensured that winutils file is present in C:\hadoop\bin

    • @ampcode
      @ampcode  Год назад

      Could you please let me know if your all the env variables are set properly?

  • @badnaambalak364
    @badnaambalak364 11 месяцев назад +1

    I followed the steps & Installed JDK 17, spark 3.5 and python 3.12 when I am trying to use map function I am getting an Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe Error please someone help me

  • @somanathking4694
    @somanathking4694 8 месяцев назад

    This works as smooth as butter. Be patient that's it! Once set up done, no looking back.

    • @SUDARSANCHAKRADHARAkula
      @SUDARSANCHAKRADHARAkula 8 месяцев назад

      Bro, which version of spark & winutils you've downloaded. I took 3.5.1 and hadoop-3.0.0/bin/winutils but not worked

    • @meriemmouzai2147
      @meriemmouzai2147 7 месяцев назад

      @@SUDARSANCHAKRADHARAkula same for me!

  • @AmreenKhan-dd3lf
    @AmreenKhan-dd3lf 5 месяцев назад

    Apache 2.7 option not available during spark download. Can we choose Apache Hadoop 3.3 and later ( scala2.13) as package type during download

  • @shankarikarunamoorthy4391
    @shankarikarunamoorthy4391 7 месяцев назад

    sir, spark version is available with Hadoop 3.0 only. Spark-shell not recognized as internal or external command. Please do help.

  • @khushboojain3883
    @khushboojain3883 Год назад +1

    Hi, I have installed Hadoop 3.3 (the lastest one) as 2.7 was not available. But while downloading winutils, we don't have for Hadoop 3.3 in repository. Where do i get it from?

    • @sriram_L
      @sriram_L Год назад

      Same here.Did u get it now?

    • @khushboojain3883
      @khushboojain3883 Год назад

      @@sriram_L yes, u can directly get it from google by simply mention the Hadoop version for which u want winutils. I hope this helps.

    • @hritwikbhaumik5622
      @hritwikbhaumik5622 Год назад

      @@sriram_L it still not working for me though

  • @karthikeyinikarthikeyini380
    @karthikeyinikarthikeyini380 Год назад +1

    hadoop 2.7 tar file is not available in the link

    • @ankitgupta5446
      @ankitgupta5446 Год назад

      100 % working solution
      ruclips.net/video/jO9wZGEsPRo/видео.htmlsi=lzXq4Ts7ywqG-vZg

  • @nagalakshmip8725
    @nagalakshmip8725 8 месяцев назад

    I'm getting spark- shell is not recognised as an internal or external command, operable program or batch file

  • @Kartik-vy1rh
    @Kartik-vy1rh Год назад +1

    Video is very helpful. Thanks for sharing

    • @ampcode
      @ampcode  Год назад

      Thank you so much!

  • @edu_tech7594
    @edu_tech7594 Год назад +1

    my Apache hadoop which i downloaded previously is version 3.3.4 eventhough i should choose pre-built for Apache Hadoop 2.7?

    • @sriram_L
      @sriram_L Год назад

      Same doubt bro.
      Did u install now

  • @sriramsivaraman4100
    @sriramsivaraman4100 Год назад +2

    Hello when I try to run the command spark_shell as a local user its not working (not recognized as an internal or external command) and it only works if I use it as an administratror. Can you please help me solve this? Thanks.

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. Could you please try once running the same command from the spark/bin directory and let me know. I guess there might be some issues with your environment vatiables🤔

    • @dishantgupta1489
      @dishantgupta1489 Год назад

      @@ampcode followed each and every step of video still getting not recognised as an internal or external command error

    • @ayonbanerjee1969
      @ayonbanerjee1969 Год назад

      ​@@dishantgupta1489 open fresh cmd prompt window and try after you save the environment variables

    • @obulureddy7519
      @obulureddy7519 Год назад

      In Environment Variables you give the paths in Users variable Admin. NOT IN System variables

  • @ashwinnair2325
    @ashwinnair2325 6 месяцев назад

    thanks a lot pyspark is opening but when executing df.show() command on a dataframe i get below error
    Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
    is there any way to rectify it

  • @Mralbersan
    @Mralbersan 7 месяцев назад

    I can't see Pre-Built for Apache Hadoop 2.7 on the spark website

    • @meriemmouzai2147
      @meriemmouzai2147 7 месяцев назад

      same problem for me! I tried the "3.3 and later" version with the "winutils/hadoop-3.0.0/bin", but it didn't work

  • @prasadbarla7215
    @prasadbarla7215 22 дня назад

    spark runs only on java 8 or 11 version it doesn't work with latest version I've tried it

  • @chinmayapallai8452
    @chinmayapallai8452 Год назад +1

    I have followed same thing what ever u have done while u have explained, I have observed and same thing I did but both spark and pyspark is not working,Can you please help me how to resolve the issue as after giving cmd then typing spark-shell it's showing spark- shell is not recognised as internal or external command same thing for spark also . Please help me how to overcome from this 🙏🙏🙏🙏🙏🙏🙏

    • @nayanagrawal9878
      @nayanagrawal9878 Год назад

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

  • @alpha_ak-p3h
    @alpha_ak-p3h 3 месяца назад

    not getting the ui says: docker refused to connect

  • @prajakta-dh7fc
    @prajakta-dh7fc 7 месяцев назад

    'spark' is not recognized as an internal or external command,
    operable program or batch file. its not working for me i have follow all the steps but its still not working waiting for solution

  • @SupravaMishra-e4d
    @SupravaMishra-e4d 29 дней назад

    I am getting errors continuously after doing the same procedure as well, please reply to me.

  • @ganeshkalaivani6250
    @ganeshkalaivani6250 Год назад +1

    can any one please help...last two days tried to install spark and give correct variable path but still getting system path not speicifed

    • @ampcode
      @ampcode  Год назад

      Sorry for late reply. Could you please check if your spark-shell is running properly from the bin folder. If yes I guess there are some issues with your env variables only. Please let me know.

  • @touhidalam4825
    @touhidalam4825 3 месяца назад

    Im getting bad constant pool index error. Please help

  • @viniciusfigueiredo6740
    @viniciusfigueiredo6740 Год назад +1

    I followed the step by step and when I search for spark-shel at the command prompt I come across the message :( 'spark-shell' is not recognized as a built-in command or external, an operable program or a batch file). I installed windows on another HD and did everything right, there are more people with this problem, can you help us? I'm since January trying to use pyspark on windows

    • @letsexplorewithzak3614
      @letsexplorewithzak3614 Год назад +1

      Need to edit bottom "add this to env var path"
      path >> C:\Spark\spark-3.3.1-bin-hadoop2\bin\

    • @kiranmore29
      @kiranmore29 Год назад

      @@letsexplorewithzak3614 Thanks worked for me

    • @nayanagrawal9878
      @nayanagrawal9878 Год назад

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

    • @jayakrishnayashwanth7358
      @jayakrishnayashwanth7358 Год назад

      Even I'm facing the same issue ,can you tell in more detail like what to do add in system variables??As we already added Java , Hadoop, Spark and Pyspark_Home in the user varaibles as said in the video.@@nayanagrawal9878

    • @penninahgathu7956
      @penninahgathu7956 10 месяцев назад

      @@nayanagrawal9878 thank you!!! I did this and it solved my problem

  • @ankushv2642
    @ankushv2642 Год назад

    Did not work for me. At last when I typed the pyspark in command prompt, it did not worked.

  • @ganeshkalaivani6250
    @ganeshkalaivani6250 Год назад +1

    FileNotFoundError: [WinError 2] The system cannot find the file specified getting this error even i have installed all required intalliation

    • @ampcode
      @ampcode  Год назад

      Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!

  • @nikhilupmanyu8804
    @nikhilupmanyu8804 10 месяцев назад

    Hi, Thanks for the steps. I am unable to see Web UI after installing pyspark. It gives This URL can't be reached. Kindly help

  • @chinmaymishra6381
    @chinmaymishra6381 Год назад +1

    winutil file is not downloading from that github link

    • @sriram_L
      @sriram_L Год назад

      Yes brother.Did u get it now from anywhere?

  • @infamousprince88
    @infamousprince88 5 месяцев назад

    I'm still unable to get this to work. I've been trying to solve this problem for nearly 2 weeks

  • @pulkitdikshit9474
    @pulkitdikshit9474 8 месяцев назад

    hi i installed but when I restarted my pc it is no longer running from cmd? what might be the issue?

  • @Bujdil-y8z
    @Bujdil-y8z Год назад

    not working for me i set up everything except hadoop version came with 3.0

  • @basanthaider3238
    @basanthaider3238 Год назад

    I have an issue with the pyspark it's not working and it's related to java class I can't realy understant what is wrong ???

  • @alulatafere6008
    @alulatafere6008 6 месяцев назад

    Thank you! It is clear and much helpful!! from Ethiopia

  • @itsshehri
    @itsshehri Год назад +1

    hey pyspark isnt working at my pc. I did everything how you asked. Can you help please

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. Could you please also set PYSPARK_HOME env variable to the python.exe path. I guess this’ll do the trick😅👍

  • @theefullstackdev
    @theefullstackdev Год назад

    and when downloading the spark a set of files came to download not the tar file

  • @user-zk4hm2cy8l
    @user-zk4hm2cy8l 3 месяца назад

    If you tried all the steps mentioned above and it still does not work, try to add "C:\Windows\System32" to system variable "path". It fixed the error after 2 days of struggling

  • @nuzairmohamed5345
    @nuzairmohamed5345 Год назад +1

    I get a noModuleError saying pyspark does not contain numpy module. I followed all the steps. Can you please help??

    • @ampcode
      @ampcode  Год назад

      Hello, Are you trying to use numpy in your code. If so, have you installed pandas package? Please let me know so we can solve this issue😃

    • @nuzairmohamed5345
      @nuzairmohamed5345 Год назад +1

      ​@@ampcode how to install pandas in pyspark

    • @ampcode
      @ampcode  Год назад

      @@nuzairmohamed5345 you can run command as below:
      pip install pandas
      Please let me know if any issues.

  • @nikhilchavan7741
    @nikhilchavan7741 Год назад

    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.-- Getting this error

    • @nayanagrawal9878
      @nayanagrawal9878 Год назад

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

  • @Analystmate
    @Analystmate Год назад

    C:\Users\lavdeepk>spark-shell
    'spark-shell' is not recognized as an internal or external command,
    operable program or batch file.
    Not working

    • @syamprasad8295
      @syamprasad8295 Год назад

      which winutil file did u download? Its Hadoop 2.7 or later version?

  • @bramhanaskari3152
    @bramhanaskari3152 Год назад +1

    you haven't give solution for that warn procfsMetricsGetter exception is there any solution for that ?

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. This could happen in windows only and can be safely ignored. Could you please confirm if you’re able to kick off spark-shell and pyspark?

  • @Nathisri
    @Nathisri Год назад +1

    I have some issues in launching python & pyspark. I need some help. Can you pls help me?

  • @manikantaperumalla2197
    @manikantaperumalla2197 6 месяцев назад

    java,python and spark should be in same directory?

  • @theefullstackdev
    @theefullstackdev Год назад

    i have fallowed all these steps and installed those 3 and created paths too, but when i go to check in the command prompt... its not working.. error came... can anyone help me please to correct this

  • @anuraggupta5665
    @anuraggupta5665 Месяц назад

    Hi @AmpCode
    Thanks for the great tutorial.
    I followed each steps and spark is working fine.
    But when I'm executing some of my pyspark script, I'm getting below Hadoop error:
    ERROR SparkContext: Error initializing SparkContext.
    java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
    Can you please help me on this urgently..
    I have set all paths as you showed in video but I'm not able to solve this error.
    Please Help.

  • @antonstsezhkin6578
    @antonstsezhkin6578 Год назад +8

    Excellent tutorial! I followed along and nothing worked in the end :)
    StackOverflow told me that "C:Windows\system32" is also required in the PATH variable for spark to work. I added it and spark started working.

    • @Manojprapagar
      @Manojprapagar Год назад +1

      helped

    • @antonstsezhkin6578
      @antonstsezhkin6578 Год назад +2

      @@Manojprapagar happy to hear it!

    • @ampcode
      @ampcode  Год назад

      Thank you so much!

    • @conroybless
      @conroybless 3 месяца назад

      This was the game changer, also check the the extracted spark folder isn't in a folder of another folder(3 clicks to see the files). Should just be the spark folder you created and inside that folder another folder with the extracted spark filies.(2 clicks to see the files)

  • @gosmart_always
    @gosmart_always Год назад

    Every now and then we receive alert from Oracle to upgrade JDK. Do we need to upgrade our JDK version? If we upgrade, will it impact running of spark.

  • @shahrahul5872
    @shahrahul5872 Год назад +1

    on apache spark's installation page, under choose a package type, the 2.7 version seem to not be any option anymore as on 04/28/2023. What to do?

    • @shahrahul5872
      @shahrahul5872 Год назад +2

      I was able to get around this by copying manually the URL of the site you were opened up to after selecting the 2.7th version from the dropdown. Seems like they have archived it.

    • @ampcode
      @ampcode  Год назад

      Sorry for late reply. I hope your issue is resolved. If not we can discuss further on it!

  • @chittardhar8861
    @chittardhar8861 Год назад +1

    my spark shell command is working when opened from bin folder , but it's not working in normal cmd , please help

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. Then this might be the issues with your environment variables. Could you please verify if they are set correctly and let me know.

    • @chittardhar8861
      @chittardhar8861 Год назад +1

      @@ampcode yup , i have to add 1 more environment variable which i got to know from other comments. Your video is great. Thank you so much.

    • @ampcode
      @ampcode  Год назад

      @@chittardhar8861 Thank you so much😊

    • @UManfromEarth
      @UManfromEarth Год назад

      @@chittardhar8861 Hi, did you add it in the system variables or user variables ? (Speaking about C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin to the environment variable path right ?) So frustrating that it is not working @AmpCode

    • @chittardhar8861
      @chittardhar8861 Год назад

      @@UManfromEarth i did it in the system variable path.

  • @moathmtour1798
    @moathmtour1798 Год назад +1

    hello, which Hadoop Version should i install since the 2.7 is not available anymore ? thanks in advance

    • @ampcode
      @ampcode  Год назад

      You can go ahead and install the latest one as well. no issues!

    • @venkatramnagarajan2302
      @venkatramnagarajan2302 Год назад

      @@ampcode Will the utils file still be 2.7 version ?

  • @richardalphonse2680
    @richardalphonse2680 9 месяцев назад

    Bro while executing spark-shell getting an error
    ReplGlobal.abort: bad constant pool index: 0 at pos: 49180
    [init] error:
    bad constant pool index: 0 at pos: 49180
    while compiling:
    during phase: globalPhase=, enteringPhase=
    library version: version 2.12.17
    compiler version: version 2.12.17
    reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b
    epl-3fc51940-943d-416d-ab37-074575e4ad8d
    last tree to typer: EmptyTree
    tree position:
    tree tpe:
    symbol: null
    call site: in
    == Source file context for tree position ==
    Exception in thread "main" scala.reflect.internal.FatalError:
    bad constant pool index: 0 at pos: 49180
    while compiling:
    during phase: globalPhase=, enteringPhase=
    library version: version 2.12.17
    compiler version: version 2.12.17
    reconstructed args: -classpath -Yrepl-class-based -Yrepl-outdir C:\Users\HP\AppData\Local\Temp\spark-f4a4c1ed-e79a-4179-9492-a41e66431c1b
    epl-3fc51940-943d-416d-ab37-074575e4ad8d

  • @abhinavtiwari6186
    @abhinavtiwari6186 Год назад +1

    where is that git repository link? Its not there in the description box below

    • @ampcode
      @ampcode  Год назад +1

      Extremely sorry for that. I have added it in the description as well as pasting it here.
      GitHUB: github.com/steveloughran/winutils
      Hope this is helpful! :)

  • @ДаниилСидоров-ж3и
    @ДаниилСидоров-ж3и 2 месяца назад

    Man, i love you.
    Thank you for this video!!!

  • @Adhikash015
    @Adhikash015 Год назад +1

    Bhai, bro, Brother, Thank you so much for this video

    • @ampcode
      @ampcode  Год назад

      Thank you so much!

  • @ashwinkumar5223
    @ashwinkumar5223 Год назад +1

    Gettin as spark shell is not recognized as internal or external commnad

    • @shashankkkk
      @shashankkkk Год назад +1

      C:\Apache Spark\spark-3.3.1-bin-hadoop2\bin\ add this to env var path

    • @ampcode
      @ampcode  Год назад

      Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!

  • @syafiq3420
    @syafiq3420 Год назад +1

    how did you download the apache spark in zipped file? mine was downloaded as tgz file

    • @ampcode
      @ampcode  Год назад

      Sorry for late response. You’ll get both options on their official website. Could you please check if you are using the right link?

    • @georgematies2521
      @georgematies2521 Год назад

      @@ampcode There is no way now to download the zip file, only tgz.

  • @SupravaMishra-e4d
    @SupravaMishra-e4d 29 дней назад

    Spark-Shell is not running