Multiprocessing in Python

Поделиться
HTML-код
  • Опубликовано: 8 янв 2025

Комментарии • 122

  • @AecapA
    @AecapA 2 года назад +133

    TUTORIAL IS WRONG!
    What is wrong:
    1. When creating the sub-process, all the names (variables) from the original process are being copied, but they are NOT shared with the original process. When you append to a list in the sub-process, list in the original process stays empty.
    This style of creating sub-processes in the video is very similar to C++ and it's library. But in C++ to get results of calculations you pass to a function a pointer to a specific memory address to save results to, which is NOT the case for Python!
    Basically, results_a in sub-process is not an instance of results_a in original process.
    If you don't believe me, go ahead and try printing the results_a list after the calculations in the sub-process! It will be empty.
    2. After starting the sub-process p1, you must JOIN it with the p1.join() command. This command waits for p1 sub-process to finish.
    In the tutorial, program just starts sub-processes, but they may or may not be finished by the time the original process ends (in the case of finishing before sub-processes finish, it will throw an error I believe). And the time that you have measured - it is the time that it took to CREATE all the sub-processes, but not to finish them.
    3. As many of commenters have said, temp_a = results_a does NOT make a copy of the list! After this line of code temp_a and results_a will "point" to the same list, and changing results_a will change temp_a. To check this, try print(results_a is temp_a) - this will print True! Therefor print(results_a == temp_a) will always be True, even if calculations in sub-process is wrong (which they are).
    I have watched some videos from this channel before (for example a video "Progress Bars in Python Terminal" was very useful), but some videos are to short to cover all the nuances (which is very important in my opinion) and some are just catchpenny, botched and harmful for the viewer (like this one)! I really hope it is not intentional.
    I wish the author good luck and be more responsible when creating videos.

    • @MrJdcirbo
      @MrJdcirbo 2 года назад +2

      So, how would you code so that the subprocesses actually append the calculations to the appropriate lists?

    • @sandman9601
      @sandman9601 Год назад +8

      I appreciate the video and learned a fair amount, but I was going to post something similar to Aecapa. The fact that the video split the work into 3 threads but saw a 7x speedup should have been a red flag.

    • @MrManshenoy
      @MrManshenoy 18 дней назад

      Thanks for pointing out this issue, even I thought the same.

  • @ThomasRStevenson
    @ThomasRStevenson 2 года назад +127

    Don't you need a p1.join() to wait the all of the processes to finish? Isn't the time you are recording just the time for the processes to start, but not to finsih (without the p1.join())?

  • @alierencelik2188
    @alierencelik2188 3 года назад +33

    There is an issue with list comparisons. You dont copy elements of the list but rather you generate another variable pointing at the list. No matter what calculations are made, they will always be equal. Thanks for the video btw.

  • @__wouks__
    @__wouks__ 3 года назад +31

    I am not sure if I am correct on this but you declare your results_a, results_b and results_c lists and then append your calculations to these lists. Then you basically point to the same lists with your temp_a, temp_b and temp_c variables and append again to result_a, result_b and result_c. The thing is as I understand it temp_a and result_a are referring to the exact same list and therefor you are appending to both, which means temp_a and result_a will always be the same. If you use the "is" keyword instead of "==" it will also show true. They will also have the same id.

    • @quiagonjin108
      @quiagonjin108 3 года назад +1

      I was just about to write this. Thanks!

    • @chriswyble1860
      @chriswyble1860 2 года назад +2

      temp_a = results_a.copy() and so on should be able to remindy this issue.

  • @davidpaipa1751
    @davidpaipa1751 3 года назад +4

    I discovered this channel a couple of weeks ago and I love it !

  • @ItsCloudHub
    @ItsCloudHub Год назад +3

    After starting processes, I think we have to check if all those processes are completed, using .join(), I guess
    where as you re directly printing end time immediately after starting process, so not sure if overall logic is correct

  • @richard_noblockhit
    @richard_noblockhit 3 года назад +3

    9:35 Uhh, idk if im stupid, but those lists are immutable, you would need to deepcopy them to actually compare them and be sure that they end up with the same result. I trust you tho, this is supposed to work. otherwise well done!

  • @jairoantoniomeloflorez306
    @jairoantoniomeloflorez306 3 года назад +9

    Men, thanks a lot for this mini-lesson :) It was very useful to overcome an issue that I had with a current project.

  • @boazfortuijn4774
    @boazfortuijn4774 Год назад

    I watched like two video's on this, but this one was by far the clearest. Thanks!!

  • @I77AGIC
    @I77AGIC 2 года назад +3

    This needs an edit or even a full reupload. It doesn't make sense to say that there is a 7x speedup when you are only using 3 processes. At 100% efficiency you would finish 3 times faster, which even that won't really happen. You're just measuring the time it takes to launch the processes.

  • @shwekhine5836
    @shwekhine5836 2 месяца назад

    Thank you for this tutorial. You demoed clear and short to the point.

    • @NeuralNine
      @NeuralNine  2 месяца назад

      Thanks for watching!

  • @InspektorDreyfus
    @InspektorDreyfus 2 года назад +5

    There should be a waiting point until the three parallel processes are finished before taking the end time stamp.

  • @ottomarjo7395
    @ottomarjo7395 Год назад

    This video finally helped me understand how multiprocessing works. Thank you!

  • @walnut7137
    @walnut7137 Год назад

    Tysm for this video! It’s super useful for developing my physics sandbox in python. It’s found a great use!
    Incredible video overall!

  • @RickyMoore-g7l
    @RickyMoore-g7l Год назад +1

    As always, it was very helpful.

  • @jossec1344
    @jossec1344 10 месяцев назад

    thanks, that was straight to the point, love your work!

  • @anonlegion9300
    @anonlegion9300 3 года назад +4

    how do you differentiate multi-threading vs multiprocessing? like which use case is good for multi-threading vs multiprocessing? which is faster?
    Also, why aren't you developing on a linux distro?

    • @phantombeing3015
      @phantombeing3015 2 года назад +1

      Process and threads are os terms. When you launch some program, a process is spawned. It has a thread. You can add threads to a process. Now, when you do multiprpcessing, you are eseentially launching a new program which is doing it's own job and will have it's own thread

  • @muthurubant5812
    @muthurubant5812 3 года назад +1

    You are 😳😳😳.. I don't have words to appreciate you bro.. 🔥🔥🔥🔥

  • @sf2998
    @sf2998 2 года назад +1

    Asserting that Python is slow is actually incorrect. Python program speed may differ significantly depending on many factors such as how you structure your code, and which code editor you choose to run your programs on...etc...etc

  • @panconqueso9195
    @panconqueso9195 Год назад

    wow, this is amazing. After implementing multiprocessing my code went from 20 minutes to 30 seconds.

    • @MuhammdBilalAzaad
      @MuhammdBilalAzaad Год назад +1

      You should read @AecapA comment before calculating run time

  • @Tntpker
    @Tntpker 3 года назад +1

    Do you need to join() at the end?

  • @ИванИванов-м4л6ц
    @ИванИванов-м4л6ц 4 месяца назад

    Hello! I use Threading with Selenium, download 2 profiles from the temple! through threadings, I pass the link and --profile-directory to args
    . The problem is that on one thread selenium follows the link, on the other the browser opens and that's it... does not click on the link..... and if I remove the path to the profiles and their directories, everything works fine! thank you in advance!!!

  • @XPxp2012xpXP
    @XPxp2012xpXP 2 года назад +1

    But when I print out the a,b, and c after the mp calculation, they are empty lists.

  • @dinohsu1019
    @dinohsu1019 10 месяцев назад

    It seems multiprocessing doesn't directly work with Jupyterlab (or IPython, not sure), one workaround is to have the worker function in a separate .py file and imported. Also, I am not sure about the high level concurrent module, but I think it's important to let new multiprocessing learners know about this at the beginning, thanks.

  • @lightgrid
    @lightgrid Год назад +1

    You were correct the first time. It's pronounced processes not processeeez

  • @atharvasrivastava5281
    @atharvasrivastava5281 3 года назад +1

    Loved this new video, was actually searching on how to use multiprocessing to speed up factorial and Fibonacci computations!

    • @chamuditharavindu1659
      @chamuditharavindu1659 3 года назад +1

      Lol me too 😅

    • @Xaminn
      @Xaminn 3 года назад +1

      @lru_cache is your buddy, friend. :) www.geeksforgeeks.org/python-functools-lru_cache/
      tl;dr:
      from functools import lru_cache
      Basically just toss @lru_cache(maxsize = int()) as a decorator around your function.
      Your shit is about to fly lol. (Especially if you're using multiprocessing as well.)
      # Function that computes Fibonacci
      # numbers with lru_cache
      @lru_cache(maxsize = 128)
      def fib_with_cache(n):
      if n < 2:
      return n
      return fib_with_cache(n-1) + fib_with_cache(n-2)

    • @fernandomagnabosco
      @fernandomagnabosco 3 года назад +2

      The faster way to solve the fibonacci sequence isnt by using recursive functions but instead implementing the general formula of the sequence.
      (Sorry if I mispelled, not native english speaker)

    • @Xaminn
      @Xaminn 3 года назад +2

      @@fernandomagnabosco I believe you are correct. The general formula will outpace recursion. Especially as 'n' increases. Fibonacci sequence is one of the slowest things to compute though. (Excluding BOGO sort of course :) ). Your English was flawless by the way. Cheers.

    • @atharvasrivastava5281
      @atharvasrivastava5281 3 года назад

      @@fernandomagnabosco Do you mean the golden ratio recurrence relation?

  • @huanranchen
    @huanranchen 2 года назад +1

    so how to get the return value of these functions..

  • @amrihidayatulloh6883
    @amrihidayatulloh6883 2 года назад

    Thank you man! it was very useful for my project, thanks for the tutorial

  • @nintendo6269
    @nintendo6269 2 года назад +1

    Hi! Thanks for an starting point. There are something missing.
    1- The process does not have a join() function.
    2- You never reset results list, so, technically, you append again the same results.
    3- Taking 2 as true, how in the world temps are equal to results lists? Check same code with some little changes :
    import multiprocessing as mp
    import time, math
    global results_a,results_b,results_c
    results_a = 0
    results_b = 0
    results_c = 0
    def make_calc_1(numbers):
    global results_a
    for number in numbers:
    results_a += (number * 2)
    def make_calc_2(numbers):
    global results_b
    for number in numbers:
    results_b += (number * 2)
    def make_calc_3(numbers):
    global results_c
    for number in numbers:
    results_c += (number * 2)
    if __name__ == '__main__':
    numbers = list(range(2)) # up to 1 : Result = (1 * 2) = 2
    numbers2 = list(range(3)) # up to 2 : Result = (1 * 2) + (2 * 2) = 6
    numbers3 = list(range(4)) # up to 3 : Result = (1 * 2) + (2 * 2) + (3 * 2) = 12
    tini = time.time()
    make_calc_1(numbers)
    make_calc_2(numbers2)
    make_calc_3(numbers3)
    tfin = time.time()
    print (" Sequential time : ", (tfin - tini))
    print(" A : ", results_a, "
    B : ", results_b,"
    C : ",results_c)
    tini = tfin
    # Reset results values to zero (0)
    results_a = 0
    results_b = 0
    results_c = 0

    p1 = mp.Process(target="make_calc_1",args=(numbers),)
    p2 = mp.Process(target="make_calc_2",args=(numbers2),)
    p3 = mp.Process(target="make_calc_3",args=(numbers3),)
    p1.start()
    p2.start()
    p3.start()
    p1.join()
    p2.join()
    p3.join()
    # Whatever you chose ("terminate()" or "kill()") produce same results
    p1.terminate()
    p2.terminate()
    p3.terminate()

    tfin = time.time()
    print ("
    Multiprocess time : ", (tfin - tini))
    print(" A : ", results_a, "
    B : ", results_b,"
    C : ",results_c)
    Anyway, thanks for references and try to make it simple.

    • @Alex-yo8ph
      @Alex-yo8ph Год назад

      @Nin Tendo Hi, when numbers, number2, numbers3 are set as big enough 1.000.000, 2.000.000, 3.000.000 then the result is Sequential time I get is 0.5 seconds and the Multiprocess time as 1 second. Could you explain why the 1st part is faster than the 2-nd one?

    • @seekii777
      @seekii777 Год назад

      I only noticed 2-, was looking to find a comment like this :) regarding 3-: I think the result lists were neither global nor returned by the functions? that would mean basically comparing empty lists.

  • @suryaravikumar4210
    @suryaravikumar4210 3 года назад +3

    yeah buddy, thanks this is what i expected from you

  • @official_anik10
    @official_anik10 Год назад

    I am learning this multiprocessing thing, I searching internet to know suppose I have two lists of samples and I want to add those lists element wise how to do it in multiprocessing. I asked chatgpt but the solution it gives some error. main problem is, suppose I have a sample dict with params like 'a', 'b', 'c' with 1000 numbers of samples. now I want to find the value of the function for each sample points (like, sample['a'][i] + sample['b'][i] + sample['c'][i] actual function is more complicated). how to make the calculation parallel for 1000 sample points.

  • @Gametime00789
    @Gametime00789 7 месяцев назад

    im deploying a python API file on amazon ec2 where one function in it utilizes multiprocessing to parallelize some data processing - assigned 2 CPU cores1
    Now if this api is deployed and if the api receives 100 concurrent calls will api fail because it exceeded the processor capacity or any other cause to fail???

  • @chriskeo392
    @chriskeo392 2 года назад

    any news when gil will be removed or any speed ups?

  • @MuffineousMuffin
    @MuffineousMuffin 3 года назад +1

    what keyboard do you use?

  • @alejozen3457
    @alejozen3457 3 года назад +4

    Nice content. But I think there is a mistake when you try to compare the results. I think you are not actually getting the results from the parallel computation.

    • @hftautobot4246
      @hftautobot4246 2 года назад

      yeah the results for the parallel computation is empty. However if you print the elements at the end of each method it is computing. Also the time for multiprocessing is not accurate since the threads are still running and doing the computation.

  • @chenwu9788
    @chenwu9788 2 года назад

    Do you have example using numpy multidimension array?

  • @Matt-Wolf
    @Matt-Wolf 9 месяцев назад

    executing this crashed my RAM, i have 24 cores, but only 16g ram memory, maybe that has something to do with it?

  • @frodddd
    @frodddd Год назад

    Can you use await keyword to yield the stack untill process ends?

  • @RicoNNect82nd
    @RicoNNect82nd 3 года назад

    From this example, how can I archive that function one repeats itself until two finishes? Handle in the subprocess itself or in the part, where its declared/started?

    • @pokemettilp8872
      @pokemettilp8872 3 года назад +1

      I am not really sure, but seeing that he used global variables (result_a, result_b, ...) you could probably have a global variable for that too. Something like:
      func2_finished = False
      def func1:
      while not func2_finished:
      # Your code here
      def func2:
      # Your code here
      func2_finished = True
      I am not too sure though, but maybe you could try this.

  • @kn9987
    @kn9987 2 года назад

    Hi, is it possible to start multi instances of particular process with single object at the same time without finishing 1st process

  • @poutineausyropderable7108
    @poutineausyropderable7108 2 года назад

    Can you have multiprocessing if its code being imported?

  • @adamfatyga7977
    @adamfatyga7977 Год назад

    How to check how much proces i can run on one time?

  • @Wavy_Cat
    @Wavy_Cat 2 года назад +2

    Btw. all results and temp variables are empty. So technically the code doesnt work

    • @Itamar960
      @Itamar960 2 года назад

      u right. did you fixed it some how?

    • @Wavy_Cat
      @Wavy_Cat 2 года назад

      @@Itamar960 no i didnt fix it

  • @rverm1000
    @rverm1000 3 года назад

    yes tthat was very good. can you do a video where your running the same strategy on multiple stocks?

  • @robertzedric8954
    @robertzedric8954 2 года назад

    Super helpful, thanks!

  • @sujatabasu282
    @sujatabasu282 2 года назад

    Here used only 3 processes but if I want to run my code on 50 processors then I have to write the creating process 50 times ..i.e. impossible. How it will be solved??

  • @smanzoli
    @smanzoli 2 года назад +1

    Where are the 3 join()??????

  • @Itamar960
    @Itamar960 2 года назад

    TNX it's really helpfull!
    I got some error: "File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()" do u know why?
    also, the array is empty

  • @tonymiri7010
    @tonymiri7010 3 года назад

    I bet you could really take advantage of multi-processing by using it with a JIT compiler like Numba

  • @lsnlst1725
    @lsnlst1725 Год назад

    Hi, nice video. It could be very useful if you add another video explaining how to extract the return value when you use Process and some error handling notions. The most typical ones.

  • @nicecubin
    @nicecubin 3 года назад

    literally was just learning about this

  • @ahmedeveloper
    @ahmedeveloper 8 месяцев назад

    AttributeError: module 'multiprocessing' has no attribute 'Process'

  • @pokemettilp8872
    @pokemettilp8872 3 года назад

    How could you get a 7-times improvement if you only spawned three processes though?

    • @faz7799
      @faz7799 2 года назад

      20, 9 divided by 2, 9 equals 7(rounded down from 7, 138). If you multiply 2, 9 with 7 you should get around 20,9. Hence 7x faster. You can also say 600% faster than only using one processor.

  • @MuffineousMuffin
    @MuffineousMuffin 3 года назад

    Also, is there a limit on how many processes I can run at the same time? Is it based on the number of cores my cpu has?

    • @ethan7930
      @ethan7930 3 года назад

      yes it is

    • @pokemettilp8872
      @pokemettilp8872 3 года назад

      No, the number of processes is not really limited, if there is a limit, it is set by the operating system. Just Windows alone has over 100 processes running at startup on my dual core. Using more processes than you have cores doesn't speed your program up though.

    • @sujatabasu282
      @sujatabasu282 2 года назад

      Here used only 3 processes but if I want to run my code on 50 processors then I have to write the creating process 50 times ..i.e. impossible. How it will be solved??

    • @NewLondonMarshall
      @NewLondonMarshall Год назад

      @@sujatabasu282 there are multiprocess pools that can be used for this :)

  • @krishgarg2806
    @krishgarg2806 3 года назад

    VIM with pycharm tutorial?

  • @chakradharcholleti6722
    @chakradharcholleti6722 3 года назад

    Name of intro music?

  • @1Aditya1
    @1Aditya1 3 года назад

    How do you center a list of words?

    • @1Aditya1
      @1Aditya1 3 года назад

      like
      list = ["a", "b" , "c"]
      list2 = [" a ", " b ", " c "]

    • @pw5687
      @pw5687 3 года назад

      @@1Aditya1 that is adding whitespace - you would have to do:
      list = ["a", "b" , "c"]
      final_list = []
      for item in list:
      final_list.append(' '+item+' ')

    • @pw5687
      @pw5687 3 года назад

      btw i havnt tested this but im pretty sure it would work

    • @1Aditya1
      @1Aditya1 3 года назад

      @@pw5687
      list = [["a", "b"], ["1", "2"], ["c", "d"]]
      center_list = [2, 3]
      output_list = [[" a ", " b "], [" 1 "," 2 "], [" c ", " d "]] 
      For this one?

  • @sameerdubal
    @sameerdubal Год назад

    super video .. nice !!

  • @mrmuranga
    @mrmuranga 3 года назад

    Well explained

  • @simssim262
    @simssim262 3 года назад

    Bro please make a tutorial on a seq2seq chat bot it would be a great project please consider it

  • @marwantronx682
    @marwantronx682 Год назад

    please share code with us in next videos.

  • @yyhhttcccyyhhttccc6694
    @yyhhttcccyyhhttccc6694 8 дней назад

    i keep getting errors bruh

  • @aghoribgmi4355
    @aghoribgmi4355 2 года назад +2

    The result of time comparison is wrong. When you are starting the process and not joining it before moving forward then it gives a wrong value.
    I am not a full time coder but I can spot these things. Shut down your channel 😡

  • @arnavmeena525
    @arnavmeena525 3 года назад

    I liked the old into...

  • @Unprotected1232
    @Unprotected1232 Год назад

    Methods like join() are so poorly defined in the documentation. Like bloody hell who wrote the damn thing? I miss Java...

  • @kishanbhargav7655
    @kishanbhargav7655 2 года назад

    thanks bro

  • @shashwatm4332
    @shashwatm4332 3 года назад +1

    Nice

  • @cavalorthpoe
    @cavalorthpoe Год назад

    why do you even talk about comments when you just ignore all of them anyway...

  • @karinxpw
    @karinxpw 3 года назад +1

    Hiii

  • @Python-r9q
    @Python-r9q 4 месяца назад

    Not up-to the standard NeuralLine

  • @b391i
    @b391i 3 года назад +1

    Threading And Multi-Processing Are The Same As I Guess 🤔

    • @SparePlayss
      @SparePlayss 3 года назад

      you didn't watched the full video I guess..

    • @necaton
      @necaton 3 года назад

      its basically the same

    • @b391i
      @b391i 3 года назад

      @@SparePlayss There is two modules in python with the same job one named "threading " and the second module named "multiprocessing".

    • @pw5687
      @pw5687 3 года назад +3

      @@b391i they are very different. mp runs each task in a seperate process (core) but mt shares the threads of a core, but it isnt parallel (like mp, which is)

    • @Xaminn
      @Xaminn 3 года назад +1

      If you're truly interested:
      Detailed explanation: blog.floydhub.com/multiprocessing-vs-threading-in-python-what-every-data-scientist-needs-to-know/
      Quick explanation: www.python-engineer.com/courses/advancedpython/15-thread-vs-process/
      TL;DR:
      Threading: 1 Savant Dutch knitter who can thread faster than anyone... But she only has two hands so the knitting is only about 1000 knots-per-minute...
      Multiprocessing: 100 Teletubby knitters who are questionable... At best. But still 100 of them.
      Jk, about the TL;DR... Look into the links.

  • @samsonwong5505
    @samsonwong5505 2 года назад

    I have learned python for only few months and even i can tell that this tutorial is wrong. Please be responsible by removing this video or at least correct yourself and re-post this topic.

  • @royler8848
    @royler8848 2 года назад

    I can't even begin to stress how many flaws and mistakes this video has

  • @HelmutQ
    @HelmutQ 3 месяца назад

    Really an inspiring video, however, with a few caveats pointed out below. The list comparision makes no sense, the child processes don't change the variables in the parent process. The time advantage exceeding that of number of the employed processors is unreal. I would really consider to take down this video because it risks to overshadow the otherwise excellent quality of your channel.

  • @MrFredazo
    @MrFredazo Год назад

    Beware, many things are so wrong here.

  • @joaco_crous
    @joaco_crous 3 года назад

    420 Likes.

  • @tatia-6232
    @tatia-6232 Год назад

    cap

  • @joshdheda8776
    @joshdheda8776 Год назад

    i don't get it... you guys on youtube doing these tutorials are LITERALLY IT/CompSci/etc. people, yet most of you don't have the common sense to upload your 720p and above resolution in 24 fps. WHY use 60 fps? WHY? why such a huge bandwidth jump if we want to watch in 720p rather than 480p? take my criticism constructively, not everyone has infinite data or high speed connections

  • @IzUrBoiKK
    @IzUrBoiKK 3 года назад +1

    !

  • @andrej2321
    @andrej2321 Год назад

    Horrible. As others have mentioned, buggy and outright incorrect. Please do not adopt these examples for your code.

  • @milosmiljkovic5975
    @milosmiljkovic5975 9 месяцев назад

    Please don't make guides on something you're clueless.

  • @mrmuranga
    @mrmuranga 3 года назад

    Well explained