Haskell for Imperative Programmers #33 - Parallelism

Поделиться
HTML-код
  • Опубликовано: 28 дек 2024

Комментарии • 28

  • @sirnawaz
    @sirnawaz 4 года назад +2

    At around 44:40, why did you use `parBuffer 100 rseq`, why not `parListChunk 100 rseq`? what is the difference between them BTW? I'm implementing ray-tracer in Haskell, and for an image size `100x56`, I'm computing a chunk of pixels in parallel. So I tried both `parBuffer 64 rseq` and `parListChunk 64 rseq` and the former creates `5600` sparks (which does not make sense) whereas the latter creates `88` sparks only (which actually makes sense: 100 * 56 / 64 ~ 88, after all)

    • @philipphagenlocher
      @philipphagenlocher  4 года назад +4

      parListChunk evaluates the whole list in parallel by first dividing it into chunks which are then evaluated. parBuffer buffers the next n-1 elements (in parallel) on the evaluation of the head of the list. It only computes the next elements if they are needed. In general, if the size of the list is known and the whole list has to be evaluated then parListChunk is fine. parBuffer has the benefit of working on lazy lists (even infinite lists which would not work with parListChunk). That's why I wanted to demonstrate parBuffer!
      Regarding your example: parListChunk splits the list into chunks which are then evaluated by the sparks (one chunk for one spark), so your calculation of 88 sparks is correct. parBuffer creates one spark for every single element but only in "lazy" steps of the specified size. Since you have 100x56 = 5600 elements, that's where the number of sparks are coming from. So in your example, your list is split up into "chunks" of 64 elements that are then being evaluated elementwise in parallel.
      I shouldn't have said that parBuffer is doing chunking because that's not really what is happening. It is building "rolling buffer" of a certain size and evaluates that buffer in parallel, only shifting that buffer once it is needed.
      I'm pinning your comment so hopefully this will clear up some confusion!

    • @sirnawaz
      @sirnawaz 4 года назад

      @@philipphagenlocher the "rolling buffer" analogy makes sense, and that is what I had imagined. However, it seems that `parBuffer` is badly designed, because IF it waits for the buffer (of size 64) to full, then it evaluates, then it could have used a chunk of 8 size for each spark (I've 8 cores, so 64/8 = 8 sized chunks). Or, IF it does not wait for the buffer to full, then the argument `64` is not very useful and `parBuffer` could avoid taking this argument from user.
      Or maybe, there is something that I'm still missing. ::wondering::

    • @philipphagenlocher
      @philipphagenlocher  4 года назад +2

      @@sirnawaz "parBuffer n" is not really waiting for the buffer to fill. It waits for the evaluation of the head. Once the head is being evaluated, n-1 new sparks are created (one spark per element) and parBuffer waits for the next unevaluated element to be evaluated at some point. The reason parBuffer is implemented that way is to facilitate a parallel computation for lazy (possibly infinite) lists. If you want to use additional chunking within the rolling buffer you probably have to implement it yourself.
      According to the documentation chunking might change in the future, so maybe we'll get a better system at some point! ;)

  • @lucianobarletta8283
    @lucianobarletta8283 4 года назад +4

    Almost an hour, that's a big jump! Still gonna watch it, obviously

  • @SriKadimisetty
    @SriKadimisetty 4 года назад

    I was studying the same thing so this is greater saved me time with the proper cli & ghcoptions and with getting the hang of threadscope as well. Thank you much!

  • @nadaxiong
    @nadaxiong 4 года назад +2

    Hi thx again for the great video! May I know where did you learn these knowledge? From GHC source?

    • @philipphagenlocher
      @philipphagenlocher  4 года назад +2

      My main sources are the Haskell Wiki, "Parallel and Concurrent Programming in Haskell" by Simon Marlow and "Haskell Programming From First Principles" by Christopher Allen and Julie Moronuki.
      Thanks for watching!

  • @valcron-1000
    @valcron-1000 4 года назад +8

    You're doing God's work! Thanks for this

  • @sirnawaz
    @sirnawaz 4 года назад

    Awesome, as usual!
    However, at 22:40, one spark is fizzled and you tried to explain it. But the explanation lacks proper reasoning. You said it is executed by the main thread. Why exactly? After we used `rpar` on both, and then `rseq` on both which is supposed to wait for completion BEFORE it returns. So why would main thread execute it?

    • @philipphagenlocher
      @philipphagenlocher  4 года назад +1

      That was poorly worded on my part. When thinking about evaluation of sparks there really aren't "threads" that do the work, but HECs. A HEC can create a spark and put it into its pool. Other HECs can do work stealing and start evaluation of sparks in spark pool other than their own. Of course one of the HECs has to be the one that is executing the main thread. So it is possible for a HEC to execute the main thread and (in the case of rseq) start evaluating sparks. The HEC executing the main thread is not waiting for the return, but actively participating in the evaluation of the sparks.

    • @sirnawaz
      @sirnawaz 4 года назад

      @@philipphagenlocher That makes perfect sense now. Thanks a lot. :-)

  • @fuag155555
    @fuag155555 4 года назад +1

    This is awesome, I'd love to hear your description of FRP, either reactive-banana or wires or something like that. I’m also curious about custom dsl in Haskell, programming at the type level, type families, or template Haskell 🙂

  • @maxreuv
    @maxreuv 4 года назад +1

    Another treasure vid, thank you!

  • @sirnawaz
    @sirnawaz 4 года назад

    One last doubt (hopefully) in this awesome video: at around 21:50, we noticed this little program seems to allocate 17 GB of memory, and that value seems to be always too high to be scary, even though `top` (or `htop`) on machine reports much lower value (often in MB). Why this discrepancy? I understand this video isn't about that, not sure if you have explained that in any of your other videos. This answer on Stackoverflow suggests to ignore that value, without explaining what that actually is.. stackoverflow.com/a/61670171/415784.

    • @philipphagenlocher
      @philipphagenlocher  4 года назад +1

      That's a really good question and I must admit that I don't fully understand why the value for allocated memory is so comically high. When looking at chapter 5.4.4 in the GHC user guide ( downloads.haskell.org/~ghc/7.2.1/docs/html/users_guide/prof-heap.html ) we find some reasons:
      "-There is an overhead of profiling itself, which is subtracted from the residency figures by the profiler. This overhead goes away when compiling without profiling support, of course. The space overhead is currently 2 extra words per heap object, which probably results in about a 30% overhead.
      -Garbage collection requires more memory than the actual residency. The factor depends on the kind of garbage collection algorithm in use: a major GC in the standard generation copying collector will usually require 3L bytes of memory, where L is the amount of live data. This is because by default (see the +RTS -F option) we allow the old generation to grow to twice its size (2L) before collecting it, and we require additionally L bytes to copy the live data into. When using compacting collection (see the +RTS -c option), this is reduced to 2L, and can further be reduced by tweaking the -F option. Also add the size of the allocation area (currently a fixed 512Kb)."
      The generational GC seems to be the main issue which is explained in detail by Simon Marlow here: stackoverflow.com/questions/3171922/ghcs-rts-options-for-garbage-collection/3172704#3172704
      So I think it the main problem with the measurement is the overapproximation of what the GC actually needs. Since the example program in the video holds live data (in the form of thunks and sparks) this seems to inflate the allocation for the GC. When adding the 30% overhead needed for profiling the measurement seems possible.

    • @sirnawaz
      @sirnawaz 4 года назад

      @@philipphagenlocher Thanks for your response, with some pointers which I can explore further, at least for two weekends. And if I find anything more concrete and interesting, or how to deal with it, I'll share with you. :-)

    • @pedrovasconcelos8260
      @pedrovasconcelos8260 4 года назад

      Functional languages (particularly lazy ones) often perform a lot more allocations than imperative ones. For example: due to lazyness Haskell integers are heap allocated by deafult. Compiling with -O2 enables some unboxing optimizations that should improve these numbers significatly. Note, however, that even the unoptimizated version does not require 17Gb because most allocations are short lived (ie residenxy is still low), and the GC can recover memory as the program runs.

  • @mtcomscxstart
    @mtcomscxstart 4 года назад +2

    Thank you for the videos. They are absolutely great. You deserve much more subscribers than you have right now. Nevertheless, there is a lot of room for improvements that can make your videos much better. For example, use higher resolutions like 1440p or even 4K. Especially for the textual information, because video compression algorithms do not work well for text. Try to use different fonts, color schemes for the code.
    And again, please, don't get me wrong. I really like your videos :) Just want them to be even better!

    • @philipphagenlocher
      @philipphagenlocher  4 года назад

      Thank you so much!
      I'm working on improving the video quality! Bigger resolutions are a bit of a problem for me since I am restricted by the size of my monitor when recording the screen but I'll see what I can do!

  • @pineiden
    @pineiden 3 года назад

    what is the program to check the execution profile that you use?

  • @3PicPe0ple
    @3PicPe0ple 4 года назад +1

    Your middle name isn't Michael, is it?

  • @cierre2955
    @cierre2955 2 года назад

    30:00

  • @alexveremis4608
    @alexveremis4608 4 года назад

    great vid thanks a lot

  • @obinator9065
    @obinator9065 2 года назад

    0:11 Squirrel ist auch ziemlich nervig für Deutsche :’D