I love these tutorials. But at this point, using other async programming constructs would be much better to use. Namely, I can see that it's really easy to accidentally use uninitialized variables. It can be hard to see if a computed variable is added to a task queue and computed asynchronously. Might as well use more safer async constructs, like `Ivars/Defers` from a statically typed functional programming like Haskell or OCaml.
I was stuck on this for about 45 minutes, but finally figured it out: the issue is _not_ the taskwait. Rather, the key is you need to have a #pragma omp single nowait before each #pragma omp task. Without the single, the task will run a separate copy on each thread, which obviously kills performance. Without the nowait, the implicit barrier at the end of single will kill performance. As an aside, the examples at 5:50 and 6:00 in this lecture are pretty suboptimal and really don't show how to use task correctly.
There is no need for a taskwait I think. The parallel block has a barrier at the end, so all the tasks will be executed when leaving the block. You can also add a nowait to the single block because otherwise you would have 2 locks right after the other
Brilliant tutorial.
No use of '#pragma omp parallel' for the Fibonacci code. Does it mean more like a serial code with the use of tasks?
lastprivate would work as well for x and y, right? or can that only be used in a for construct?
I love these tutorials. But at this point, using other async programming constructs would be much better to use. Namely, I can see that it's really easy to accidentally use uninitialized variables. It can be hard to see if a computed variable is added to a task queue and computed asynchronously. Might as well use more safer async constructs, like `Ivars/Defers` from a statically typed functional programming like Haskell or OCaml.
instead of shared x and y, can I use lastprivate x and y?
in this case x and y wont be initialised so actually you will need to declare x and y first and lastprivate
tasks made my code way slower. I think the #pragma omp taskwait is causing the bottleneck
I was stuck on this for about 45 minutes, but finally figured it out: the issue is _not_ the taskwait. Rather, the key is you need to have a #pragma omp single nowait before each #pragma omp task. Without the single, the task will run a separate copy on each thread, which obviously kills performance. Without the nowait, the implicit barrier at the end of single will kill performance.
As an aside, the examples at 5:50 and 6:00 in this lecture are pretty suboptimal and really don't show how to use task correctly.
There is no need for a taskwait I think. The parallel block has a barrier at the end, so all the tasks will be executed when leaving the block. You can also add a nowait to the single block because otherwise you would have 2 locks right after the other