A&DS S04E14. Parallel Algorithms

Pavel Mavrin

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 июл 2024
Algorithms and data structures. Semester 4. Lecture 14
In the final lecture, we discussed parallel algorithms.
ITMO University, 2022

Комментарии • 21

@MehbubulHasanAlQuvi Год назад ⁺¹²
This was an emotional moment :) Finally the end of a long, fully FREE course on A&DS. This is the best course there is! Pavel, we want more from you. This time contents about CP will be appreciated. Also, AtCoder evening streams are great. Please continue those whenever you get time.
@madhukiranattivilli2321 Год назад ⁺²
I really loved this video. I will try to implement all algos in this lecture, and see the performance improvement of the parallel algos. This is very important as almost every mainstream application is multi-threaded
@madhukiranattivilli2321 Год назад
This is a fantastic lesson
Thanks Pavel :)
I watched 2nd time now, after 9 months gap -- time just flies. I thought I watched just 3-4 months ago
@madhukiranattivilli2321 Год назад
Parallel Merge Sort
Hi Pavel,
In order to merge a pair of logn sized blocks, we've to first get the starting indices of all blocks for both arrays. We already know 1 + n/logn indices (1 is for index 0) for each array. But through binary search, we get n/logn more indices for each array, which we've to insert amongst the already known indices. So, 1 + 2 * (n / logn) indices max, associated with each array = O(n/logn) indices
If we insert the new index normally into the array of indices (i.e. do linear search to get position to insert, and move few indices to the right, and finally insert index), linear search takes O(n/logn) time, and n/logn indices to insert, and logn levels -- making W = O(n²/logn)
But, if we use SBBST to store the indices associated with each of the 2 arrays (ex: TreeSet in java) -- search runs in O(log(n/logn)) = O(logn) time (assuming log(logn) is close to 1; a constant value; O(1)), and insertion of each new index runs in O(1) time. So, W = “O(logn) to add each index to SBBST” * “n/logn indices” * “logn levels” = O(nlogn)
Only with above improvements the overall W=nlogn and D=log²n is feasible.
So, we should get T(W,D) = O(nlogn, log²n) in each aspects :
(1) binary search on sorted array... to find appropriate index positions in one of the 2 arrays for few values in the other array (n/logn values)
(2) insert into SBBST... to insert the new indices (found in (1) above) into an SBBST associated with each of the 2 arrays
(3) merge logn-sized segment pairs
Could you please review and lemme know if I've complicated things, if there is a more easier way to implement? Thanks :)
@madhukiranattivilli2321 Год назад
Filter parallel OP
Hi Pavel,
I understood ur style, where we first apply the filter to build the binary array (0 for odd item, 1 for even item), and then build the prefix-sum array, and using the 2 array combination add even items at appropriate index positions in result array.
So, 1st OP of filtering : W, D = O(n), O(1)
2nd OP of building segment tree : W, D = O(n), O(logn)
3rd OP of building prefix-sum array : W, D = O(n), O(logn)
How is below approach? :
1st OP of filtering : W, D = O(n), O(1)
Instead of filling 2nd array with 0s and 1s, add empty list for odd item, and list with even item index if even item
2nd OP of building segment tree : W, D = O(n), O(logn)
Instead of adding 2nd array items 2 at a time, concatenate the 2 lists. When this 2nd OP ends, at the final level of segment tree having single node, we will have the list of all index positions of even items from the original 1st array
And finally build the result array with the even items at those index positions.
Time complexity is same in both approaches, but we avoid building the prefix-sum array (3rd OP) in my style.
Is there any defect? Please check and advise. Thanks :)
@madhukiranattivilli2321 Год назад
Parallel Merge Sort
Hi Pavel,
I've implemented 2 variations of this algo -- a fake implementation -- i.e. a single threaded program only :
(1) merge sort using additional memory
(2) inline merge sort, with no extra memory
Both worked gr8
But I realized that lot of corner cases have to be carefully handled. In comparison, even though the concept of FFT algo is more complex, FFT algo implementation is "relatively" easier than "Parallel Merge Sort" :)
@HasanAhmed-jn8zs 2 года назад ⁺³
Will u discuss combinatorics ?
@Agent-kt7sv Год назад
Добрый день,Павел,скажите,пожалуйста,в каком порядке корректен просмотр серии DSA?Благодарю Вас.
@madhukiranattivilli2321 Год назад
Parallel 2-3 BST build() algo
Hi Pavel,
About the "Pipelining" improvisation…
When the node at level k among those logm levels of input recursively calls parallel_add() twice, for it's left and right segments, in 2 parallel threads... each of those 2 parallel threads at levels k+1... while trying to move their items from level j to level j-1 among those logn levels of 2-3 BST, should proceed only if their parent thread at level k (of input) had already moved it's item to level j-2 or lower (in 2-3 BST), else be in wait() state, and come out of wait() when their parent does notify() to it's 2 child threads
This might not be too straight forward to implement!
Am I correct? Please clarify :)
Code :
parallel_add(left_index, right index, parent_thr_id) :
middle_index = (left_index + right_index) / 2
run parallel_add(left_index, middle_index + 1, current_thr_id) on thread #1
run parallel_add(left_index, middle_index + 1, current_thr_id) on thread #2
add_23BST(sorted_array[middle_index], parent_thr_id, current_thr_id)
add_23BST(item, parent_thr_id, current_thr_id) :
init next_level = logn
while true loop
if next_level == -1
-- i.e. current level of 0 i.e. root level has been handled already by current_thr_id
break
if parent_thr_id at level ≥ next_level
wait()
-- comes out of wait() when parent_thr_id runs notify()
else if parent_thr_id at level < next_level
if next_level == logn
get greatest leaf item item0 ≤ item
let parent0 be item0's parent
add item as parent0's parent, and immediate right sibling to item0
if item's parent has > 3 kids
split parent into 2 parents
split kids into 2 groups of atmost 3 kids each
assign left kids-group to left parent
assign right kids-group to right parent
item = item's parent
--next_level
notify()
-- to bring any child_thr_ids out of wait() state
endloop
@jashdkj4902 8 месяцев назад
Здравствуйте, Павел Юрьевич, хотел узнать можно ли у вас взять персональные занятия, если да, то подскажите куда лучше написать что бы обсудить. Я работаю уже разработчиком и хочу повысить уровень по АИСД. Чувствую что не хватает знаний )
@floppa-fy2qh Год назад
Здравствуйте, Павел Юрьевич, будут ли у Вас лекции по пространственным структурам данным ? И в идеале ещё cache oblivious структуры данных )
@pavelmavrin Год назад
А пространственные это какие? Cache oblivious знаю, но они не поместились в курс(
@floppa-fy2qh Год назад
@@pavelmavrin Вообще, может быть, "пространственные" это жаргонизм, просто я где-то однажды увидел такое и сам стал так называть :)
Это структуры данных где используются алгоритмы т.н. "spatial subdivision" для их построения, например: квадро/окто дерево, AABB дерево, k-d дерево, R-дерево и т.д.
@pavelmavrin Год назад
@@floppa-fy2qh о, вот это я плохо знаю кстати, надо почитать
@floppa-fy2qh Год назад
@@pavelmavrin на самом деле, в моём понимании, они являются обобщением обычного бинарного дерева поиска с операциями lower/upper bound, только в случае бинарного дерева поиска n (количество измерений) = 1, а в пространственных структурах n > 1, а смысл операций тот же :)
@gattuchandrashekar5963 Год назад
It will be helpful if you could post all the videos in english
@madhukiranattivilli2321 Год назад ⁺¹
Speed of light
Hi Pavel,
You say 3GHz is upperlimit, and I understood ur logic. But I googled and there is the latest Intel processor of 5.5GHz !!
Nothing can be faster than speed of light is the rule. If 5.5GHz is feasible, shouldn't it mean the physical speed of anything within the processor to be able to do 5.5 billion operations per second is still below the speed of light? -- like (obviously) the speed of fan, speed of signal transfer within the chip, etc
But, I'm not confident with my logic above!
Cud u please clarify?
@pavelmavrin Год назад ⁺¹
Yes, of course, 5-6 GHz is still possible. I didn’t say 3GHz is the limit, i said it’s very close to the limit, meaning that it’s very unlikely to have processors much faster.
@madhukiranattivilli2321 Год назад
@@pavelmavrin ok
@luckytima2315 Год назад ⁺¹
Видео новые будут ? :( Можете на русском снимать ? Привет из Астаны
@nikhilgupta8941 Год назад
Use English

Следующие

Автовоспроизведение

АиСД S04E12. Быстрое преобразование Фурье