The legend of Deep Learning! Thank you Professor Andrew Ng for sharing your light with the world 🕯️and for teaching us this awesome new field 😀Forever grateful!
The beam width is used in every iteration. At each stage we evaluate every possibility for the 3 beams we carried over from the last stage (this produces 30,000 new possibilities in the example) and then we reduce it to just 3 (our beam width) before moving on to the next step. It looks like the computation complexity for a search of a sequence of k words from a dictionary of n possible words, with a beam width b would be as follows. n steps for the first word of the sequence. then b*n for the additional k-1 steps in the sequence, giving n+(b*n*(k-1)), which for simplicities sake could just be considered b*n*k. For a sequence of 10 words, a dictionary with 10,000 words, and a beam size of 3, it would take 3*10*10000, or 300,000 operations. Beam is just a reduction of breadth-first-search, so if the beam were infinite, it would be identical to BFS. The complexity of BFS would be the size of the dictionary to the power of the size of the sequence (10000^10 in our example). For even modest dictionary sizes and sequence lengths this quickly becomes infeasible so that's why we need beam search to narrow the possibility while still giving us a high likelihood of finding the optimal result.
One question on this one , If we increase beam width then can September come as a candidate in the first 3 words.May be the African sentence was literally translated as "September is the best time for Jane to visit Africa"
Sure! The beam search candidate sequences he used in his example were just that -- examples. I don't believe they were taken from a real neural network. They were merely meant intuitively to motivate the algorithm.
Hi, Thank you for your effort. I find your videos and explanations very instructive and detailed. But I was wondering if you could make a video about tree-to-string machine translation using tree transducers. It is something that I can't quite capture yet.
Thank you for being a protagonist of open education. These videos help a lot
The legend of Deep Learning! Thank you Professor Andrew Ng for sharing your light with the world 🕯️and for teaching us this awesome new field 😀Forever grateful!
You’re a fantastic teacher. Thank you
If first two words were "Andrew" and "Ng" next two words will be "is", "best".
explained extremely well!
Andrew Ng is the ML G.O.A.T!
quite tough
what is the computation complexity of the above method?
is the beam width only used in the first iteration?
The beam width is used in every iteration. At each stage we evaluate every possibility for the 3 beams we carried over from the last stage (this produces 30,000 new possibilities in the example) and then we reduce it to just 3 (our beam width) before moving on to the next step.
It looks like the computation complexity for a search of a sequence of k words from a dictionary of n possible words, with a beam width b would be as follows. n steps for the first word of the sequence. then b*n for the additional k-1 steps in the sequence, giving n+(b*n*(k-1)), which for simplicities sake could just be considered b*n*k.
For a sequence of 10 words, a dictionary with 10,000 words, and a beam size of 3, it would take 3*10*10000, or 300,000 operations. Beam is just a reduction of breadth-first-search, so if the beam were infinite, it would be identical to BFS. The complexity of BFS would be the size of the dictionary to the power of the size of the sequence (10000^10 in our example). For even modest dictionary sizes and sequence lengths this quickly becomes infeasible so that's why we need beam search to narrow the possibility while still giving us a high likelihood of finding the optimal result.
how's the algorithm with the memory usage? is it better than greedy?
One question on this one , If we increase beam width then can September come as a candidate in the first 3 words.May be the African sentence was literally translated as "September is the best time for Jane to visit Africa"
Sure! The beam search candidate sequences he used in his example were just that -- examples. I don't believe they were taken from a real neural network. They were merely meant intuitively to motivate the algorithm.
is there any different about training?
thank you!
Hi, Thank you for your effort. I find your videos and explanations very instructive and detailed. But I was wondering if you could make a video about tree-to-string machine translation using tree transducers. It is something that I can't quite capture yet.
Thank you
Best!
So helpful, thank you
u r best
Nice accent and vid
During decoding process, could beam search be replaced with something like MCTS ?
Can you please help me out with beam search,As am not getting how can implement using keras
is getting a decent microphone an issue these days in tech community or wtf
idk man gives it a very raw feel it's kinda nice to my ears
can u please help me in local beam search coding on unity
did you get the solution for beam search ?
what a terrible non helpful video
Thank you