Dear Stanford online recently I have completed product management course from Stanford online but i haven't got the certificate help me please how will I get the certificate
Gemini 1.5 Pro: The video is about demystifying mixture of experts (MoE) and Sparse Mixture of Experts (Smoe) models. The speaker, Albert Jang, who is a PhD student at the University of Cambridge and a scientist at Mistral AI, first introduces the concept of dense Transformer architecture. Then he dives into the details of Smoes. He explains that Smoes are a type of neural network architecture that can be more efficient than standard Transformers by using a gating network to route tokens to a subset of experts. This can be useful for training very large models with billions of parameters. Here are the key points from the talk: * Mixture of Experts (MoE) is a neural network architecture that uses a gating network to route tokens to a subset of experts. * Sparse Mixture of Experts (Smoe) is a type of MoE that can be more efficient than standard Transformers. * Smoes use a gating network to route tokens to a subset of experts, which can be more efficient than training a single large model. * Smoes are well-suited for training very large models with billions of parameters. The speaker also discusses some of the challenges of interpreting Smoes and the potential for future research in this area. Overall, the talk provides a good introduction to Smoes and their potential benefits for training large language models.
great❤❤❤
Cool format, Stanford quickly becomes my favorite blogger lol
47:00 Anyone knows the paper that suggests that learning to route isn't any better than random routing ?
6:22 here “xq_LQH = wq(x_LD).view(L, N, H)” should be “xq_LQH = wq(x_LD).view(L, Q, H)” right?
where to get slides
Dear Stanford online recently I have completed product management course from Stanford online but i haven't got the certificate help me please how will I get the certificate
Wait u will get it
- Ethan from Stanford
Gemini 1.5 Pro: The video is about demystifying mixture of experts (MoE) and Sparse Mixture of Experts (Smoe) models.
The speaker, Albert Jang, who is a PhD student at the University of Cambridge and a scientist at Mistral AI, first introduces the concept of dense Transformer architecture. Then he dives into the details of Smoes. He explains that Smoes are a type of neural network architecture that can be more efficient than standard Transformers by using a gating network to route tokens to a subset of experts. This can be useful for training very large models with billions of parameters.
Here are the key points from the talk:
* Mixture of Experts (MoE) is a neural network architecture that uses a gating network to route tokens to a subset of experts.
* Sparse Mixture of Experts (Smoe) is a type of MoE that can be more efficient than standard Transformers.
* Smoes use a gating network to route tokens to a subset of experts, which can be more efficient than training a single large model.
* Smoes are well-suited for training very large models with billions of parameters.
The speaker also discusses some of the challenges of interpreting Smoes and the potential for future research in this area. Overall, the talk provides a good introduction to Smoes and their potential benefits for training large language models.