- Видео 119
- Просмотров 28 510
meanxai
Южная Корея
Добавлен 14 июл 2023
This channel covers AI-related technologies. We will discuss the mathematical theory of AI algorithms and implement practical models using Python, TensorFlow, etc.
The source codes can be found at:
github.com/meanxai/machine_learning
github.com/meanxai/deep_learning
The topics we will cover are:
1. Machine Learning
2. Deep Learning
3. Recommendation System
4. Natural Language Processing
5. Reinforcement Learning
All videos are produced in Korean and translated into English. And the audio is generated by AI, Text-to-Speech. There may be some grammatical errors or awkward expressions.
The source codes can be found at:
github.com/meanxai/machine_learning
github.com/meanxai/deep_learning
The topics we will cover are:
1. Machine Learning
2. Deep Learning
3. Recommendation System
4. Natural Language Processing
5. Reinforcement Learning
All videos are produced in Korean and translated into English. And the audio is generated by AI, Text-to-Speech. There may be some grammatical errors or awkward expressions.
[MXDL-11-07] Attention Networks [7/7] - Stock price prediction using a Transformer model
Before I end this series, I would like to write some code to predict stock prices using the Transformer model.
Since stock prices are also time series, we can apply all the Seq2Seq, Attention, and Transformer models we have looked at in this series to try to predict stock prices. In this video, we will use the Transformer model to predict stock prices.
Stock prices are difficult to be predicted because they are characterized by non-stationary stochastic processes, that is, random walks. It can be said that future stock prices are determined not only by past memories but also by future events, information shock, etc. Past memories can be technically analyzed, but future events cannot. Theref...
Since stock prices are also time series, we can apply all the Seq2Seq, Attention, and Transformer models we have looked at in this series to try to predict stock prices. In this video, we will use the Transformer model to predict stock prices.
Stock prices are difficult to be predicted because they are characterized by non-stationary stochastic processes, that is, random walks. It can be said that future stock prices are determined not only by past memories but also by future events, information shock, etc. Past memories can be technically analyzed, but future events cannot. Theref...
Просмотров: 75
Видео
[MXDL-11-06] Attention Networks [6/7] - Time series forecasting using a Transformer model
Просмотров 103День назад
In the last two chapters, we predicted time series using sequence-to-sequence based models. In this video, we will predict time series using a transformer model that only uses attention, rather than a sequence model. Instead of writing our own transformer code from scratch, we'll use the code posted on this site, github.com/suyash/transformer Since this code is for natural language processing, ...
[MXDL-11-05] Attention Networks [5/7] - Transformer model
Просмотров 7814 дней назад
In the previous chapter, we looked at a sequence-to-sequence based attention model. In this video, we'll look at a Transformer model that uses only attention, rather than a sequence model. Eight data scientists working at Google published a groundbreaking research paper in the field of natural language processing in 2017 called “Attention is all you need.” This is the Transformer model. Transfo...
[MXDL-11-04] Attention Networks [4/7] - Seq2Seq-Attention model using input-feeding method
Просмотров 12721 день назад
In the last video, we implemented a simple Seq2Seq-Attention model to predict time series. In this video, we will add a feature called input-feeding method to the existing Seq2Seq-Attention model. The input-feeding approach is presented in section 3.3 of Luong's 2015 paper. In the existing Attention model we looked at in the previous video, Attention decisions are made independently, which is s...
[MXDL-11-03] Attention Networks [3/7] - Seq2Seq-Attention model for time series prediction
Просмотров 7721 день назад
In the last video, we implemented a sequence-to-sequence model to predict time series. In this video, we're going to add a feature called Attention to our sequence-to-sequence model. Let's take a look at the architecture of the Seq2Seq-Attention model and how to find attention scores and attention values. And let's implement this model with Keras and predict a time series. There are many papers...
[MXDL-11-02] Attention Networks [2/7] - Implementing a Seq2Seq model for time series forecasting
Просмотров 82Месяц назад
In the last video, we looked at how a sequence-to-sequence model works and how to create a dataset for time series prediction. In this video, we will implement this model using Keras and predict a time series. #AttentionNetworks #Seq2Seq #SequenceToSequence #TeacherForcing #TimeSeriesForecasting
[MXDL-11-01] Attention Networks [1/7] - Sequence-to-Sequence Networks (Seq2Seq)
Просмотров 88Месяц назад
Starting from this video, we will look at the attention networks, which is the eleventh topic of deep learning. This topic covers Sequence-to-Sequence, Attention, and Transformer networks. These are mainly used in natural language processing (NLP), but in this tutorial we will apply them to time series forecasting. In later NLP courses, we will explore these in more detail and use them to creat...
[MXDL-10-08] Recurrent Neural Networks (RNN) [8/8] - Multi-layer and Bi-directional RNN
Просмотров 188Месяц назад
So far, we have implemented many-to-one and many-to-many models. In other forms, we can implement single-layer and multi-layer models, and unidirectional and bidirectional models. We can also implement models that combine all of these. Then we can implement recurrent neural networks in many different forms. For example, we can implement a two-layered, unidirectional many-to-one model, or a two-...
[MXDL-10-07] Recurrent Neural Networks (RNN) [7/8] - Gated Recurrent Unit (GRU)
Просмотров 99Месяц назад
In this video, we'll look at one of the many variations of LSTM: Gated Recurrent Unit (GRU). And let's implement many-to-many GRU models, and apply them to a time series prediction problem. Let's implement a custom GRU layer and see how the output of each gate is computed and how information from the previous time step is propagated to the next time step. And let's implement a many-to-many GRU ...
[MXDL-10-06] Recurrent Neural Networks (RNN) [6/8]- Peephole LSTM models and time series forecasting
Просмотров 76Месяц назад
In this video, we will look at the structure of Peephole LSTM and implement many-to-one and many-to-many models for time series forecasting. Felix A. Gers et al. proposed Peephole LSTM, which adds peephole connections to traditional LSTM, in their 2000 paper "Recurrent Nets that Time and Count" and their 2002 paper "Learning Precise Timing with LSTM Recurrent Networks". Peephole connections all...
[MXDL-10-05] Recurrent Neural Networks (RNN) [5/8] - Build LSTM models for time series forecasting
Просмотров 186Месяц назад
In this video, we will implement LSTM models for time series forecasting. In the last video, we looked at the structure of many-to-one LSTM. In this video, we will implement it in code. First, let's implement an LSTM cell using Keras' custom layer and implement a many-to-one LSTM model to predict time series. Next, let's implement many-to-one and many-to-many LSTM models using Keras' LSTM class...
[MXDL-10-04] Recurrent Neural Networks (RNN) [4/8] - Long Short-Term Memory (LSTM)
Просмотров 86Месяц назад
In the last three videos, we looked at Simple RNN. It struggles to learn long-term dependencies because of vanishing or exploding gradients. That is, it is difficult to learn information from the distant past simply by increasing the number of time steps. The LSTM was designed to address the problem. In this video, we will briefly look at the history and papers of LSTM, and then take a closer l...
[MXDL-10-03] Recurrent Neural Networks (RNN) [3/8] - Build RNN models for time series forecasting
Просмотров 98Месяц назад
In this video, we will implement RNN models for time series forecasting. In the previous video, we looked at the structure of many-to-one and many-to-many RNN models. In this video, we will implement them in code. First, let's implement a recurrent layer using Keras' custom layer and implement a many-to-one RNN model to predict time series. Next, let's implement a many-to-one RNN model using Ke...
[MXDL-10-02] Recurrent Neural Networks (RNN) [2/8] - Backpropagation Through Time (BPTT)
Просмотров 120Месяц назад
In this tutorial, we will look at the basic types of recurrent neural networks, many-to-one and many-to many. And we will look at the backpropagation through time. And let's see what problems arise during backpropagation of recurrent neural networks. #RecurrentNeuralNetwork #RNN #SimpleRNN #BPTT #BackpropagationThroughTime
[MXDL-10-01] Recurrent Neural Networks (RNN) [1/8] - Basics of RNNs and their data structures.
Просмотров 2332 месяца назад
Starting from this video, we will look at recurrent neural networks, RNNs, which is the tenth topic of deep learning. This video is part 1, providing the basics of RNNs and their data structures. Let's look at the full table of contents. In Chapter 1, we will look at Simple RNN, the basic model of recurrent neural networks. RNN is an artificial neural network that is useful for analyzing time s...
[MXDL-9-01] Highway Networks [1/1] - Shortcut connections, implementing highway networks using Keras
Просмотров 802 месяца назад
[MXDL-9-01] Highway Networks [1/1] - Shortcut connections, implementing highway networks using Keras
[MXDL-8-03] Weights Intialization [3/3] - Kaiming He Initializer
Просмотров 702 месяца назад
[MXDL-8-03] Weights Intialization [3/3] - Kaiming He Initializer
[MXDL-8-02] Weights Initialization [2/3] - Xavier Glorot Initializer
Просмотров 702 месяца назад
[MXDL-8-02] Weights Initialization [2/3] - Xavier Glorot Initializer
[MXDL-8-01] Weights Initialization [1/3] - Observation of the outputs of a hidden layer
Просмотров 702 месяца назад
[MXDL-8-01] Weights Initialization [1/3] - Observation of the outputs of a hidden layer
[MXDL-7-02] Batch Normalization [2/2] - Custom Batch Normalization layer using Keras
Просмотров 3752 месяца назад
[MXDL-7-02] Batch Normalization [2/2] - Custom Batch Normalization layer using Keras
[MXDL-7-01] Batch Normalization [1/2] - Training and Prediction stage
Просмотров 3502 месяца назад
[MXDL-7-01] Batch Normalization [1/2] - Training and Prediction stage
[MXDL-6-02] Dropout [2/2] - Scale-down and Scale-up
Просмотров 882 месяца назад
[MXDL-6-02] Dropout [2/2] - Scale-down and Scale-up
[MXDL-6-01] Dropout [1/2] - Zero-out step in dropout
Просмотров 672 месяца назад
[MXDL-6-01] Dropout [1/2] - Zero-out step in dropout
[MXDL-5-02] Regularization [2/2] - Activity (or Activation) Regularization
Просмотров 1223 месяца назад
[MXDL-5-02] Regularization [2/2] - Activity (or Activation) Regularization
[MXDL-5-01] Regularization [1/2] - Weights and Biases Regularization
Просмотров 853 месяца назад
[MXDL-5-01] Regularization [1/2] - Weights and Biases Regularization
[MXDL-4-02] TensorFlow & Keras [2/2] - Build neural networks with Keras
Просмотров 753 месяца назад
[MXDL-4-02] TensorFlow & Keras [2/2] - Build neural networks with Keras
[MXDL-4-01] TensorFlow & Keras [1/2] - Build neural networks with TensorFlow
Просмотров 1303 месяца назад
[MXDL-4-01] TensorFlow & Keras [1/2] - Build neural networks with TensorFlow
[MXDL-3-03] Backpropagation [3/3] - Automatic Differentiation
Просмотров 643 месяца назад
[MXDL-3-03] Backpropagation [3/3] - Automatic Differentiation
[MXDL-3-02] Backpropagation [2/3] - Error Backpropagation along multiple paths
Просмотров 933 месяца назад
[MXDL-3-02] Backpropagation [2/3] - Error Backpropagation along multiple paths
[MXDL-3-01] Backpropagation [1/3] - Error Backpropagation along single path
Просмотров 1193 месяца назад
[MXDL-3-01] Backpropagation [1/3] - Error Backpropagation along single path
Thanks.
Thanks
Thanks
Very nice. Thanks
The codes can be found at github.com/meanxai/deep_learning
The codes can be found at github.com/meanxai/deep_learning
The codes can be found at github.com/meanxai/deep_learning
This is a really clear video. After watching it, I have a deeper understanding of LightGBM. Thanks!
Thanks for your comment. I am glad it was helpful!
very good video!
Do all weak learners have to come from the same family? Meaning all weak learners are DTs or SVMs or they can be different?
Same family. To keep the way we measure the epsilon consistent across rounds, it makes sense to use the same weak learner across rounds.
@@meanxai Okay. Thank you!
How many no of examples are taken from the training data while sub sampling? Is it like random forests, equal to the number of training examples?
Typically, the sampled subset is the same size as the original data set and contains repeated data points. However, if your original data set is too large, you can generate smaller subsets. This is called "boosting-by-filtering". In this case, you need to consider a lower bound on the sample size that the model has to use in to guarantee that the final hypothesis has error smaller than epsilon. The lower bound is presented in theorem 3.2 of Yoav Freund's 1995 paper. Unfortunately, I haven't figured this out.
@@meanxai Oh okay. Thanks, will check that out!
This is what I got from NotebookLM ### Understanding Lambda, Gamma, and the Derivation of 'm' The sub-sample size 'm' plays a pivotal role in boosting algorithms. Here's an explanation of lambda (λ), gamma (γ), and how the lower bound for 'm' is derived: * **Lambda (λ) represents the reliability parameter.** This signifies the desired probability with which the final hypothesis outputted by the boosting algorithm should achieve the target accuracy (ε). In simpler terms, it's the confidence level that the final hypothesis will have the desired error rate. * **Gamma (γ) is a measure of the weak learner's advantage over random guessing.** The boosting algorithm utilizes a weak learning algorithm, denoted as **WeakLearn**, that is assumed to perform slightly better than random guessing. Gamma quantifies this advantage. * The sources provide a detailed analysis of a boosting algorithm that employs a "majority-vote game" analogy to illustrate its functionality. This game involves two players: a "chooser" who selects a weighting scheme for a set of points and an "adversary" who attempts to maximize the "loss" based on the chooser's strategy. * The derivation of the lower bound for 'm' stems from the requirement that **the hypotheses generated by WeakLearn should have a high probability of having an error smaller than 1/2 - γ.** This condition ensures that the weak learner is indeed performing better than random chance. * The sources state that the lower bound for 'm' is calculated using a formula that incorporates ε, λ, and other factors related to the weak learner's performance. This formula ensures that the sample size is sufficient to achieve the desired accuracy (ε) with the specified reliability (λ). * **The sources provide a specific formula for calculating 'm':** **m' >= (2/γ^2) * ln (m/(λ * ε)).** This formula guarantees that the probability of the final hypothesis having an error larger than ε is at most λ. * The choice of 'm' directly influences the number of training examples needed and the computational complexity of the boosting algorithm. A larger 'm' typically leads to higher accuracy but increases computational cost. In summary, lambda and gamma are essential parameters in determining the sub-sample size 'm', which plays a crucial role in the performance of boosting algorithms. The lower bound for 'm' is derived to ensure that the weak learner's performance surpasses random guessing and the desired accuracy and reliability are achieved. ### Selecting Gamma and Lambda Yes, **gamma (γ) and lambda (λ) are parameters that are typically chosen by the user.** There are factors that can help guide the selection of these parameters: * **Desired Accuracy (ε):** The user's desired accuracy for the final hypothesis plays a significant role in choosing gamma and lambda. A higher desired accuracy might necessitate a smaller gamma, indicating a requirement for a weak learner with a greater advantage over random guessing. This, in turn, might influence the choice of lambda to maintain a balance between accuracy and reliability. * **Computational Constraints:** The choice of gamma and lambda indirectly affects the computational complexity of the boosting algorithm through its influence on the sub-sample size 'm'. Users need to consider the available computational resources. A smaller gamma or a higher lambda might lead to a larger 'm', potentially increasing the computational cost. * **Nature of the Weak Learner:** The characteristics of the weak learner employed in the boosting algorithm can provide insights into selecting appropriate values for gamma and lambda. If the weak learner is known to have a significant advantage over random guessing, a larger gamma might be suitable. Conversely, a weaker learner might require a smaller gamma. * **Trade-off between Accuracy and Reliability:** The selection of gamma and lambda involves a trade-off between the desired accuracy and the reliability with which that accuracy should be achieved. A smaller gamma generally leads to better accuracy but might require a smaller lambda to maintain the desired reliability, potentially increasing the computational cost. The sources provided do not offer specific guidelines or formulas for choosing gamma and lambda. These parameters depend on the specific application, the characteristics of the weak learner, and the user's priorities regarding accuracy, reliability, and computational resources. **It's important to note that the information beyond what is stated in the sources about choosing gamma and lambda is not from the sources and may need to be independently verified.**
@@KushJuvekar-j3f Thanks for the useful information. It helps me a lot to understand the lower bound of m.
Thank. Very nice.
thanks.
Thanks. Keep going please.
👍👍👍❤️
Thanks
Interesting why dot product is used as a measure of similarity over cosine similarity. Cause if for example we compare [2,1] to [4,2] and [4,8] the resulting dot products would be 10 and 16 respectively which is counter intuitive as cosine between [2,1] and [4,2]r is zero hence the must be pretty similar but dot product of [2,1] and [4,8] is higher because of the higher values of the latter vector...
I totally agree with you. As in your example, dot product similarity does not make intuitive sense. We cannot say that dot product similarity is better than cosine similarity. However, we also cannot say that the latter is always better than the former. Cosine similarity only cares about the angle difference, while the dot product cares about both the angle and the magnitude. In deep learning, the magnitude of a vector may actually contain information we are interested in, so there is no need to remove it. Dot product similarity is said to be especially useful for high-dimensional vectors.
Great job!!!! Amazing tutorials!!!!
Thanks for your comment.
Great. Thanks
Thanks. amazing.
Thanks for your comment.
The codes can be found at github.com/meanxai/deep_learning
Nİce. Thank you.
❤❤
The codes can be found at github.com/meanxai/deep_learning
Best. Thank you.
Thank u so much.❤❤
Thanks
The codes can be found at github.com/meanxai/deep_learning
The codes can be found at github.com/meanxai/deep_learning
The codes can be found at github.com/meanxai/deep_learning
The codes can be found at github.com/meanxai/deep_learning
The codes can be found at github.com/meanxai/deep_learning
Thanks
Are these slides available for reference?
Sorry, the slides (PDF) are not public.
Please make video on pytroch
Sorry, I am not familiar with PyTorch.
thank you for a good explanation
Thanks for your comment.
Can you provide the code please?
The code can be found at github.com/meanxai/deep_learning Thanks.
@@meanxai thank you for sharing but I typed the whole 🤣 and I also use CiFAR10 dataset but the val_accuracy wasn't good. Like 52 only the training was more than 90
@@zoraizelya3975 I also got 52% accuracy, which is too low. I think this is because our highway network consists of basic feedforward networks, which are not suitable for challenging image datasets. Thanks for your comment.
sorry for bothering, can i know the current formula you use in the video of the histogram based split finding for interval and the current formula for Score, can i know where u find those formula ? is it from article or what ? Thank You
The formulas used in this video, such as score, gain, came from XGBoost. More details on those formulas can be found here: [MXML-11-04] XGBoost (4/9) ruclips.net/video/ud7kJv5csxw/видео.html [MXML-11-05] XGBoost (5/9) ruclips.net/video/e_1TJD8tHgE/видео.html Thanks.
Waiting for your RNN module...
Sorry what a batching Normalization What we want make that !!
very intyeresting. Thank you. On which hypothesis the formula of confidence interval is calculated?
I didn't quite understand your question. Could you explain it a bit more?
Wow graph and content is really educational and beneficial- thanks much- pls keep sharing knowledge
Thanks for your comment.
Whoops, looks like you uploaded without sound
Really? The sound is fine, tested on Microsoft Edge, Google Chrome, and Samsung tablet. Please let me know whether you are still facing the problem.
When will the next video be uploaded? Can you speed up the frequency of uploading videos? I need NLP playlist sir .
Thank you for your interest in my videos, but I can only make 2 videos a week at the moment. I don't think I can make more than that.
Good explanation for hist binning. 👍
Next video ? Would you kindly provide some additional reading material, sir, so that we can fully grasp the subject?
OK. From the next video onwards, I will try to provide more links if we need additional reading material.
@@meanxai Thanks sir
thanks
So good! Thanks for the series!
Excelent, thanks
Also made video on how to build in pytorch