meanxai
meanxai
  • Видео 119
  • Просмотров 28 510
[MXDL-11-07] Attention Networks [7/7] - Stock price prediction using a Transformer model
Before I end this series, I would like to write some code to predict stock prices using the Transformer model.
Since stock prices are also time series, we can apply all the Seq2Seq, Attention, and Transformer models we have looked at in this series to try to predict stock prices. In this video, we will use the Transformer model to predict stock prices.
Stock prices are difficult to be predicted because they are characterized by non-stationary stochastic processes, that is, random walks. It can be said that future stock prices are determined not only by past memories but also by future events, information shock, etc. Past memories can be technically analyzed, but future events cannot. Theref...
Просмотров: 75

Видео

[MXDL-11-06] Attention Networks [6/7] - Time series forecasting using a Transformer model
Просмотров 103День назад
In the last two chapters, we predicted time series using sequence-to-sequence based models. In this video, we will predict time series using a transformer model that only uses attention, rather than a sequence model. Instead of writing our own transformer code from scratch, we'll use the code posted on this site, github.com/suyash/transformer Since this code is for natural language processing, ...
[MXDL-11-05] Attention Networks [5/7] - Transformer model
Просмотров 7814 дней назад
In the previous chapter, we looked at a sequence-to-sequence based attention model. In this video, we'll look at a Transformer model that uses only attention, rather than a sequence model. Eight data scientists working at Google published a groundbreaking research paper in the field of natural language processing in 2017 called “Attention is all you need.” This is the Transformer model. Transfo...
[MXDL-11-04] Attention Networks [4/7] - Seq2Seq-Attention model using input-feeding method
Просмотров 12721 день назад
In the last video, we implemented a simple Seq2Seq-Attention model to predict time series. In this video, we will add a feature called input-feeding method to the existing Seq2Seq-Attention model. The input-feeding approach is presented in section 3.3 of Luong's 2015 paper. In the existing Attention model we looked at in the previous video, Attention decisions are made independently, which is s...
[MXDL-11-03] Attention Networks [3/7] - Seq2Seq-Attention model for time series prediction
Просмотров 7721 день назад
In the last video, we implemented a sequence-to-sequence model to predict time series. In this video, we're going to add a feature called Attention to our sequence-to-sequence model. Let's take a look at the architecture of the Seq2Seq-Attention model and how to find attention scores and attention values. And let's implement this model with Keras and predict a time series. There are many papers...
[MXDL-11-02] Attention Networks [2/7] - Implementing a Seq2Seq model for time series forecasting
Просмотров 82Месяц назад
In the last video, we looked at how a sequence-to-sequence model works and how to create a dataset for time series prediction. In this video, we will implement this model using Keras and predict a time series. #AttentionNetworks #Seq2Seq #SequenceToSequence #TeacherForcing #TimeSeriesForecasting
[MXDL-11-01] Attention Networks [1/7] - Sequence-to-Sequence Networks (Seq2Seq)
Просмотров 88Месяц назад
Starting from this video, we will look at the attention networks, which is the eleventh topic of deep learning. This topic covers Sequence-to-Sequence, Attention, and Transformer networks. These are mainly used in natural language processing (NLP), but in this tutorial we will apply them to time series forecasting. In later NLP courses, we will explore these in more detail and use them to creat...
[MXDL-10-08] Recurrent Neural Networks (RNN) [8/8] - Multi-layer and Bi-directional RNN
Просмотров 188Месяц назад
So far, we have implemented many-to-one and many-to-many models. In other forms, we can implement single-layer and multi-layer models, and unidirectional and bidirectional models. We can also implement models that combine all of these. Then we can implement recurrent neural networks in many different forms. For example, we can implement a two-layered, unidirectional many-to-one model, or a two-...
[MXDL-10-07] Recurrent Neural Networks (RNN) [7/8] - Gated Recurrent Unit (GRU)
Просмотров 99Месяц назад
In this video, we'll look at one of the many variations of LSTM: Gated Recurrent Unit (GRU). And let's implement many-to-many GRU models, and apply them to a time series prediction problem. Let's implement a custom GRU layer and see how the output of each gate is computed and how information from the previous time step is propagated to the next time step. And let's implement a many-to-many GRU ...
[MXDL-10-06] Recurrent Neural Networks (RNN) [6/8]- Peephole LSTM models and time series forecasting
Просмотров 76Месяц назад
In this video, we will look at the structure of Peephole LSTM and implement many-to-one and many-to-many models for time series forecasting. Felix A. Gers et al. proposed Peephole LSTM, which adds peephole connections to traditional LSTM, in their 2000 paper "Recurrent Nets that Time and Count" and their 2002 paper "Learning Precise Timing with LSTM Recurrent Networks". Peephole connections all...
[MXDL-10-05] Recurrent Neural Networks (RNN) [5/8] - Build LSTM models for time series forecasting
Просмотров 186Месяц назад
In this video, we will implement LSTM models for time series forecasting. In the last video, we looked at the structure of many-to-one LSTM. In this video, we will implement it in code. First, let's implement an LSTM cell using Keras' custom layer and implement a many-to-one LSTM model to predict time series. Next, let's implement many-to-one and many-to-many LSTM models using Keras' LSTM class...
[MXDL-10-04] Recurrent Neural Networks (RNN) [4/8] - Long Short-Term Memory (LSTM)
Просмотров 86Месяц назад
In the last three videos, we looked at Simple RNN. It struggles to learn long-term dependencies because of vanishing or exploding gradients. That is, it is difficult to learn information from the distant past simply by increasing the number of time steps. The LSTM was designed to address the problem. In this video, we will briefly look at the history and papers of LSTM, and then take a closer l...
[MXDL-10-03] Recurrent Neural Networks (RNN) [3/8] - Build RNN models for time series forecasting
Просмотров 98Месяц назад
In this video, we will implement RNN models for time series forecasting. In the previous video, we looked at the structure of many-to-one and many-to-many RNN models. In this video, we will implement them in code. First, let's implement a recurrent layer using Keras' custom layer and implement a many-to-one RNN model to predict time series. Next, let's implement a many-to-one RNN model using Ke...
[MXDL-10-02] Recurrent Neural Networks (RNN) [2/8] - Backpropagation Through Time (BPTT)
Просмотров 120Месяц назад
In this tutorial, we will look at the basic types of recurrent neural networks, many-to-one and many-to many. And we will look at the backpropagation through time. And let's see what problems arise during backpropagation of recurrent neural networks. #RecurrentNeuralNetwork #RNN #SimpleRNN #BPTT #BackpropagationThroughTime
[MXDL-10-01] Recurrent Neural Networks (RNN) [1/8] - Basics of RNNs and their data structures.
Просмотров 2332 месяца назад
Starting from this video, we will look at recurrent neural networks, RNNs, which is the tenth topic of deep learning. This video is part 1, providing the basics of RNNs and their data structures. Let's look at the full table of contents. In Chapter 1, we will look at Simple RNN, the basic model of recurrent neural networks. RNN is an artificial neural network that is useful for analyzing time s...
[MXDL-9-01] Highway Networks [1/1] - Shortcut connections, implementing highway networks using Keras
Просмотров 802 месяца назад
[MXDL-9-01] Highway Networks [1/1] - Shortcut connections, implementing highway networks using Keras
[MXDL-8-03] Weights Intialization [3/3] - Kaiming He Initializer
Просмотров 702 месяца назад
[MXDL-8-03] Weights Intialization [3/3] - Kaiming He Initializer
[MXDL-8-02] Weights Initialization [2/3] - Xavier Glorot Initializer
Просмотров 702 месяца назад
[MXDL-8-02] Weights Initialization [2/3] - Xavier Glorot Initializer
[MXDL-8-01] Weights Initialization [1/3] - Observation of the outputs of a hidden layer
Просмотров 702 месяца назад
[MXDL-8-01] Weights Initialization [1/3] - Observation of the outputs of a hidden layer
[MXDL-7-02] Batch Normalization [2/2] - Custom Batch Normalization layer using Keras
Просмотров 3752 месяца назад
[MXDL-7-02] Batch Normalization [2/2] - Custom Batch Normalization layer using Keras
[MXDL-7-01] Batch Normalization [1/2] - Training and Prediction stage
Просмотров 3502 месяца назад
[MXDL-7-01] Batch Normalization [1/2] - Training and Prediction stage
[MXDL-6-02] Dropout [2/2] - Scale-down and Scale-up
Просмотров 882 месяца назад
[MXDL-6-02] Dropout [2/2] - Scale-down and Scale-up
[MXDL-6-01] Dropout [1/2] - Zero-out step in dropout
Просмотров 672 месяца назад
[MXDL-6-01] Dropout [1/2] - Zero-out step in dropout
[MXDL-5-02] Regularization [2/2] - Activity (or Activation) Regularization
Просмотров 1223 месяца назад
[MXDL-5-02] Regularization [2/2] - Activity (or Activation) Regularization
[MXDL-5-01] Regularization [1/2] - Weights and Biases Regularization
Просмотров 853 месяца назад
[MXDL-5-01] Regularization [1/2] - Weights and Biases Regularization
[MXDL-4-02] TensorFlow & Keras [2/2] - Build neural networks with Keras
Просмотров 753 месяца назад
[MXDL-4-02] TensorFlow & Keras [2/2] - Build neural networks with Keras
[MXDL-4-01] TensorFlow & Keras [1/2] - Build neural networks with TensorFlow
Просмотров 1303 месяца назад
[MXDL-4-01] TensorFlow & Keras [1/2] - Build neural networks with TensorFlow
[MXDL-3-03] Backpropagation [3/3] - Automatic Differentiation
Просмотров 643 месяца назад
[MXDL-3-03] Backpropagation [3/3] - Automatic Differentiation
[MXDL-3-02] Backpropagation [2/3] - Error Backpropagation along multiple paths
Просмотров 933 месяца назад
[MXDL-3-02] Backpropagation [2/3] - Error Backpropagation along multiple paths
[MXDL-3-01] Backpropagation [1/3] - Error Backpropagation along single path
Просмотров 1193 месяца назад
[MXDL-3-01] Backpropagation [1/3] - Error Backpropagation along single path

Комментарии

  • @ati43888
    @ati43888 3 часа назад

    Thanks.

  • @ati43888
    @ati43888 День назад

    Thanks

  • @ati43888
    @ati43888 День назад

    Thanks

  • @ati43888
    @ati43888 8 дней назад

    Very nice. Thanks

  • @meanxai
    @meanxai 10 дней назад

    The codes can be found at github.com/meanxai/deep_learning

  • @meanxai
    @meanxai 10 дней назад

    The codes can be found at github.com/meanxai/deep_learning

  • @meanxai
    @meanxai 10 дней назад

    The codes can be found at github.com/meanxai/deep_learning

  • @yang-z1m
    @yang-z1m 11 дней назад

    This is a really clear video. After watching it, I have a deeper understanding of LightGBM. Thanks!

    • @meanxai
      @meanxai 11 дней назад

      Thanks for your comment. I am glad it was helpful!

  • @yang-z1m
    @yang-z1m 11 дней назад

    very good video!

  • @KushJuvekar-j3f
    @KushJuvekar-j3f 14 дней назад

    Do all weak learners have to come from the same family? Meaning all weak learners are DTs or SVMs or they can be different?

    • @meanxai
      @meanxai 14 дней назад

      Same family. To keep the way we measure the epsilon consistent across rounds, it makes sense to use the same weak learner across rounds.

    • @KushJuvekar-j3f
      @KushJuvekar-j3f 14 дней назад

      @@meanxai Okay. Thank you!

  • @KushJuvekar-j3f
    @KushJuvekar-j3f 14 дней назад

    How many no of examples are taken from the training data while sub sampling? Is it like random forests, equal to the number of training examples?

    • @meanxai
      @meanxai 14 дней назад

      Typically, the sampled subset is the same size as the original data set and contains repeated data points. However, if your original data set is too large, you can generate smaller subsets. This is called "boosting-by-filtering". In this case, you need to consider a lower bound on the sample size that the model has to use in to guarantee that the final hypothesis has error smaller than epsilon. The lower bound is presented in theorem 3.2 of Yoav Freund's 1995 paper. Unfortunately, I haven't figured this out.

    • @KushJuvekar-j3f
      @KushJuvekar-j3f 14 дней назад

      @@meanxai Oh okay. Thanks, will check that out!

    • @KushJuvekar-j3f
      @KushJuvekar-j3f 14 дней назад

      This is what I got from NotebookLM ### Understanding Lambda, Gamma, and the Derivation of 'm' The sub-sample size 'm' plays a pivotal role in boosting algorithms. Here's an explanation of lambda (λ), gamma (γ), and how the lower bound for 'm' is derived: * **Lambda (λ) represents the reliability parameter.** This signifies the desired probability with which the final hypothesis outputted by the boosting algorithm should achieve the target accuracy (ε). In simpler terms, it's the confidence level that the final hypothesis will have the desired error rate. * **Gamma (γ) is a measure of the weak learner's advantage over random guessing.** The boosting algorithm utilizes a weak learning algorithm, denoted as **WeakLearn**, that is assumed to perform slightly better than random guessing. Gamma quantifies this advantage. * The sources provide a detailed analysis of a boosting algorithm that employs a "majority-vote game" analogy to illustrate its functionality. This game involves two players: a "chooser" who selects a weighting scheme for a set of points and an "adversary" who attempts to maximize the "loss" based on the chooser's strategy. * The derivation of the lower bound for 'm' stems from the requirement that **the hypotheses generated by WeakLearn should have a high probability of having an error smaller than 1/2 - γ.** This condition ensures that the weak learner is indeed performing better than random chance. * The sources state that the lower bound for 'm' is calculated using a formula that incorporates ε, λ, and other factors related to the weak learner's performance. This formula ensures that the sample size is sufficient to achieve the desired accuracy (ε) with the specified reliability (λ). * **The sources provide a specific formula for calculating 'm':** **m' >= (2/γ^2) * ln (m/(λ * ε)).** This formula guarantees that the probability of the final hypothesis having an error larger than ε is at most λ. * The choice of 'm' directly influences the number of training examples needed and the computational complexity of the boosting algorithm. A larger 'm' typically leads to higher accuracy but increases computational cost. In summary, lambda and gamma are essential parameters in determining the sub-sample size 'm', which plays a crucial role in the performance of boosting algorithms. The lower bound for 'm' is derived to ensure that the weak learner's performance surpasses random guessing and the desired accuracy and reliability are achieved. ### Selecting Gamma and Lambda Yes, **gamma (γ) and lambda (λ) are parameters that are typically chosen by the user.** There are factors that can help guide the selection of these parameters: * **Desired Accuracy (ε):** The user's desired accuracy for the final hypothesis plays a significant role in choosing gamma and lambda. A higher desired accuracy might necessitate a smaller gamma, indicating a requirement for a weak learner with a greater advantage over random guessing. This, in turn, might influence the choice of lambda to maintain a balance between accuracy and reliability. * **Computational Constraints:** The choice of gamma and lambda indirectly affects the computational complexity of the boosting algorithm through its influence on the sub-sample size 'm'. Users need to consider the available computational resources. A smaller gamma or a higher lambda might lead to a larger 'm', potentially increasing the computational cost. * **Nature of the Weak Learner:** The characteristics of the weak learner employed in the boosting algorithm can provide insights into selecting appropriate values for gamma and lambda. If the weak learner is known to have a significant advantage over random guessing, a larger gamma might be suitable. Conversely, a weaker learner might require a smaller gamma. * **Trade-off between Accuracy and Reliability:** The selection of gamma and lambda involves a trade-off between the desired accuracy and the reliability with which that accuracy should be achieved. A smaller gamma generally leads to better accuracy but might require a smaller lambda to maintain the desired reliability, potentially increasing the computational cost. The sources provided do not offer specific guidelines or formulas for choosing gamma and lambda. These parameters depend on the specific application, the characteristics of the weak learner, and the user's priorities regarding accuracy, reliability, and computational resources. **It's important to note that the information beyond what is stated in the sources about choosing gamma and lambda is not from the sources and may need to be independently verified.**

    • @meanxai
      @meanxai 13 дней назад

      @@KushJuvekar-j3f Thanks for the useful information. It helps me a lot to understand the lower bound of m.

  • @ati43888
    @ati43888 18 дней назад

    Thank. Very nice.

  • @ati43888
    @ati43888 19 дней назад

    thanks.

  • @ati43888
    @ati43888 19 дней назад

    Thanks. Keep going please.

  • @jirapolottpobukadee1139
    @jirapolottpobukadee1139 19 дней назад

    👍👍👍❤️

  • @ati43888
    @ati43888 20 дней назад

    Thanks

  • @radionnazmiev546
    @radionnazmiev546 21 день назад

    Interesting why dot product is used as a measure of similarity over cosine similarity. Cause if for example we compare [2,1] to [4,2] and [4,8] the resulting dot products would be 10 and 16 respectively which is counter intuitive as cosine between [2,1] and [4,2]r is zero hence the must be pretty similar but dot product of [2,1] and [4,8] is higher because of the higher values of the latter vector...

    • @meanxai
      @meanxai 21 день назад

      I totally agree with you. As in your example, dot product similarity does not make intuitive sense. We cannot say that dot product similarity is better than cosine similarity. However, we also cannot say that the latter is always better than the former. Cosine similarity only cares about the angle difference, while the dot product cares about both the angle and the magnitude. In deep learning, the magnitude of a vector may actually contain information we are interested in, so there is no need to remove it. Dot product similarity is said to be especially useful for high-dimensional vectors.

  • @radionnazmiev546
    @radionnazmiev546 21 день назад

    Great job!!!! Amazing tutorials!!!!

    • @meanxai
      @meanxai 21 день назад

      Thanks for your comment.

  • @ati43888
    @ati43888 23 дня назад

    Great. Thanks

  • @ati43888
    @ati43888 24 дня назад

    Thanks. amazing.

    • @meanxai
      @meanxai 24 дня назад

      Thanks for your comment.

  • @meanxai
    @meanxai 25 дней назад

    The codes can be found at github.com/meanxai/deep_learning

  • @ati43888
    @ati43888 28 дней назад

    Nİce. Thank you.

  • @jirapolottpobukadee1139
    @jirapolottpobukadee1139 Месяц назад

    ❤❤

  • @meanxai
    @meanxai Месяц назад

    The codes can be found at github.com/meanxai/deep_learning

  • @ati43888
    @ati43888 Месяц назад

    Best. Thank you.

  • @jirapolottpobukadee1139
    @jirapolottpobukadee1139 Месяц назад

    Thank u so much.❤❤

  • @ati43888
    @ati43888 Месяц назад

    Thanks

  • @meanxai
    @meanxai Месяц назад

    The codes can be found at github.com/meanxai/deep_learning

  • @meanxai
    @meanxai Месяц назад

    The codes can be found at github.com/meanxai/deep_learning

  • @meanxai
    @meanxai Месяц назад

    The codes can be found at github.com/meanxai/deep_learning

  • @meanxai
    @meanxai Месяц назад

    The codes can be found at github.com/meanxai/deep_learning

  • @meanxai
    @meanxai Месяц назад

    The codes can be found at github.com/meanxai/deep_learning

  • @ati43888
    @ati43888 Месяц назад

    Thanks

  • @ujjwalarora4159
    @ujjwalarora4159 Месяц назад

    Are these slides available for reference?

    • @meanxai
      @meanxai Месяц назад

      Sorry, the slides (PDF) are not public.

  • @IkhukumarHazarika
    @IkhukumarHazarika Месяц назад

    Please make video on pytroch

    • @meanxai
      @meanxai Месяц назад

      Sorry, I am not familiar with PyTorch.

  • @hopelesssuprem1867
    @hopelesssuprem1867 Месяц назад

    thank you for a good explanation

    • @meanxai
      @meanxai Месяц назад

      Thanks for your comment.

  • @zoraizelya3975
    @zoraizelya3975 Месяц назад

    Can you provide the code please?

    • @meanxai
      @meanxai Месяц назад

      The code can be found at github.com/meanxai/deep_learning Thanks.

    • @zoraizelya3975
      @zoraizelya3975 Месяц назад

      @@meanxai thank you for sharing but I typed the whole 🤣 and I also use CiFAR10 dataset but the val_accuracy wasn't good. Like 52 only the training was more than 90

    • @meanxai
      @meanxai Месяц назад

      @@zoraizelya3975 I also got 52% accuracy, which is too low. I think this is because our highway network consists of basic feedforward networks, which are not suitable for challenging image datasets. Thanks for your comment.

  • @suttanariiq2401
    @suttanariiq2401 2 месяца назад

    sorry for bothering, can i know the current formula you use in the video of the histogram based split finding for interval and the current formula for Score, can i know where u find those formula ? is it from article or what ? Thank You

    • @meanxai
      @meanxai 2 месяца назад

      The formulas used in this video, such as score, gain, came from XGBoost. More details on those formulas can be found here: [MXML-11-04] XGBoost (4/9) ruclips.net/video/ud7kJv5csxw/видео.html [MXML-11-05] XGBoost (5/9) ruclips.net/video/e_1TJD8tHgE/видео.html Thanks.

  • @RahulKumar-ez6vw
    @RahulKumar-ez6vw 2 месяца назад

    Waiting for your RNN module...

  • @karterel4562
    @karterel4562 2 месяца назад

    Sorry what a batching Normalization What we want make that !!

  • @sageagat3796
    @sageagat3796 2 месяца назад

    very intyeresting. Thank you. On which hypothesis the formula of confidence interval is calculated?

    • @meanxai
      @meanxai 2 месяца назад

      I didn't quite understand your question. Could you explain it a bit more?

  • @bilalviewing
    @bilalviewing 2 месяца назад

    Wow graph and content is really educational and beneficial- thanks much- pls keep sharing knowledge

    • @meanxai
      @meanxai 2 месяца назад

      Thanks for your comment.

  • @cornevanzyl5880
    @cornevanzyl5880 2 месяца назад

    Whoops, looks like you uploaded without sound

    • @meanxai
      @meanxai 2 месяца назад

      Really? The sound is fine, tested on Microsoft Edge, Google Chrome, and Samsung tablet. Please let me know whether you are still facing the problem.

  • @RahulKumar-ez6vw
    @RahulKumar-ez6vw 2 месяца назад

    When will the next video be uploaded? Can you speed up the frequency of uploading videos? I need NLP playlist sir .

    • @meanxai
      @meanxai 2 месяца назад

      Thank you for your interest in my videos, but I can only make 2 videos a week at the moment. I don't think I can make more than that.

  • @SameerShah-r7k
    @SameerShah-r7k 2 месяца назад

    Good explanation for hist binning. 👍

  • @RahulKumar-ez6vw
    @RahulKumar-ez6vw 3 месяца назад

    Next video ? Would you kindly provide some additional reading material, sir, so that we can fully grasp the subject?

    • @meanxai
      @meanxai 2 месяца назад

      OK. From the next video onwards, I will try to provide more links if we need additional reading material.

    • @RahulKumar-ez6vw
      @RahulKumar-ez6vw 2 месяца назад

      @@meanxai Thanks sir

  • @NO_NAME-fe6dj
    @NO_NAME-fe6dj 3 месяца назад

    thanks

  • @renanaoki714
    @renanaoki714 3 месяца назад

    So good! Thanks for the series!

  • @estadisticaparatodos6070
    @estadisticaparatodos6070 3 месяца назад

    Excelent, thanks

  • @IkhukumarHazarika
    @IkhukumarHazarika 3 месяца назад

    Also made video on how to build in pytorch