Logistic Regression in Python - Predicting if the stock market is going Up or Down

Поделиться
HTML-код
  • Опубликовано: 30 янв 2021
  • This video is showing how Machine Learning can be used in the stock market. It is showing how a Logistic Regression can help to predict whether the market is going Up or Down. In specific on the S&P 500.
    The Logistic Regression is implement in Python using statsmodels. I have also performed this using the sklearn library so if you need any support with that kindly let me know.
    Get the Notebook/Source code by becoming a Tier-2 Channel member:
    / @algovibes
    I found this prediction in the book: Introduction to statistical learning with R (ISLR) as I am learning R right now. I can highly recommend this book.
    This video is for educational and entertaining purposes. It is no investment advice!
    If anything is unclear please drop me a comment. I am happy to help!
    #Python #LogisticRegression #MachineLearning #Stockmarket

Комментарии • 89

  • @mayacho4910
    @mayacho4910 Год назад +172

    Inflation depreciates idle money. I'm in a privileged position to be able to save almost 65% of our net household income, as I placed it on safer investments. The key for us was not spending beyond our means. If you invest and have other sources of income outside of dividends then you will be able to live off dividends. Got north of $520K in my portfolio as I bought a lot of dividend stocks before, I'm buying more now, and I will buy more when it drops further.

    • @theresagarcia1218
      @theresagarcia1218 Год назад

      The main problem is that most folks don’t care about anything other than football, Basketball and Music etc. They find it normal to take credit card debt which will cost them 20% per year but considers it risky to invest their money and make 10% or more per month. Learning to avoid high interest debt while also learning how to put your money to work for you by investing is a very powerful combo.

    • @helenoliver4838
      @helenoliver4838 Год назад

      The one effective technique I'm confident nobody admits to using, is staying in touch with an Investment-Adviser.

    • @mariahhayes5089
      @mariahhayes5089 Год назад

      Starting out with a professional that knows the ropes of the choppy but profitable market is the best way to achieve getting a well structured portfolio. That’s why I have been working with 'MARGARET ANN WARNKEN" because in financial dealings one has to be prudent. Most traders enter and exit with a quick 10% profit which is not bad in general opinion but why not make more of the opportunities presented?.

    • @raymondbarnes5264
      @raymondbarnes5264 Год назад

      She looks the part. MARGARET really seems to know her stuff. Out of curiosity i looked her up, found her web~page, and decided to read through her resume, educational background, qualifications and it was really impressive. She is a fiduciary who will act in my best interest. So, I booked a session with her.

    • @danielkey1463
      @danielkey1463 Год назад

      Identifying lucrative investments can get you ahead in no time, in years of Investing, my portfolio has experience immense growth, because i did so.

  • @tudatostrader
    @tudatostrader 2 года назад +1

    Thank you! Seriously love your work!

    • @Algovibes
      @Algovibes  2 года назад

      Thanks a lot for your support man.

  • @rajeevmenon1975
    @rajeevmenon1975 2 года назад +1

    Nice contents all way through. Been watching all.your videos. Very immersive and pleasant to watch. Keep up the good work mate!!!

    • @Algovibes
      @Algovibes  2 года назад

      Thank you man. Happy to read that!

  • @user-ey8vl6ko8c
    @user-ey8vl6ko8c 3 года назад +7

    Howdy! I just found your channel on RUclips, and reallylove what you are doing there! I like how clear and detailed your explanations are and the depth of knowledge you have on Python! Your content really stands out and you've put so much thought into your videos. Since I run a tech education channel as well, I love to see fellow Content Creators sharing, educating, and inspiring a large global audience. I wish you the best of luck on your RUclips Journey, cannot wait to see you succeed!
    Cheers :-)

    • @Algovibes
      @Algovibes  3 года назад

      Thanks a lot for your kind comment mate. Really appreciate it.
      Good luck to you as well. You have awesome content! :-)

  • @PMHijes
    @PMHijes 3 года назад +1

    Very interesting, thank you!

    • @Algovibes
      @Algovibes  3 года назад

      Thanks for watching my friend :-)

  • @rsbenari
    @rsbenari 3 года назад +2

    Thanks for this. Nicely done. Wondering, since accuracy is improved by looking at just the nearest-term prior lags, if a comparison of this technique with a Markov analysis would be worth while. Similar data prep (skipping the lags); and the Markov Transition Matrix could be built from the 'Direction' column. So not a heavy lift. Thanks for thinking on this.

    • @Algovibes
      @Algovibes  3 года назад +1

      Thank you so much for your valuable comment and sharing your thoughts! It would indeed be an interesting comparison.

  • @jollyguo5154
    @jollyguo5154 2 года назад +1

    very very helpful, clear and straightforward explain

    • @Algovibes
      @Algovibes  2 года назад

      Thank you very much for your feedback. Appreciate it :-)

  • @maiconreis9276
    @maiconreis9276 2 года назад +1

    Congrats on your video. I like very much this.

    • @Algovibes
      @Algovibes  2 года назад

      Thank you for your feedback buddy. Happy you like it!

  • @username8i
    @username8i 3 года назад +1

    Nice video, and is in sync with the book. It would be nice if you could have shown the matrix against a couple of stocks [may be in next video with scikit learn]. Cheers.

    • @Algovibes
      @Algovibes  3 года назад

      Thanks for watching and your comment. Which matrix are you referring to? You mean like a logistic regression on multiple stocks and compare where it works best if I am getting you right? That would be indeed a nice idea and we could use sklearn this time. Nice idea. Please do me a favor and correct me if this is not what you were talking about. Thanks in advance :-)

    • @username8i
      @username8i 3 года назад +1

      @@Algovibes Thanks. Yes, I'm talking about the confusion matrix you created to evaluate the outcome.

  • @roym1444
    @roym1444 3 года назад +1

    Hi i am curious about the work you do as i trade myself and back-test indicators manually on historical data to optimize my indicators and build strategies . Do you use these models in live trading scenarios.

    • @Algovibes
      @Algovibes  3 года назад

      Hi, first of all thank you very much for watching. What exactly do you want to know? What I am doing for a living or how do I make trading decisions?
      Best regards

    • @ai.simplified..
      @ai.simplified.. 2 года назад

      @@Algovibes how do u make trading decisions?

  • @jamestsang1061
    @jamestsang1061 2 года назад +1

    So cool!

    • @Algovibes
      @Algovibes  2 года назад

      Happy to read. Thanks a lot for watching :-)

  • @emmadshahid3031
    @emmadshahid3031 2 месяца назад

    hi, I am getting a separation error when I run the model, can I somehow share with you my data sheet, would appreciate if you could help me with it.

  • @aaronsarinana1654
    @aaronsarinana1654 2 года назад +1

    Nice video, just starting to play with ML. And yeah, the p-values are pretty high. Multicollinearity is a central concern in Logistic Regression. I'm gonna try other ML algorithms less sensitive to multicollinearity, maybe one based on decision trees. Maybe PCA can help too. Thanks!

    • @Algovibes
      @Algovibes  2 года назад +1

      Hey man, thanks a lot for your comment! Let me know your results :-)

    • @stevenpham6734
      @stevenpham6734 Год назад

      Yes, define the n number of lag by PCA to reduce the independent variables

  • @yes9748
    @yes9748 2 года назад +1

    Hey man where can we get the dataset from? Could you provide a link to it?

    • @Algovibes
      @Algovibes  2 года назад

      Hi buddy,
      the dataset is free to download as shown at around minute 3:30.
      I am using the yfinance library to pull the data.

  • @fsulitskiy
    @fsulitskiy 2 года назад +1

    Why do you use the lags as independent variables? Sorry if it is a stupid question, I'm quite new to ml

    • @Algovibes
      @Algovibes  2 года назад +1

      Idea behind that is, that past returns "could" have an impact on future returns. So independent variables are the past returns predicting the next days return.
      y -> next days return
      x1,x2,x3 -> prior returns

    • @fsulitskiy
      @fsulitskiy 2 года назад +1

      @@Algovibes understood. Thank you for your reply! :)

  • @collint7375
    @collint7375 Год назад

    Why is the predicted value compared against 0.5 whereas the actual is compared against 0?

  • @benjamintreitz1647
    @benjamintreitz1647 2 года назад +1

    "We're not discussing statistics here, we want to predict the market" - what a guy.

    • @Algovibes
      @Algovibes  2 года назад +1

      :D You need it to do so tho. I was just kidding.

    • @benjamintreitz1647
      @benjamintreitz1647 2 года назад

      @@Algovibes no prob I just like the way you express yourself :D

  • @dhirajp4677
    @dhirajp4677 3 года назад +1

    Curse of dimensionality demonstrated..thank you..subscribed

    • @Algovibes
      @Algovibes  3 года назад

      Thank you very much for subscribing. Much appreciated :-)
      Regarding the Curse of Dimensionality: Nice thought!
      But I didn't mean to demonstrate it, haha :-D

    • @markk4203
      @markk4203 2 года назад

      I might be wrong, but don't you have that backwards? If he found a significant result given more inputs (and NS for fewer) then it might be the curse of dimensionality were there not enough data points to support a large sample size for each permutation of the higher-dimensional space.

  • @ihorhurnyak7666
    @ihorhurnyak7666 3 года назад +1

    very interesting example ... but serious question about this pseudo R squered ... I tried many times to increase it to the dreamed level 0.2 - 0.4 ... and without big success ... Even in case of your example I can get not more than 0.15 ... And in my opinion this rule 0.2 - 0.4 is like a wish maybe ... not more ... I don't see necessity in such level... What do you think?

    • @Algovibes
      @Algovibes  3 года назад +2

      Thanks for watching and your comment :-)
      You are right. It should be the goal to have a higher (btw Mc Fadden's) Pseudo R squared. You already pointed out the relevant things: A good fit is a value somewhere around 0.2-0.4. And yes it is an incredibly good fit if you achieve that.
      But let me ask: Did you achieve a 0.15 when taking only Lag1 and Lag2? Didn't take a look at the stats. Would be awesome.
      Thanks in advance!

    • @ihorhurnyak7666
      @ihorhurnyak7666 3 года назад +1

      @@Algovibes no ... I increased quantity of variables and sample .... only then I can get bigger R

  • @gamersclipsandlifestyle7587
    @gamersclipsandlifestyle7587 2 года назад +1

    can you do with sklearn. would be great help

    • @Algovibes
      @Algovibes  2 года назад +1

      Thanks for the suggestion. Appreciate it :-)
      Will see what I can do!

  • @colorful_face5472
    @colorful_face5472 3 года назад +2

    Hi, nice content! Would be nice if you create a github repository for your code - or something similar. Thanks for your effort! Btw. I'm also into ML, python and finance, so if you like to exchange some knowlege, just let me know - (I think the german community is still quite small).

    • @Algovibes
      @Algovibes  3 года назад +1

      Hi :) Thanks a lot for your comment.
      Regarding the Repository: I have not yet decided whether I am setting up a GH Repository or my own website, so I am kindly asking for your patience until then.
      Regarding exchange: I am always happy to exchange with people and I would love to exchange with you as well. BTW thanks a lot for your subscription. Really appreciate it.

  • @MathaGoram
    @MathaGoram Год назад +1

    Volume seems to lower the model validity. Counter intuitive?

    • @Algovibes
      @Algovibes  Год назад

      Could you elaborate? Quite some time ago since I recorded this 😁

    • @MathaGoram
      @MathaGoram Год назад

      @@Algovibes I did parametric runs with your dataset. In every model, when I removed Volume as an independent variable (and kept the other set (of Lag_n) independent variables), the percentage went up. Thanks for your detailed example which permitted me to make the parametric runs. BTW, don't short change yourself. Your understanding of pandas (slicing/dicing) as demonstrated in these videos efficiently is brilliant, IMHO.

  • @teenspirit1
    @teenspirit1 2 года назад +1

    This is similar to sports betting. Taking last three wins between two teams will let you predict the next win most of the times since most teams are not evenly matched. Same with SPY which has been bullish since the beginning of time so predicting that it will go up the next day is pretty easy if you think about it. And people put that into an index fund and make money off of it.

    • @Algovibes
      @Algovibes  2 года назад

      Thanks for sharing your thoughts man! I do not fully agree but you definitely got a point.

  • @Anzeljaeg
    @Anzeljaeg 2 года назад +1

    I will run i on Pi4 for sure

  • @Binancian
    @Binancian 2 года назад +4

    Nice video, obviously including lag1 up to lag5 into the regression model is a bad idea usually due to very strong autocorrelation of the time series data.

    • @Algovibes
      @Algovibes  2 года назад

      Thanks for your comment and sharing your thoughts. Really appreciate it!

  • @markk4203
    @markk4203 2 года назад +1

    No significant results given six inputs but significant (technically we don't know without checking p-values but please ignore that detail) results with two (arbitrarily or not? We don't know) inputs selected: what comments would you make with regard to curve fitting here?

    • @Algovibes
      @Algovibes  2 года назад +1

      Can you elaborate on curve fitting? What do you want to achieve?

    • @markk4203
      @markk4203 2 года назад +1

      @@Algovibes The results were poor and the author removed all but two inputs and got more encouraging results. I'm asking if you think this might be overfitting (i.e. curve fitting), which is bad because future results are less likely to be as good.

    • @Algovibes
      @Algovibes  2 года назад +1

      @@markk4203 Ah I got it! Yea I would definitely take another time horizon into account or do out of sample tests in general.

  • @kevinalejandro3121
    @kevinalejandro3121 3 года назад +2

    I think we can enhance the prediction accuracy by adding to the dataframe several trading indicators.

    • @Algovibes
      @Algovibes  3 года назад

      Thanks a lot for watching :-)
      Very interesting thoughts!

  • @francesco.4533
    @francesco.4533 2 года назад +1

    Random guessing is 50% only if your y is balanced (i.e. there is the same number of occurrences of "Up" and "Down" in your data). In your case the frequency of "Up" in your data is most likely 0.5272 which is why you are getting that accuracy for your model... If a simple logistic model would be able to truly achieve a performance better than random on the direction of the S&P500 you would be multi billionaire by now :)

    • @Algovibes
      @Algovibes  2 года назад

      Good comment! Thanks a lot for pointing that out.

    • @stanislav6668
      @stanislav6668 10 месяцев назад

      all of the algotrading (except HFT) is based on "guessing" and it is unappropriate to treat it as a business or earning model. The only profitable algotraing strategy is to teach somebody on algotrading :) So condsider just as hobby, but not more. I watch that guy only for some educational python things in couple of videos.

  • @SP-db6sh
    @SP-db6sh 3 года назад

    It's a curtain raiser of machine learning approach in trading.LSTM or transformer can also be used for better accuracy if proficiency meets expectations .

    • @Algovibes
      @Algovibes  3 года назад

      Well, I am happy to have raised the curtain 🧖🏽 Thanks for watching :-)

  • @andreasw665
    @andreasw665 Год назад +1

    So the p value was important after all ;)

  • @user-livelife
    @user-livelife 3 года назад

    So basically you are predicting the trend of stocks ? Right
    Thank you

    • @Algovibes
      @Algovibes  3 года назад

      Is this a question or a statement? :D

    • @user-livelife
      @user-livelife 3 года назад +1

      @@Algovibes it is a question i swear hahaha
      I have a project abt predicting stock Trends using ML , i am new to all of this that's why i am asking 💙

  • @simplySunny9
    @simplySunny9 2 года назад +1

    70% traders predict the direction correctly and very early. So for ML to be worth using it should have accuracy of 80%. 56% is not at all encouraging in any way. Also out of 80% many moves wouldn't be profitable becuase of only slight change in prices.

    • @Algovibes
      @Algovibes  2 года назад +1

      Well I would need a proof for this claim :D Anyhow thanks a lot for sharing your thoughts.

    • @simplySunny9
      @simplySunny9 2 года назад

      @@Algovibesits already documented by so many trader's. I have traded market for many years and read many books.

    • @Claudineionficinal
      @Claudineionficinal 2 года назад +1

      @@simplySunny9I dont think you have source on this

    • @juansb1509
      @juansb1509 2 года назад

      70% of traders?? More like 0,7%

    • @simplySunny9
      @simplySunny9 2 года назад

      @@juansb1509 nope...it's above 70.

  • @ahmadz113
    @ahmadz113 2 года назад

    Thanks for the video . However , I think You made a mistake using Volume data to predict the past not future .

    • @Algovibes
      @Algovibes  2 года назад

      Thanks for watching :-)
      Disagreed, why do you think so?

    • @ahmadz113
      @ahmadz113 2 года назад +1

      @@Algovibes sorry , i am wrong :)

    • @Algovibes
      @Algovibes  2 года назад

      @@ahmadz113 No worries man, thanks for the feedback tho!

    • @ahmadz113
      @ahmadz113 2 года назад +1

      @@Algovibes thanks , as the results are not sufficient for practical trading , can we have another video with more variables and features included ? Its an interesting topic and needs more studies .