Predict The Stock Market With Machine Learning And Python
HTML-код
- Опубликовано: 16 май 2024
- In this tutorial, we'll learn how to predict tomorrow's S&P 500 index price using historical data. We'll also learn how to avoid common issues that make most stock price models overfit in the real world.
We'll start by downloading S&P 500 prices using a package called yfinance. Then, we'll clean up the data with pandas, and get it ready for machine learning.
We'll train a random forest model and make predictions using backtesting. Then, we'll improve the model by adding predictors. We'll end with next steps you can use to improve the model on your own.
You can find an overview of the project and the code here - github.com/dataquestio/projec... .
If you enjoyed this tutorial, check out this link bit.ly/3O8MDef for free courses that will help you master data skills.
Chapters
00:00 - Introduction
01:28 - Downloading S&P 500 price data
03:30 - Cleaning and visualizing our stock market data
04:29 - Setting up our target for machine learning
08:19 - Training an initial machine learning model
17:01 - Building a backtesting system
23:05 - Adding additional predictors to our model
28:45 - Improving our model
33:37 - Summary and next steps with the model
---------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef
Hi everyone! You can find the code for this tutorial here - github.com/dataquestio/project-walkthroughs/tree/master/sp_500 .
Thanks Vik!
Thanks Vic, However your F1 score is at 0.5. How does that factor in?
Thanks, but it's incomplete.
Hey Viki. You should have used the pd.dropna(inplace=True).
Great video. Will you or can you provide additional information on other useful classifiers and also how to merge other data sources like news and sentiment into this code?
Clear and to the point. I hate super long videos full of things that don't provide much value. This one was great. I like that he walked through general data science/machine learning steps. In particular the data cleansing which many skip over, but it is actually an important step. Also, a pet peeve of mine is audio quality. This video you can hear the presenter clearly and he doesn't sound like his is working from a tin can.
Excellent. This tutorial corrects an error that pretty much every other video from others that I have seen has made. Don't seek MSE precision in your target as your goal. That's not what practitioners are looking for. Do what this educator has done instead. This model gets it right as used in the real world. Solid base to work with. Well done!
I’m new to coding but have always been an avid market watcher and looking for opportunities. Best video I’ve seen since I started scouring the depths of RUclips for this content last week. Thank you sir!
Thanks for your great video. Im curious to read more about the whole issue of predicting actual prices versus only the direction. Do you have a good source on this? I can see why the latter is more robust, but once you start accounting for transaction costs, the magnitude of the direction is also important. curious to get your thought on this too.
I cannot thank you enough! It's very straight to point and I've learned more in this video than in n online courses and articles.
Searched & watched a LOT of videos. This is the best. Well done man.
have you tried them? do they work on real data?
This was an amazing walkthrough. I have learned so much!
My man is doing noble work. Kudos!
Very thorough and loved it sir. Thanks for the video lesson.
thank yiu so much fir the video. I have taken varius courses in different places, and your video and teaching style are certainly the best !
Great video. Thank you for the insights. Going to be tuning into more of your work.
Excellent video, thank you for sharing this. Hopefully I can see more ML related videos going forward.
Watched up to 2:26 and I already know this is going to be excellent.
Clear and concise explanation from the start and you know this is going to be more than your ordinary YT tutorial
It's not excellent, you can't beat the market as regular person. You basically compete with Harvard graduates with math, computer science, etc. Degrees. Again, one RUclips video won't make you beat the market
@@alang.2054 someone had to break this kids dreams of being rich off a youtube vid
@alang.2054 Where'd you get that she said she would beat the market from her comment?
I read an observation just stating that, this video is higher quality than most YT videos that claim to teach you something specific yet just give you fluff..
Vik, I echo the compliments on the excellent video. I was able to use my own bespoke weekly market timing signals aligned with weekly S&P closes to finally get a grounded statistical "opinion" on the predictability of forward returns - as only my second Python exercise! Thanks!
Sir your explaining skills are top notch
Thats a really good video and it seems you really know what you are talking about. Thanks!
Great video. Really clear and at a pace that allowed me to follow it easily and learn some new and simple techniques in how to manipulate data.
What a great framework to ML time-series data for prediction. Thanks for sharing!
This is awesome, instead of showing what you need to learn or try it shows how to actually build a model. This is very usefull. Thank you!
Could we get a similar video bus featuring a deep learning model instead?
What are you talking about? Do you really think this guy would show you real ways to make money? On market you compete with professionals in multi billion hedge funds with degrees, you can't beat them with RUclips video
Explaining is on top. Thank you!
Very useful man, thanks for show us the way!
How would you use the volume column?
Not sure how to use the volume, can we build some relative volume indicator? Can you give a hint, or maybe a link to a video, where you use volume somehow to improve your model?
Volume should influence the model significantly.
DUDE THIS IS SO HELPFUL
Thanks so much, you're a blessing
Thank you very much for this! Truly found this useful for my first ML Project. However, a bit confused by the 'combined' graph - how did you get it? :) (I had to do mine using the train_test_split import.)
This was very well delivered. Thank yo sharing.
I will consider the suggestions you made and see how this works.
Very exciting with a bit of 😅.....
Great tutorial!
Thanks for your great video. Im curious to read more about the whole issue of predicting actual prices versus only the direction. Do you have a good source on this?
Congratulations for your explanation and it was very clear. I would like to suggest you to prepare a vide including news about the stock into this model. Thanks
I'm hoping you can do a follow up video to this. Would be great to see how you would incorporate macro data into your model, such as news or interest rates.
thank you thank you !! this is great, suscribed :)
Incredible video! This helped me a whole lot I really do appreciate it! I Just Liked and Subcribed!
Vik thank you for this video! Greetings from Poland. Please explain to me how to connect the model so that operating on a virtual server bought and sold instruments? How do you combine it?
Thank you for your videos. But what if I have multiple stocks to predict, and when I parse one stock id in, I want to get the specific prediction for that id only. will it be feasible?
Great job! I used the majority of your code but for a specific company. My personal aspect is that this "result" is a bit messy. Do you have any tips on how we could make a clear graph towards the end with "predicted values"? I tried graphing with "Tomorrow" with respect to "Close"m but no difference. Part of that reason could because of the wide X-axis.
Thanks again, looking forward to your answer! / Alexander
Great video , I hope to see more tutorials like this in the future.
Wow, the concept of predicting the stock market using machine learning and Python is such a fascinating topic! The blend of finance and technology is always an area ripe for innovative approaches. It's impressive how machine learning can analyze vast amounts of data to find patterns that might not be obvious at first glance. Python, with its extensive libraries and community support, is an excellent choice for such complex computations. It's exciting to think about how these tools can provide insights into market trends and possibly even predict future movements. The intersection of machine learning and finance is definitely a space to watch! 📈💡🤖
Great tutorial 🙏
Hi, great lesson,
I have a question.
I'm still new to data science.
But why didn't you use the data as a predictor?
Im asking because say we want to predict what happens in the next day.
How do i pass it to the model when i didn't train with it
Hi Vik. Thank you very much. Is it possible to predict two days in advance instead of just tomorrow?
Thank you so much, I’m learning to build and plot models, I’m basically copied your code and tried to understand it,
What’s your advice to learn how to do it yourself?
Super helpful - Thank You !!!
These are great for practice Keep em coming
Glad you like them, Prathamesh! -Vik
Very good explanation, thanks.
hi, I wonder how reliable would this be if I predicted the 10, 20, or more candles into the future with an accuracy of 75 to 90 percent. do you think its gonna be useful in the financial markets. since I did create features which predicts the prices with an accuracy of 85 percent.
Good and clear explanation :) Although there are other factors to be considered like bid offer spread and commissions. Also, when the market goes against you, do you wait before the end of day to close the losing position? Maybe setting a stop loss and including it in the model and back testing can help. Thanks.
how would commissions help? lol
@@Mike-fm3km In the back testing of the model, it may seem profitable but after considering the commissions/transaction fees, it might be unprofitable instead.
Great video. It seems that the yfinance api is no longer functioning. Could you please do an updated video using a different method to collect the date? Thanks.
Thank you ❤❤
Actually you forgot to measure the expectancy of a trade in the case it has a precision of 42%. Because what makes a strategy profitable is bit the win rate but rather the expectancy of the trades. Although it is a great video and a good tutorial about programming. Thanks and keep up the good work.
The features used for the random forest cannot be the high, close, low , open values directly without any transformation because what the model is essentially doing is creating a overfit of non linear decisions to certain prices ranges. It is basically memorizing that when the close was above X value and open below Y value predict 1 or 0. You need to normalize the predictors in some way so that the model can use them independently of how high the value the stock is and truly create generalizable rules. Ratios are good since they use percentage instead of using absolute values and allow the model to use information of multiple candles as well.
Quite important comment.
Excellen video. I think you have a great teaching ability. I'm surprised you did not start with the usual "THIS IS NOT FINANCIAL ADVICE..." disclaimer 😇
Hello Vik, Thanks for the great tutorial, really informative. Do you know how to add lorentzian classification to the model in your example?
Cool Video! Thank you!!
Hey man, how did you get into this kind of work? Im so keen to find some work doing what you did but am finding limited possibilities
Thanks, Vic.
cool went threw the whole process on mini conda.
Great video, thank you!
Excellent Video. Thank you for sharing. Question, how can we compare the 'influence' from another stock in the same industry, ie, two retail stocks, or two energy stocks?
correlation maybe.
Hello! Why the column "Tomorrow" wasn't used for training? 🤔
Brilliant video Vik! Towards the end, you mentioned adding news to the model. Could you share how one could integrate that?
Thanks!
Hi Jeevan - the easiest way to do it is to scrape daily headlines from say the new york times, and create a "sentiment" model to indicate confidence in the market. The output of that model could then be a predictor column. Of course, you could get a lot more complicated than this :)
Excellent video!
Which platform are you coding on? Is it via google collab or jupyter?
Great stuff!
The S&P 500 is still up 10% this year. It's not a get-rich-quick scheme, but it's a proven strategy for wealth accumulation over time, Which happens path i'm considering so as to hedge the losses on my $350k portfolio, but are there any drawbacks to buying such quality stocks?
Well, one potential downside is that they may not offer the same rapid growth potential as riskier, smaller-cap stocks. So, it depends on your goals and risk tolerance. you may want to work with a financial advisor who can help with right approach.
this is definitely considerable! think you could suggest any advisors i can get on the phone with? i'm in dire need of proper portfolio allocation
very much appreciated, your response suggests a person of benevolence.. just inputted her full name on my browser, and came across her site, top-notch qualifications! she seems well-qualified
@@TeresaBricklefuck you bots no ones gonna fall for that
Spam comment chain, please remove
Hi Vik - thank you for the great video
This could be a dumb Qs - in "Improving Our Model" section, why didn't you change Predictors to "NEW_Predictors" when you defined the function/ when you've copy paste?
Does this matter?
Thank you,
AL
"NEW_Predictors" was passed while calling backtest function which calls predict function with "New_Predictors". Hence New_Predictors was used for modelling
great channel, will try to get some of my time to get to do something meaningful with the help of dataquest
Amazing video!! Have yiou looked at the performances of other ML techniques, e.g, MLPregressor?
I suggest you google the semi strong efficient market hypothesis. Would save a lot of time.
Hey Real Quick Question, Can We Get Predictions For Each Single Stock?
Hint: on a recent macbook you can use all its cores by:
import joblib
N_CORES = joblib.cpu_count(only_physical_cores=True)
...
model = RandomForestClassifier(n_estimators='your value', min_samples_split='your other value', random_state=1, n_jobs=N_CORES)
The speedup is amazing
you don't need any information about the system to do this, n_jobs = -1 will use all the available cores with no imports or extra lines :)
How do you add additional columns that will display information from yahoo finance such as pe ratio dividens and so on
Amazing work! Although I have a few doubts. I selected 18 features - from global stock indices, currency, and commodity - to predict daily directional changes in Nifty 50.
1. I'm not using the closing price for input variables rather I'm using the difference in previous close and current close. Is this a correct approach.
2. Also, can I split the target variable into 5 category (Up, Down, Neutral, Extended Up, Extended Down).
1) wouldn’t that be the same as using closing values?
2) interesting idea but it will probably reduce the over all effectiveness of the model because it reduces the amount of training data that meets the 5 categories vs 2. I don’t know about India exchanges, but in the US, for example, Fidelity charges $0 trade fee and keeps $0 from market makers for order flow. It all goes to the customer as price improvement. This is an extreme case, but my point is that I’m 2023, there should be markets you can trade for little to no cost. The brokers want your limit orders because it provides their other customers more liquidity without having to execute through a market maker. Also, they sell the limit order data to hedge funds that use that extra level of info to have an edge on the markets.
Excellent video and you are above average by all means. You made things easier for me who is new to Python. At 65 yrs old I tried to work your script and it worked beautifully. So, I tried with TSLA ticker and it gave me no obj to concatenate error and I have no idea how to fix that error.
Hey
I'm facing the similar issue.
You got any solutions?
@@bvspa Did not work for me yet
@@SuperVIN786 you have to alter the start an step count as per the dataset
@@bvspa Thanks I will try that
@@bvspa Finally ran it after I made start and step number change, watched the video again which helped. Thanks
great video, I hope to see your works into real trading platform. It would be great to see your P&L.
Buying a stock is easy, but buying the right stock without a time-tested strategy is incredibly hard. I’ve been trying to grow my portfolio of $160K for sometime now, my major challenge is not knowing the best entry and exit strategies
Investors should be cautious about their exposure and be wary of new buys, especially during inflation. Such high yields in this recession is only possible under the supervision of a professional or trusted advisor.
I have been speaking with a coach for a long time now mostly because I lack the background knowledge and mental toughness to handle these reoccurring market conditions. I made over $220K during this drop, which proved that there is more to the market than the average person is aware of.
I just started a few months back, I'm going for long term, I'm still trying to wrap my head around it, who’s this advisor you work with?
Credits to *Sharon Louise Count* one of the best portfolio manager;s out there. she;s well known, you should look her up
I Found her online page by searching her full name, I wrote her an email and scheduled a call, hopefully she responds soon. Thanks
Great video. The backtest code can be improved. Use vectorized back test instead of doing it in a loop will greatly improve the back test efficiency.
Can you elaborate on this? The backtest for me takes about 100 seconds
thank you
Hi, how do I predict the next , for instance in a new data.
Isn’t there leakeage in the ‘trend’ feature, considering it is a function of future values (‘target’)?
Hi can we use this for Indian stock markets?❤
Thank you so much for the tutorial and for taking the time to explain each piece of code in such a clear manner. I have two quick questions: 1.) What is the purpose of the .csv file ? 2.) Broadly speaking, what would be the steps to using a different API? Thanks !!
If you can fit the data from the API into a data-frame it would be very easy.
@@FlisB thanks for replying. Would you by any chance know how get (in addition to 1 or 0 when proba >.6) a column with the actual probability?
@@adamfrench4587 You need to save the result of model.predict_proba to another variable. add probs = preds before changing "preds" with 0.6 condition. And then add "probs" to the array inside pd.concat.
Legend, thank you so much!
hello sir , can this be used for day trading , in indian market for options trading of bank nifty and nifty in a 5 minutes candle time frame during market hours and feeding real time data?
I’m trying to figure out which kind of career gets you working on these kind of predictions on a daily basis? I’m on the fence whether to go the BBA route and major in finance or to major in computer science. I know I’ll need both, but I’m unsure which area is more important. Does a financial analyst do this primarily or is it a data scientists job?
Pure math. Eventually physics. Companies like people who know how to think and have good problem solving skills that could be applied to anything. Also in math uni you learn a lot of programming these days
A BS in Data Science, not CS or SE
How has the model done this year? Does it show a topping formation?
when you split the data into the training and testing dataset, you are actually performing what is called Simple Random Sampling, this will cause the training data to have the same elements/characteristics of the testing dataset. If you were to calculate the means of each predictor variable in the testing and training dataset it will roughly be the same due to random sampling. The point I am trying to make is that you cannot claim the model has not "seen" the testing data, yet it managed to capture the majority of its properties due to simple random sampling, how about you train the model using the first 70% rows then leave the remaining 30% at the bottom for predictions? In that way the model does not have any idea what's happening with the remaining 30% (though there is an argument one can put forward about this), I think that approach would be the most realistic. I have used the simple random sampling before and I have gotten results which seemed to be accurate, it was not until I used this method I am suggesting to you that I obtained a little bit higher errors.
Hi @Everyone, I am getting this following error when trying to get the predictions for the second time on the new_predictors
Code: predictions = backtest(nifty50, model, new_predictors) FYI, I am using Nifty50 dataset.
ValueError: Length of values (1) does not match length of index (250)
Can anyone guide me through this error, I am not getting it. Any help would be much appreciated.
Hi, the way you explain is much better than any other. But I have a [ ValueError: No objects to concatenate ] after
" predictions = backtest(AXIS, model, predictors) ".
How to solve it?
you dont have enough data to concatenate
Super 👏👏💪
What are the Profits after trading every day compared to SPY performance as the benchmark.
What is the definition of you can get pretty far? 51, 60, 70, 80, 90 % accuracy?
i didnt get the point of shift{1} in the trend column? why shift 1 forward?
Hey Vik! You mentioned at 9:35 if someone could find a linear correlation in stock/financial market they’d do pretty well. Is this because linear relationships are easier to analyze than non-linear relationships?
Mathematically speaking, a linear relationship is when one thing goes up, something else goes up as well. So if you found the linear relationship between something and the stock market, if your indicator did something you would be able to tell exactly what the stock market would do next. Finding this would make you an infallible trader
I request you to create a video considering Fundamental Analysis news integration prediction model as its happening behind the scenes to change the values. Its just a request if possible.
Do we have any latest updates to this model? Adding extended logic for improvements?
Which SOFTWEAR used for run this code ?
Ill take a notes: the model without hyperparameter tuning. if hyperparamter tuning is done, when backtesting we no longer need to look for the best parameters. In contrast to cross-validation which requires more tuning
Very nice tutorial !!!
Question: How do I actually print the prediction for tomorrow (Up or Down)?
Let's say the end of the trading day today is 7/13 and I have all the open/high/low/close data for 7/13 how do I print tomorrow's 7/14 prediction???
Never mind, I think I got it, the Target & Prediction columns are actually tomorrow's forecast
Yup, that's right! -Vik
@@Dataquestio what is the difference between Target and Prediction and how could we predict the 21st day in the future ?
Hello, thank you very much for the video, I am new to ML, I would like to know how to use the model? How do I see the prediction for the next day? thanks and greetings
The last row regenerated by the backtest is 2022-05-17. If the latest existing close is 2022-05-18 (assuming we’re in the morning of 2022-05-19), how is it we can predict the close of 2022-05-19?
I suppose this has something to do with dropping rows with NaN…