"Optimizing Trading Strategies without Overfitting" by Dr. Ernest Chan - QuantCon 2018

Поделиться
HTML-код
  • Опубликовано: 31 май 2024
  • Optimizing parameters of a trading strategy via backtesting has one major problem: there are typically not enough historical trades to achieve statistical significance. This talk will discuss a variety of methods of overcoming that, including stochastic control theory and simulations. Simulations may involve either linear or nonlinear time series models such as recurrent neural networks.
    About the Speaker:
    Dr. Ernest Chan is the Managing Member of QTS Capital Management, LLC., a commodity pool operator and trading advisor. He began his career as a machine learning researcher at IBM’s Human Language Technologies Group, and later joined Morgan Stanley’s Data Mining Group. He was also a quantitative researcher and proprietary trader for Credit Suisse. Ernie is the author of “Machine Trading”, “Algorithmic Trading”, and “Quantitative Trading”, all published by Wiley, and a popular financial blogger at epchan.blogspot.com. He also teaches at the Master of Science in Predictive Analytics program at Northwestern University. He received his Ph.D. in theoretical physics from Cornell University.
    The slides to this presentation can be found at bit.ly/2HWCFxC.
    To learn more about Quantopian, visit www.quantopian.com.
    Disclaimer
    Quantopian provides this presentation to help people write trading algorithms - it is not intended to provide investment advice.
    More specifically, the material is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory or other services by Quantopian.
    In addition, the content neither constitutes investment advice nor offers any opinion with respect to the suitability of any security or any specific investment. Quantopian makes no guarantees as to accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.
  • НаукаНаука

Комментарии • 84

  • @saron2523
    @saron2523 3 года назад +2

    Ernest makes the path clear! Thanks!

  • @injeel_ahmed
    @injeel_ahmed 4 года назад +5

    Very informative

  • @RasoulMojtahedzadeh
    @RasoulMojtahedzadeh 3 года назад +11

    Time Series models must be able to also model the major economic events somehow. AR(p), ARIMA(.), GARCH, etc. won't lead to a reliable model of Forex or Stock market price fluctuation.

  • @randomerplays5335
    @randomerplays5335 2 года назад +1

    Will put this into practice on my EagleFx account

  • @ivanklful
    @ivanklful 3 года назад +3

    Great video! I was following it with greatest possible attention and needs couple of times to be watched. Would like to check my understanding of it particular with example AUDCAD, ok?
    1. AUDCAD is FOREX pair with historically quite low volatility and therefore can be assumed as stationary over the long run. To check its stationarity we need to perform augmented Dickey-Fuller's test if there is unit root and eventually to reject the null that there's unit root!
    2. Set up autoregressive model AR(1) where previous time period prices predict today's price?!
    3. To divide time-series model to two parts training part and testing part, and to train training time-series by using Maximum Likelihood Estimation (MLE)?!
    4. Test out-of-sample model thereafter
    I understood almost everything except how to chose an optimal rate of mean reversion (kappa or k)?! So how to pick an optimal rate of mean reversion (k)?

    • @patryknextdoor
      @patryknextdoor 2 года назад +1

      Better focus on t value instead

    • @joseneto6558
      @joseneto6558 2 года назад +2

      he just simulated 10000 time series with the same properties of his training sample set. Then, after that, he evaluated many different AR models with different Ks' fitted to this training set and selected the one that returned the maximum pnl. After that, he applied the optimized k AR model to the Out of sample test set and found that even if he optimizes parameters with a longer time series, that could not reflect the actual OOS market structure, so, there might also be overfitting.

  • @Kochos
    @Kochos 2 года назад +1

    any softwares that optimize assets with avoiding overfitting/overoptimization built in?

  • @frenchmarty7446
    @frenchmarty7446 2 года назад +3

    I don't understand the part about generating fake data to fix overfitting.
    How are you generating fake data? With a random distribution?
    I've searched and searched and I can't find anything close to what he is talking about. I've seen the bias-variance trade-off, out-of-sample testing, guidelines for model selection, etc. Nothing I've seen suggests you can fix overfitting with fake data.

  • @mitesh8utube
    @mitesh8utube 4 года назад +11

    7.3K views and no comments?

    • @livethumos
      @livethumos 3 года назад +12

      Everyone was taking notes

    • @leocracker9450
      @leocracker9450 3 года назад

      @@livethumos lol

    • @15chris45chris
      @15chris45chris 2 года назад

      Just like in class, we slept through it and woke up and just started clapping with everyone else. 🤣
      This is joke.

  • @gerardomoscatelli9035
    @gerardomoscatelli9035 4 года назад +3

    What if log(x) isn't normally distributed ?

    • @A.I.-
      @A.I.- 3 года назад +1

      Then they're all screwed!!! Cause and Effect >>> lose their money for being an idiot.
      When are these (supposedly "intelligent") people going to learn that their "linear solutions" doesn't apply to a "non-linear problem"???
      Its like programming a robot to catch a chicken by mean reversion. Because the chicken ran too far to the left and now it is time for the chicken to revert back to the mean. F@ck me!!!

    • @arfa9601
      @arfa9601 3 года назад +5

      @@A.I.- obviously you have missed the point of this presentation.

    • @arfa9601
      @arfa9601 3 года назад +2

      Interestingly enough, it's not! There are signature ”specs” observed in such groups of distributions which lead to variety of interesting emergent phenomena.

  • @alrey72
    @alrey72 2 года назад +4

    I truly don't understand. If your model is fitted from year 2000 to 2015 and then worked well when back tested from 2016 to 2021, it will certainly handle a lot of different cases and should net an income.

    • @China4x4
      @China4x4 11 месяцев назад

      you can refer to financial market memory effect to understand this

    • @alrey72
      @alrey72 11 месяцев назад

      @@China4x4 I don't know if I read the correct paper but it mentioned that more than 50% of the time price bounces off the support/resistance.

  • @Corpsecreate
    @Corpsecreate 2 года назад +6

    It all makes sense, except for synthesising fake time series data. How can you generate new data without understanding the underlying market dynamics that drive the movement of the time series? From what I gathered, Ernest was suggesting to fit some assumed probability distribution of returns based on a parametric model (like gaussian mu, sd) using the historical data, then using this model to synthesis as much training data as you want. But...doing this means that your fake data is RANDOM, by definition. So this means your training data is random, making any further analysis useless. Am I missing something?

    • @frenchmarty7446
      @frenchmarty7446 2 года назад +4

      Yeah this is my question too.
      How are you getting this fake (interpolated) data? And you're using fake data to generate models...? Isn't it the other way around? What model gives you the fake data in the first place?

    • @azy789
      @azy789 2 года назад +4

      The methodology introduces randomness as an attempt to overfitting parameters to some price series. If you have an underlying pattern plus some degree of randomness (captured by the infinite number of simulated patterns you generate), the trading model parameters trained on this set of simulated price series would be "looser" and thus more generalizeable to out-of-sample test data whose underlying price patterns are consistent with the true historical price series you generated your simulations from. That's a big assumption though and requires market knowledge to judge whether or not that assumption can be made.

    • @frenchmarty7446
      @frenchmarty7446 2 года назад +2

      @@azy789 That doesn't answer the question in the slightest.
      If you take random data points and add noise, how have you changed anything? If your random noise is unbiased, how do the new points not cancel eachother out? Any features in the original time series will still be present (including noise) and more significant than any added noise.
      If I take one time series and generate 1000 new time series based on the original time series but with random noise, they will still average out to the same original time series noise and all.
      If anything aren't you just fooling yourself by artificially increasing the sample size? What was obviously insignificant noise before now looks like significant information when scaled to 1000 samples. Compared to the fake noise, the original noise is now a signal because it is "coincidientially" in most of your samples by design (when in reality it's just in one real sample).
      Your model will only seem to have a "looser" fit, relative to the fake noise. But relative to the original data (signal and noise) you will massively overfit.
      You can test this for yourself. Take say ten noisy data points with a linear relationship. Now overfit a nonparametric curve model. Now generate 1000 fake data points (100 from each of the ten +/- some noise). Notice how you can still overfit the same model (except now you've made the fit look more realistic).

    • @Corpsecreate
      @Corpsecreate 2 года назад +3

      @@frenchmarty7446 Yes, everything you say is correct and is the source of my confusion with simulated data

    • @frenchmarty7446
      @frenchmarty7446 2 года назад +1

      @@Corpsecreate I have to assume the speaker was referring to some kind of oversampling of interesting minority features in the data. Which is strange because the problem oversampling solves isn't overfitting.
      The only robust solutions to overfitting that I'm aware is out of sample testing (such as crossvalidation) and some form of bayesian weighting (rare features in the data don't move the needle as much as reoccuring features). Preferably the former if not both.
      I've never heard of making modifications to the data itself to change the signal to noise ratio. To me that sounds like you're trying to get something from nothing (better data from mediocre data) or even circular reasoning (how do you even know what is signal and what is noise?).

  • @baleaweb4021
    @baleaweb4021 Год назад +3

    I loose my time watching this and I don't know why