Machine Learning Mini-Project: Predicting Stock Closing Prices from Intraday Stock Prices on the NIFTY index.
ROCKET vs. Time Series Forest vs. Temporal Convolutional Networks vs. XGBoost
--
If you dabble in stock trading, as I do, you might wonder how you can tell how the stock is going to do by the time of the closing bell — is it going to close above where it started, or not? There are intraday patterns, surely — people always tell you stock trading activity comes in “waves”, and that things tend to slow down a bit during the lunch hours, and that there is a power hour towards the end where big moves can happen.
For this project — (Google Colab notebook publicly available here) — I am using NIFTY index (India), and we are looking at minute by minute data. We are normalizing each time series with respect to its opening price, so each point is just the difference between it and the opening price. The Indian index is open for about 6 hours and 15 minutes, meaning that there should be 375 minutes. I used data from 2018–2019, and dropped any day where there were less than 372 data points (there was only 1 or 2). Then the question becomes — how much of a historical window do we need to predict where the stock ends up? Can you tell after the first hour? Or can a machine learn a pattern after 3 of the 6.25 hours have passed?
I will attempt to answer this question for the NIFTY using the sktime library, which is a time series library, as well as XGBoost and keras-TCN, a library for temporal convolutional networks. The ones that I will be focusing on here are ROCKET transform and the Time Series Forest Classifier. There are actually tons of interesting classifiers for time series here, many of which are of the symbolic representation sort (representing time series as sequences of letters or symbols, like DNA). I found that most of them aren’t too competitive when it comes to this time series, so I’m focusing on the 2 that are actually decent enough to be something you might deploy in real life.
DATA
The data is from this Kaggle page, and from it we are using NIFTY, not BANK NIFTY as our index of choice. Also, we are training with just the years 2018–2019, dividing this set 80/20 and not doing any shuffling…