Machine Learning Mini-Project: Predicting Stock Closing Prices from Intraday Stock Prices on the NIFTY index.
ROCKET vs. Time Series Forest vs. Temporal Convolutional Networks vs. XGBoost
If you dabble in stock trading, as I do, you might wonder how you can tell how the stock is going to do by the time of the closing bell — is it going to close above where it started, or not? There are intraday patterns, surely — people always tell you stock trading activity comes in “waves”, and that things tend to slow down a bit during the lunch hours, and that there is a power hour towards the end where big moves can happen.
For this project — (Google Colab notebook publicly available here) — I am using NIFTY index (India), and we are looking at minute by minute data. We are normalizing each time series with respect to its opening price, so each point is just the difference between it and the opening price. The Indian index is open for about 6 hours and 15 minutes, meaning that there should be 375 minutes. I used data from 2018–2019, and dropped any day where there were less than 372 data points (there was only 1 or 2). Then the question becomes — how much of a historical window do we need to predict where the stock ends up? Can you tell after the first hour? Or can a machine learn a pattern after 3 of the 6.25 hours have passed?
I will attempt to answer this question for the NIFTY using the sktime library, which is a time series library, as well as XGBoost and keras-TCN, a library for temporal convolutional networks. The ones that I will be focusing on here are ROCKET transform and the Time Series Forest Classifier. There are actually tons of interesting classifiers for time series here, many of which are of the symbolic representation sort (representing time series as sequences of letters or symbols, like DNA). I found that most of them aren’t too competitive when it comes to this time series, so I’m focusing on the 2 that are actually decent enough to be something you might deploy in real life.
The data is from this Kaggle page, and from it we are using NIFTY, not BANK NIFTY as our index of choice. Also, we are training with just the years 2018–2019, dividing this set 80/20 and not doing any shuffling so we can see how something trained on the past can generalize to the future, i.e., see if there is some kind of concept drift going on.
Preprocessing the data — just subtract the first value from the rest, so that it equals 0, and then drop that col. Take the first X number of hours as your training data. I started with 4 hours, meaning 239 points in time (the 240th being the one you’re trying to predict). Then, scale the numbers by dividing by 100, to get numbers that are roughly in the [0,1] range. To create a binary target variable, just compare the closing price to the opening price, if the closing price is higher, we code as 1, else 0. Also, you might want to try using tsmoothie’s LOWESS to smooth out the time series a bit. It doesn’t change much in the big picture of things. Here’s the plot of one daily time series alongside its smoothed version:
Sktime classifiers require that the data be stored in a strange format — a pandas DataFrame, except instead of one column for each time stamp (239 features, an array of shape (N, 239), you have 1 column where each row or element of that column is itself a pandas Series, meaning an (N,1) array where that single feature is the 239 element series.
Here are the models I used and how they were configured.
ROCKET — this one is based on random convolution kernels, so basically, it’s like a shallow convolutional neural network without nonlinear activations, dilations, or anything fancy, really. A really good explanation of the ROCKET algorithms can be found here. It’s considered fast and SOTA, and it does work pretty well. By default ROCKET uses 10000 kernels. Technically, I am using MINIROCKET, which generates the features — but then you still have to choose a classifier to learn from those features. For that they recommend Ridge Classifier or Logistic Regression. I found that with RidgeCV() you can get decent performance, and that it’s faster than LogisticRegressionCV. The code looks something like this
rocket = MiniRocket(random_state = 2468) trainx_transform = rocket.fit_transform(Xtrain_sktime)
valx_transform = rocket.transform(Xtest_sktime)clf = RidgeClassifierCV(alphas = np.logspace(-4,4, num = 100), normalize = True)clf.fit(trainx_transform, ytrain_sktime)predicted = clf.predict(valx_transform)print("Accuracy with Rocket: %2.3f" % accuracy_score(ytest_sktime, predicted))print("Matthews CC:%2.3f" % matthews_corrcoef(ytest_sktime, predicted))
Time Series Forest — this one is interesting — instead of taking each time stamp as a feature and throwing that at a tree-based classifier, it takes intervals of the time series (how many intervals is a HP of the model), and finds summary stats like the mean, deviance, and slope of each one, and uses those as features. This means preserving the order of the time stamps, whereas if you just think of each time stamp as an independent feature, then your algorithms don’t care what order they are listed in. These features are then handed to a DecisionTreeClassifier. The code looks something like this:
steps = [
("extract",RandomIntervalFeatureExtractor(n_intervals = "sqrt", features=[np.mean, np.std, _slope])),
("clf", DecisionTreeClassifier())]time_series_tree = Pipeline(steps)tsf = TimeSeriesForestClassifier(estimator=time_series_tree,
n_estimators = 100,
criterion = "entropy",
random_state = 2222,
n_jobs=-1)tsf.fit(Xtrain_sktime, ytrain_sktime)print("Accuracy: ", accuracy_score(ytest_sktime, tsf.predict(Xtest_sktime)))print("MCC: ", matthews_corrcoef(ytest_sktime, tsf.predict(Xtest_sktime)))
XGBoost — I also trained a model with XGBClassifier(), using each time stamp as a feature.
Temporal Convolutional Networks — for simplicity, I use the keras/tensorflow based library keras-tcn. It uses dilated kernels. I didn’t change any default settings, just made sure the last layer used log-loss as the loss function. The code looks something like this:
i = Input(shape=(trainx.shape[-2], 1))m = TCN()(i)m = Dense(1, activation = 'sigmoid')(m)early_stopping = EarlyStopping(patience = 50, restore_best_weights=True, min_delta = 0.000)reduceLR = ReduceLROnPlateau(factor = 0.5, patience = 5, min_delta = 0.01)\from keras.optimizers import *model = Model(inputs=[i], outputs=[m])
model.summary()model.compile(loss = "binary_crossentropy", optimizer = Adam(lr = 1e-3))model.reset_states()
model.fit(trainx, trainy,validation_data = (valx, y_test),
shuffle = True,
callbacks = [early_stopping, reduceLR],
batch_size = 64,
epochs = 200)
MODEL EVALUATION-4 HOUR WINDOW
Here are the results, again, for the 4 hour window. The TCN takes the longest, even with early stopping, and has over 90k parameters. By comparison, ROCKET is really done in the blink of an eye.
MODEL EVALUATION-5 HOUR WINDOW
You would expect this to get better results, since the uncertainty is just what happens in the last 1.25 hours of the day. Here are the results, using the same learners, same parameters, etc.
I don’t know how much this would help the day trader. With this system here, if you wait 5 hours, you have a good chance of knowing if the stock will end up above or below; it might be good to know it could end UP if you it is down at the 5 hour mark and you are otherwise feeling pessimistic about it. So you hold on to it, rather than making the rash decision to dump it. On the other hand, maybe you are feeling optimistic because after 5 hours, you are ahead, but the model might tell you that there is a good chance it will end up closing down, and you might want to “curb your enthusiasm” for that very reason. It might be interesting to see how much your model can learn from 1–2 hours of data, and if you do some thorough experiments, let me know.