Using SAX_VSM for Financial Time Series Classification/Stock Prediction

Peijin Chen
4 min readApr 3, 2019

SAX-VSM is one of a few time series transformation techniques that involve discretizing a series of real numbers and transforming them into ‘words’ — which have a particular length and a particular alphabet. For example, you could take a time series of length 100 and transform it into 10 words, each composed of the letters A, B or C. You can then use tf-idf type methods from natural language to vectorize and eventually classify, not unlike the way we classify emails as spam or not. How well does this work?

Here’s the problem. Given 20 past observations of a stock’s (closing) price, can you classify what the stock will be another 5 days later, relative to the first day? Let’s say you bought the stock on day 1 and you have held until day 20 — and it’s maybe gone down or up a bit — if it’s gone down, should you hold it another 5 days and hope for a turnaround? Or should you consider selling it because, at least for the next 5 days, there is no appreciable improvement in sight?

What you need

  1. Install pyts, pyentrp, keras, tensorflow, sklearn, numpy, pandas. The SAXVSM classifier implemenation I am using is from pyts, but saxpy also has an implementation, though the classifier part hasn’t been completely finished.
  2. Get some stock data. Pick a stock.
  3. Start by preprocessing the data, using 80% historical data to predict what happens in the 20% of the ‘future’ data. Z-score normalize the data, and use the same mean/std for the future data; you can’t let the future leak in when considering what the statistics are for this distribution. The stock I used was Facebook (FB).
fb = np.array(dat["NASDAQ.FB"])
train, val = train_test_split(fb, test_size = 0.2, shuffle = False, random_state = 33)
print(train.shape, val.shape)
scaler1 = StandardScaler()
train = scaler1.fit_transform(train.reshape(-1,1))
print(train.shape)
from pyentrp import entropy as ent
train = ent.util_pattern_space(train, lag=1, dim=25)
trainX = train[:,:20]
trainY = train[:,-1]
val = scaler1.transform(val.reshape(-1,1))
val = ent.util_pattern_space(val, lag=1, dim=25)
valX = val[:,:20]
valY = val[:,-1]

Let’s start with binary classification — if the stock is higher on the 25th day than it was on the 1st day, then it is labeled 1 and if not, it is labeled 0.

train_targs = list()
for i in range(len(trainX)):
if trainY[i] >= trainX[i,0]: #comparing Y to first X
s = 1
else:
s = 0
train_targs.append(s)
print(train_targs)
val_targs = list()
for i in range(len(valX)):
if valY[i] >= valX[i,0]: #comparing Y to first X
s = 1
else:
s = 0
val_targs.append(s)
print(val_targs)
print(np.unique(val_targs))

At this point, you can use the pyts library’s SAXVSM implementation to test out its classification chops.

from pyts.classification import SAXVSM
saxvsm = SAXVSM(n_bins = 3, window_size=2,
use_idf = True,
smooth_idf = True,
sublinear_tf=True,
strategy = 'uniform')
saxvsm.fit(trainX, train_targs)

Hyperparameters — n_bins, window_size, etc. I’ve found, through trial and error, that is good to start with n_bins = 2, and window_size =2 and move them up one after the other. I didn’t do grid search, was just working on this by hand. I used the idf options, though they didn’t seem to change the performance much. I found that ‘strategy’ was important — you can choose between ‘uniform’, ‘normal’, and ‘quantile’, with radically different results.

What my settings above shows is that I used 3 letters (A,B,C) to create 2-letter words. That means that each 20-day window was represented by 10 ‘words’. You can see the tf-idf vectors here; it shows that there is some discrepancies in how the words are ‘valued’ in each of the 2 classes. This is what allows us to do the classification.

After fitting, which is fast on this data set, you can then calculate the F1-score and AUC_ROC scores:

from sklearn.metrics import roc_curve, auc, roc_auc_score
y_val_classes = val_targs
val_classes = saxvsm.predict(valX)
from sklearn import metrics
br_f1=metrics.f1_score(y_val_classes, val_classes, average='macro')
print(br_f1)
false_positive_rate, true_positive_rate, thresholds = roc_curve(val_targs, saxvsm.predict(valX))
print(auc(false_positive_rate, true_positive_rate))
print(roc_auc_score(val_targs, saxvsm.predict(valX)))0.8152268099406929 0.8147921582180276 0.8147921582180276

For comparison, here’s a simple MLP (feed-forward neural network) model:

import keras
i = Input((20,))
o = Dense(64, activation = 'sigmoid', kernel_regularizer=l2(1e-6))(i)
o = Dense(2, activation = 'softmax', kernel_regularizer=l2(1e-6))(o)
model = Model(inputs=[i], outputs=[o])
model.compile(loss="binary_crossentropy",
optimizer = keras.optimizers.Adam(lr = 1e-3),
metrics = ['acc'])
model.summary()

The F1, and AUC_ROC come out to be :

0.7781739435666272 0.7855812074848072 0.7855812074848072

In fact, a Logistic Regression model does better than both of them:

est = LogisticRegression(class_weight='balanced')est.fit(trainX, train_targs)
val_classes = est.predict(valX)
y_val_classes = val_targs
br_f1=metrics.f1_score(y_val_classes, val_classes, average='macro')
print(br_f1)
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_val_classes, val_classes)
print(auc(false_positive_rate, true_positive_rate))

print(roc_auc_score(y_val_classes, val_classes))
0.8304662161298251 0.8308803764874539 0.8308803764874539

This is modeling a situation where you have held a stock for 20 days and are wondering what’s going to happen over the next few days. This would be different if you had bought this on the 20th day, after seeing 20 past observations, and wanted to see what would happen 5 days later (eg buy on Monday at the bell and sell on Friday at the bell). You don’t care where it started on the 1st day since you didn’t buy on that day — you just care how that price helps determine the price movement over the 5 days that you are planning on holding the stock. I’ll save that for the next post.

--

--