Time Series Forecasting Using Empirical Mode Decomposition and (Dilated) Convolutional Networks (2).
The time series data and I/O plan.
Let’s start with the Ikeda time series: a nonlinear chaotic time series — synthetic, that is. The series that I downloaded has 60k data points. As has been pointed out by others, there are many examples in the literature where the EMD process has been performed on the entire data set — crucially, that is, PRIOR to actually splitting it into training and testing data. That’s got to be a big no-no, and unfortunately, I spent months doing that without realizing it. Why? Because you don’t want information from the future seeping into the training data (cue more Terminator references). When we observe and collect data, we usually do so with a sensor, let’s say, that gives us a univariate time-series. Therefore, the idea must be that we use our knowledge of the past — the training data — to create, to the best of our capabilities, accurate IMFs — and then, when we decide in the present to forecast the future, what we can do is collect some data — a certain lag or lookback period of n time-steps — and then, based on that, try to forecast one step ahead, or if we want, m time steps ahead. And we can either do this by directly predicting a m-tuple sequence, or recursively, by predicting 1 time step ahead, then shifting our window to include this new data, and make yet another prediction, until we reach our desired m step ahead forecast. For now, I am going to stick with the 1 step ahead prediction.
Here’s the plan — split the data into 59k training data and 1k for forecasting, and don’t touch that testing data until you need to. In particular, use whatever package you need to do the EMD on the training set. In my case, you can see that I ended up with 16 IMFs and a residue. Now, the natural question is, are all of these IMFs equally useful in understanding and predicting the series? Probably not. Definitely not. And there are many proposed methods for determining what the useful IMFs are. Some people discard the first one, because it’s usually high-frequency noise, others check out the cross-correlation between IMFs and the original time series, discarding those that fall too far underneath a threshold. You can also use various tests to see if the distribution of each IMF contains anything significant or if it’s indistinguishable from random noise on some particular frequency bandwidth. I’m going to ignore that for now — and use all the IMFs and the residue. This is the point of having awesome computers, right — they should learn the relative weights of each IMF, and they can, using the attention-mechanisms, learn which part of the input they ought to care about more. And if that hypothesis turns out to be bullsh*t, well, at least I’m a little wiser for the wear.
Back to the plan: use the IMFs produced by the testing data — all of which have the same length as the original series — and let the machine learn how this decomposition is done. In my case, I settled on this particular variation — use a lag of 20 values (I’ll explain later why 20) from the original time series to predict the next value in the series — but not as a singular value, but as a tuple of j-values, where j is the number of IMFs (including residue). Since the the original time series can be reconstructed as the sum of the IMFs and the residue, then having those IMF/residue components is as good as having the next value.
That means that you have a 20-tuple input leading to a 17-tuple output — and that means that if, in the future, you observe/collect 20 data points, the machine should be able to tell you 1. what the constituent 17 parts are and 2. what their sum, the next value, ought to be. OK great, but how do you know that the data really has 16 IMFs and a residue?
Believe me, I tried using the same EMD package on the remaining 1k data points — the future — and it didn’t give me the same number of IMFs. It couldn’t, that would be like collecting temperature data every minute for a day and then seeing what the annual or centuries long trends are. It should be noted that since the EMD essentially acts as a dyadic filter bank, you can calculate the average peak to peak distance in each IMF as a ballpark guess for the period, and you find that, very roughly, the period doubles. Well, 2¹⁶ is about 65k, which is just a little more than the data we have.
So 16 it is. So now just teach the computer to the job of predicting what each IMF value in the testing/future sample should be, and then sum if you want to know the actual prediction — and of course, you could also predict the IMFs into the future, if you’re interested in seeing how the dynamics play out across those resolutions in that time horizon.
Oh yeah, and here is a place where you can see the actual map of the two-dimensional Ikeda map and change the parameters and initial point to study how this dynamical system changes.