Predicting Future Stock Prices Using Machine Learning— An Introduction to LSTM Networks in Finance using PyTorch

Mark Bentivegna
8 min readMay 18, 2021
Figure 1 — Finance in machine learning. Photo by M.B.M. on Unsplash

Machine Learning in Finance

The world of finance has been turned upside down in recent years as the amount of data we have at our disposal has grown exponentially. Long gone are the days of manually looking at charts and earnings reports to identify market trends and opportunities; today’s private equity firms go as far as feeding satellite images into machine learning models to detect foot traffic at retail locations to predict economic activity.

In this article I will introduce the use of a Long Short-Term Memory (LSTM) model to predict future stock prices for certain equities. The choice of this model is designed to be an introduction to the applications of machine learning in exciting financial use cases. Can machine learning be used to predict price movements and enhance returns? Absolutely. Can it be done with the techniques in this article? No, industry experts are using much more sophisticated models with much more data at their disposal. To truly enhance returns in stock price prediction a data scientist would need to leverage data and strategies that competitors don’t have. Rather, this example serves as an introduction to Recurrent Neural Networks (RNN’s) and their use cases in finance as well as a proof of concept that machine learning can be used to predict market activity. For a more detailed dive into advanced machine learning techniques and finance, see Stefan Jansen’s book on the subject. Only a small subset of code will be included directly in this article, for the entire codebase please see the GitHub repository.

We leverage Facebook’s PyTorch framework to build and implement our model. The decision to use either Google’s TensorFlow or Facebook’s PyTorch ultimately boiled down to the fact that I think there are fewer tutorials available for PyTorch. Implementing this model in TensorFlow is arguably simpler than PyTorch and a fun challenge for anyone trying to learn and become more proficient with RNN’s.

Recurrent Neural Networks

Figure 2 — Bi-directional RNN architecture as seen in Dive into Deep Learning. Printed with permission from the authors

RNN’s are a type of neural network that provide the added benefit of supporting sequential data. In our use case, “sequential” data refers to the changing stock prices over time. The stock price data (X) we will be looking at is an example of time series data. As seen in the architecture diagram above, the model predicts an output (O) at each given day and updates its hidden state (H) parameters. Here, state variables store information about inputs from each step. Each given day, or “step”, has a state value that may influence the value of tomorrow’s stock price. Our model has the added benefit of being bi-directional, meaning that incorrect predictions from earlier stock prices are corrected for future days so it is self-correcting over time. A more detailed analysis of RNN’s can be found in this chapter of Dive into Deep Learning.

Long Short-Term Memory Networks

Figure 3 — LSTM architecture as seen in Dive into Deep Learning. Printed with permission from the authors

LSTM networks are a subclass of RNN’s that have the added benefit of maintaining “memory” (C) in each cell to better capture long term trends. For longer sequences of data it is important to remember values seen early in the sequence. To support this, LSTM leverages memory cells to track the hidden state (H) and additional information. These memory cells are built on three gates: forget (F), input (I), and output (O). This gives us the ability to decide what to remember and what to forget from previous values in the sequence. The input gate takes in new data (X) from the sequence and the forget gate decides how much of our memory we should discard. The memory is then updated after going through the input and forget gates. The output gate then takes into account the input and previous hidden state values. This gate’s value is fed into the next cell in the sequence as the hidden state and is also used to update the memory. For a more detailed analysis of the mathematical model, see this section in Dive into Deep Learning.

Data Cleaning

Vastly more important than the model itself is the data we are feeding into it. Financial data is very valuable and fortunately it has gotten much easier to acquire in recent years. There are many API’s that provide expansive financial data for a subscription fee but there are several that offer more limited data for free. One source of free data is yfinance. For years Yahoo! finance provided historical data but this service was decommissioned, and so yfinance was built to provide developers with the same functionality. For our example, we will be using daily historical pricing data for Coca-Cola. We chose Coca-Cola because it is a very well-known international beverage company and is a personal favorite of legendary investor Warren Buffet. Coca-Cola, as opposed to cryptocurrencies or technology companies, is not very volatile making it a more straightforward example for our model to predict. The data contains the opening, closing, high, and low prices as well as total volume for every trading day going back about 8 years. After fetching the data from yfinance, we scale and split the data and leave the most recent 20% of trading days unseen during training, therefore to be used for testing purposes. Since we are using time series data we need to determine a sequence size and we chose 50 for this scenario. This means that each prediction will directly take into account the past 50 days’ worth of trading data. Our label will be the column called “Future” which is the following day’s closing price and what we are trying to predict.

Stock Price Prediction Model

After we have our data formatted and ready, we next design and implement our model. We will be using a hidden layer size of 512 and bi-directional LSTM network using a single layer. See below for an architecture overview of our model generated using TensorBoard.

Figure 4 — High level architecture overview designed using TensorBoard. Image by the author
Figure 5 — More detailed connectivity graph designed using TensorBoard. Image by the author

Our model’s hidden state will be initialized with a 3-dimensional tensor of all zeroes with shape (2 * number of layers, batch size, hidden state size). We multiply the number of layers by two because our model is bi-directional so we will need twice as many state values. We found properly initializing our model’s state dimensions to be one of the most challenging aspects of this project. After converting our data to PyTorch tensors and setting our loss and optimizer functions, we begin training. We trained our model in Google Colab using GPUs over ten epochs and this took several minutes.

Final Results

After training completes we will want to take a look at our results visually. Our graphs plot the number of days from the starting point of our training and testing data respectively. For reference, the training data starts in 2013 and the testing data ranges from late 2019 to mid 2021.

Figure 6 — Training data performance plotting predicted price versus actual price. Image by the author

Above, we see the predictions on the previously seen training data. Our model does a good job of capturing the trend line. We are only predicting the next day’s stock price and there generally aren’t too many price swings in short periods of time so this is expected. If we tried something more volatile, like the recent GameStop saga, we would see a massive disparity between the predicted and actual prices. We see a few periods of higher volatility in the training data about 200 and 600 steps after the start of the sequence where our model performance dips but our model clearly captures the bigger picture price trends.

Figure 7 — Testing data performance plotting predicted price versus actual price. Image by the author

Legendary boxer Mike Tyson once said, “Everybody has a plan until they get punched in the mouth”. In general, machine learning models only hold value if they perform well against unseen data. We have nearly two years’ worth of stock pricing data that our model hasn’t seen so we plug in these sequences to see how our model performs in real-world circumstances. We see our model does a very good job of predicting tomorrow’s stock prices. When the price starts changing direction we see a slight delay in our predicted prices to catch up with the actual price but it generally captures both the magnitude and direction of the actual price movement.

Conclusion

As mentioned earlier in the article, this model is very limited in its ability to be leveraged as an effective trading strategy. It is intended to be an introduction to LSTM networks in PyTorch and proof that machine learning can be applied to the world of finance. With this simple model, there are many other fun areas to explore. How does this model perform on different stocks? How does changing model shape and size impact price predictions? Can this be used in other asset classes besides equities? Let us know what you find! We found that decreasing the number of layers and increasing the hidden layer size yielded the best results.

Contributors

The following individuals contributed to this article:

Project Lead — Mark Bentivegna, Cloudreach Data Engineering Manager

Advisor — Dr. Ioannis Koutis, NJIT Professor

Contributor — Gene Eagle

Contributor — David Apolinar

Contributor — Rohit Tikyani

Contributor — James Jardinella

Architecture diagrams in this article were used with permission from the authors of Dive into Deep Learning. This extremely insightful book is available online for free (Yes, you read that correctly. It is actually free!) for everyone to use. It is a great starting point for anyone interested in getting more proficient in the field of deep learning and is used by universities around the globe. We would like to thank the book’s authors:

Dr. Aston Zhang — Amazon Senior Scientist

Dr. Zack C. Lipton — Amazon Scientist, CMU Assistant Professor

Mu Li — Amazon Senior Principal Scientist

Dr. Alex J. Smola — Amazon VP/Distinguished Scientist

References

Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. Dive into Deep Learning. 2020

Stefan Jansen. Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python, 2nd Edition. 2020

--

--

Mark Bentivegna

Data scientist with a passion for predictive modeling and big data.