top of page
  • Writer's picturericcardo

My Reinforcement Learning options trader

Updated: Jan 30, 2021

#statisticallyRight #reinforcementLearning #machineLearning #optionsTrading



About ten years ago I worked on my thesis-project, which involved the design, programming, and implementation of a robotic platform for rehabilitation at the pediatric hospital of Rome in Italy (picture 1).

Picture 1: 2D induction motor platform for diagnosis and rehabilitation

Even though at the time artificial intelligence was already many decades-old and part of academic programs, average engineers like me were focused mainly on deterministic programming and, maybe, some stochastic processes.

Fast forward a few years and I am close to the investment management field. While I am focused more on Private Equity, Venture Capital and their operating companies, I am in touch with financial markets as well. Other than managing an equity portfolio, I also use equity-options and financial derivatives. The real reason for my investing activity (stocks) and trading one (options) is my interest in decision-making through probability and statistics. My feeling is that, while stocks help shape the character of a professional, financial derivatives allow him to be statistically right. I strive then to leverage those concepts in almost any decision I make and in any other domain I am involved in.

In the last few years, as machine learning got packaged in simple coding-libraries and commercial machines became able to run average algorithms, I put together my engineering experience with my trading one [options] and started applying tech to financial markets.

I started by feeding a neural network with prices and volatilities, training it with some examples, and asking it to forecast future stocks’ movements – we will call this my Deep Learning approach, even though networks were not that deep nor large. That is like an algorithm focused on extrapolating repeating patterns from our browsing history to show us “most relevant” Ads.

After my first implementations, I was not satisfied with the results I was obtaining. While I had never thought predicting financial markets would be easy, I was not expecting my codes would try to fool me. Figure 2 shows an example of my early results (x-axis are days and y-axis are normalized returns of a stock; the x-axis must be read from right to left, with most recent values to the left of the picture).

Figure 2: Normalized stock’s return – prediction (green) vs actual (red)

The graph shown above may seem to have potential considering that the green line (forecast) resembles pretty well the actual stock return across 200 days – it also gives a prediction for the next few days on the left of the picture. Truth is, the code is cheating. Even though I have not fed the algorithm with the exact indicator to track, it is just minimizing the error of its predictions (mean squared error) by constantly trailing the market a few days. It is basically predicting future returns as being equal to the most recent ones - that is why the green line is always a bit to the left (delayed) of the actual red line. Overall it seems a good visual result but, if we tried to draw a vertical line at any instant, neglected the part of the picture to the left of that line (future events), and tried to place a bet based on what the green line is suggesting, we would likely lose money.

We may argue that innovation rarely happens overnight, being often the result of many baby-steps compounding over time and with a final catalytic event – sometimes, a successful commercial application. Even though I did not “innovate” on anything, I like to think a similar argument applies to my algorithms. After the first moment when I wanted to find the Holy Grail of FinTech, I realized it would be better to start over by trying to build simpler and more useful supporting tools for my trading activity - still based on common probability & statistics.

My robotics experience was very important in that phase of marginal improvements. I moved from my initial Deep Learning approach (defined above) toward Reinforcement Learning (discussed below and named my R-Learning approach). To me the line of separation between the two approaches almost vanishes away in some cases, however, they do present different technicalities – I write at the end of the post a technical parenthesis for the interested reader.

With my new approach, I was using neural networks to train a virtual trader (agent) rather than to learn market patterns [if you will]. We can think about R-Learning as something more related to a robot entering a controlled space and learning from its interactions with the environment. In Deep Learning I was feeding the algorithm a bunch of indicators while also providing the resulting labels (references to fit the model and learn). In Reinforcement Learning I was still feeding the code with similar indicators, but I was letting it learn by autonomously placing simulated trades and experiencing the rewards (figure 3).

Figure 3: the agent – environment interaction in Reinforcement Learning (Source: Sutton – Barto)

While following posts may explore the real reasons why I moved toward Reinforcement Learning – mainly related to transparency and explainability of my A.I. and more direct applicability to robotic systems – I want to discuss here one of the most important results I obtained so far:

Through my new approach [my R-Learning one] I was focused on finding those baby-steps that could allow me to eventually extrapolate big takeaways. In time, I noticed some consistencies among apparently different results. Figure 4 shows two distributions: the one to the right with high kurtosis (please allow me to simplify and say “fat tails” - impactful rare events) and the one to the left with low kurtosis (please allow me to simplify and say “thin tails” - non-impactful and less rare events) - the skewness is intrinsic because of the nature of the underlying volatilities. Feeding my algorithms with those two different datasets, I noticed quite different outcomes - which I was expecting but not thinking to be able to quantify somehow. Repeating similar experiments with several stocks, I noticed some consistencies in those differences. By trying to understand how those opposing fundamental statistics were affecting my algorithms, I started to better understand general quantitative dynamics. To simplify, I was improving my overall Probability and Statistics. My codes were basically able to find the same takeaways an expert risk-manager would get from those same data.

Once the algorithms suggested to me where to look, I would then leverage those findings in my traditional execution of trades [options]. In detail, a few months ago I coded an options-scraper in Python able to go over the internet and return me options contracts interesting to me. Those contracts reflect some quantitative characteristics I have identified as optimal for my trading strategy – they also depend on particular market conditions, therefore, they may change over time.

Volatility comparison between two stocks – low excess kurtosis (-0.26) left vs high excess kurtosis (8.36) right

Even if my experience had not given me exact tips to build my options-scraper, at least, it would have reminded me how important a preliminary analysis of the statistics of the data is for machine learning applications – and in general for any analysis. Professionals and researchers usually know what they are looking at – and I’d love to learn from many of them - but when it comes to end-users (purchasers of the technology) and even non-technical executives at technology companies, they often do not focus enough on the main statistics of the data and how they can affect results. This discussion could be quite long since it may have differences concerning the type of data (e.g image recognition vs financial prediction) but it is likely to stay valid to a large extend.

Considering current days’ attention to tech, we may tend to think algorithms make the whole difference, however, often data are determinant in making or breaking an initiative. I can only talk about implementations similar to mine in complexity, but those types of algorithms still have performances dramatically dependent on the nature and quality of the data, and the operations applied to them.

My feeling is that, at my level and similar ones, a different and more effective pre-processing of the input may be orders of magnitude more effective than going for the cutting-edge network-architecture or fitting algorithm. Ideally, however, the two should be well-blended together.


Final technical parenthesis for those interested:

I still use neural networks in some Reinforcement Learning algorithms to approximate the space, which is arguably infinite [infinite market's states and possible trading actions], however, there could be big differences in the possible convergence of the solution between my initial Deep Learning approach and the subsequent Reinforcement one. Even considering similar gradient-descent approaches, R-Learning usually proceeds 1-step at a time while Deep Learning usually fits bunches of cases. Moreover, I am trying to figure out how much the stability of the R-approach is affected by my financial application: starting from a specific state (e.g. market level), the next state has very few or no dependencies from the action the agent undertakes – this depends also by the reward the coder structures for the agent. Conversely, say a robot is exploring a physical room, the next position is almost completely determined by its current position and the movement the robot decides to make (maybe, I should say that it better reflects a Markov process). I may discuss possible findings in subsequent posts.


Main picture tag:

bottom of page