riccardo
Game Theory in my Behavioral Finance failure
Updated: May 2, 2021

Our modern world is arguably based on increasing collaboration among people. As better explained by famous historians, the modern connected world is part of the reason behind our increasing focus on services and specialized activities. This is a major trend in our society and it favors those capable of good collaboration with others. While it is not my business to say how we are faring in terms of interactions among people, inspired by a quantitative experienced I had a few years ago, here is how to avoid a fundamental and critical mistake. This is related to Game Theory, and it could dramatically improve our negotiations, decisions, and in general almost any daily endeavors.
The problem - the "zero-sum bias"
We might not like to admit that we often strategize and act as if the environments were zero-sum-games (our win implies the other part’s loss). Even if that applies to the reader only sporadically, it is important to remark that similar approaches are likely to result in negative payoffs. Actual zero-sum games, reduced here as situations where our win requires the other part’s loss, are rarer than we think. The argument is related to what is called “zero-sum bias”, and it is probably aggravated by the idea that the other part's loss is not only seen as a way to increase our reward - because the sum of rewards and losses is supposed to be 0 - but as a required condition of the game.
Framing the problem through my experience - the “Tit For Tat” algorithm
Years ago, I was attending one of the most interesting class I had during my Master in Financial Engineering, it was called Behavioral Finance. At the end of the semester, the whole class participated in a financial trading competition. Each student had to code a robo-trader, participating in a simplified trading game. All the algorithms would participate in a round-robin tournament (each player is matched in turn against every other player), and it would have to decide whether to split or steal the jackpot at each round (say, a fix $1 payoff for each round). Two algorithms would fight each other n rounds before playing the next opponent. Both algorithms would receive half the jackpot (say, 50 cents) if they both chose to split; they would both receive $0 if they both chose to steal; only one would receive the entire pot if it chose to steal while the other chose to split. Please note, the game presents a characteristic we often find in our daily endeavors: the result will not depend on only one round, but on n rounds – let us focus on the n rounds between two single algorithms.
If the reader is thinking about situations where that last point does not apply, s/he will probably recognize that in those situations the loss is not limited. In the tournament discussed above, a single algorithm can lose $1 max at each round against an opponent, which is much less than the potential cumulative payoff at the end of the n rounds between two algorithms (i.e. $1 x n, or, ½ of that if we want to discount it somehow). That is in part why it is worth learning and adjusting in time. So, we will be covering cases where a single event cannot ruin the player, and while it may sound like a limitation, it wants to be a suggestion: we must always position ourselves in a way to limit the potential loss from a single event ... might sound trivial, but I continuously see bad implementations, and I often find myself not prioritizing that. Plenty of public material on that principle.
The first thing I did was to spend a couple of hours strategizing for the best algorithm able to identify and leverage any weak spot of any opposing trader. My algorithm should always try to steal when possible while splitting otherwise. While that sounds simple and intuitive, it is tough to implement, even impossible in reality. For sake of simplicity, I can immediately say that my initial approach was completely wrong. Here, the definition “wrong” is not a closed-form one, or, not mathematically exact, a type of solution not existing at the moment on this kind of game. I am defining that approach as “wrong” out of my specific experience, and more importantly, out of the experience of professionals in the field that I will mention below. The critical mistake I was making was to strategize as if the game was in big part a zero-sum one, something where for me to win, my opponent had to be ruined.
Please note: the specific payoff of similar games can make the game a zero-sum one. However, since we are not discussing this in rigorous mathematical terms, I am confident the reader will frame the discussion as intended. Moreover, even in zero-sum games, optimal approaches could be constituted by collaborative ones, not necessarily aiming at the opponents’ ruin.
We might have already heard that a highly effective way to approach confrontations with others, is to look for ways to make all parts involved somehow happy, which is possible in non-zero-sum situations, and even in zero-sum in different measure. Especially in the former type of confrontations, we should always think about proposing collaborations, and one of the best starting points is probably the Tit-For-Tat algorithm. That is a simple algorithm or strategy just playing at each round what the opponent played at the previous one: if the opponent splits, we will split at the next round (without knowing whether the opponent will split again); if it steals, we will steal next (again, without knowing whether it will steal again). This type of algorithm was the winner of the tournament of my Behavioral Finance class.
Tit For Tat was the winning algorithm in many instances of similar games played for research purposes in the field of Game Theory. That allows us to summarize the main concepts constituting a good strategy in general. The following were verified by my experience, but I was able to clearly identify them thanks to a resource I suggest to the interested reader: Metamagical Themas by Douglas Hofstadter. Here are the main concepts and how they would apply to the financial tournament above:
Always offer cooperation: in our example, we should always respond with a split after the opponent offered a split the round before. We should also always start the game with a split.
Forgive: even after the opponent tried to steal (successfully or not), but then offered to split, catch the opportunity to go back to collaborative terms copying the splitting choice next according to Tit-For-Tat.
Retaliate immediately: yes, be forgiving, but immediately respond with a steal next if the opponent tried to steal the round before. “Set the tone”.
Three points are always nice to present and easy to remember. Unfortunately, we need a couple of considerations more. While the principles above constitute a well-rounded starting point, if we want to make a step further, we should put a considerable effort into making our algorithm “intelligent” being able to spot an unresponsive opponent. There is no point in offering cooperation, being forgiving, and retaliating if the opponent does not care what we do. Therefore, it is especially important to spot an algorithm [or a person] deciding without any consideration of our behavior. I am not mentioning opponents deciding randomly, because while a critical case, it is very tough to spot a random algorithm (admitting the other side can build something truly random).
Readers might have already perceived that, since our opponent is probably thinking in the same way we do, both parts must communicate their intent clearly, not risking appearing unresponsive, and setting the opposing player on aggressive strategies. The “Metamagical” book mentioned above even examines cases where a player tries to steal a bit more than its offer of collaboration, something related to "Tit For Two Tat" – say it tries to steal twice, no matter what, after the opponent tried to steal only once. Any such attempt appears to have negative consequences because it generates what is called "negative echoes”, lasting several rounds: the second hard-coded steal might correspond to an offer of collaboration of the opponent, which will then interpret the stolen round as a refusal of its offer and respond with a steal. The first algorithm, which would be now back to the original Tit-For-Tat behavior will respond with a steal to the other’s steal, and it will make two of them … the game will take time to go back to cooperative terms.
Takeaways
The last few comments exactly summarize the mistakes I made in my algorithm. I tried to be smarter than others and I made it too complicated (not even just complex), which only resulted in my algorithm appearing not clear to opponents, being tagged negatively by some of them, and even generating negative echoes with others. My algorithm ended up not participating to the final real tournament (the one after some trial rounds). I had been so busy coding the winning algorithm that I did not make sure my executable file was coherent with the master algorithm the professor would run to implement the final round-robin tournament.
The important takeaway is that the result of the tournament gave Tit-For-Tat as the winning algorithm – they were more than one. The same principles underlying Tit-for-Tat could be extremely effective in many situations involving strategic decision-making or even more simple situations. Even without mathematical exactness, the strategy outlined above does seem a statistical winner in similar situations. The interested reader can verify that Tit-For-Tat is of course not free of defects. Even in simple resource as Wikipedia, it is possible to read as two such algorithms could spiral into a split-steal war: one starts by splitting while the other starts by stealing, they will respond inversely and the two might never catch up. I mentioned this particular issue of the Tit-For-Tat strategy because it allows me to suggest a general principle my experiences have usually verified: always embed into a strategy a bit of randomness or flickering. This is very different from saying “build a random algorithm or strategy”, this is more similar to some e-greedy algorithms where we let the strategy deviate from the rigorous path, and at sporadic instants make random decisions (e.g. 1% of the time) - Reinforcement Learning people, and in general anyone interested, can relate this to "exploration vs exploitation". Those implementations could be extremely effective in adjusting to singular situations … let the odds play their part.
Main image's tag: