Extracting patterns and insights from data is a skill that’s arguably as much art as science. It is a delicate balancing act, with any number of often subtle missteps or misjudgments likely to lead to flawed results.
The ability to recognize patterns is critical to how humans learn and make decisions, and likely offered an evolutionary advantage in spotting predators and prey. It is such an important trait that the brain has developed a knack for finding patterns even where none exist – a phenomenon known as apophenia.
Common examples are seeing animals in clouds and hearing voices in white noise. Apophenia is the basis of superstitions, conspiracy theories and paranormal experiences. It is not uncommon among gamblers and investors, often resulting in bad bets and investments.
Data science and machine learning (ML) algorithms may seem like ideal tools to help overcome apophenia by bringing rationality and analytical rigor to the investment decision-making process. But if used to test too many variables within large data sets, they, too, are prone to find connections between unrelated things.
“If you look enough times, you will see signal, but it’s actually noise,” explained Mark Ainsworth, a senior data science consultant and former Head of Data Insights at Schroders.
For instance, if a statistical test is conducted on two variables, x and y, with a typical 5% level of significance, the probability of finding a relationship where none exists is less than 5%. But if a test was conducted on x and 100 different y’s, “five of them would come up as a significant relationship, because of the way we defined what significant means,” said Ainsworth. “There's a one in 20 chance of seeing this relationship just by accident.”
This is an example of the multiple comparisons problem. “If you torture the data long enough, it will confess to anything,” said influential British economist Ronald Coase. And one can uncover no end of spurious correlations among random variables, such as between the divorce rate in Maine and per capita consumption of margarine (see Figure 1).
The plausibility trap
The trouble is, spurious correlations are not always easy to dismiss. Sometimes, they can seem quite plausible.
Ainsworth gave the example of an investment professional with a reasonable hunch that there is a relationship between weather patterns and prices of wheat. Following up on that hunch, a data scientist might look at “different rolling averages of the price and different rolling averages of the weather in different parts of the world, and different ways of blending together those variables.” In the process, “you discover these phantom relationships just because you tried so many ideas.”
Further complicating things, humans are also “quite adept at telling stories about patterns, whether they exist or not,” as science writer Michael Shermer, the founding publisher of Skeptic magazine, observed.
John Elder, founder of data science consultancy Elder Research, recounted the story of a friend of his as a master’s student, who had analyzed medical data concerning heart strength in ill children. The friend was projecting 2D scatterplot graphs during a key meeting with the head doctor and nurse who excitedly interpreted and wrote notes for each result.
However, the x-axes on the transparencies weren’t labeled, and the friend suddenly realized that he had been showing them backwards – so the actual relationship between the variables was the opposite of the apparent one. As he apologized, started over, and flipped the first transparency, the head doctor immediately exclaimed, “That makes sense too!”
In light of this, Elder urges forming your hypotheses of likely relationships prior to testing, because “after the fact, virtually everything can and will be plausibly interpreted.”
Elder also advised not focusing only on the data, but also bringing experience and information from outside of the data to bear. Often, this can expose a perceived causal relationship as merely correlation.
Getting data you can trust
A crucial first step before running analysis or training ML models is internal validation of data (see Figure 2). “Reliable analysis depends so strongly on the quality of the data that internal inconsistencies can hobble your work, or they can be clues to problems with the flow of information within the company and reveal a key process obstacle,” said Elder.
The main data risks derive from biases in training data and a mismatch between training data and actual data used during operations; incomplete, outdated, or irrelevant data; insufficiently large and diverse sample size; and inappropriate data collection techniques.
Mitigating these data risks is more important than ever, said Augustine Backer, Vice President, Lead Analyst of Investment Portfolio at Wells Fargo.
“Before, data was just one part of the decision-making process, but now, it has become central to it. That makes having good, clean, trustworthy data paramount,” he said.
Failing to address these data risks, or even just leaving ML models to train for too long on a sample set of data, can lead to overfitting (see Figure 3), whereby the model makes accurate predictions on the training data, but not for new data.
It is also important to implement appropriate security measures to prevent internal or external threat actors from gaining access to input data — as well as algorithm design and output decisions — to manipulate them to deliberately bring about flawed outcomes.
Keeping an eye on concept drift
Even if data risks are thoroughly considered and managed, however, the predictive power of ML models could still decay over time because of a change in market conditions.
A model trained during the height of the pandemic, for example, would not be applicable today, explained Sreekanth Mallikarjun, Head of AI Innovation at Reorg. “Many companies shut down just because of the pandemic situation, even though they did nothing wrong. This is known as concept drift. We don’t want to use that model today, which assumes the world is naturally unfair.”
It is therefore necessary to set up a process to detect concept drift and deal with it by either retraining and updating the model or creating a new one.
More broadly, a host of other issues related to algorithm design can rob data of its analytical and predictive ability. These include biased logic, flawed assumptions or judgments, inappropriate modelling techniques and coding errors.
These issues could partly be addressed by setting appropriate project goals and metrics to get the computer to “feel” about the project like you do, suggested Elder.
To illustrate that point, Elder offered the example of a model which forecasts a stock’s price would rise from $10 to $11, but it actually rose to $14.
Since the gain forecasted would cause one to buy, the action is right and this error is really a positive surprise. But an algorithm based on squared error (like most are) would just see the error of $3 between the forecast and the truth, and square that to a penalty of 9.
If, on the other hand, the price fell to $9, the forecast would be too high by $2 and the penalty would only be 4. Using squared error – the most common loss function – the algorithm would thus have “preferred” the loss scenario (by more than 2:1). An alternative criterion that punishes negative errors (losses) much more than positive ones (gains) would better reflect an investment manager’s objectives.
A lot at stake
Other problems to be aware of when developing and deploying ML algorithms are catastrophic forgetting and the disjoint union problem. The former sees AI systems forgetting information from previous tasks while learning new ones, and an example of the latter is having to “assess two companies with conflicting attributes which are both good investments,” said Dan Philps, an AI researcher and Head of Rothko Investment Strategies.
There is plenty of incentive to prevent ML algorithms generating sub-optimal predictions and recommendations in the investment industry. When a streaming company’s algorithm occasionally suggests an unsuitable video to one of its customers, the consequences are hardly catastrophic. But with billions of dollars on the line, asset owners would be far less forgiving of bad investment decisions.
There are fixes for most of the issue outlined above, but it can often be hard to even know that they need fixing. That is where human judgment proves indispensable.
“There are always things that can go wrong, and it's always messy and complicated,” said Ainsworth. “If you operate under the assumption that you've probably got it wrong, your job is to satisfy yourself that you’ve tested and sense-checked that.”
Ideally, that process is woven into an organization’s culture.
“You have to be in a team where you want your colleague to try to break your model, because breaking it in the lab will be a lot cheaper than breaking it in the real world,” said Elder.
“The earlier you can break it, the better. And that takes a lot of humility – to say, ‘help me figure out what might be wrong with this.’”
Explore related articles
- Using NLP to unlock a treasure trove of alternative data
- Why communication is key to getting the most out of data in finance
- Why data science is a key skill for investment professionals