We're using cookies, but you can turn them off in your browser settings. Otherwise, you are agreeing to our use of cookies. Learn more in our Privacy Policy

The good, bad and ugly of bias in AI

colorful montage showing a set of empty scales with one side lower than the other
Published 25 Apr 2024

Machine learning could either help remedy or magnify our inherent biases, depending on how it is designed and the data it is trained on. Human judgment and oversight will therefore remain vital to realizing AI’s promise to improve the investment decision-making process.

Bias is defined in the dictionary as an inclination of temperament or outlook. Though it is often associated with faulty judgments and unconscionable prejudice among humans, as well as systematic errors in machine learning (ML), it is not necessarily all bad.

Bias provides people a shortcut for making quicker decisions, which could favor normatively positive outcomes in an investment management context — a preference for socially beneficial investments, for example, or an aversion to extreme risk. Inductive bias is also integral to ML models, allowing them to prioritize certain properties so that they can make generalizations from training data.

But bias in the investment process poses real problems. Investment managers are prone to a variety of common emotional and behavioral biases that can lead to poor decisions.

Figure 1: Types of behavioral biases

Emotional biases

  • Loss aversion
  • Overconfidence
  • Self-control
  • Status quo
  • Endowment
  • Regret aversion

Cognitive biases 
(Belief perseverance errors)

  • Conservatism
  • Confirmation
  • Representativeness
  • Illusion of control
  • Hindsight

Cognitive biases 
(Information-processing errors)

  • Anchoring and adjustment
  • Mental accounting
  • Framing
  • Availability

Emotional biases, influenced by feelings like fear, greed or remorse, are arguably the hardest to overcome because they often derive from a deep-seated impulse. These include loss aversion, overconfidence and a lack of self-control.

Cognitive biases, of which there are more than 175 types, generally stem from some type of miscalculation or misinterpretation. The main cognitive biases afflicting investment managers can be categorized as belief perseverance or processing errors. The former includes confirmation bias, which is the tendency to favor information that supports existing beliefs, and an illusion of control over outcomes. One of the most common forms of processing error is anchoring bias, whereby people tend to be more influenced by information they gleaned early in the decision-making process.

Although cognitive biases tend to be easier to resolve than emotional biases once the faulty logic underpinning them is highlighted, ML can provide a powerful means of correcting for both. ML can even be used to identify biased trading decisions by other investors, potentially creating opportunities to buy or sell mispriced securities.

And while machines are susceptible to biases of their own, mitigating them is almost always easier than remedying those held by people. After all, many human biases operate on an unconscious level, whereas a lack of fairness in artificial intelligence (AI) can be systematized and quantified. are also working to form a set of unified standards to help incorporate these factors into the investment process. 

Sources of AI bias

There are several ways bias can creep into machine learning systems, including biased training data, poor data handling, inappropriate model selection or incorrect algorithm design. These need to be addressed by implementing a comprehensive bias pipeline that adheres to clear principles laid out in a decision framework for the development and deployment of responsible AI applications. By striving towards ethical goals such as fairness, accountability and transparency (see Figure 2), firms will go a long way towards tackling bias in AI.

Figure 2: Five focus area for tackling bias in AI Accountability Value alignment Explanability Fairness User data rights Al designers and developers are responsible for considering Al design, development, decision processes, and outcomes. Al should be designed to align with the norms and values of your user group in mind. Al should be designed for humans to easily perceive, detect, and understand its decision process. Al must be designed to minimize bias and promote inclusive representation. Al must be designed to protect user data and preserve the user's power over access and uses. Source: IBM.

One often-cited example of algorithmic discrimination in the financial sector pertains to credit decisions, where automated systems end up magnifying historical trends or excluding certain demographic groups because of the data they were trained on. 

In the investment industry, this pattern could extend to client onboarding and credit risk estimation. If model developers try to sort groups of clients or prospective clients into risk categories based on limited data history that underrepresents certain groups, those groups could end up inappropriately classified.

This is “a particularly pernicious phenomenon,” noted Mark Ainsworth, a senior data science consultant and former Head of Data Insights at Schroders. “Since algorithmic systems and automation are often deployed as a cheap way to serve less affluent groups, this will tend to systematically punish them.”

There are several methods of mitigating data bias, including correcting or augmenting the data to better represent previously disadvantaged groups. Another way is to redefine the algorithm’s objectives.

“It's really important that the people who are creating those algorithms don't think that their only job is to predict whether someone will default on their loan as accurately as possible. That's a narrow definition of what success looks like,” explained Ainsworth. “They should recognize that those algorithms exist in a wider social framework and that the data they’re training it on manifests bias in society.”

Data bias hurts returns

Aside from issues of fairness and equality, biased data sets can also dilute the predictive capability of machine learning models.

Two of the most common biases that manifest when using historical financial data to train ML models in the investment industry and backtest investment strategies are survivorship bias and look-ahead bias.

Survivorship bias is famously illustrated using an example from World War II, when the US military examined damage sustained by bombers returning from missions to conclude that armor should be added to the areas that were most frequently hit. Statistician Abraham Wald pointed out that any aircraft that had been shot down would be excluded from the data set. He suggested that armor should instead be added to areas which had not been hit, since the bullet holes in the returning aircraft likely represented areas where they could take damage and still make it back.

For investment managers, survivorship bias generally takes the form of ignoring companies that no longer exist on a given index data set, either because they were delisted, went bankrupt or were the subject of some corporate action, such as a merger or acquisition. Of the nearly 3,000 constituents of the Russell 3000 when the index was created in 1986, fewer than 20% have survived until today.

By failing to keep data on the companies that did not survive, “you’ve scrubbed it, so all your model sees is success,” explained Julia Bonafede, Co-Founder of Rosetta Analytics.

"If you could know what the members of an index are going to be decades in the future, what a wonderful strategy that would be,” remarked John Elder, Founder of data science consultancy Elder Research.

Survivorship bias can lead to overfitting, which describes a situation in which the ML model gives accurate predictions for training data, but not for new data.

The same is true for look-ahead bias, when data is used that would not have been known or available during the period analyzed. Elder described this error as “leaks from the future.

For example, first quarter results might be recorded in a database as having been available on April 1st, even though they may not have been released until May 15th. This would lead the model to assume it had access to data that would not actually be available for another six weeks.

"You have to construct your data set so that the model can’t see into the future,” said Bonafede.

Data bias is also now prevalent in the use of natural language processing to analyze alternative and unstructured data  from sources such as social media and news feeds. Bias can arise in the way that algorithms filter content or from the use of programming choices that disadvantage some dialects or languages.

The world is now awash in Transformers which power models like ChatGPT,” added Bonafede. “These scrape data from the web and we don't actually know what they've been trained on. All sorts of data can creep in that may not be screened or cleaned properly.

This makes it difficult to abide by the principles of ethical AI, which require professionals and data scientists to ensure the source data used to train their ML algorithms are free from bias, or that they limit bias as far as possible by understanding the data’s properties and using careful sampling approaches.

Behavioral bias among model developers

Data scientists also need to be cognizant of several behavioral biases that make it more likely that the ML models they create are bad, said Samuel Livingstone, Head of Data Science & Data Engineering at Jupiter Asset Management. He suggested being aware of four biases in particular.

In addition to the aforementioned anchoring bias, these include recency bias, “where the latest information you receive is given an undue weight relative to something you learned a while ago,” he said.

Also prevalent in the industry is expertise bias, defined as an inclination to treat the opinion of people considered experts as incontrovertible and trustworthy, even though their actual level of expertise on the subject may not have been verified, explained Livingstone.

And finally, Livingstone singled out the disposition effect, which is a tendency for investors to “sell winners too fast while holding on to losers for too long because they don’t want to be wrong.

“I've seen a version of that transcend the financial markets into data science, with people spending so much time trying to make an inaccurate model correct,” he said.

They will spend loads of time re-running, re-modelling, featuring engineering, trying different hyperparameters.” Instead, they should just go back to basics and build a straightforward model, “because if that can’t capture it, making it more complex won’t,” continued Livingstone. “It’s really the propensity for people to spend too much time on losers.

The best way for data scientists to overcome such propensities is to first be aware of them, said Livingstone.

“I think every data scientist should have a component of their learning that focuses on those behaviors and how they can personally mitigate them. And I think most of them can be mitigated by having more than one person in the process,” he said.

Mutual man-machine support

In the future, that vital second-guess could also come from a machine, with better data, analytics and AI increasingly being applied to examine human biases. Some organizations already run algorithms alongside human decision-makers, comparing results and looking for explanations of the differences.

This could create a virtuous feedback loop, in which AI helps to improve human judgment and oversight, which is necessary to guide and ensure the accuracy, safety and fairness of AI systems.

So, while there is a good chance that AI can lay the foundations upon which more rational, effective and equitable investment decision-making can be built, for the foreseeable future at least, human input is likely to remain not only indispensable, but also value accretive.

Explore related articles

 

View more Digital, Data & Technology stories