Sentiment Analysis for Finance: an Oxymoron
Sentiment Analysis is mostly wrong right now, especially when it comes to finance. More specifically, the way sentiment analysis is being used to categorize language in financial text is completely unhelpful. There are fleets of machine learning models combing through reddit every day trying to give their stakeholders an edge, but are unlikely to yield meaningful results. Why? Let’s see.
What is Sentiment Analysis?
The name Sentiment Analysis describes a large group of methods used to determine whether text is positive, negative, or somewhere in between. They range from simple methods just looking for certain words, all the way up to monstrous machine learning models trained on thousands of sentences scored by humans. Some of these models can achieve 99%+ accuracy in real world scenarios, but even this will not help you find the next Tesla, unfortunately. Even perfect accuracy in sentiment analysis won’t give you the edge you need to make better decisions.
The Problem
When reading the news about a stock, or some company event, you likely get the sense that the article overall is either positive or negative. As humans, we get a sense of the whole argument being made, and the evidence they provide. Nice! But we can’t read everything out there, so what if we take a model that knows what we mean by positive or negative, and run it on every article and see what we get? Article quality aside, you’ll get back some interesting results… but what does any of it mean? Here our positive/negative breakdown becomes really unhelpful. Let’s say you investigate $TSLA, and see it had majority positive sentiment in 60% of articles. Does that mean you should buy shares? Backtesting of even the best models out there have shown almost no edge of trading based on sentiment scored this way, because positive doesn’t equal bullish. Take this sentence for example:
“Tesla shares spike on surprise earnings beat”
The sentiment is aggressively positive, but investing based on this sentence is likely to bring you to the party too late; the earnings are now priced in. All the articles you analyzed initially are riddled with sentences like this, and were given scores that don’t actually help you choose what to do next. Another example:
“By some estimates, the Covid-19 diagnostics business could fall by 35% in the next year, which helps explain Hologic’s steep discount. But Jefferies analyst Raj Denhoy thinks those concerns are overblown.”
This pair of sentences from This Covid Testing Stock Is Cheap demonstrates how we actually reason in writing — point and counter-point. This would probably get flagged with negative sentiment, but is actually making the bull case for $HOLX. These are just two examples, but they show that traditional positive/negative sentiment does not work in the financial domain.
How We’re Fixing It
The main reason the current approaches aren’t working is they make no attempt to categorize chronology or mood. In the real world there are past events and future projections, with analyst outlooks associated with each. At babbl we’re analyzing financial language in this pattern; an approach that actually extracts the views of the text. If we instead classify sentences on 2 axis (reactive vs speculative, optimistic vs pessimistic), we can get more insight from text at scale. For example, the sentence from before:
“Tesla shares spike on surprise earnings beat”
This sentence would easily be classified as reactive (it is reacting to something that’s already happened) and optimistic (obvs), which is much more accurate to the actual meaning of the sentence. And the other pair of sentences:
“By some estimates, the Covid-19 diagnostics business could fall by 35% in the next year, which helps explain Hologic’s steep discount. But Jefferies analyst Raj Denhoy thinks those concerns are overblown.”
Would be classified as speculative (concerned with future events, thanks to estimates and could fall) and optimistic (thanks to steep discount).
The Takeaway
We need finance-specific text analysis, and anything else is likely unhelpful. If you’d like to get more out of your finance text, we’re here to set you up. We have a chrome extension and a personalized dashboard running now, with reddit and twitter support coming soon. It’s time we all did a better job researching, but you don’t have to do it alone. Post an article in the comments and I’ll show you what babbl’s algorithms would tell you about the text!
Thanks for reading! You can reach me at cartford@hey.com with comments, questions, concerns, or quinoa.