Each day, oceans of new financial data are being generated by the internet, smartphones
and social media.
This data is often referred to as “big data”.
Furthermore, the term has generated excitement in the investing world because of its
potential to vastly improve the decision-making process of investors and financial
Modern businesses have begun to store data on everything, right from what a consumer
likes, to how they interact with a particular product, to the amazing restaurant
opened up in Italy last weekend.
As more data is recorded every year, the possibilities of what can be done with so much
raw data continue to grow.
At the same time, advancements in information technology have led to the emergence of
new tools such as machine learning and sentiment analysis.
Investors are looking to leverage those new tools and technologies to gain advantages
over market players that rely purely on traditional data.
In the past computers could only analyze structured data or data that is easily
quantifiable and organized in a set format.
However, about 80% of all generated data is unstructured or expressed in a format that is
not easily quantified.
An example would be textual news data, which often does not come naturally mapped to a
particular company or topic.
The mapping process for such data sets is complex and resource intensive, especially for
"hidden" concepts and topics that do not appear explicitly as keywords in the
Financial analysts face a major problem when working with unstructured textual data:
How can a machine understand text like a human and interpret it? How can a system understand
and apply domain specific knowledge?
About 90% of
report that unstructured data is their
primary big data
As a result, most investors do not yet use textual data to gain new insights.
Over the last years, there has been an exponential increase in data that is available to
Thousands of news and opinions are shared on social media every second and an overwhelming
amount of metadata can be derived from it.
However, not all data is relevant to a particular use case or useful for financial analysis.
Choosing the right technological tools is critical.
In Addition, analysts usually encounter data quality issues like missing, incomplete or
duplicated data, even when obtaining data from reliable sources.
Therefore, big data must be "cleaned" and filtered before it can be further processed to
improve investment strategies.
Investor sentiment can be described as a belief about future stock performance and
investment risks that is not justified by the facts at hand.
has shown that the question is no longer whether investor sentiment
affects stock prices, but how to measure investor sentiment and quantify its
in Natural Language Processing (NLP) have yielded promising results when analyzing
social media posts or news articles to predict stock prices.
StockBrain builds on top of current research to deliver an accurate "stock sentiment"
indicator, which captures whether investors are likely to increase or withdraw
investments into a particular stock.
The following paragraph will present a broad overview of how StockBrain's system empowers
Every day, StockBrain collects thousands of news articles from over 500 sources in 6
All articles are passed through several quality filters to remove unrelated content or spam
from our processing pipeline.
StockBrain extracts key concepts and phrases from each news story and uses this information
to form relationships between articles and concepts.
To achieve this, we use a combined
Vector Space Model (VSM)
Latent Dirichlet Allocation (LDA)
In Addition to that, we use curated dictionaries to identify concepts that relate to
investment, business and economical topics.
As a result, StockBrain can track which news topics were dominating the headlines when a particular stock price
To determine the sentiment of an article, StockBrain uses a mix of stochastic and
statistical methods, that are based on state-of-the-art sentiment dictionaries such as
In Addition to that, we increase the precision of existing approaches by also considering
topic knowledge that has been extracted before (see previous paragraph).
This allows us to not only generate one overall sentiment value for a news story, but also
to measure and weight the sentiment of each sentence.
All sentiment information that relates to a particular company is combined in one single
metric that ranges between -1 (most pessimistic) to 1 (most optimistic).
In order to provide actionable insights, we also display which topics or news stories
contribute to the current investor sentiment.