Background Information

Data Sources

The ad-hoc notifications as well as the stock price data are retrieved from publicly available websites belonging to the EQS Group, the largest provider of ad-hoc announcements in the German market. We only use announcements available in English. As the announcements are released by all publicly-traded firms on German exchanges, we compare the sentiment to the Composite DAX (CDAX) price index because it covers the broadest universe of German stocks.

Cleaning & Preprocessing

After retrieving the announcements, we carry out the following steps:

Remove html tags, email addresses & phone numbers
Remove customized stop words
Replace percentages with a term indicating the range
Replace dates and remaining numbers with placeholders
Replace financial synonyms
Tokenize announcement into single words
Stem tokens with Porter (1980) algorithm
Transform to Document-Term-Matrix (DTM)
Remove sparse terms, e.g. terms that are contained in less than 0.5% of all announcements

Sentiment Metric

sentiment metric

Sentiment Update Process

sentiment update process

Static dictionary

The static sentiment is based on the Loughran & McDonalds (2011) dictionary with weighting -1 and 1 for all negative and all positive words, respectively. We chose this dictionary as the foundation for our static sentiment because it is derived from regulatory filings (e.g. American 10-K filings) similar to ours and aimed at the financial domain.

Dynamic Dictionary Generation

After the announcement times are adjusted for business days and holidays, the open and close prices are used to calculate the announcement return as in Hagenau et al. (2013b). To reduce the potential for illiquid stocks to bias the coefficients one-sidedly, all returns above 100% are excluded. The generation of the dynamic dictionary is based on the approach by Pröllochs et al. (2014).
dynamic dictionary generation

Dynamic Dictionary Initialization & Update Process

As initial training set for the dynamic dictionary, we only used the first 5 years of our dataset with available announcement returns. The reason for this approach was to avoid look-ahead bias during the theoretical analysis.

Since January 2017, the dynamic ridge dictionary is generated using all available announcement returns of announcements published over the last 5 years. The update frequency is set to every six months, but further research is needed to determine the optimal frequency.

Interactive Sentiment Index

The aggregation methodology is a modified version of the approach used in Hagenau et al. (2013a) and proceeds as follows:

Take the average Polarity of all announcements in a week (month)
Take the sum of average Polarity over the last n weeks (months)
Standardize the sum of average Polarity for each sentiment indicator based on its historic sample median and median absolute deviation (mad) for robustness

Therefore, a sentiment indicator value above (below) zero implies positive (negative) sentiment compared to the median historic sentiment level. The indicators with shorter aggregation windows are more reactive to changes in the underlying news flow and therefore better able to explain short-term market movements. The indicators with longer aggregation periods are better suited to display the overall level of ‘animal spirits' in the broad stock market and especially, abrupt and continuing improvements (deteriorations) seem to lead trend reversals in the CDAX by a shifting amount of time depending on the current regime of the stock market. Because the sentiment indicators capture different effects depending on the combination of frequency and aggregation window, our final result is an Interactive Sentiment Index, where the viewer can choose the number of aggregation periods by himself, along with the time frequency and type of underlying dictionary. The advantage of this approach is that users can explore for themselves which combination meets their own needs best.

Hagenau, M., Hauser, M., Liebmann, M., and Neumann, D. 2013a. “Reading All the News at the Same Time: Predicting Mid-term Stock Price Developments Based on News Momentum,” in 2013 46th Hawaii International Conference on System Sciences (HICSS 2013): Wailea, [Maui], Hawaii, USA, 7 - 10 January 2013 ; [proceedings], R. H. Sprague (ed.), Wailea, HI, USA. 7/1/2013 - 10/1/2013, Piscataway, NJ: IEEE, pp. 1279–1288.

Hagenau, M., Liebmann, M., and Neumann, D. 2013b. “Automated news reading: Stock price prediction based on financial news using context-capturing features” Decision Support Systems (55:3), pp. 685–697.

Loughran, T., and McDonald, B. 2011. “When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks,” The Journal of Finance (66:1), pp. 35–65.

Porter, M. F. 1980. “An algorithm for suffix stripping,” Program (14:3), pp. 130–137.

Pröllochs, N. Feuerriegel, S., and Neumann, D. 2014. “Generating Domain-Specific Dictionaries Using Bayesian Learning” SSRN Electronic Journal.