Sie sind hier: Startseite Ressourcen Datasets

Datasets

 

SentimentDictionaries

This library provides domain-specific dictionaries for sentiment analysis. Each dictionary consists of words that statistically feature a positive or negative polarity in movie reviews or financial filings The dictionaries are extracted from two different corpora, namely, IMDb movie reviews and U.S. regulated Form 8-K filings. Details are available from the following reference. 

  • Proellochs, Feuerriegel and Neumann (2017): Language That Matters: Statistical Inferences for Polarity Identification in Natural Language, Working Paper, Chair for Information Systems Research, University of Freiburg, Germany. 

Details 

This library contains the following dictionary resources in CSV format. 

  • Movie reviews dictionary : This dictionary contains words that feature a positive or negative connotation in IMDb movie reviews (DictionaryIMDB.csv),
  • Financial filings dictionary: This dictionary contains words that feature a positive or negative connotation in U.S. regulated 8-K filings (Dictionary8K.csv). 

The individual columns of each dictionary are as follows: 

  • Words: This column lists the individual dictionary entries. We provide stems instead of complete words as stemming is part of the document preprocessing.
  • Scores: This column denotes the polarity score of each entry.
  • Idf: This column denotes the inverse document frequency (idf) of each entry. 

Usage in R 

We also provide both dictionaries in the form of a package for the statistical software R. You can install SentimentDictionaries from github with:

 

# install.packages("devtools")
devtools::install_github("nproellochs/SentimentDictionaries", subdir = "R-package")

 

Both dictionaries can be easily used in combination with the SentimentAnalysis R package.

SentimentDictionaries on GitHub: https://github.com/nproellochs/SentimentDictionaries

 

NegatedSentences

This repository provides annotations of negation scopes for 500 sentences from IMDb movie reviews. The dataset is labeled manually by two external persons (Annotator A and Annotator B). Each sentence contains at least one explicit negation phrase from the list of Jia et al. (2009). The labeled sentences can, for example, be used in machine learning models that aim at learning accurate negation scopes for sentiment analysis. Details are available from the following reference.

  • Pröllochs, Feuerriegel and Neumann (2017): Understanding Negations in Information Processing: Learning from Replicating Human Behavior, Working Paper, Chair for Information Systems Research, University of Freiburg, Germany.

Details

This library contains the following resources in CSV format.

  • Negation Labels Annotator A: This file contains the annotations from Annotator A (sentences_annotator_a.csv).
  • Negation Labels Annotator B: This file contains the annotations from Annotator B (sentences_annotator_b.csv).

The individual columns of each resource are as follows:

  • Id: This column assigns a unique Id to each sentence.

  • Sentence: This column contains the sentences that are labeled by the two human annotators.

  • IsNegated: This column contains the negation pattern for each sentence. The value T denotes that a word is marked as negated by the human annotator, whereas F denotes that the word is marked as not negated.

NegatedSentences on GitHub: https://github.com/nproellochs/NegationDataset