Use it directly by selecting the data to be analyzed:


or

... or read below about what it does, and how it works:

Analysis of sentiments for Twitter, Instagram and beyond

This function performs sentiment analysis, also called opinion mining. It analyzes the text and determines whether the sentiment is neutral, positive or negative.
It works best on social media such as tweets for Twitter, comments on Instagram posts and other very short texts in English or French. In a comparison with 23 alternatives, this tool was found to be the best tool for sentiment analysis on social media.
Born in 2012, this tool is under continuous development.

Import your data, get the results: an example

A wide variety of options are available to import the texts to analyze:


list of options to import texts

how to import a text and do sentiment analysis on it

Fast and performant

The tool is programmed with Java which performances are similar too, or better than Python. It does not rely on part-of-speech tagging (POS tagging), which makes it even faster. Parallel computing is used in some subfunctions and great care has been given to performance.

Model - short description

The function examines each term of the text and applies a series of rules (in this context, is the term positive or negative?). It also considers emojis, punctuation, hashtags, capitalized words... to determine the sentiment.

Model - long description: a series of heuristics

The principles followed by the tool are described in this academic publication about Umigon, published in the anthology of the Association fo Computational Linguistics. The tool follows these steps:

  1. check on the length of the text: if it is just a couple of words short, it will not be classified
  2. removal of urls, removal of content in quotes, normalization of apostrophs
  3. two versions of the text are established: one where all the accents and special characters are removed, and one where they are retained. All the following steps will apply to both versions of the text.
  4. check on emojis, emoticons and onomatopeia. Do not preprocess your text to remove them, they provide useful information on sentiment!
  5. check on hashtags, if any
  6. decomposition of the text in n-grams up to four-grams
  7. for each n-gram:
    • skip it if it belongs to a pre-established list of stop words
    • check for repeated characters and remove them as necessary (yeeeahhh! becomes yeah!)
    • check if the n-gram is contained in the pre-established list of potentially positive words. If so, the corresponding rule is applied. Most often the rule is as simple as "a positive word gives a positive sentiment to the text", but more complex rules are also included of course.
    • check if the n-gram is contained in the pre-established list of potentially negative words. If so, the corresponding rule is applied. Most often the rule is as simple as "a negative word gives a negative sentiment to the text", but more complex rules are also included of course.
  8. final checks: detection of moderators in the sentence ("but", "however", "even if", etc.). Positive or negative values placed before or after these moderators are kept or get deleted.
  9. final decision, based on the results of the previous steps.

Scoring, percentage and emotions?

Some tools provide a score to express the strength of the sentiment: a high score for a very positive sentiment, and a very low score for a negative sentiment. Zero represents a neutral sentiment. In my experience, these scores are not super reliable, except for the obvious cases. "Horrible" will score really low, and "wonderful" will score really high. But in the middle, things are less straightforward and the scores are much harder to interpret - I would not advise to rely on them. A more promising road would be to introduce emotions to tease out the finer nunances of sentiment, beyond the positive / neutral / negative categories. Drop an email at admin@clementlevallois.net if you are interested in this direction of research.

Positive, neutral and negative sentiments: further considerations

This tool identifies subjective markers of sentiment, NOT positive or negative factual statements. To give an example:


  • "This country is at war" -> it is classified as NEUTRAL, even if a country at war is "objectively" or "factually" not a positive thing.
  • "War is horrible :(" -> it is classified as NEGATIVE because the term "horrible" and the emoji are subjective markers of a negative sentiment.
  • "War of the sexes is an exciting research topic!" -> it is classified as POSITIVE because the term "exciting" is a subjective marker of a positive sentiment.

We believe this approach makes sentiment analysis reliable and really unique in the landscape of tools for opinion mining and sentiment analysis. Ready to try it?


or