😊 😬 Free sentiment analysis tool for social media in English, French and Spanish

... or read below about what it does, and how it works:

Based on scientific research

Cite this paper if you use Umigon in your work:
"Levallois, C. (2024). Umigon-lexicon: rule-based model for interpretable sentiment analysis and factuality categorization. Language Resources and Evaluation, 1-18." (link)

Analysis of sentiments for Twitter, Instagram and beyond

This function performs sentiment analysis, also called opinion mining. It analyzes the text and determines whether the sentiment is neutral, positive or negative.
It works best on social media such as tweets for Twitter, comments on Instagram posts and other very short texts in English or French. In a comparison with 23 alternatives, this tool was found to be the best tool for sentiment analysis on social media.
Born in 2012, this tool is under continuous development.

If you use this function in an academic context (research or studies), you must reference it in your bibliography:

Levallois, Clement. "Umigon: Sentiment analysis on Tweets based on terms lists and heuristics". Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval), 2013, Atlanta, Georgia.

Import your data, get the results: an example

Fast and efficient

The function is programmed in Java. His code is accessible freely on Github. Its performance is excellent because it does not rely on part-of-speech taggging (POS tagging), which makes it fast. Concurrent computing techniques are used in sub-functions and great attention has been paid to overall performance.

The model - short description

The function examines each term of the text and applies a series of rules to determine the effect on the sentiment. It also systematically considers emojis, punctuation, hashtags, variations in spelling, capitalized words... to determine the sentiment.

The model - long description

The principles followed by the tool are described in this academic publication about Umigon, published in the anthology of the Association fo Computational Linguistics. The tool follows these steps:

check on the length of the text: if it is just a couple of words short, it will not be classified
removal of urls, removal of content in quotes, normalization of apostrophs
two versions of the text are established: one where all the accents and special characters are removed, and one where they are retained. All the following steps will apply to both versions of the text.
check on emojis, emoticons and onomatopeia. Do not preprocess your text to remove them, they provide useful information on sentiment!
check on hashtags, if any
decomposition of the text in n-grams up to four-grams
for each n-gram:
- skip it if it belongs to a pre-established list of stop words
- check for repeated characters and remove them as necessary (yeeeahhh! becomes yeah!)
- check if the n-gram is contained in the pre-established list of potentially positive words. If so, the corresponding rule is applied. Most often the rule is as simple as "a positive word gives a positive sentiment to the text", but more complex rules are also included of course.
- check if the n-gram is contained in the pre-established list of potentially negative words. Same logic applies
final checks: detection of moderators in the sentence ("but", "however", "even if", etc.). Positive or negative values placed before or after these moderators are kept or get deleted.
final decision following complex rules, based on the results of the previous steps.

Scoring, percentage and emotions?

Some tools provide a scoring to express the strength of the sentiment: a large value for a very positive sentiment, and a very low value for a negative sentiment. Zero represents a neutral sentiment. In my experience, these scorings are not super reliable, except for the obvious cases. "Horrible" will score really low, and "wonderful" will score really high. But in the middle, things are less straightforward and the scorings are much harder to interpret - I would not advise to rely on them. A more promising road would be to introduce emotions to tease out the finer nunances of sentiment, beyond the positive / neutral / negative categories. Drop an email at analysis@exploreyourdata.com if you are interested in this direction of research.

Positive, Neutral, and Negative Feelings: Additional Considerations

This tool identifies subjective markers of sentiment, NOT positive or negative factual statements. To give an example:

"This country is at war" -> it is classified as NEUTRAL, even if a country at war is "objectively" or "factually" not a positive thing.
"War is horrible :(" -> it is classified as NEGATIVE because the term "horrible" and the emoji are subjective markers of a negative sentiment.
"War of the sexes is an exciting research topic!" -> it is classified as POSITIVE because the term "exciting" is a subjective marker of a positive sentiment.

We believe this approach makes sentiment analysis reliable and really unique in the landscape of tools for opinion mining and sentiment analysis. Ready to try it?#TOPICS

or

or