Detecting the voice of the customer

Filtering out content that has a “sponsored / commercial” tone, to identify the voice of the customer
How does it work?
See it in action:
Next steps
About me

A new function is now available: identify the voice of the customer

Filtering out content that has a “sponsored / commercial” tone, to identify the voice of the customer

At the very beginning of my work on sentiment analysis, I was struck by the fact that most of the positive tweets I collected were simply some promoted content written by brands or communication agencies.

So I thought: surely, analysts and academic researchers have designed procedures to remove this content, before they measure the sentiment of a corpus of tweets? Otherwise what is measured is not the “genuine” voice of the customer (in a business context), and it is very much tainted by stakeholders who publish positive things, just to push an agenda for their (commercial) interest.

As it turns out, there is no library or service that I know of which does this preprocessing / cleaning step. So I made a prototype of such a function.

How does it work?

First, you can try it on the homepage of nocodefunctions, or on the dedicated page for the detection of promoted content. Let me know what you think!

It works in the simplest way: the premise is that promoted content doesn’t sound natural. For the fans of the TV series Friends, I remember this episode where Phoebe works at a massage chain. The receptionist, who addresses Rachel then Phoebe, keeps adopting a formal, commercial tone with them - which sounds weird given the relatively informal settings:

Similarly, a piece of text which has been written by a corporate agent or someone set to promote a service or product, will probably be phrased in a way that doesn’t feel natural. These unnatural phrasings are captured through simple lists. That’s it? These lists are public, see the one for English and the one for French.

See it in action:

As you see, we benefit from the fact that the underlying logic is made entirely transparent and interpretable thanks to the approach we chose: the decision is explained entirely, in plain language.

Next steps

The function is at the embryonic stage. If you are interested in it, get in touch (see below) to participate! For it to perform adequately, we need much bigger lists of expressions that would signal a “non-natural” discourse. As the function is using the same underlying logic as the one for sentiment analysis, it can leverage the fact that the terms included in the list can be supplemented by heuristics: simple rules assessing basic facts about the context where the word is being used. See the full list of available conditions / heuristics there.

About me

I am a professor at emlyon business school where I conduct research in Natural Language Processing and network analysis applied to social sciences and the humanities. I teach about the impact of digital technologies on business and society. I also build nocode functions 🔎, a click and point web app to explore texts and networks. It is fully open source. Try it and give some feedback, I would appreciate it!

my email: analysis@exploreyourdata.com 📧
or on Twitter: @seinecle 📱
you can also read the other articles of this blog 👓, where I write about the process of developing the app.

Date: July 6, 2023