Developing a Best-Worst Scaling (BWS, aka maxdiff) free web app

Advantages of Best-Worst Scaling
Project: a web app to facilitate running BWS tasks
Design of the BWS task: special attention to make it engaging to human annotators
Management of datasets and human annotators (coders)
Your contribution
Where to find the BWS task when it will go live?

update: Best-Worst Scaling is now available on Nocode!

Best-Worst Scaling, also known as MaxDiff, is a fantastic - yet still relatively underused - tool to collect data on how people compare / evaluate a list of items. That can be as diverse as:

ordering a list of items from least preferred to most preferred (candidates, brands, places…)
ordering a list of features from least to most important (political issues, product characteristics…)
classifying statements or single terms based on how “relevant” they are on a given dimension

In the last few years, use cases have expanded from market research to UX design and even more recently, to Best-Worst scaling (BWS) being used in machine learning to create labelled datasets of a high quality. I am confident these two use cases for BWS / maxdiff will expand.

So far, setting up a BWS choice task can be implemented:

via free programming packages (with an R package or Python scripts by Geoff Hollis)
by hiring specialized research agencies and consultants
by using commercial software.

These solutions have obvious benefits, but a free and click-and-point solution would be useful, too. For this reason, I am developing a free web app to run BWS / maxdiff tasks. Read further or jump to the bottom of this page if you’d like a contact point and start a discussion!

Advantages of Best-Worst Scaling

There are many well established scales to have users rate things, such as this linear rating scale:

Best-Worst Scaling is different and introduces a smart twist. With BWS, the respondent must choose among several options at once, and is simply asked to select the best and worst option :

This slight change in the design of the questions has two important benefits:

contrary to rating scales, there is no trouble guessing what the respondents meant with their scoring, or correcting for possible biases (such as a tendency to choose scores near the middle of the scale).
faster collection of data - many items are scored at once. This reduces the number of questions to be asked.

On the first point, a recent study(¹) speaks volumes: Best Worst Scaling elicits judgements which spread smoothly and “naturally” over the space of all possibilities. The study used BWS, pairwise comparison and a rating scale to ask respondents how they value words in terms of their positive / negative “valence” (“happy” is a term with a positive valence, “depressed” is a term with a negative valence).

You would expect that respondents would find that most of the words have a neutral or quasi neutral valence. The more we go to the extreme on the positive or negative side, the fewer words there should probably be. That is indeed the case, see the graphic below. But look at the differences between the three methods: BWS (green line, very smooth) and rating scale / pairwise comparison (blue and red lines, pretty irregular):

BWS is the only method which produced a smooth ranking, as should be expected. The two other methods have big ups and downs which are due to their design specificities - clearly artefacts of measurement.

Project: a web app to facilitate running BWS tasks

The project is to develop a web tool to design and run such BWS tasks (“Case 1” scenario, which corresponds to the example above). The idea for this project comes from my need to run many of these tasks for a research study where students will be the respondents. I need the BWS task to be super convenient to setup and useable with dozens of students, accessible on the web from their laptop or mobile.

Hence the goal is to develop this web app for Best-Worst scaling and to open it for any one to use for their own BWS tasks. The advantages will be:

a web application: makes it easy to share it with the respondents / human annotators
responsive and mobile friendly with touch screens to allow for the greatest engagement of human annotators
free tool, for any use (academic and commercial)
very easy to use: no complex registration, not a full catalog of all the decision tasks that exist. Just set up a BWS task and run it
designed to foster the engagement of the human annotators (see below)
following the best academic standards (eg, in the design of the blocks of choice)

Design of the BWS task: special attention to make it engaging to human annotators

My feeling is that the type of user interface used in a BWS task has an impact on the quality of the responses and on the engagement of the human annotators recruited for the task. The radio buttons or check boxes that are often used in BWS studies (see the screenshot above) just feel not “cognitively aligned” with the task.

Alternatively, I propose to design a user interface for BWS that allows the user to drag-and-drop the items in a block of items under review. The user would drag the best option to the top of the list, and drag the worst option to the bottom of the list, like so:

bws_ordered_list

This format of user input, I believe, has the benefit to make the respondent “act” and “embody” (even with a simple mouse movement) the decisions they make (more than clicking a radio button). Displacing the preferred option to the top (and the worst at the bottom) adds a symbolic weight to the decision and offers a meaningful visual feedback to the decision being taken. It might well make the respondent more reflexive on the choice they are making, making the final outcome more valuable.

Management of datasets and human annotators (coders)

While designing the BWS task as a free web app is what matters most to me, I realize that such a task needs supporting services to be fully useful. The task designer must be able to upload their dataset, recruit coders, adding “gold questions” and other methods of quality control. I’ll go about it in two ways:

remain as minimalist as possible. The goal is to make the BWS task as easy to design and run.
if possible, connect with existing software for the management of datasets and coders. Discovertext is a great web app for the annotation of textual datasets, and I am exploring the possibility that the BWS web app I develop can connect to it. This would save on development time and insure a best-in-class management of coders and datasets.

Your contribution

Have you the need for such an app? Suggestions or remarks? I’d love to hear from you. If you have feature requests, I’ll do my best to add them! Email me or say hi on Twitter!

Where to find the BWS task when it will go live?

It will be hosted on Nocode functions, which is the platform I develop for all my research apps. I hope to have it live by January 2022.

And discover all the other functions of the nocode functions web app: https://nocodefunctions.com/

by Luna De Bruyne et al. to appear in the Dec 2021 issue of the journal Language Resources and Evaluation ↩

Date: November 11, 2021