January 18, 2017 |

How Sentiment Analysis Can Add Value to Numeric Ratings in Review Systems

We started deemly with the purpose to help people trust each other when interacting in online P2P transactions. Therefore, we provide a score for our deemly users, which sums up their aggregated reputation from reviews and ratings across the web. Hence, we are naturally eager to look further into the meaning of this data.

A sentiment analysis is a process that computationally identifies and categorizes opinions expressed in a text. However, the main focus is to determine whether the writer’s attitude towards a particular topic, product, etc., is positive, negative or neutral.

Nowadays, there is a clear tendency to take into account online reviews and ratings. We do this especially when we have to decide what products to buy, services to use or, in the case of the sharing economy, who to trust.

Numeric ratings are different from written reviews

The rating that consumers see is a quantitative representation of qualitative inputs. Basically, it means that we spend time to write our opinion and give a numeric value, or the other way around. Either way, the written text will often be more information dense than a simple star rating.

Moreover, adding additional rating parameters, like we see it on Airbnb, it compensates the written text. It encourages the rating party to take a more nuanced view.

Polarized numeric ratings

Reviewers tend to rate either very high, or very low when there is a numeric scale, often from 1-5 or 1-10 stars. Smallbiztrends.com wrote a post regarding this in 2015 and it used Yelp reviews as a case. To summarize, they compared reviews from 2005 and 2014 and, as we can see below, the 1 and 5 stars reviews covered 55,4% in 2014, compared to 44,4% in 2005. Usually, the visitors look briefly at the aggregated rating without reading the written reviews so, this is an important remark.


From http://minimaxir.com/2014/09/one-star-five-stars/


We use Google Natural Cloud Language API

We feed the API a sample of our stored reviews and then, we let Google crunch the data with their powerful machine learning models. The API returns two values for the sentiment analysis:

Google provides some examples of values and what they indicate:

google sentiment analysis scores example


So, what’s in the store?

As time goes by, we build an increasing amount of reviews from our user platforms that we try to analyze. We deduce around the “raw” review data, which are stripped of any user relation. After that, we run them through the sentiment analysis and in the end, we look deeper into our findings.

Apart from all these, there are some interesting questions to raise. How the reviews’ score and magnitude reflect the numeric rating? Are any differences between platforms over the time? How is the correlation between the google calculated score and the users’ rating affected by the rating scheme of a platform (some use stars, others smileys and so on)?

This will be the first post in a series of thoughts and findings behind our work. Furthermore, you are welcome to follow or connect with us. Do you have any thoughts or ideas surrounding this subject? We would love to hear from you!