This post originally appeared on the Baymard Institute blog.
In both the Baymard Institute’s qualitative and quantitative ecommerce tests, findings show that online shoppers expect customer ratings sorting to function differently from how it’s currently implemented on 86% of major ecommerce sites. This mismatch was observed to cause great user frustration and curtailed the subjects’ ability to find what they considered highly rated products.
Below, we’ll outline why users expect customer rating sorting to function differently, how you can align your sorting logic with user expectations and provide examples from leading ecommerce sites which already have this new sorting logic implemented.
Test observation: Users don’t trust averages based on 1-4 ratings
The typical mismatch between how users expect customer ratings to function and how it’s implemented comes from the intent users have when applying the customer rating sort type. From the test sessions, it’s clear that most users rely on customer ratings as a way to quickly tap into the wisdom of the crowd – the collective opinion and experiences of other shoppers.
During testing, the customer rating sort type was used most frequently when the subjects were browsing for products where they had little domain knowledge and therefore sought to rely on the insights and experiences of others to make an otherwise difficult decision, and to reduce the risk of purchasing an inadequate product.
However, when benchmarking the product list experience of 50 major ecommerce sites, we found that on 86% of those sites, customer ratings sorting is implemented as a naive rating average sorted in descending order, where a 5-star average rated product will be placed before a 4.8-star-average product regardless of how many ratings those averages are based on.
“This one only has a single rating, so that isn’t trustworthy at all,” a subject noted after realizing several of the products positioned first when sorting by customer ratings only had 1-2 ratings. She and all of the other subjects who sorted by customer ratings at REI found this inadequate, and instead favored the products with a 4.0-4.5 star average based on 5+ ratings.
“But then again, I can see it’s only a single review. That’s of course not so.. so.. this could be fake,” a subject speculated after having clicked on the first few products in the list, which he had sorted by “top rated,” continuing, “It could just as well be the manufacturer who was in here and posted a good review.”
When sorting by customer reviews, most sites will position a product with a single 5-star rating before a product with a 4.8-star average based on 18 votes. And technically this is correct, as the former product technically has a higher average. Yet, it is a naive implementation that doesn’t take the sample size into account, and indeed, nearly all users will find the latter product to be a much better indicator of a product recommended by the crowd when looking to make a product selection.
So, while it may be mathematically correct to place the 5-star average first, it fails to account for the reliability of the average. A sample size of 1 is obviously flawed – a fact that wasn’t lost on the test subjects, who assumed that products with only a handful of perfect reviewswere usually either a coincidence (a couple of fanboys) or even the manufacturer or site representatives who’d given the rating, and would often find it highly questionable.
Meanwhile the reliability of customer review averages based on several votes were never called into question by the test subjects. In practice, skepticism began to drop when the average was based on 5+ votes. This high level of skepticism toward a low number of perfect ratings has been confirmed during prior checkout and mobile ecommerce usability studies as well.
Survey: Users prefer higher number of reviews despite slightly lower average
To get a more quantitative understanding for users’ bias of not fully trusting a 5-star rating average based on a just a few reviews, we tested three different rating averages against 2,250 people.
Methodology: In total, three surveys were conducted with a total of 2,250 responds (split evenly across the three surveys), testing different rating averages versus number of votes. Each survey showed the respondent two list items (shown in the result graphs) and asked them to pick which one they would purchase. To avoid sequencing bias, the display order of the two answers was randomized for each respondent.
For two otherwise identical products, where one product has a 5-star average based on 2 reviewss, and the other has a 4.5-star average based on 12 reviews, 62% would pick the one with the higher number of ratings despite its lower average. This confirms the test observations that when a perfect average is based on only a few ratings, users will often prefer other products with a slightly lower average but a higher number of ratings.
As noted in the institute’s earlier investigation of users’ perception of product ratings, product ratings essentially function as a type of social proof for users, letting them tap into the wisdom of the crowd, using good reviews as a proxy for high quality or value for money. The thinking goes that if a lot of other users are happy with a product it means that it must be a bargain or of high quality –– or both. This is why users lacking domain knowledge or experience with the product find product ratings particularly useful. It allows them to rely on the domain knowledge and product experience of other customers.
Solution: Sorting logic should account for both number of reviews and its average
To better match the user’s expectations and intent behind sorting by customer reviews, a site’s sorting logic has to take the number of ratings into account as well, and not rely solely on the average score. In essence, when a user decides to sort by customer ratings, the products with a 5-star average based on just 1-4 ratings should not be placed before any products with a 4.5+ star average based on 50+ ratings.
Home Depot has a sorting logic for customer ratings that takes both the rating average and number of votes into account when determining display sequence for “Avg. Customer Rating” sorting. Notice how products with lower averages but more votes are placed above 5 star rated products with only a few ratings.
The sorting logic should instead be weighted to account for the combination of rating average and the total number of ratings. This aligns much better with the intent the vast majority of users have when they sort by customer ratings (i.e., “Show me what other users think are the best products”). For instance, notice in the Home Depot example above how products with a 4.5-star average based on 50 and 36 ratings respectively are placed before the two products with a 5.0-star average based on only 6 ratings.
Now, a simpler 5 vote “cutoff” which simply excludes (i.e. doesn’t calculate an average for) any product with less than 5 votes could also be adopted. However, this is of course a much less sophisticated solution and obviously won’t work well for smaller sites and in categories with few user ratings.
While it’s true that the weighted sorting method makes the actual sorting logic less transparent to the user (as it changes from a simple high-to-low logic to a more complex equation), during testing, this issue proved to be far less severe than the issues caused by listing products with 5-star averages first, even if their average was only based on a handful of ratings. Without a weighted logic, the most trusted products with 4.5+ averages based on dozens or hundreds of ratings will be scattered across several pages of results, making it very difficult for users to find the products which are recommended by the crowd.
The exact weighting between averages and number of reviews will likely vary based on site context and audience, and may require ongoing tweaking and A/B split-testing. For inspiration, here’s the few major ecommerce sites identified that do currently have a weighted sorting logic for their customer ratings: Overstock, Amazon, Crutchfield, Best Buy, Home Depot, and Lowe’s.
Less Development. More Marketing.
Let us future-proof your backend. You focus on building your brand.