On Review Scores

I can remember the "good old days" of videogame (print) magazines; well, to be more accurate (and giving away perhaps too much about my age) "computer game" magazines. They had a whimsical yet cynical style and their reviews made me laugh as often as they made me think about the games they reviewed.

And then to sum-up their review they gave you a nice, incredibly specific percentage score. None of this thumbs-up, 4 star, 8 out of 10 nonsense we tolerate today: A full gamut of cents ensuring that I was absolutely aware that this game was worth 74%, and not a fraction more, nor less. Controversial in those days meant being out by a single digit, not a whole star, or 1/10 of a score. Now that's pressure!

However, and it's barely worth saying, but, well, I did notice that they only actually seemed to use about 30% of the available scale, which if I were being critical I might say rendered somewhat redundant the whole percentile scheme in the first place. That, and well, to be honest if a game got 90%, my silly brain would actually see that it "only" got 90%, and there emerged an entire super-rating scheme on top of the base mechanic whereby, in practice, anything below 90% was effectively awful, and then you reset the scale using only the 1-10% scored above 90 (though of course nothing ever got 100% so maybe it's only 1-7/8).

Excuse me, but...

... what on earth has all that got to do with technology and consumer electronics? Well actually this whole train of thought was sparked after reading through Josh's excellent review of the Galaxy Nexus. I was happily absorbed by the review and thinking "wow, this sounds like a really great phone, perhaps even the best-smartphone-in-the-world-eva(tm)". Then I hit the score at the end and instantly thought "hmmm, only 8.6", and this genuinely soured my impression of the phone (for a moment at least), even though I'd spent the last 10 minutes reading a glowing review and know that 8.6 is a very good score.

It's not that I really thought the score was bad, it was the specificity of the score that really struck me as it reminded me very much of those old-school PC Gamer 91% - i.e. good, hmmm, but not that good. If it'd been an straight 8/10 I don't think I'd have had the same response.

Technology changes so quickly and varies so dramatically between products of the same type that using scores as the main method for comparison between them is not always (ever) the most sensible approach, especially given the fact that we can now do fancy direct spec comparisons via the website to actually compare the weeds of a set of products. Also, unlike videogames, there is a meaningful price difference between products of the same type and so using scores as the basis for comparison is a bit unfair on intentionally cheaper items (and the people who aren't fortunate enough to be able to buy the top of the range model). 

So then I wondered...

What is the point of a review score?

Well, what I want in a review (and what I get from the excellent in-depth reviews here) is an understanding of what the product offers, whether or not this product is good in its own right, and how it compares in specific (important) ways to its competition. I can then use that information to work out if its right for me.

But, if the whole of the review provides that information, what really is the score at the end telling me?

If I were being cruel I might think that the score is there to tell the people who can't be bothered to read the review whether the product is any good. But surely that doesn't apply here :-)

Now I don't mean to imply that scores are useless, as I actually think they offer a simple and relatively clear indication of the overall experience offered by a product and having a neat, concise conclusion in the form of a score is very useful.

But, if a review score is meant to be a quick-hit encapsulation of the opinion of an entire review of hundreds or thousands or words then perhaps we should acknowledge its inherent fuzziness and not pretend that it's a clinical, objective, to-the-decimal-point-accurate-representation of the product. The review score shouldn't be the end of any conversation about a product, far from it.

So

If giving one game 94% and another 95% and believing this adequately describes their differences is actually pretty dumb, I think using a similar scale for reviews on The Verge might also be a bit too much.

A simpler 1-10 overall review score might make more sense (as is used for the individual components of the review score). I think a 1-10 scale provides enough range to differentiate between good and bad products (as long as the whole scale is deployed), but isn't so precise that it pretends to offer more meaning than it possibly could.

I think a woolier scale helps to open up a conversation about products rather than closing one down. If you see that two phones have a score of, say, 8/10 you're unlikely to think them identical, and will hopefully look into the reviews and try and understand the difference between them: If one phone has 8.5 and another 8.6 then clearly the latter is best, job done.

And what does it really mean to score one phone 8.5 and another 8.6, anyway? ;-)

In Conclusion

6.3: The pixel density of this forum post is terrible!

I'd be very interested to hear other people's thoughts on review scores, scales etc.