Text Size:-+

The Wisdom of the Crowd and Wine Critic Ratings

AAWE_91.jpgI think one of the more interesting phenomena in the world of wine that has arisen in the past five years is the emergence of a group known as the American Association of Wine Economists. I must admit, when I first saw the announcement of their existence, I assumed it was just some economists having a good time and looking for an excuse to drink some more wine.

But then the papers started to be published, and now the AAWE has made it clear that they're quite serious in trying to apply the dark arts of statistics and economics to the world of wine -- a world that can increasingly be quantified and examined thanks to rafts of data available online.

The associations publications can often be met with controversy or criticism, and that's putting it a little mildly. I've heard one wine writer refer to their work as utterly fraudulent, and read many a lambasting blog post aiming to criticize the mathematics behind their work.

I didn't pay nearly enough attention in my college statistics class to be able to offer a judgement about the quality of the work done by the various authors of the 91 papers available through the association, but I can say that I find quite fascinating what this group of academic wine lovers is doing.

Their latest paper is a case in point. Entitled "The Buyer's Dilemma - Whose Rating Should a Wine Drinker Pay Attention to?" (232k PDF), this paper looked at the relationships between the scores from major wine critics and the scores found on CellarTracker for around 100 Bordeaux wines.

Like most of their papers, I have a bit of a hard time decoding their numerical voodoo. Things like "We run a two sample t-­test (with unequal variances) to check if the 1.6 points difference between community and RP [Robert Parker scores] is statistically significant. The t-­statistics is 4.58 and the critical t-­value is 1.97; therefore we reject the null hypothesis that there is no difference between community and RP average scores" make my head spin a bit.

But here's essentially what these folks are claiming that the data support:

1. There's "significant" variance between the scores of major critics (Parker, Tanzer, and the Spectator) on the same wine
2. There's "significant" disagreement between community scores on CellarTracker and Robert Parker in particular (but also Wine Spectator)
3. There's considerably more "agreement" between Stephen Tanzer's ratings and those of the CellarTracker community than the other critics
4. There is greater correlation between the community score and the price of the wine than the critics' scores and the price of the wine.

Of the findings I see the third as most interesting just as a simple fact.

I do have some questions about the findings that perhaps my readers with better math skills than I might be able to answer.

In particular I'm not sure whether what the world of statistics considers to be a "significant" difference actually translates into the wine world. There may be a significant difference (in pure mathematical terms) between a rating of 92.5 and 96 for a wine by two different critics, but really, those two critics entirely agree on the quality of the wine in every sense that really matters for the wine world.

I was glad to see the paper's authors offer a hypothesis that the last point above may be skewed by psychology: namely that consumers have a tendency to rate more expensive wines higher simply because they (subconsciously) think that more expensive wines ought to be better. This sounds quite plausible to me.

The authors also offer another interesting hypothesis which I find less likely -- that the relative divergence between CellarTracker scores and Robert Parker's scores is somewhat deliberate -- a backlash against Parker in which consumers "resent Robert Parker's influence -­ or shall we say hegemony over the wine community -­ and systematically challenge his ratings by either giving higher scores to the wines with low RP ratings and lower scores to the wines with high RP ratings."

I just find it hard to believe that consumers are aware enough of the specific Parker rating of the wine they are drinking, and when they are rating it, to facilitate this sort of behavior in a broad sense.

The one thing that the authors of the paper don't happen to devote much attention to, which I think is at least as interesting as all their other findings, is that in general, the greater wine drinking population doesn't think wines are as good as the major critics do. If I am understanding the data correctly, in all cases the community rating was below all of the critics ratings.

This is sort of surprising, given claims by some commentators in the wine industry that most people can't tell the difference between a $18 bottle of wine and a $90 bottle of wine. As a result you would think that the scores that the broader population assigned to wines might skew towards either end of the spectrum in a kind of "yum/yuck" or "love it / hate it" volatility.

Of course, CellarTracker is not necessarily representative of the broader population, a point which the paper's authors seem to acknowledge, but it isn't clear just how much they want to address this bias. Of course, it is really the only broad and deep set of consumer wine evaluation data that is publicly available, so one can hardly fault them for using it.

In any case, the paper is worth a read, and it's pretty easy to skim the technical parts. Take a gander and then tell me what you think.

Comments (9)

Eric LeVine wrote:
09.15.11 at 8:14 AM

I have been getting emailed links to those American Association of Wine Economists (AAWE) papers for a while, and I have to admit this was the first one of I have read in full. Suffice it to say I am neither an economist nor a statistician, but I think they are putting way too much emphasis on CT scores versus actually trying to read and analyze the text. Of course for purposes of the analysis they want to do I understand that the scores are there, are ample and are tempting. And at some level I am just totally tickled that I have given them some fun data to play with!

Thanks, Eric

Phil wrote:
09.15.11 at 8:28 AM

On your question Alder, the finding of statistically significant difference is done with the entire data set, not just one piece of data. The threshold is dependent on the amount of data you have, the more you have the less a difference needs to be to be statistically significant.

The word "significant" in this context is purely a statistical term, the difference between Parker and CT is small and may not mean much in the real world (as you point out). The other important thing is that there is positive correlation between the two, so wines critics rates higher than others are rated in the same order by the community. This tends to actually lend credence to critics, you may have to subtract a point or two but the order is correct (if you are assuming that CT is "correct", a big assumption in my opinion but that's another story). The point the study is making about Parker is that while the order is the same, the community is actually giving slightly higher scores to lower scored Parker wines and then the two are switching as Parker scores get higher (but critically the order remains the same). This is a bunching phenomenon that is not uncommon when you are dealing with large groups of people vs. one person: highs are not as high and lows are not as low. Of course it doesn't occur with WS or Tanzer.

I'd also note that the statistical significance test is extremely close for Tanzer, although it would still be far behind Parker and WS.

The final important thing to note is sort of glossed over and it's about price. We would expect to find strong positive correlation between price and ratings because in general, more expensive wines are better than less expensive wines (there are obvious exceptions, I know about the studies that show Joe Schmo off the street can't distinguish in a lab setting, and there is no doubt that you can find really good wine at lower prices). But the CT crowd takes this to an extreme that indicates to me that their scores may not be as reliable as the critics, at least when it comes to Bordeaux. It's also a blow to the argument that Parker is overly influenced by prestige by not tasting blind (if you take price to be an indicator of prestige) since he has the lowest correlation by far. Tanzer isn't helped by this either, since he comes close to matching CT.

One conclusion you could draw, or at least try a regression to see if it holds water, is that part of the difference between Parker and WS is that they are not as tied to price as CT and Tanzer and that is why the first two have statistically significant differences with CT. In this scenario Parker and WS are more reliable than CT/Tanzer because they aren't as tied to the price of the wine and the statistically significant difference between them is a good thing.

Another possibility to examine is if score inflation is causing the relative lack of correlation. In other words, Parker and WS rate everything more highly, including the less expensive wines, and the lower correlation is not because some expensive wines are being rated lower and some less expensive wines are being rated higher, it's because in general less expensive wines are being rated higher. In this scenario Tanzer and CT are rating these less expensive wines lower while staying in the same neighborhood on the more expensive wines because the scale has a cap of 100 and Parker and WS cannot go any higher. In other words they are using more of the scale. This seems unlikely to me, at least when it comes to Parker, because it was established that on the lower end of the rating scale the community was actually rating wines slightly higher than Parker was, difficult to do if we're tying to say that Parker is rating the less expensive (lower rated wines) too highly. It is also not supported by the variance data, which shows Parker and WS with larger variance (the amount between the lowest score and the highest score). So on the whole, this theory seems unlikely to me (and must have to the writers as well as they don't mention it).

I'd also want to know more about the difference between Parker and WS. On the whole, I'd say the report leans more towards: "The Wisdom of Crowds?" although the writers don't say so explicitly.

Alan Goldfarb wrote:
09.15.11 at 1:39 PM

To make an analogy to sports, perhaps this group is doing what they've been doing in baseball (the most stats-centric sport of all) for years -- parsing stats in ways (sometimes good, sometimes not)that eventually brings more meaningful perspective and meaning; i.e. are wins a true indicator of a pitcher's success? Read and then see "Moneyball", the story of how the Oakland A's began approaching the game, with some short term success.
One can only hope that breaking down wine statistics, might bring more clarity and efficacy.

Dan Hennessy wrote:
09.15.11 at 10:24 PM

CellarTracker users automatically see Tanzer's wine ratings when they select a particular wine. I think that the most likely explanation for the lack of statistical significance between CT scores and ST scores is that the Tanzer score is an anchor point for CT users. That is, they are (consciously or unconsciously) influenced by the Tanzer rating.

CLD wrote:
09.16.11 at 4:03 AM

Wine is an experience, a feeling, a perception. The drinker has his/her own preferences simply because the first tingle and the first whiff triggers unique memories and feelings. How do the raters and authorities wish to rate and standardize those feelings?

Tone Kelly wrote:
09.16.11 at 5:06 AM

The answer to the question is really an individual matter. I ran an experiment with the local American Wine Society where we served wines that had large differences in the scores that individual critics gave to the wines. The wines were served knowing what they were but the critic's scores were hidden from the tasters until after they had voted on which ones they preferred. There was no general conclusion on which critic was "right". The only takeaway was that each taster had to find a critic who's "preferences in taste" came closest to their own. Since there is no true, right or correct answer to what the rating/tasting notes should be for a wine, individual tasters should find a critic that is the closest to their preferences. This could vary by region or grape - such as trusting Parker for Bordeaux or Tanzer for Spain or Italy.

Lindsay M wrote:
09.16.11 at 4:13 PM

The meaning of "significant" in statistics is that the probablility of being wrong in your conclusion is below some previously chosen value, usually 5% or 1%.

So the conclusion that Parker and CT are significantly different in their ratings means that the differences seen have less than a 5% probability (or 1%, if that is the level of significance chosen) of being due to random chance.

Jim Seder wrote:
12.02.11 at 5:25 PM

The concept of wine ratings was founded, I believe, in the best spirit of helping wine consumers establish a relationship between quality and price. And it really wasnt until the great 2000 Bordeaux vintage of the century that ratings became such a craze (not to mention the 2003, 2005, 2009 and 2010)! And no wonder, with Bordeaux and Burgundy prices skyrocketing, consumers were determined to purchase only the best according to a given expert.
I believe that wine ratings have their place in the larger spectrum of decision making but never as the single overriding factor. Whats far more valuable, Ive found, is deciphering the overall opinion of a wine as revealed by several experienced tasters. If the consensus is consistent (and if youre lucky enough to know or trust someone who has sampled that wine), that carries far more weight than does one single view.
Finally, if your palate style and preferences dont agree with the masses, dont be swayed by rating numbers..Its YOUR palate that counts as well as your $$.

02.24.12 at 6:42 PM

I completely believe in the wisdom of the crowd's ability to judge the quality of a wine much much more than the wine critics. A wine critic is a human being and has his or her own tastes just as much as a 21 year old wine newby tasting wine for the first time.

If you have 100 people's opinion about a given wine vs. Robert Parker's opinion about that same wine I'd take the 100 person crowd's opinion every time.

Comment on this entry

(will not be published)
(optional -- Google will not follow)

Type the characters you see in the picture above.

Buy My Book!

small_final_covershot_dropshadow.jpg A wine book like no other. Photographs, essays, and wine recommendations. Learn more.

Follow Me On:

Twitter Facebook Pinterest Instagram Delectable Flipboard

Most Recent Entries

Plumbing the Depths of Portugal: A Tasting Journey Vinography Images: Rain at Last The Mysterious Art of Selling Direct Critical Consolidation in Wine What Has California Got Against Wineries? Dirty Money for a Legendary Brand Vinography Images: Tendrils Highlights from Tasting Champagne with the Masters Off to Portugal for a Drink Vinography Images: Hazy Afternoon

Favorite Posts From the Archives

Masuizumi Junmai Daiginjo, Toyama Prefecture Wine.Com Gives Retailers (and Consumers) the Finger 1961 Hospices de Beaune Emile Chandesais, Burgundy Wine Over Time The Better Half of My Palate 1999 Királyudvar "Lapis" Tokaji Furmint, Hungary What's Allowed in Your Wine and Winemaking Why Community Tasting Notes Sites Will Fail Appreciating Wine in Context The Soul vs. The Market 1989 Fiorano Botte 48 Semillion,Italy

Archives by Month


Required Reading for Wine Lovers

The Oxford Companion to Wine by Jancis Robinson The Taste of Wine by Emile Peynaud Adventures on the Wine Route by Kermit Lynch Love By the Glass by Dorothy Gaiter & John Brecher Noble Rot by William Echikson The Science of Wine by Jamie Goode The Judgement of Paris by George Taber The Wine Bible by Karen MacNeil The Botanist and the Vintner by Christy Campbell The Emperor of Wine by Elin McCoy The World Atlas of Wine by Hugh Johnson The World's Greatest Wine Estates by Robert M. Parker, Jr.