Reviewer's Re-Review: Artic Wolf Analysis 2012
Some years ago it became apparent that we all were at the mercy of any rummie come lately who had enough computer skills to toss up a website and hold forth as a "reviewer". Some even had the moose balls to name their own "methods" - before they'd had any experience, and worse yet, seemingly possessed of a remarkable, hypertaster's bitter palate.
The Artic Wolf was first reviewed in January, 2011, and summarized by this distribution of scores...
As should be obvious, there was a considerable bias toward the higher scores, with only three rums scoring below "average", but with nearly half his reviews receiving very high scores. This was hardly realistic, but to be fair, only 55 rums were scored. Although this is still a reasonable sample, it is barely so. What to do?
Wait, and wait. Which we did, and now this amazing, sefl-proclaimed hobbyist (who once denied he had any commercial intentions), has clearly gone to the dark side and has now reviewed an amazing 371 spirits, not to mention an estimated 500 or 600 drinks recipes in little more than a year. No wonder he has a good layer of winter fat.
Let's see what happened. Did the Wolf become more or less biased?
Survey sez: More biased. Seriiously check the two charts - his shift to the right - to the very highest scores - should be obvious. This time I superimposed what a normal distribution would look like. Here's the actual numbers:
50: 0 spirits
55: 0
60: 0
65: 3
70: 11
75: 38
80: 85
85: 119
90: 93
95: 22
_______
Total spirits: 371
Remember, a score of "75" should be both the median and average score in a normal distribution. In this case only 14 rums score below average, while about 280 score above! The Wolf's median is in the upper 80's and over "85" - these in an range that most top reviewers would give 4 stars (out of 5) for "great" spirits. Mr. Wolf places 204 of his reviews in this 4 Star category. Worse yet, in the 90 to 100 category, what top reviewers would call 5 Star spirits, Mr. Wolf has awarded this top honor to an unbelievable 115 spirits. Holy Mooseshit, Batman! That's a total of 319 rums in the "great" or "superlative" classes.
As for "Average: spirits, there are only, gulp, 49 spirits, and for "below Average"? Only 3.
What does this mean?
If a gatherer of data, in this case a reviewer is unbiased, the "Average" or 3 Star category should be the largest. The number of spirits below average should match the number above average. In Mr. Wolf's case, the number below is 14, while the number above is 319! Talk about imbalance. The same is true for the top and bottom ratings - there should be very few of each (typically about 2.1% of the data). In Mr. Wolf's case the number of bottom rated sprits is zero - none - while at the tippy, tippy top ("95+") he reports a stunning 23.
Let's put this in further perspective by comparing these numbers to a normal bell curve and distribution:
Normal Distribution:
50-59: 8 spirits
60-69: 50
70-79: 253
80-89: 50
90-99: 8
Mr. Wolf's Distribution:
50-59: 0 spirits
60-69: 3
70-79: 49
80-89: 253
90-99: 115
Folks, this can't be blamed on sample size, which is more than adequate. Nor can it this be attributed to the common response that "...I only pick the best spirits". First, Mr. Wolf appears to review anything that is gifted to him for review, and second it doesn't matter anyway. Here's why...
It's like throwing darts. It matters not where you aim at the target - your darts will accumulate in a normal fashion - with a few hitting the aiming point, a few missing wildly, and most clustered in an "average" distance away. This pattern is common and applies to data collected for any endeavor from shoe size, to human height or weight, number of tomatoes on a plant, to spirits reviews. That is simply life. Trust me, the top reviewers (think Dave Broom) show perfectly normal distribution of scores.
So the question is before us - was Mr. Wolf biased, and has it changed for the better or worse? You decide...
Reviewer's Re-Review: Artic Wolf
- Capn Jimbo
- Rum Evangelisti and Compleat Idiot
- Posts: 3550
- Joined: Mon Dec 11, 2006 3:53 pm
- Location: Paradise: Fort Lauderdale of course...
- Contact:
Reviewer's Re-Review: Artic Wolf
Last edited by Capn Jimbo on Wed Oct 10, 2012 11:00 am, edited 1 time in total.
- Capn Jimbo
- Rum Evangelisti and Compleat Idiot
- Posts: 3550
- Joined: Mon Dec 11, 2006 3:53 pm
- Location: Paradise: Fort Lauderdale of course...
- Contact:
Yet another view...
Yet another view...
Here's another view, this time with numbers of reviews. Even this gives Mr. Wolf the benefit of the doubt as the superimposed "bell curve" is on the squat side. While going through these 371 reviews, particularly those of rum, it seemed to me that a score of 85 to 87 seemed most typical, frequent and predictablel. Curious, I ran a spreadsheet and found the average score was just over 86.
What does this mean?
It's important to note that Mr. Wolf has a very unusual rating scale. As a matter of history the notion of scoring really emerged at the hands of Robert Parker who - following the lead of years of scholastic grading (A to F) established what has now become standard for rating almost anything.
Parker established a 100 point scale, with no scores below 50. In effect this provided 5 ranges: poor, below average, average, good and excellent. Other reviewers used the old familiar letter grades: F - D - C - B - A or the well known five star system: 1 - 2 - 3 - 4 and 5 stars.
Thus, an "average" wine, beer or spirit would earn a 3 Stars, a "C", or a score in the 70's.
Not Mr. Wolf
His "scale"...
Bizarre? Let's continue.
Another issue is that mixing spirits are rated right along with "sippers". The effect of this inclusion is that what Mr. Wolf regards as primarily sipping spirits automatically earn scores in the 90's, while what he wants to call "mixers" end up the 70's. This is the equivalent of rating sports cars and pickup trucks using the same scale.
It can't work. Does Mr. Wolf actually understand his own scale? A poster at his site posted this very telling question:
Does his "90" compare to F. Paul Pacult's 90? Or to Dave Broom's 5 Stars? Or to other's "A" ratings? Survey sez... No. Some would therefore call Mr. Wolf's "90" less than useful. But not so fast. It is very, very useful - not to drinkers...
But to distillers.
You see, Mr. Wolf is highly dependent on freebies. I don't know the average Canadian cost of his 371 review bottles, but I don't think $12,000 would be far off. I can tell you from experience that you don't get many freebies unless your reviews are sufficiently and favorably predictable. When most spirits considered primarily as "sippers" automatically score in the 90's, the distillers simply can't supply you fast enough. The answer is simple:
The public expects that all scores in the "90's" represent the relatively few, really excellent, top-rated spirits. A "93" from Pacult, Jackson, Parker, Broom, BTI, the Malt Maniacs or Ralfy - or The Rum Project for that matter - really mean something. They are rare. But as for Mr. Wolf?
Perhaps not so much. You decide...
Here's another view, this time with numbers of reviews. Even this gives Mr. Wolf the benefit of the doubt as the superimposed "bell curve" is on the squat side. While going through these 371 reviews, particularly those of rum, it seemed to me that a score of 85 to 87 seemed most typical, frequent and predictablel. Curious, I ran a spreadsheet and found the average score was just over 86.
What does this mean?
It's important to note that Mr. Wolf has a very unusual rating scale. As a matter of history the notion of scoring really emerged at the hands of Robert Parker who - following the lead of years of scholastic grading (A to F) established what has now become standard for rating almost anything.
Parker established a 100 point scale, with no scores below 50. In effect this provided 5 ranges: poor, below average, average, good and excellent. Other reviewers used the old familiar letter grades: F - D - C - B - A or the well known five star system: 1 - 2 - 3 - 4 and 5 stars.
Thus, an "average" wine, beer or spirit would earn a 3 Stars, a "C", or a score in the 70's.
Not Mr. Wolf
His "scale"...
Keep in mind that unlike most reviewers Mr. Wolf actually scores the bottle (which can change a rating by a substantial 5 points), and favors the palate (taste) far more than the nose - giving taste an amazing 60 points, while the nose (which most reviewers favor) just 10 points. Most reviewers score only the totality of the experience - harmony, roundness, integration, complexity and representation of the style, while Mr. Wolf prefers to only score the components.0-25 A spirit with a rating this low would actually kill you.
26-49 Depending upon your fortitude you might actually survive this.
50 -59 You are safe to drink this…but you shouldn’t.
60-69 Substandard swill which you may offer to people you do not want to see again.
70-74 Now we have a fair mixing rum or whisky. Accept this but make sure it is mixed into a cocktail.
75-79 You may begin to serve this to friends, again probably still cocktail territory.
80-84 We begin to enjoy this spirit neat or on the rocks. (I will still primarily mix cocktails)
85-89 Excellent for sipping or for mixing!
90-94 Definitely a primary sipping spirit, in fact you may want to hoard this for yourself.
95-97.5 The Cream of the Crop
98+ I haven’t met this bottle yet…but I want to.
Bizarre? Let's continue.
Another issue is that mixing spirits are rated right along with "sippers". The effect of this inclusion is that what Mr. Wolf regards as primarily sipping spirits automatically earn scores in the 90's, while what he wants to call "mixers" end up the 70's. This is the equivalent of rating sports cars and pickup trucks using the same scale.
It can't work. Does Mr. Wolf actually understand his own scale? A poster at his site posted this very telling question:
A good, fair and honest question. The poster simply wants to know which of his scores are the equivalent of the now standard ratings of "poor", "below average", "average", "good" and "excellent". Really, a simple question. And Mr. Wolf's answer?Emil: "...But I do have a question. Wines are normally scored using scales of 50 to 100, E to A, or the five star system. From bottom to top these are poor, below average, average, good, and excellent. For example an average wine would get a “C”, 3 stars or a score in the 70′s. How would you equate your scores to what we in the wine world are used to for these five categories?
He avoids this very simple question, and for good reason. His convoluted "system" simply doesn't relate to the standard ratings we have come to know and understand. It simply doesn't work for those of us who'd like to know how his scores compare with the many other reviewers we follow.
Mr. Wolf: "Hi Emil and welcome to my website. Since I am unfamiliar with the five star system used for wines I will not attempt to superimpose that system upon mine or vica versa.
My scoring system is explained in detail below the review."
Does his "90" compare to F. Paul Pacult's 90? Or to Dave Broom's 5 Stars? Or to other's "A" ratings? Survey sez... No. Some would therefore call Mr. Wolf's "90" less than useful. But not so fast. It is very, very useful - not to drinkers...
But to distillers.
You see, Mr. Wolf is highly dependent on freebies. I don't know the average Canadian cost of his 371 review bottles, but I don't think $12,000 would be far off. I can tell you from experience that you don't get many freebies unless your reviews are sufficiently and favorably predictable. When most spirits considered primarily as "sippers" automatically score in the 90's, the distillers simply can't supply you fast enough. The answer is simple:
The public expects that all scores in the "90's" represent the relatively few, really excellent, top-rated spirits. A "93" from Pacult, Jackson, Parker, Broom, BTI, the Malt Maniacs or Ralfy - or The Rum Project for that matter - really mean something. They are rare. But as for Mr. Wolf?
Perhaps not so much. You decide...
Last edited by Capn Jimbo on Sun Oct 07, 2012 1:20 pm, edited 1 time in total.
- Capn Jimbo
- Rum Evangelisti and Compleat Idiot
- Posts: 3550
- Joined: Mon Dec 11, 2006 3:53 pm
- Location: Paradise: Fort Lauderdale of course...
- Contact:
It gets worser and worser...
It gets worser and worser...
The more you examine Mr. Wolf's unique system and results, the worse it gets. Keep in mind that he scores both "mixers" and "sippers" together, using the same scale (like rating sports cars against pickup trucks). As he puts in his guide to ratings, spirits rated from 70 to 80 are mixers, and according to Mr. Wolf, he considers even those up to 84 in the mixing category.
His "sippers" don't really begin until the upper 80's, and what he consider spirits that are "primarily sippers" are 90 and above. Consider these facts and the distribution of scores is even more skewed. Now even his "mixers" are mostly "above average" and his sippers are mostly in the 90's. Can it get any more biased than that?
You decide.
The more you examine Mr. Wolf's unique system and results, the worse it gets. Keep in mind that he scores both "mixers" and "sippers" together, using the same scale (like rating sports cars against pickup trucks). As he puts in his guide to ratings, spirits rated from 70 to 80 are mixers, and according to Mr. Wolf, he considers even those up to 84 in the mixing category.
His "sippers" don't really begin until the upper 80's, and what he consider spirits that are "primarily sippers" are 90 and above. Consider these facts and the distribution of scores is even more skewed. Now even his "mixers" are mostly "above average" and his sippers are mostly in the 90's. Can it get any more biased than that?
You decide.