DISQUS

Building Web 2.0 Reputation Systems: Ratings Bias Effects

  • Amy Jo Kim · 4 months ago
    Great analysis, love this - can't wait for the book!
  • Nina Simon · 4 months ago
    I used to spend a lot of time as a slam poet. Poetry slams, like custom autos, are an example of user content judged by other community members. But despite an olympic 0-10 with one decimal point scoring scale (and the fact that every poem must be scored), slam scores tend to be in the 7-10 range, and as the night goes on and the excitement increases, score creep can shrink the range to 9.7-10. The social pressure to rate people highly is effective.

    Unlike the custom auto community, slams aren't judged by other poets but by members of the audience who are not poets and ideally have never attended a slam previously. They do this to avoid bias and to focus on poetry that appeals "to the people." But the result is a scoring mechanism that rewards novelty over subtlety, and poets often feel frustrated by the system. It seems to me that one of the reasons the custom auto ratings were so healthy is the fact that users were rating based on their own relational sense as active participants of what is good rather than a novice or absolutist sense of what is good.
  • Claire · 4 months ago
    Interesting. I notice the least sharp j-curve was movies. I would think that books would also fall in a less sharp j-curve. Most of us will only purchase/use something which we have some expectation of liking. I won't go see a slasher flick because I have no expectation of liking it, so I'm not going to be rating one. Professional movie critics, who are paid to view a wide variety of movies, will go and rate it accordingly. I've noticed this phenonmenon with movie reviews. The movie goer ratings are almost always higher than the critics rating. But movies are also viewed in a social context of "dates". I might go see a movie because I'm going along with a friend(s) that I might ordinarily avoid.

    Books are interesting because within any genre the variety and number published is fairly large. I might be a mystery fan but that covers a wide range. SF&F, romance, mystery are all large, amorphous sets. So I might buy a top rated or new book which I in retrospect consider a complete waste of money. Also, "fans" are notoriously brutal when there expectations aren't met. And it's in genre fiction (and movies) that you find some of the most intensely loyal fans. I'm curious as to how that effects ratings.

    When your choices within a set are numberous and ill-defined, how does that impact your willingness to select a member within it. And if I never select it, I'm not going to rate it. Also, what kind of personalities are more likely to rate a product. Many more people will purchase a product than will rate it. What does that tell us about ratings? How does the intensity of my feelings effect my willingness to express an opinion. My guess there are far more 2 and 3 opinions which are not showing up as ratings. How likely am I to give a rating to something on which I'm lukewarm ?

    Just some thoughts.
  • Mekin · 4 months ago
    Thanks for the graphs & some neat analysis!

    I am not convinced that looking at this aggregate distribution alone we can ask the question if a 5-scale rating is needless and instead we can just use fav or not. The volumes will always be towards the best - Harry Potter's for example (for books) - And those alone can skew this curve.

    For other entities, the distribution might be a lot better and might make the 5-star scale meaningful.
  • frandallfarmer · 4 months ago
    Mekin - the post only suggests considering that the 5-point scale may not be the best choice. I'm certain that it is used many places a simpler scheme would be superior - just as I know that there are many places a 2-point scale (thumbs-up and thumbs-down) is used when a vote-to-promote model would be better...

    You'd be surprised how many product designers just immediately assume that a 5-point scale will generate a straight-line distribution of scores. Data clearly show's that's faulty. Actually, I have never, ever seen such a distribution.
  • xian · 4 months ago
    twas ever thus. the rich get richer: "They search, and the results with the highest ratings appear first and if the user has experienced that object, they may well also rate it - if it is easy to do so - and most likely will give 5 stars..."
  • Jeffrey Henning · 3 months ago
    I think this analysis neglects the importance of random sampling vs. convenience sampling, which is used by almost all reputation systems. As a result, such ratings are not representative of the much larger group of site visitors that hasn't given a rating.

    I would think the ideal rating system would have maximum dispersion: 20% of each star rating, for instance: a line, rather than a curve. Wouldn't that be preferable?
  • frandallfarmer · 3 months ago
    There are two different main points in your comment, Jeffery:

    1) The star ratings have self-selection bias and are not randomly sampled. This is correct as far as it goes. As to the statement that almost all reputation systems use randomness, I must disagree. By definition the internet introduces self-selection bias: each system is limited to those who have computers that are hooked to the internet that are using a particular application in a particular site and who opt-in to participation. Randomness does play several useful roles, even in this context, such as is detailed in our book at http://buildingreputation.com/doku.php?id=chapt...

    2) I don't see any reason to conclude that a flat distribution is always most desirable at all. People are not random. People have tastes and opinions. There is no data to suggest that taste/opinion is evenly distributed. Don't polls (that are properly randomly sampled) usually have an uneven distribution of results? Otherwise we wouldn't have so many polls. :-)

    Randy