eg the modern 6/10=bad syndrome
I also saw a solution for normalizing the scores of every person to battle this bias. This is used in Criticker system (recommendation for movies):
criticker.com/critickers-algorithm-explained/
Once you've got enough ratings, Criticker normalizes your scores. We do this because of the wide variety in the way people rate things. For some people, a score of "80" is close to a masterpiece, while for others "80" is middling. Some try to spread their scores equally across the whole scale, while others adhere to the system they learned in school, when a "60" was real bad. Like, if you came home with a "60" on your math test, you were probably going to get grounded.
Currently, I’m investigating if this is something my recommendation algorithm needs, some maybe I will implement some kind of normalization for scores on Gamescovery, we will see.
TyrianMollusk@infosec.pub 6 hours ago
Sounds good. Doesn’t actually work :/ Sure, if everyone gave a statistically valid data spread covering every rating point, then you could probably normalize them so it doesn’t matter what numbers an individual used. But people don’t do that. Maybe someone only rates 8-10, but is that because they like everything, because they don’t rate anything they didn’t like, because they think an 8 is bad, because they just lump everything they don’t like in the “8 or below” group, or some other random thing? They don’t know, and what about the obvious fact that most everyone watches more movies they rate good than bad, so ratings have a huge implicit skew to the distribution? They don’t know that either, but they scale the ratings anyway, and that’s some of why they don’t really work if you get down to it. The rest is just that their analysis concept is broken.
I actually use criticker for my movie rating, and it doesn’t really do me any good (but it’d be a pain to move everything, so I haven’t :). Their system still falls prey to the usual issues, just not as obviously as say Steam which basically just always throws the most popular candidate it can shoehorn into a rec. If you have weird taste, you get grouped with rating profiles that happen to agree enough on something, but that don’t actually have real connection to your taste. Eg, if I like some movies everyone likes (and let’s face it, we pretty much all have some close-enough-to-universally appreciated likes), my “close rating” users will be focused on people who also liked those movies, and a lot of meaningful stuff becomes noise, but one’s taste is much more in the noise data than in the big obvious strokes. Alternatively, if I watch and like some fringe thing no one sees, suddenly anyone else who did is closer to me, mainly because there’s so little data in common between us to go on.
Criticker is convinced I love esoteric foreign drama (I really don’t), because I scour deep into horror during part of the year and occasionally find a gem that gets a good rating, often from some dark corner of Asia. They also think my 50 is 77th percentile, probably for the same reason (ie I do have a lot of low ratings, because I’m watching things just because they are horror). A 50 is where I put “pretty decent/not really that good” stuff, which seems a lot lower than 77 to me, but I can’t tell Criticker that because of their “helpful” scaling. After my partner (who watches basically everything I do and has very similar taste), the next closest TCI (their code for how close your normalized ratings are to someone else’s, and the basis for their rating prediction) comes at thirty. That basically says they’re useless, which is more accurate than any given rating prediction they generate for me, with my mere 1,845 ratings to go by ;)
I really think one needs to find and minimize the “common” elements to focus on the uncommon in rating analysis, and in prediction. Eg if people tend to like X but I don’t, that actually means a lot more than if I also like X. And recommending I rush out to watch The Godfather (thanks again, Criticker, never heard that one…), doesn’t do me any good, because everyone already knows it. It’s an “easy” rec, but it’s not a good rec.
If Criticker used the 3-4-3 system for their ratings instead of telling us it will just work out, that would lead me to apply my numbers differently, which on its own is kinda telling for improving their data. I didn’t make up the 3-4-3 thing, BTW. I was working on a related web/database project, and that was passed on to me as studied and statistically well-proven for producing better survey results (and that was someone from an industry that definitely cares about that). Does make a lot of sense, though. It’s nice when something has a clear right answer like that… except you get a little frustrated seeing nothing actually use it ;)