Comment

Comment on Why do we use average instead of mode?

Because they mean very different things. Imagine you tallied the spending of 5 people in your restaurant:

10 15 30 100 150

First of all that distribution has no mode, so let’s then check the next 2 customers.

10 15 20 30 100 150 150

Cool, now checked with 6 people this, and we can say the following.

The mode is to spend 150. Almost no one does this, but that is the mode regardless.

The mean is 30, this tells you that half the people spend more than this, and half the people spend less than it. However it doesn’t give you an accurate idea, because the people who spend less spend close to it, but the people who spend more spend way more. So if a guy spends 35 he would look like a high spender, but in fact he probably should be in the low spending category.

The average is 67.85, no one spent this amount, but this tells you that if a person spends more than that he’s a high spender, so of someone came in and spent 35 you would know he’s not one of your high spending customers.

Now let’s see how each of those numbers is at predicting how much 7 customers would spend, let’s look at the same values, where the customers spent 475. The mode tells you that people will spend 1050, that’s absolutely wrong. The mean tells you that they’ll spend 210, that’s also very wrong. The average however tells you that they’ll spend 475 which is the exact number.

This is the same for every other statistics, even if it doesn’t make any sense to say that people have an average of 2.3 kids, if you were planning on receiving 10 random families they would probably have 23 kids in total. Average is good at predicting large groups, and that’s the information we usually care about when we’re trying to express a large group in a single number. If you want a second number the obvious choice is the standard deviation, in the example above the standard deviation is 63.76 this gives you an idea on how accurate is your average at predicting, so in the case above not very accurate at all, but if we imagine that the number of kids above had a standard deviation of 0.2 you can be 68% certain that the 10 families will have between 21-25 kids, or 95% certain that they will have between 19-27 kids, or 99.7% certain that they will have between 17-29 kids. Working with the level of confidence in a prediction allows you to evaluate certainty at doing things. If you only knew that the mean was 2 kids or that the mode was 1 kid you couldn’t predict things with any accuracy.

source

Sort:hotnew top

poprocks@lemmy.world ⁨11⁩ ⁨months⁩ ago
Mean is averages Median is the middle value

source
- Nibodhika@lemmy.world ⁨11⁩ ⁨months⁩ ago
  Oops, sorry, english is not my first language. You’re correct, I’ll edit my post.
  
  source
- my_hat_stinks@programming.dev ⁨11⁩ ⁨months⁩ ago
  They’re all averages. Mean is the sum divided by the how many numbers therr are.
  
  source
Tramort@programming.dev ⁨11⁩ ⁨months⁩ ago
Outstanding response

source
fastandcurious@lemmy.world ⁨11⁩ ⁨months⁩ ago
This makes the most sense! I’ll add though that over a large data set, i still think mode still gives you a better idea about what you should expect, mean makes more sense if you are talking solely about stats and numbers, and want to make a decision based on a ‘trend’

source
- Nibodhika@lemmy.world ⁨11⁩ ⁨months⁩ ago
  Not really, it depends on extremes, imagine you have 1001 couples, 400 have 0 kids, 201 have 1 kid, 100 have 2 kids, 100 have 3 kids, 50 have 4, 50 have 5, 30 have 6, 30 have 7, 20 have 8, 20 have 9. The mode is 0, the median is 1, the average is 1.88.
  
  In this case you get two extremes, a lot of people with 0 kids, and people with lots of kids that move the average up.
  
  source
derpgon@programming.dev ⁨11⁩ ⁨months⁩ ago
Mode would probably work great for the # of kids statistic.

Just to add, mode works best for data sets with low amount of values (number of kids is usually 1-3). It completely breaks with high amount of distinct values (like $ spent).

source
- Nibodhika@lemmy.world ⁨11⁩ ⁨months⁩ ago
  Yes, it works best for small integer numbers, but it doesn’t provide any meaningful degree of confidence in the amount of kids, because 0,1,2,2,2,3,5 and 1,2,2,2,3,5,6 have the same mode but express very different groups.
  
  source