Advertisement |
Advertisement |

# Mean/Average Values in R

#### Part of Mike's Big Data, Data Mining, and Analytics Tutorial

The mean or average of a set of data values is defined as the sum of the values divided by the count of values.

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

There are a few guidelines to using the mean:

- The mean is a measure of center for data that is measured on a
**continuous scale**(Review data classification here) - The mean is
**not appropriate for ordinal/nominal scale data**. Using the mean leads to meaningless statements of the form:- “The average gender in the world is somewhere between male and female (1.2)”
- “The average satisfaction was 2.3”

```
#Get 10 random integer values uniformly distributed between 1 and 20
x<-round(runif(10,1,20))
#sort and display the values
x<-x[order(x)]
x
```

`## [1] 1 3 3 10 11 14 15 16 19 19`

These values can be summarized as frequencies of individual values (frequency referring to the number [count] of times each individual value appears in the set):`table(x)`

```
## x
## 1 3 10 11 14 15 16 19
## 1 2 1 1 1 1 1 2
```

This table of values can be visualized in a histogram (a bar chart that shows the relative frequency of each value or a summarization within ranges of values [called bins]). In the chart below, the red line is drawn at the mean of the values:The chart below shows the same information, but using R’s default binning/summarization algorithm:

The mean of this set is:

\[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

\[ \bar{x} = \frac{1 + 3 + 3+ 10+ 11+ 14+ 15+ 16+ 19+ 19}{10} \]

\[ \bar{x} = \frac{111}{10} \]

\[ \bar{x} = 11.1 \]

In R, it is easy to find the average of a set of numbers using the built-in mean function:

`mean(x)`

`## [1] 11.1`

It is also possible to write (mostly) equivalent, but less efficient functions that compute the mean/average in R:```
average<-function(x) {
sum(x)/length(x)
}
average(x)
```

`## [1] 11.1`

Or even worse performance-wise, but demonstrating the mechanics of the for loop…```
average<-function(x) {
sum_x<-0
count_x<-0
for (i in 1:length(x)) {
sum_x<-sum_x+x[i]
count_x<-count_x + 1
}
sum_x/count_x
}
average(x)
```

`## [1] 11.1`

There’s really not a good reason in most cases to write your own function that calculates the mean, but you may find a special reason in doing so…Back to Mike's Big Data, Data Mining, and Analytics Tutorial

## No comments:

## Post a Comment