Statistics is the science of data!  Collecting, classifying, organizing, analyzing, interpreting, etc.

A variable is a characteristic that differs or varies from one observation to the next.  Quantitative data are data that consist of numbers.  Categorical data are data that do not consist of numbers.

The number of M&M’s in a small bag is a piece of quantitative data.

The color of an M&M is a piece of categorical data.

 

Describing Quantitative Data Numerically

 

For a given set S, , is a shorthand notation for the sum of all data in set S.

example: Let S={3, -5, 2, 1, 8, -6}  Then, = 3+(-5)+2+1+8+(-6)=3.

 


The mean (or average) of a set of n observations is .

example: The average of set S is 3/6=.5.

 

The median in a set of n observations that are ordered from smallest to largest is the middle observation (if n is odd) or the mean of the two middle observations (if n is even).

example: Let  S1 = {1, 4, 6, 9, 10}. The median of S1 is 6.

 

example: Let S = {3, -5, 2, 1, 8, -6}. In order to find the median, we must first order the set S from smallest to largest. In doing so we see that S = {-6, -5, 1, 2, 3, 8}. The two middle observations are 1 and 2. The average of these two middle observations is 1.5. Thus, the median of S is 1.5.

 

For every median, 50% of the data falls below the median and 50% falls above the median.

 

 

For a given set of data, the pth percentile is a number x such that p% of the data falls below x. Consequently, (100-p)% falls above x.

example: The median is P50, the 50th percentile.

The lower quartile, QL (Q1 on the TI-83), is P25, the 25th percentile.

The upper quartile, QU (Q3 on the TI-83), is P75,  the 75th percentile.

 

Which score would you prefer to own on the next test, P10 or P80?  Explain.

 

 

The mode of a data set is the value that occurs with the greatest frequency. Neither set S nor S1 have a mode. Consider the set S2 ={-1, 3, 5, -1, 8, 9, -1}. The mode of S2 is -1. A set may have more than one mode. Consider S3 = {1, 2, 3, 1, 2, 4, 1, 2, 5, 1, 2, 6, ...}. The set S3 has two modes, 1 and 2. If a set has two modes then the set is said to be bimodal.  Caution must be used when consider the mode for inclusion as a summary statistic.

Section 01 Section 02 Section 03 Section 04 Section 05
100 100 100 100 200 scores of 100
98 100 100 100 194 scores of 50
96 100 100 100  
92 100 100 100  
90 100 100 100  
88 100 100 100  
86 100 100 100  
86 100 100 100  
82 100 100 100  
81 100 100 100  
80 100 50 100  
79 100 50 50  
75 92 50 50  
73 85 50 50  
68 85 50 50  
54 63 50 50  
32 34 50 50  
    50 50  
    50 50  
    50 50  
      50  

 

 

 

The range of a set of data is the minimum value subtracted from the maximum value.
Let S2 ={-1, 3, 5, -1, 8, 9, -1}. The range of S2 is 9 - (-1) = 10.

The interquartile range, IQR, is Q3-Q1.

 

Let A = {0, 1, 2, 3, 4, 5, 500, 500, 995, 996,  997, 998, 999, 1000} 

and

let B = {0, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 1000}.

Compute the mean, median, mode and range for both sets A and B.

 

Consider boxes A and B contain slips of papers with the entries as previously listed.   A student selects a box, randomly selects a slip of paper and receives the value written on it in dollars.  Which box would you select to play:  A or B?

 

 

We need some other more complex statistical function. This function will be the standard deviation. The standard deviation will measure how far away the data in a set is from the average. The sample standard deviation s is computed from the formula

.

 

The TI-83/84 will quickly and easily compute many of these summary statistics.