Friday 23 March 2012


Quartile


In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled. A quartile is a type of quantile.
In epidemiology, sociology and finance, the quartiles of a population are the four subpopulations defined by classifying individuals according to whether the value concerned falls into one of the four ranges defined by the three values discussed above. Thus an individual item might be described as being "in the upper quartile".

Definitions

first quartile (designated Q1) = lower quartile = splits lowest 25% of data = 25th percentile
second quartile (designated Q2) = median = cuts data set in half = 50th percentile
third quartile (designated Q3) = upper quartile = splits highest 25% of data, or lowest 75% = 75th percentile
The difference between the upper and lower quartiles is called the interquartile range.

If a data set of values is arranged in ascending order of magnitude, then:


The interquartile range is a more useful measure of spread than the range as it describes the middle 50% of the data values.

Computing methods

There is no universal agreement on choosing the quartile values.[1]
One standard formula for locating the position of the observation at a given percentile, y, with n data points sorted in ascending order is:[2]

Case 1: If L is a whole number, then the value will be found halfway between positions L and L+1.
Case 2: If L is a fraction, round to the nearest whole number. (for example, L = 1.2 becomes 1).

Examples:

Method 1
Use the median to divide the ordered data set into two halves. Do not include the median into the halves, or the minimum and maximum.
The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.
This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.

Method 2
Use the median to divide the ordered data set into two halves. If the median is a datum (as opposed to being the average of the middle two data), include the median in both halves.
The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.












Example 6

Given the series:
3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and standard deviation.
The quartiles 1 and 3.
The deciles 2 and 7.
The percentiles 32 and 85.


3, 5, 2, 7, 6, 4, 9.






References:

http://www.vitutor.com/statistics/descriptive/a_15.html
http://www.vitutor.com/statistics/descriptive/deciles.html
http://www.yourdictionary.com/decile
http://www.maplesoft.com/support/help/Maple/view.aspx?path=Statistics/Decile
http://www.mathsteacher.com.au/year9/ch17_statistics/06_quartiles/quartiles.htm
http://en.wikipedia.org/wiki/Quartile



Decile

Decile refers to one of ten equal groups which are divided a large group of values or statistics.

It is any one of the numbers or values in a series dividing the distribution of the individuals in the series into ten groups of equal frequency.
The deciles are the nine values of the variable that divide an ordered data set into ten equal parts.
The deciles determine the values for 10%, 20%... and 90% of the data.
D5 coincides with the median.
The Decile function computes the specified decile of the specified random variable or data set.
The first parameter can be a data set (represented as an Array), a distribution, a random variable, or an algebraic expression involving random variables.

The second parameter d is a decile or list of deciles.





Given the series:
3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and standard deviation.
The quartiles 1 and 3.
The deciles 2 and 7.
The percentiles 32 and 85.


3, 5, 2, 7, 6, 4, 9.








Example 3:

Given the series:
3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and standard deviation.
The quartiles 1 and 3.
The deciles 2 and 7.
The percentiles 32 and 85.


3, 5, 2, 7, 6, 4, 9.






References:

http://www.wordnik.com/words/decile
http://www.math.unb.ca/~knight/BasicStat/quartilx.htm
http://www.vitutor.com/statistics/descriptive/a_15.html
http://www.vitutor.com/statistics/descriptive/deciles.html
http://www.yourdictionary.com/decile
http://www.maplesoft.com/support/help/Maple/view.aspx?path=Statistics/Decile


Percentiles

In statistics, a percentile (or centile) is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found. The term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced tests.
The 25th percentile is also known as the first quartile (Q1), the 50th percentile as the median or second quartile (Q2), and the 75th percentile as the third quartile (Q3).
There is no universally accepted definition of a percentile. Using the 65th percentile as an example, the 65th percentile can be defined as the lowest score that is greater than 65% of the scores. This is the way we defined it above and we will call this "Definition 1". The 65th percentile can also be defined as the smallest score that is greater than or equal to 65% of the scores. This we will call "Definition 2". Unfortunately, these two definitions can lead to dramatically different results, especially when there is relatively little data. Moreover, neither of these definitions is explicit about how to handle rounding. For instance, what score is required to be higher than 65% of the scores when the total number of scores is 50? This is tricky because 65% of 50 is 32.5. How do we find the lowest number that is higher than 32.5 of the scores? A third way to compute percentiles (presented below), is a weighted average of the percentiles computed according to the first two definitions. This third definition handles rounding more gracefully than the other two and has the advantage that it allows the median (discussed later) to be defined conveniently as the 50th percentile.


so the 40th percentile would be the third number (since 2.5 rounds up to 3), or 35.
The 100th percentile is defined to be the largest value. (In this case we do not use the above definition with P=100, because the rank n would be greater than the number N of values in the original list.)

Linear interpolation between closest ranks

An alternative to rounding used in many applications is to use linear interpolation between the two nearest ranks.
In particular, given the N sorted values , we define the percent rank corresponding to the nth value as:


This is halfway between 20 and 35, which one would expect since the rank was calculated above as 2.5.
It is readily confirmed that the 50th percentile of any list of values according to this definition of the P-th percentile is just the sample median.
Moreover, when N is even the 25th percentile according to this definition of the P-th percentile is the median of the first values (i.e., the median of the lower half of the data).

Weighted percentile

In addition to the percentile function, there is also a weighted percentile, where the percentage in the total weight is counted instead of the total number. There is no standard function for a weighted percentile. One method extends the above approach is a natural way.



Applications

When ISPs bill "burstable" internet bandwidth, the 95th or 98th percentile usually cuts off the top 5% or 2% of bandwidth peaks in each month, and then bills at the nearest rate. In this way infrequent peaks are ignored, and the customer is charged in a fairer way. The reason this statistic is so useful in measuring data throughput is that it gives a very accurate picture of the cost of the bandwidth. The 95th percentile says that 95% of the time, the usage is below this amount. Just the same, the remaining 5% of the time, the usage is above that amount.
Physicians will often use infant and children's weight and height percentile to assess their growth in comparison to national averages.

The normal curve and percentiles

The methods given above are approximations for use in small-sample statistics. In general terms, for very large populations percentiles may often be represented by reference to a normal curve plot. The normal curve is plotted along an axis scaled to standard deviation, or sigma, units. Mathematically, the normal curve extends to negative infinity on the left and positive infinity on the right. Note, however, that a very small portion of individuals in a population will fall outside the −3 to +3 range.

In humans, for example, a small portion of all people can be expected to fall above the +3 sigma height level.

Percentiles represent the area under the normal curve, increasing from left to right. Each standard deviation represents a fixed percentile. Thus, rounding to two decimal places, −3 is the 0.13th percentile, −2 the 2.28th percentile, −1 the 15.87th percentile, 0 the 50th percentile (both the mean and median of the distribution), +1 the 84.13th percentile, +2 the 97.72nd percentile, and +3 the 99.87th percentile. Note that the 0th percentile falls at negative infinity and the 100th percentile at positive infinity.

Examples:

EXAMPLE 1

Consider the 25th percentile for the 8 numbers in the table. Notice the numbers are given ranks ranging from 1 for the lowest number to 8 for the highest number.


The first step is to compute the rank (R) of the 25th percentile. This is done using the following formula:

R=P100(N+1)

where P is the desired percentile (25 in this case) and N is the number of numbers (8 in this case). Therefore,

R=25100(8+1)=94=2.25

If R were an integer, the Pthe percentile would be the number with rank R. When R is not an integer, we compute the Pth percentile by interpolation as follows:

Define IR as the integer portion of R (the number to the left of the decimal point). For this
example, IR=2

Define FR as the fractional portion of R. For this example, FR=0.25

Find the scores with Rank IR and with Rank IR+1 For this example, this means the score with Rank 2 and the score with Rank 3. The scores are 5 and 7.

Interpolate by multiplying the difference between the scores by FR and add the result to the lower score. For these data, this is 0.25×(7−5)+5=5.5

Therefore, the 25th percentile is 5.5. If we had used the first definition (the smallest score greater than 25% of the scores) the 25th percentile would have been 7. If we had used the second definition ( the smallest score greater than or equal to 25% of the scores) the 25th percentile would have been 5.

EXAMPLE 2

For a second example, consider the 20 quiz scores in the table.


We will compute the 25th and the 85th percentiles. For the 25th,

R=25100(20+1)=214=5.25

IR=5

FR=0.25

Since the score with a rank of IR (which is 5) and the score with a rank of IR+1 (which is 6) are both equal to 5, the 25th percentile is 5. In terms of the formula:
The 25th percentile equals

0.25×(5−5)+5=5

For the 85th percentile,

R=85100(20+1)=17.85

IR=17

FR=0.85

CAUTION:

FR does not generally equal the percentile to be computed as it does here.

The score with a rank of 17 is 9 and the score with a rank of 18 is 10. Therefore, the 85th
percentile is:

0.85×(10−9)+9=9.85

Let's consider the 50th percentile of the numbers 2, 3, 5, 9.

R=50100(4+1)=2.5

IR=2

FR=0.5

The score with a rank of IR is 3 and the score with a rank of IR+1 is 5. Therefore, the 50th percentile is:

0.5×(5−3)+3=4


EXAMPLE 3:

Finally, consider the 50th percentile of the numbers 2, 3, 5, 9, 11.

R=50100(5+1)=3

IR=3

FR=0

Whenever FR=0, you simply find the number with rank IR. In this case, the third number is equal to 5, so the 50th percentile is 5. You will also get the right answer if you apply the general
formula:

The 50th percentile equals

0.00×(9−5)+5=5

Example 4:

The handle of a suitcase that fits 99% of the adult population is:

P99 hand breadth = 83 + 2,33 * 6,9 = 99 mm

An extra of 2 cm gives also some margin for the biggest hand. That makes 12 cm.


To calculate other percentiles, you can look up the corresponding Z-value in this Z-table. In a first step you have to search the desired percentile between all the numbers in the middle. The bold numbers at the outside give the Z-value.


Example 5:

Percentile 17 hip breadth.

In the Z-table you can find 17,11 which is the closest to 17. The corresponding Z-value is than
- 0,95.

P17 = 387 – 0,95 * 35 = 354 mm


Example 6:

A man with a body length of 1m92 results in the following Z-value:

Z = (1920 – 1706) / 94 = + 2,28

In the Z-table you can find in the row of 2,2 and the column of 0,08 the percentile 98,87. This means that 98,87% of the population is smaller.

In a kitchen of 90 cm high the lowest point of the wash-up bowl is 75 cm high.

The percentile of the corresponding fist height, determines how many adults will have to bend over.

Z = (750 – 766) / 43 = - 0,37

With this Z-value the percentile 36 of fist height corresponds. This means that everybody who is taller, 64%, will wash-up at a height lower than his fist and will have to bend forward in the back.

References:

http://www.dinbelg.be/formulas.htm
http://cnx.org/content/m10805/latest/
http://en.wikipedia.org/wiki/Percentile


Research done by Cyril Vance Litonjua

Supervised by Professor Crisencio M. Paner