18. Deviation and Standard Deviation.

In the previous post, we briefly touched upon the concepts of deviation and standard deviation. This post is dedicated to learning these two parameters in greater detail.


DEVIATION

This is a simple concept to understand. It tells us how much, an individual value in a given set, deviates from the average value. Heights of five girls are shown in the figure below.

Mean = (4+5+4+6+5)/5 = 4.8
∴The average/mean height is 4.8 feet.
So, how much does Girl 4 deviate from the normal? The height of the fourth girl is 6 feet.
∴ Deviation = 6 – 4.8 = 1.2 feet. So, Girl 4 is 1.2 feet taller than the average girl in this group. This is how deviation helps us to find out how far away the value is from the average.

We use this concept in our daily lives too. Consider a student scoring 89% on his exams. 89 percentile is absolutely a very good score! However, if the average score in that class is 98% then 89 doesn’t seem to be a very good number as it deviates from the average score in the negative direction!

Conversely, if a student scores 67% in a class with an average of 40%, this student seems doing pretty good for himself! So mean and deviation help us to put things in the right perspective.


Standard deviation 

This is an extremely useful concept widely used by many professionals. As mentioned earlier, the standard deviation is the ‘mean of the mean’.To understand this concept we need to revise what we learnt earlier about the NORMAL DISTRIBUTION CURVE in post 14.

Standard Deviation is a quantity which tells us how closely our values are either clustered around or are spread away/dispersed from the mean.If most values are near the mean, then we get a tall and steep bell-shaped curved.If the values are spread out then we get a more spread out curve as follows –

We have already studied the formula for Standard deviation (s/σ) in the earlier post.

             .

Consider three sets of values given below –

SET 1 ⇒ {0,13,15,0}     Mean = 7   Standard Deviation = 8.124 ≈ 8.1
SET 2 ⇒ {0,8,15,5}       Mean = 7   Standard Deviation = 6.271 ≈ 6.3
SET 3 ⇒ {6,6,8,8}         Mean = 7    Standard Deviation =1.15     ≈ 1.1

(Note – I have just plugged in the respective values in the formula for standard deviation(s) above).

If we take a look at the values above, we see that though the mean for all the three sets is the same, the standard deviation values differ. So mean tells us where the centre of the set of values is and standard deviation tells us how all the values in the set are spread around the mean value. In the first set, as all the values are far away from the mean value (7), the standard deviation is large.In the next set, the values are slightly closer to 7 ,thus the standard deviation value is slightly less.In the third set, all the values are around the mean value and so the standard deviation is very less (just 1.15!) .We should expect a tall bell-shaped curve for SET 3 (low standard deviation) and a spread out one( high standard deviation ) for SET 1.

To understand this in a better way , see the curve below –

183.jpg
Gaussian curve

In the above curve –

The purple region is the area which is one standard deviation away from the mean (σ+1). This is where most values in the set lie, so it shows where the majority of the population lies. This population represents low standard deviation values as they are closer to the mean value.

The green region is the area which is two standard deviations away from the mean (σ+2). This is where values which are slightly far away from the mean lie. This area is smaller than the purple area for a normal distribution curve.

The blue region is the area which is three standard deviations away from the mean (σ+3). The values in this area are very far away from the mean, so this region represents extreme conditions, which are very rare. Thus, the area of this region is very small (as the frequency of occurrence is very low/rare).

Area under purple region ≈ 68% i.e 68% of the values in the given set lie in this region.These values are considered to be normal/average values.
Area under purple region+Area under the green region ≈ 95% i.e 68% of the values in the given set lie in this region.
Area under purple region+Area under the green region+Area under the blue region =100%i.e all the values in the given set lie in this region.The 5% of the area which is occupied by the blue region is the area of values which extremely depart from the normal.

Do you know how the discovery of Higgs Boson was presented at CERN? It was as follows –

We have observed a new boson with a mass 0f 125.3± 0.6 GeV at 4.9 σ significance!”

This is how the standard deviation concept is used in the scientific world. I hope the concept of standard deviation is clear with this post. In my next post, we start discussing a new concept- significant numbers. Till then,

Be a perpetual student of life and keep learning …

Good day !

References And Further Reading –

  1. http://www.robertniles.com/stats/stdev.shtml
  2. https://en.wikipedia.org/wiki/Standard_deviation
  3. A documentary – ‘Particle Fever’.

2 comments

Leave a Reply