Methods for Detecting the Outliers in Large
Samples
Dr. Ramnath Takiar
Abstract
Outliers are
those observations which lie at an abnormal distance
from other observations. The presence of Outliers in
data may often result in a skewed distribution, altered
kurtosis, inflated mean and Standard deviation. In any
statistical data analysis, the presence of Outliers
often pose a problem.
For the present
study, all observations lying outside the range of a
sample are taken as outliers. This is almost equivalent
to claim that all observations lying beyond Mean±3SD
are outliers. In a sample, the minimum and maximum
numbers are replaced by still lower and higher value
and treated as outliers. In the present study,
overall 200 outliers are introduced, spread over
four samples, five sample size and five trials. The
five sample sizes chosen are 40, 60, 100, 200 and 300.
For the present
study, three method are used for the detection of the
Outliers. According to selected three methods to
identify the Outliers, the Lower Fence
(LF) and Higher Fence values are defined as follows:
IQR-Old Method: LF = Q1 –1.5*IQR and HF= Q3+
1.5* IQR. IQR-Takiar method: LF=Q1 –
IQR*[0.25*ln(n)+0.20] and HF= Q3+ IQR*
[0.25*ln(n)+0.20]. SD-Range-Takiar method: LF= Mean-
SD*(0.37*ln (n) + 0.86) and HF = Mean + SD*(0.37*ln (n)
+ 0.86) where n is the sample size.
Out of 200
outliers introduced, IQR-Old method could detect only
91(45.5%) outliers. SD-Range Takiar
method could detect 69 (34.5%) of the outliers, while
IQR-Takiar method
could detect 122 (61.0%) of the outliers,
correctly. Based on the study
results, for large samples, the IQR-Takiar method is
adjudged to be the superior method as compared to other
two methods in detection of the Outliers. Further,
IQR-Takiar method is recommended for detection of
outliers in large samples.



