Methods for Detecting the Outliers in Large Samples
Dr. Ramnath Takiar
Abstract

Outliers are those observations which lie at an abnormal distance from other observations. The presence of Outliers in data may often result in a skewed distribution, altered kurtosis, inflated mean and Standard deviation. In any statistical data analysis, the presence of Outliers often pose a problem.

For the present study, all observations lying outside the range of a sample are taken as outliers. This is almost equivalent to claim that all observations lying beyond Mean±3SD are outliers. In a sample, the minimum and maximum numbers are replaced by still lower and higher value and treated as outliers. In the present study,  overall 200 outliers are introduced, spread over four samples, five sample size and five trials. The five sample sizes chosen are 40, 60, 100, 200 and 300.

For the present study, three method are used for the detection of the Outliers. According to selected three methods to identify the  Outliers, the Lower Fence (LF) and Higher Fence values are defined as follows: IQR-Old Method: LF = Q1 –1.5*IQR and HF= Q3+ 1.5* IQR. IQR-Takiar method: LF=Q1 – IQR*[0.25*ln(n)+0.20] and HF= Q3+ IQR* [0.25*ln(n)+0.20]. SD-Range-Takiar method: LF= Mean- SD*(0.37*ln (n) + 0.86) and HF = Mean + SD*(0.37*ln (n) + 0.86)  where n is the sample size.

Out of 200 outliers introduced, IQR-Old method could detect only 91(45.5%)  outliers. SD-Range Takiar method could detect 69 (34.5%) of the outliers, while IQR-Takiar  method   could detect 122 (61.0%) of the outliers, correctly.  Based on the study results, for large samples, the IQR-Takiar method is adjudged to be the superior method as compared to other two methods in detection of the Outliers. Further, IQR-Takiar method is recommended for detection of outliers in large samples.

Key Words: IQR-Old method, IQR-Takiar method, SD-Range Takiar method, Large samples, Outliers, Outlier detection rate

PDF download