Basics Stats for EDA(Exploratory Data Analysis)

Anjani Suman
2 min readJan 20, 2022
  1. Mean
  2. Median
  3. variance and standard deviation
  4. Percentile and Quantile
  5. IQR (Inter Quartile Range) and MAD (Median Absolute Deviation)

Let’s understand all of them in detail:

  1. Mean: Mean gives as information about central tendency (Average value):

Problem with mean is that if any outlier present in data then it impact mean value.

2. Median : Median gives us the middle value from sample or population:

It’s also doesn't impact by outlier until and unless 50% of data contain outlier which is unlikely to be happen.

3. Variance and Standard Deviation:

Variance: It gives us information about the data is how much deviated from mean. Also we take the square distance from mean to remove negative values. We can use absolute distance but this will not give the proper variance from mean.

Why do we need Standard Deviation if we have variance then?

To compare the results in same unit.

Example:

If the hight of student in cm then mean is in cm but the variance we get is in cm² because in formula we’re squaring the distance. But if we see the formula of standard deviation is square root then the std dev is also in cm. So both mean and std deviation is in same scale.

4. Percentile and Quantile:

Percentile:

The value below which a percentage of data falls.

Quantile:

which splits the data into quarters:

The Quartiles also divide the data into divisions of 25%, so:

  • Quartile 1 (Q1) can be called the 25th percentile
  • Quartile 2 (Q2) can be called the 50th percentile
  • Quartile 3 (Q3) can be called the 75th percentile

5. IQR (Inter Quartile Range) and MAD (Median Absolute Deviation)

IQR (Inter Quartile Range):

It’s the difference of (75th percentile — 25th percentile)

MAD (Median Absolute Deviation):

It’s same as variance but instead of mean we’re using median means we’re finding the deviation of data point from median instead of mean.

--

--