Python libraries SciPy and Pandas already have out-of-shelfe tools to calculate descriptive statistics, but behind the scene they are calling Numpy functionalities.
This post just to learn more Numpy, and its great arsenal of dealing with data.


Descriptive statistics is to calculate a summary of the data that describe the data, in order to explore it and get some intuition about what we have in our hands. It is part of a bigger process called Exploratory Data Analysis (EDA).
Descriptive statistics includes:

  • Mean
  • Median
  • Mode
  • Variance, which can be describe using:
    • Kurtosis
    • Skewness
    • Standard Deviation

Python Code

Let us jump to the code.
First let us load the data we will do calcuation on, which is the iris dataset that is already loaded as part of Scikit-learn library.

from sklearn import datasets

iris = datasets.load_iris()

sepal_length =[:,[0]].flatten()
sepal_width =[:,[1]].flatten()
petal_length =[:,[2]].flatten()
petal_width =[:,[3]].flatten()

# Find the mean
# Find the Standard deviation
# find the median
# find the max
# find the min
# find the range
# find the variance
# percentile of 50%
sepal_length.percentile(sepal_length, 50)