Python libraries SciPy and Pandas already have out-of-shelfe tools to calculate descriptive statistics, but behind the scene they are calling Numpy functionalities.
This post just to learn more Numpy, and its great arsenal of dealing with data.


Introduction

Descriptive statistics is to calculate a summary of the data that describe the data, in order to explore it and get some intuition about what we have in our hands. It is part of a bigger process called Exploratory Data Analysis (EDA).
Descriptive statistics includes:

  • Mean
  • Median
  • Mode
  • Variance, which can be describe using:
    • Kurtosis
    • Skewness
    • Standard Deviation

Python Code

Let us jump to the code.
First let us load the data we will do calcuation on, which is the iris dataset that is already loaded as part of Scikit-learn library.

from sklearn import datasets

iris = datasets.load_iris()

sepal_length = iris.data[:,[0]].flatten()
sepal_width = iris.data[:,[1]].flatten()
petal_length = iris.data[:,[2]].flatten()
petal_width = iris.data[:,[3]].flatten()

# Find the mean
sepal_length.mean() 
# Find the Standard deviation
sepal_length.std()
# find the median
np.median(sepal_length)
# find the max
sepal_length.max()
# find the min
sepal_length.min()
# find the range
np.ptp(sepal_length)
# find the variance
np.var(sepal_length)
# percentile of 50%
sepal_length.percentile(sepal_length, 50)