Python libraries SciPy and Pandas already have out-of-shelfe tools to calculate descriptive statistics, but behind the scene they are calling Numpy functionalities.
This post just to learn more Numpy, and its great arsenal of dealing with data.
Descriptive statistics is to calculate a summary of the data that describe the data, in order to explore it and get some intuition about what we have in our hands. It is part of a bigger process called
Exploratory Data Analysis (EDA).
Descriptive statistics includes:
- Variance, which can be describe using:
- Standard Deviation
Let us jump to the code.
First let us load the data we will do calcuation on, which is the iris dataset that is already loaded as part of
from sklearn import datasets iris = datasets.load_iris() sepal_length = iris.data[:,].flatten() sepal_width = iris.data[:,].flatten() petal_length = iris.data[:,].flatten() petal_width = iris.data[:,].flatten() # Find the mean sepal_length.mean() # Find the Standard deviation sepal_length.std() # find the median np.median(sepal_length) # find the max sepal_length.max() # find the min sepal_length.min() # find the range np.ptp(sepal_length) # find the variance np.var(sepal_length) # percentile of 50% sepal_length.percentile(sepal_length, 50)