One of the pillars of Data Science is the correlation. This post is to describe the correlation and the terms related to it, and how it is essential for the data science.


Introduction

One of the major functionality of data science is find the relationship and associations between data variables, and it is safe to say, if you don’t know how to measure correlation then it is impossible to get much done in the data science.
Let us clarify some terms in that field:


Terms

  • Correlation analysis:
    explores the association between two or more variables and makes inferences about the strength of the relationship.
  • Correlation coefficient:
    measures the association between two variables.
  • Correlation matrix:
    A correlation matrix measures the correlation between many pairs of variables.
  • Association:
    It is common to use the terms correlation and association interchangeably. Technically, association refers to any relationship between two variables, whereas correlation is often used to refer only to a linear relationship between two variables.
  • Inferences about association:
    Inferences about the strength of association between variables are made using a random bivariate sample of data drawn from the population of interest.
  • Linear relationships:
    if values of two variables change at a constant rate with respect to each other.
  • Nonlinear relationships:
    if the values of correlated variables do not change at a constant rate with respect to each other
  • Positive correlation:
    for two correlated variables, when one variable’s value increases (or decreases), and the other variable’s value also increase (or decrease), then the two variables are positively correlated.
  • Negative correlation:
    Opposite to positive when one value increases (or decreases), and the second value decrease (or increase), then they are negatively correlated.
  • Covariance:
    is a measure of how much two variables change together.


How to measure correlation


1 - Correlation coefficient intuition:

The first step is to draw a scatter plots between the two variables. Scatter plot will shows the relationship that can be discovered intuitively.
What we can discover from the scatter plot intuitively:

  1. Is there a coorelation or not?
  2. Is it positive or nagative?
  3. If the correlation is linear, we can estimate the line that best fit the plot.
  4. We can estimate from the line a rough idea about the line slop and x-intercept and y-intercept.

scattor-plot