Why can it be difficult to interpret a correlation between two variables?

Main Answer

Why is correlation only defined between two variables?

Your professor likely meant Pearson's correlation as presented in the standard material you are required to learn. It is a definition used in the context of conventional [or at least introductory] statistics, and is certainly defined between only two random variables in that context. But let us explore this question beyond your course.

Ancillary Answer

Like many things, there is a tapestry of historical events that will never be fully uncovered. Auguste Bravais was the first person I am aware of to calculate what we would now think of Pearson's correlation on a sample, but to him they were merely cosines of the angles between error vectors. A little later came Francis Galton who laboured on the intuition of "co-relation" and his tabular calculations [he was not a skilled mathematician, purportedly] and was an inspiration to Karl Pearson who developed the formalism we recognize today. Why did these men consider only consider correlation to be between pairs of variables? I don't know. Even more restricting, Francis Galton appears to have only considered positive correlation which may have been due to his interest in biometrics which [by chance] were positively correlated [e.g. height and weight].

Here are my speculations [not to be confused with fact].

  • Pairwise comparisons may be more intuitive for many people [though perhaps not all].
  • Related to the first point, much of mathematics is riddled with binary operations. While multiary algebras and other fascinating creatures live in the world universal algebras, it is not on most people's radar. People develop tools with what they know about.
  • Pearson's correlation interoperates nicely with linear algebra, and linear algebra is itself computationally feasible on modest problems in a pre-PC era.
  • A computational sea monster awaits those that stray too far from pairwise decomposability: exponential complexity. Once you have posited some multiary function $f$, you quickly run into the problem that for $n$ possible operands there are $2^n$ possible inputs to the function [if you include inputting an empty set of variables, sometimes taken to be $f[\emptyset]=0$ or $f[\emptyset]=1$ depending on context]. Making a number of statistical estimates that grows exponentially with the number of variables quickly gets out of hand, and it can take considerably more effort to think about what selection of subsets of the powerset of variables are needed for your problem.
  • The multivariate distribution is often-applicable, and in the context pairwise comparisons are enough. Want a multivariate normal distribution but only have a vector of IID standard normal variables? No problem. Just apply a linear transformation. Plus Isserlis' theorem tells us that the higher mixed moments are either zero or decomposable into pairwise mixed moments. Combine this with Proposition 7.1.3 from Athreya and Lahiri 2006 and you'll realize that correlation tells you everything about statistical dependence of join normal distributions.

To an extent we can also deny the claim that we don't consider correlations of multiple variables, although I will side step the argument about what the word "correlation" ought to denote. As exemplified by other answers here [+1 to all I have seen so far], we actually do consider multiary functions of random variables that might be considered [in some sense] "co-relation". Clearly these functions are different, and thus selecting among them should be informed by what it is you want to quantify.

Examples

One thing I like to think about is what data sets minimize or maximize such statistics. And one way to do that is to create examples. Let me share some with you.

While the math in many cases is not restricted to three variables, I will keep to three for now just so that 3D scatterplots can be easily utilized. The following were found using gradient-based minimization [RMSProp]. I did not check the functions for the existence of local minima, and I only reported the value of the objective to four decimal places.

Three-Point Correlation

This notion is from msuzen's post, which was new to me [+1].

The minimal data set appears to be 'nearly' a line, but with some jittering. The maximal data set is a line.

Coskewness

The coskewness is the standardized mixed product moment of three variables. It is a trilinear extension of the bilinear Pearson product-moment correlation coefficient. This notion was explicitly mentioned by Sextus, and is a standardized cumulant of the sort shown by Count. The tensorial aspect is nicely shown by whuber.

Multiple lines stretch out from the centroid into specific octants depending on the signum of $x_i y_i z_i$. Reflections $-x_i y_i z_i$ are avoided because of the odd symmetry: $\mathbb{E}[XYZ] = -\mathbb{E}[XYZ] = 0$. This will always be true for odd mixed moments.

Partial Correlation

With partial correlation we are computing the correlation between the residuals of $X$ as a function of $Z$ and the residuals of $Y$ as a function of $Z$. I noticed Tim also mentioned this statistic.

While in some of my own projects I consider the correlations of residuals of non-linear functions to be "partial correlations", it seems this is idiosyncratic to me. So for convention and and clarity, let us use the following linear equations:

$$Y = Z - 2$$ $$X = -3Z + 4$$

Appearances can be deceiving. In both plots the data superficially appear to be spherical blobs, but actually all of the data approximately sits on a plane. Looking colinear to these planes, the data would appear to just follow a line.

Taylor's Multi-Way Correlation Coefficient

Taylor 2020 suggested the coefficient

$$\operatorname{mcor}[\vec x_1, \cdots, \vec x_n] \triangleq \frac{1}{\sqrt{d}} \sqrt{\frac{1}{d-1} \sum_{i=1}^d [\lambda_i - \bar \lambda]^2}$$

where $\lambda_i$ is the $i$th eigenvalue of the correlation matrix on $\vec x_1, \cdots, \vec x_n$ and $\bar \lambda$ is the mean eigenvalue of the same correlation matrix. This approach echos Sextus in considering the eigenvalues of a covariance matrix.

The minimal data set is a spherical cloud of points. The maximal data set is a line.

Wang-Zheng's Unsigned Correlation Coefficient

Wang & Zheng 2014 proposed

$$\operatorname{UCC}[X_1, \cdots, X_n] \triangleq 1 - \det R_{\vec x \vec x}$$

where $R_{\vec x \vec x}$ is the correlation matrix on those variables.

The minimal data set appears to be a spherical cloud of points. The maximal cloud appears to be nearly a plane except for a couple deviating points.

Coefficient of Multiple Correlation [$R^2$]

Sextus mentioned the multiple correlation coefficient. As noted by dipetkov, we must assume something asymmetric:

The minimal data set is a spherical point cloud. The maximal is a data set which sits on a plane, and appears somewhat bimodal in distribution.

Variance Inflation Factor

Sextus mentioned the variance inflation factor. As with multiple correlation, we assume that $Y$ is predicted linearly from $X$ and $Z$:

The minimal data set is a spherical point cloud. In the maximal case it isn't really maximal, but rather unbounded. In this unbounded case the data appears to be approaching a distribution similar to what maximized the multiple correlation coefficient, but I stopped before numerical stability collapsed.

User1865345's Suggestion

User1865345 suggested we consider something of the form:

$$R[X,Y]R[X,Z]$$

Both the minimal and maximal cases are data sets in the form of a line with a little bit of jitter.

What are some possible problems or limitations of correlation?

Limitations to Correlation and Regression.
We are only considering LINEAR relationships..
r and least squares regression are NOT resistant to outliers..
There may be variables other than x which are not studied, yet do influence the response variable..
A strong correlation does NOT imply cause and effect relationship..

Why is correlation sometimes misleading?

This misleading correlation is often caused by a third factor that is not apparent at the time of examination, sometimes called a confounding factor. When two random variables track each other closely on a graph, it is easy to suspect correlation where a change in one variable causes a change in the other variable.

What is a major limitation of correlation analysis?

Limitations of Correlations Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables we cannot assume that one causes the other.

What is the problem with correlation?

For observational data, correlations can't confirm causation... Correlations between variables show us that there is a pattern in the data: that the variables we have tend to move together. However, correlations alone don't show us whether or not the data are moving together because one variable causes the other.

Bài Viết Liên Quan

Chủ Đề