# Data Mining with the Maximal Information Coefficient

### by Ben Lorica

** UPDATE (2/21/2013)**: In light of a recent paper by Simon and Tibshirani, I'm recommending the distance correlation instead of the MIC. Simon and Tibshirani looked at the Maximal Information Coefficient and conclude that:

In response, the creators of MIC published the following rejoinder:... the recently proposed distance correlation measureof Szekely & Rizzo (2009) is a more powerful technique that is simple, easy to compute and should be considered for general use.

Finally, Kinney and Atwal propose using Mutual Information instead:While a method with better power is always preferable if all other things are equal, the desiderata of a statistic depend on the problem it is being used to solve, and distance correlation's lack of equitability makes it ill-suited for the data exploration setting posed in Reshef et al. [2011]. MIC is a more appropriate measure of dependence in a situation in which there are likely to be an overwhelming number of signicant relationships in a data set, and there is a need to automatically find the strongest ones.

... it was claimed that MIC possesses a desirable mathematical property called equitability that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect.

... We are aware of two other critiques of the work by Reshef et al., one by Simon and Tibshirani [28] and one by Gorne et al. [29]. These do not address the issues of equitability discussed above, but rather focus on the statistical power of MIC. Through the analysis of simulated data, both critiques found MIC to be less powerful than a recently developed statistic called distance correlation (dCor) [30]. Gorne et al. [29] also recommend a different statistic, HHG [31]. Like mutual information, both dCor and HHG can detect arbitrary relationships between vector-valued variables. Moreover, both dCor and HHG are plug-in statistics that can be easily computed directly from data, i.e. they do not require an approximate estimation procedure like mutual information and MIC do. However, neither dCor[x; y] or HHG[x; y] are invariant under invertible transformations of x and y. As a result, both statistics violate self-equitability as well as DPI. We therefore suggest that dCor or HHG may be useful in situations when estimating mutual information is either impractical, due to computational cost or under-sampling, or does not provide sufficient statistical power.

**(last updated Dec/2011)**

__Original post__:
The recently published paper by a team from Harvard and M.I.T. contains a metric that may soon become a standard technique in every data miner's toolkit. Loosely described as a "measure of dependence for two-variable relationships", an analyst faced with hundreds of variables could use the *Maximal Information Coefficient* (MIC) during exploratory data analysis. Just like the correlation coefficient between two variables, the MIC has the additional advantage of being easy to interpret. A common task in data modeling and analysis is to find good explanatory variables for a given independent variable. The MIC uncovers variables that have (nonlinear) functional relationships and variables that are statistically independent from a target variable. There are other techniques that provide *partial* results along these lines. But based on my understanding of the MIC, it may become an efficient starting point for data mining and predictive modeling.

Rather than defining^{1} the MIC, let me enumerate some of its key properties. Actually these properties (which the researchers prove and numerically demonstrate in their paper) are what have me excited about the MIC:

1. 0 ≤ MIC ≤ 1The authors conduct a host of numerical experiments to demonstrate the properties above, and along the way they show that the MIC compares favorably with a host of other measures of association (including Pearson/Spearman correlations and Mutual Information).With probability approaching 1 (as the sample size grows):

2. "MIC assigns scores that tend to 1, to all never constant noiseless functional relationships"

3. "MIC assigns scores that tend to 1, for a larger class of noiseless relationships (including super-positions of noiseless functional relationships)"

4. "MIC assigns scores that tend to 0 to statistically independent variables"

Furthermore, for a pair of random variables X and Y:

5. "if Y is a function of X that is not constant on any open interval, then data drawn from (X,Y) will receive an MIC tending to 1 with probability 1 as sample size grows"

6. "

IFthe support of (X,Y) is described by a finite union of differentiable curves of the form c(t) = [x(t),y(t)] for t in [0,1]

THENdata drawn from (X,Y) will receive an MIC tending to 1 with probability one as sample size grows, provided that (dx/dt) and (dy/dt) are each zero on finitely many points"7. "the MIC of data drawn from (X,Y ) converges to zero in probability as sample size grows

if and only ifX and Y are statistically independent."8. the MIC of a noisy functional relationship is bounded from below by a function of its R

^{2}9.

NON-LINEARITY: "a natural measure of nonlinearity is MIC - r^{2}, where r denotes the Pearson product-moment correlation coefficient, a measure of linear dependence. The statistic MIC - r^{2}is near 0 for linear relationships and large for nonlinear relationships with high values of MIC."

### An Example from Finance

Once I read the paper I wanted to see the MIC applied to an actual data set. Luckily the authors have a web site and sample code in Java/R. It was easy to run the Java code against my data set, consisting of 15 Hedge Fund Indices. I wanted to see if MIC would detect associations between the different Hedge Fund trading styles. The data set consists of 15 time-series -- monthly rates of return -- over three different time periods: 1994-2011, 2000-2005, and 2006-2011.
Why examine 3 different time periods? It has been noted that financial assets have become *more* correlated since the financial crisis of 2007/2008^{2}. I wanted to use the MIC to quickly see if associations^{3} are also increasing between the different Hedge Fund trading styles. In the charts below, I contructed a heat map of the MIC matrix for the three time periods (darker shade of green means a higher MIC).

Comparing the chart for 2006-2011 with that for 2000-2005, one can quickly see a slight increase in MIC for a family of trading styles. Broadly speaking, the rise in MIC can be seen among funds that follow Event-Driven & Relative Value trading strategies. In particular, for the 2006-2011 period the MIC has risen between Distressed, Multi-strategy, Event-driven, Long/Short Equity and Convertible Arbitrage funds.

**Maximal Information Coefficient**

Time Period:

### Summary

The Maximal Information Coefficient is a tool that I plan to use more often in the future. It provides a quick way to evaluate (non-linear) associations between lots of variables. In particular, in the course of building predictive models, I can see using it to evaluate potential predictors. When it comes to capturing non-linear relationships, the MIC is superior to the correleation coefficient and Mutual Information. On the other hand, it provides comparable results when faced with linear associations.Related resources:

Hedge Fund profits flow mostly to Industry Insiders

Hedge Fund Performance: My semi-regular blog posts on the Dow Jones / Credit Suisse Hedge Fund indices.

(1) The formal definition and proofs can be found in the original paper:

*Detecting Novel Associations in Large Data Sets*by Reshef et. al. Also see this feature article and accompanying video from the M.I.T. press office.

(2) Here I'm most interested in co-movements between different asset classes. It has also been the case that within equity markets, "correlations between stocks are currently at the highest level in recent history".

(3) Following properties (1) - (9) listed above, an MIC closer to 1 suggests a functional relationship exists between two time series.

Back to Resources page.

**NOTE**: Reproduction & reuse allowed under Creative Commons Attribution.