Conference Proceedings

A framework to adjust dependency measure estimates for chance

S Romano, XV Nguyen, J Bailey, K Verspoor

16th SIAM International Conference on Data Mining 2016, SDM 2016 | Society for Industrial and Applied Mathematics | Published : 2016

Abstract

Copyright © by SIAM. Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the in-terpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on..

View full abstract