“If you torture the data long enough, it will confess to anything.” Â Â — Ronald Coase, British economist
 If you thought DATA was only âminedâ and âextractedâ for analysis, take a look at this frequently used method of âdata dredgingâ.
As we move over from traditional eyeballing of statistical data to dig deeper into machine based techniques, the entire process of DATA extraction gets more technique based.
One such DATA extraction practice is analysis of large volumes of data in the quest for ANY possible relationships. An example would be âfishingâ in very large datasets to analyse crime clusters without understanding causation. Or say âsnoopingâ into an App userâs habits for finding correlations. Â That is, combing data for patterns without pre-established hypotheses or objectives. Which sounds absurd, but may actually throw-up significant unseen relationships (what does the App user do at lunchtime when in the vicinity of Connaught Place, New Delhi?).
With the evolution of Big Data a fundamentally different practice of experimental design has evolved. Formerly, the project / questions asked would decide what data to collect, for analysis of the same. Now, the low cost of data storage has caused a rethink with all kinds of data being collected first and then searched for significant patterns.
This practice of âdata dredgingâ differs from traditional Data Mining practices.
Where the sample size is not truly representative, there is âconfoundingâ or âselection biasâ, or there exists too many hypotheses for a given dataset, there may occur some highly correlated data that are statistically significant. Whereas, there is no effect between the variables and confidence level is .05 (5%). This is a typical case of âdata dredgingâ with false positive findings, a result of looking at too many possible associations. One way to conquer errors  of âdata dredgingâ is being stringent with âsignificanceâ levels, moving to P<0.001 or beyond.
Applications of Data Dredging
When does Data Dredging occur?
So the next time you read such research findings like âTeens who eat lots of chocolate tend to be slimmerâ – take it with a pinch of salt. Better, look at it as a possible consequence of distorted âdata dredgingâ!
Leave a Reply