Data Mining Fool’s Gold

Researcher ORCID Identifier

0000-0002-5173-2741

Document Type

Article

Publication Date

Spring 5-11-2020

Abstract

The scientific method is based on the rigorous testing of falsifiable conjectures. Data mining, in contrast, puts data before theory by searching for statistical patterns without being constrained by pre-specified hypotheses. Artificial intelligence and machine learning systems, for example, often rely on data-mining algorithms to construct models with little or no human guidance.

However, a plethora of patterns are inevitable in large data sets, and computer algorithms have no effective way of assessing whether the patterns they unearth are truly useful or meaningless coincidences. While data mining sometimes discovers useful relationships, the data deluge has caused the number of possible patterns that can be discovered relative to the number that are genuinely useful to grow exponentially—which makes it increasingly likely that what data mining unearths is likely to be fool’s gold.

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Share

COinS