Short Tutorial (2h) on

"Data Mining"

JOSÉ HERNÁNDEZ-ORALLO
Departamento de Sistemas Informáticos y Computación
Universidad Politécnica de Valencia


Abstract

The talk will introduce the most important issues of data mining, as well as the context and reasons for its emergence and increasing popularity. The motivation for this new discipline is introduced by relating it to other database technologies, such as data warehousing and OLAP, clearly stating the key differences between data mining and other database exploitation tools which (all together) are usually referred as business intelligence. Some assorted examples will pave the way to a clearer view and definition of data mining, and its role in the context of knowledge discovery from databases (KDD). The KDD process will be illustrated through its main stages: data integration, data preparation, data mining, model evaluation and interpretation, knowledge deployment and model monitoring. Some predictive and descriptive data mining techniques will be briefly discussed: decision trees, rule learning, neural networks, linear and non-linear regression, Bayesian methods, frequent itemsets algorithms, clustering techniques, etc. Finally, some ideas on data mining methodologies, such as CRISP-DM, as well as demos and examples of data mining packages will give a more direct flavour of what applying data mining really means.



Slides

PART I (PDF)

PART II (PDF)

PART III (PDF)

PART IV (PDF)



Data Mining Book (in Spanish)


Back to main page.

© 2005 José Hernández Orallo.