Mining cohort data to meet medical research demands
Data mining methods are increasingly used in medical applications, for example to analyze electronic health records (EHR). Can EHR be used for medical research — and how? I start this talk with one example application, where millions of EHR have been used for a retrospective study of a rare disease. Then, I turn to the more traditional contexts of data analysis in medical research, starting with learning on epidemiological data.
Epidemiological studies encompass a modest number of participants, for which a very large number of variables are recorded. Such data are painstakingly collected, often over several years. They are properly preprared and excellently described. Starting with the simple task of classifying cohort participants with respect to a specific outcome, I elaborate on the emerging challenges and on increasingly complex mining solutions that become part of the knowledge discovery workflow. I also present first results on constraint-based learning for dimensionality reduction and classification.