Introduction to Machine Learning AlgorithmsThursday-Friday November 23-24 2017, 9am - 6pm
This course offers basics of analysing datasets with machine learning algorithms and data mining techniques in order to understand foundations of learning from large quantities of data. It starts with general methods for data analysis in order to understand clustering, classification, and regression. This includes a thorough discussion of test datasets, training datasets, and validation datasets required to learn from data with a high accuracy. Easy application examples will foster the theoretical course elements that also will illustrate problems like overfitting followed by mechanisms such as validation and regularisation that prevent such problems.
The course will start from a very simple application example in order to teach foundations like the role of features in data, linear separability, or decision boundaries for machine learning models. In particular this course will point to key challenges in analysing large quantities of data sets (aka ‘big data’) in order to motivate the use of parallel and scalable machine learning algorithms. Hands-on exercises allow the participants to immediately turn the newly acquired skills into practice. After this course participants will have a general understanding how to approach data analysis problems in a systematic way including knowledge where parallel computing provide benefits.
About the lecturer: Prof. Dr. – Ing. Morris Riedel received his PhD from the Karlsruhe Institute of Technology (KIT) and he is the head of the ‘high productivity data processing’ research group of the Juelich Supercomputing Centre (JSC) in Germany. As an adjunct associated professor at the School of Natural Sciences and Engineering of the University of Iceland he teaches ‘High Performance Computing’, ‘Cloud Computing and Big Data’, as well as ‘Statistical Data Mining’ and all of these courses are on the intersection of parallel computing and machine learning. He has given tutorials like the course above at numerous occasions like at the Barcelona Supercomputing Centre, Smart Data Innovation Conference, or Prace Spring School in Cyprus. His research interests are parallel and scalable machine learning and data science. (More info at http://www.morrisriedel.de )
Participants should be able to work on the Unix/Linux command line, have a basic level of understanding of batch scripts required for HPC application submissions, and have a minimal knowledge of probability, statistics, and linear algebra.