Advanced Analytics in Business [D0S07a] Big Data Platforms & - - PowerPoint PPT Presentation
Advanced Analytics in Business [D0S07a] Big Data Platforms & - - PowerPoint PPT Presentation
Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Course Introduction Lecturer Prof. dr. Seppe vanden Broucke Studied at KU Leuven (Belgium) PhD in Applied Economics at KU Leuven, Belgium in 2014 PhD.:
Lecturer
- Prof. dr. Seppe vanden Broucke
Studied at KU Leuven (Belgium)
PhD in Applied Economics at KU Leuven, Belgium in 2014 PhD.: Advances in Process Mining: Artificial Negative Events and Other Techniques
Assistant professor at UGent and lecturer at KU Leuven, Belgium Research: data mining and analytics, process mining, fraud analytics
Brussels Airport, FEDNOT research chair holder Co-academic organizer postgraduate studies in Big Data and Analytics
Contact: seppe.vandenbroucke@kuleuven.be or seppe.net
2
Goals of the course
At the end of the course students will:
Have insight in how advanced analytics can be used to optimize business decisions in e.g. marketing, finance, logistics, HR, etc. Have insight in issues related to the storage and processing of large datasets Be able to indicate which technologies and approaches are applicable for different types of datasets (including MapReduce, Hadoop, stream processing, etc.)
Basically: all the solid fundamentals plus pathways to expand your knowledge so you’re ready to become a (better) data scientist! Information is easy to find, motivation is not
“ “
3
Practicalities
Course from 1pm-4pm at HIW1 00.16 Questions? During, before and after course, during break, or e-mail
HOG 03.124: by appointment
Slides will be made available on Toledo (http://toledo.kuleuven.be/) before each lecture
Background material, frequent questions, etc. will also be posted on Toledo Note: all materials posted in “Advanced Analytics in Business [D0S07a]”
Alternative: http://seppe.net/aa Course recordings posted after course (but do still try to go to class) Course material consists primarily of what has been taught during lectures!
4
Schedule (check online for changes)
Date Course Topic 11.02 General introduction The data science process: introduction to supervised and unsupervised modelling Analytics 18.02 Preprocessing and feature engineering Assignment 1 made available Analytics 25.02 Supervised modelling: k-NN, (logistic) regression and decision trees Analytics 03.03 Model evaluation Assignment 2 made available Analytics 10.03 Data science tools and platforms Ensemble modeling: bagging and boosting Big data Analytics 17.03 Unsupervised modelling: clustering, association rules, anomaly detection Analytics 24.03 Advanced techniques: artificial neural networks, deep learning (conceptual), q-learning Analytics 31.03 Introduction to Hadoop and MapReduce Spark and SparkSQL Big data 07.04 No course (Easter break) Assignment 3 made available 14.04 No course (Easter break) 21.04 Streaming analytics and other big data analytics trends Big data 28.04 Text mining Analytics 05.05 Social network mining NoSQL, Neo4j and Cypher Assignment 4 made available Analytics Big data 12.05 Wrap-up: Security, Ethics, where to go from here?
5
Prerequisites
Basic knowledge of statistics and analytics Basic operating systems skills in Windows or Linux Programming in Python or R, Java Motivation and willingness to work!
6
Expectations
What is expected of you:
1 study point = 25-30 hours of work (1h of lecture = 3h of student work ) So, 4 study points = 100-120 study hours Attend lectures and pay attention, keep up with material: read course text, check background information Assignments (!)
What can you expect:
High quality lecturing Answers to your questions (if related to course topic…) Typically, by email within 5 working days Right before and after class, during breaks Upon appointment by sending email to lecturer Up to date content and lots of background information for those who’re serious about data science!
7
Exam
The evaluation consists of a lab report (50% of the marks) and a closed-book written exam with both multiple-choice and open questions (50% of the marks) Lab report:
Groups of 4-5 students 4 assignments:
Research paper discussion Predictive model competition using R or Python Text mining with Spark streaming assignment Social network analytics assignment
For each assignment, you describe your results (screenshots, numbers, approach) Deadline for completed report (all assignments): Sunday May 31st
We’ll start by forming groups on Toledo after the first (this) lesson: follow-up
- n this as soon as possible!