The 28th ACM International Conference on Information and Knowledge - - PowerPoint PPT Presentation

▶

Jan 05, 2024 249 likes •546 views

The 28th ACM International Conference on Information and Knowledge Management (CIKM 2019) Reporter: Zhenya Huang Date: 2019.11.04 Anhui Province Key Laboratory of Big Data Analysis and Application 1 Outline Background 1 Problem Definition

SLIDE 1

Anhui Province Key Laboratory of Big Data Analysis and Application

Reporter: Zhenya Huang Date: 2019.11.04 The 28th ACM International Conference

n Information and Knowledge

Management (CIKM 2019)

SLIDE 2

Anhui Province Key Laboratory of Big Data Analysis and Application

Outline

Background 1 2 Problem Definition Framework 3 Experiment 4 Conclusion & Future work 5

SLIDE 3

Anhui Province Key Laboratory of Big Data Analysis and Application

Background

ØOnline Education Systems become more and more popular

Ø Abundant learning materials Ø E.g., exercise, course, video Ø Personalized learning service Ø Students can learn on their own pace Ø Various platforms Ø MOOC Ø Intelligent Tutoring System Ø Online Judging System

SLIDE 4

Anhui Province Key Laboratory of Big Data Analysis and Application

Recommendation

ØRecommender systems

Ø Suggest suitable exercises instead of letting students self-seeking Ø Interactive systems between agent vs. student

ØKey problem

Ø Design an optimal strategy (algorithm) that can recommend the best exercise for each student at the right time

Agent Student recommendation feedback

SLIDE 5

Anhui Province Key Laboratory of Big Data Analysis and Application

Related work

ØTraditional recommendation for online learning

Ø Basic idea: Ø Try to discover the weakness of students Ø Recommend the exercises that students may not learned well

ØExisting methods

Ø Educational psychology Ø Cognitive diagnosis studies Ø Traditional Q learning algorithm Ø Data-driven algorithm Ø Content-based methods Ø Collaborative filtering Ø Deep neural networks

SLIDE 6

Anhui Province Key Laboratory of Big Data Analysis and Application

Related work

ØLimitation

Ø Single objective Ø Target at specific concepts with repeating exercising Ø Recommending non-mastered exercises Ø Always too hard Ø Student lose learning interests

Function Function Function Function

What kinds of objectives should we concern in exercise recommendation?

SLIDE 7

Anhui Province Key Laboratory of Big Data Analysis and Application

Exercise Recommendation

ØMultiple Objectives

Ø Review & Explore Ø Review non-mastered concept vs. Seek new knowledge Ø Smoothness Ø Continuous recommendations on difficulty levels can not vary dramatically Ø Engagement Ø Keep learning Ø Some are challenging but some are “gifts’’

SLIDE 8

Anhui Province Key Laboratory of Big Data Analysis and Application

Exercise Recommendation

ØChallenges

Ø How to define multiple objectives? Ø Review & Explore Ø Smoothness Ø Engagement Ø How to enable flexible recommendations with considering above objectives simultaneously? Ø How to track students’ learning states Ø How to quantify the objectives Ø Large space of exercise candidates

SLIDE 9

Anhui Province Key Laboratory of Big Data Analysis and Application

Outline

Background 1 2 Problem Definition Framework 3 Experiment 4 Conclusion & Future work 5

SLIDE 10

Anhui Province Key Laboratory of Big Data Analysis and Application

Problem Definition

ØGiven:

Ø Student: exercising record Ø Exercise: triplet

Ø Content: c is word sequence, Ø Knowledge (concept): (e.g., Function) Ø Difficulty level: d is the error rate, i.e., the percentage of students who answer exercise e wrong

ØMarkov Decision Process (MDP)

Ø State !": the exercising history of the student Ø Action #": recommend an exercise $"%& based on State !" Ø Reward r !", #" : consider multiple objectives based on the performance feedback Ø Transition T: function: ( × + → (, mapping state !" to state !"%&

Ø Goal:

Ø Find an optimal policy π : S → A of recommending exercises to students, which maximizes the multi-objective rewards.

SLIDE 11

Anhui Province Key Laboratory of Big Data Analysis and Application

Outline

Background 1 3 Framework Problem Definition 2 Experiment 4 Conclusion & Future work 5

SLIDE 12

Anhui Province Key Laboratory of Big Data Analysis and Application

DRE framework

ØAt a glance

Ø Deep reinforcement learning (Q-learning) framework Ø Exercise Q-network (EQN) Ø Estimate Q-values, generate exercise recommendation (taking action) Ø Track student learning states Ø Extract exercise semantics Ø Two Implementations

Ø EQNM with Markov property Ø EQNR with Recurrent manner

Ø Multi-objective Rewards Ø Review & Explore Ø Smoothness Ø Engagement Ø Off-policy training

SLIDE 13

Anhui Province Key Laboratory of Big Data Analysis and Application

DRE framework

ØOptimization Objective

Ø Future rewards !" of state-action pair (s, a): Ø Optimal action-value function Ø Compute the Q-values for all a′ ∈ A is infeasible

Ø Estimate and store all state-action pairs (large exercise candidates) Ø Update all Q-values (student practices very few exercises)

Ø Solution

Ø Exercise Q-Network: as a network approximator θ Ø Minimize the objective function to estimate this network.

SLIDE 14

Anhui Province Key Laboratory of Big Data Analysis and Application

DRE framework

ØExercise Q-Network

Ø Goal: estimate the action Q-value Q (s, a) of taking an action a at state s Ø Implement network approximator Ø Key points: Ø Learn the semantics of each exercise Ø Exercise Module Ø Learn the student knowledge states at each step Ø EQNM: Markov property Ø EQNR: Recurrent manner

SLIDE 15

Anhui Province Key Laboratory of Big Data Analysis and Application

Exercise Q-Network

ØExercise Module

Ø Goal: learn the semantics of each exercise Ø Combination with knowledge, content and difficulty

Knowledge embedding Content embedding

SLIDE 16

Anhui Province Key Laboratory of Big Data Analysis and Application

Exercise Q-Network

ØTwo implements

Ø Goal: Learn the student knowledge states at each step Ø Estimate Q value Q(s, a): taking action at step t Ø EQNM: only observe current state Ø EQNR: consider historical state trajectories:

Current state embedding n-layer fully-connected layers

SLIDE 17

Anhui Province Key Laboratory of Big Data Analysis and Application

Multi-objective rewards

Ø Review & Explore Ø Intuition: review non-mastered concept vs. seek new knowledge Ø Review factor: review what they learned not well: punishment (!"< 0) Ø Explore factor: suggest to seek diverse concepts: stimulation (!# > 0) Ø Smoothness Ø Intuition: two continuous recommendations on difficulty levels should not vary dramatically Ø Negative squared loss

SLIDE 18

Anhui Province Key Laboratory of Big Data Analysis and Application

Multi-objective rewards

Ø Engagement Ø Intuition: keep learning (interests), avoiding too hard or easy exercises all the time Ø Makes some recommendations are challenging but others seem “gifts” Ø Learning goal g Ø N historical performance ! on average Ø Balance multi-objective rewards

SLIDE 19

Anhui Province Key Laboratory of Big Data Analysis and Application

Off-policy training

Ø Training with offline logs

Experience reply Two separate networks Learn from other agent policy

SLIDE 20

Anhui Province Key Laboratory of Big Data Analysis and Application

Outline

Background 1 4 Experiment Problem Definition 2 Framework 3 Conclusion & Future work 5

SLIDE 21

Anhui Province Key Laboratory of Big Data Analysis and Application

Experiment

Ø Datasets Ø MATH dataset (high school level) Ø PROGRAM dataset (oj platform) Ø Data analysis

Ø Learning session Ø Interval timestamps last more than 24 (10) hours, split them into two sessions Ø Longer sessions have larger concept coverage Ø Longer sessions contain more samples with smaller difficulty differences Ø Longer sessions have exercises with medium difficulty on average Ø https://base.ustc.edu.cn/data/DRE/

SLIDE 22

Anhui Province Key Laboratory of Big Data Analysis and Application

Experiment

ØOffline Evaluation (Point-wise recommendation)

Ø We evaluate methods on logged data

Ø Static Ø Only contained pairs of student-exercise performance that had been recorded Ø Just know students’ final scores on exercise

Ø Ranking problem Ø For student: rank an exercise list at a particular time Ø Based on performance: from bad to good Ø Data partition: for each sequence, 70% training, 30% testing Ø DRE framework: Ø Baseline: Ø Cognitive diagnosis: IRT Ø Recommender system: PMF, FM Ø Deep learning: DKT, DKVMN Ø Reinforcement learning: DQN

SLIDE 23

Anhui Province Key Laboratory of Big Data Analysis and Application

Experiment

ØOffline Evaluation (Point-wise recommendation)

Ø DRER and DREM generate accurate recommendations Ø EQN > DQN: EQN well capture the state presentations of students Ø DRER > DREM: EQNR can track the long-term dependency

SLIDE 24

Anhui Province Key Laboratory of Big Data Analysis and Application

Experiment

ØOnline Evaluation (Sequence-wise recommendation)

Ø We evaluate methods in a simulated environment

Ø Implement a student simulator Ø Real-time interaction

Ø Sequential recommendation scenario

Ø For student: provide the best exercise step by step Ø Evaluate the effectiveness on three rewards (multiple objectives)

Ø Preliminaries

Ø Student simulator: EERNN (state-of-the-art) Ø Data partition: 50% for training simulator, 50% for training DRE framework

SLIDE 25

Anhui Province Key Laboratory of Big Data Analysis and Application

Experiment

ØOnline Evaluation (Sequence-wise recommendation)

Ø Review & Explore Ø Smoothness vs. Engagement

ü DRE with larger !" value has faster coverage growth speed ü The difficulty levels of recommendations do not vary dramatically in most cases ü If we set learning goal g with lower value (0.2), DRE would recommend more difficult exercises

SLIDE 26

Anhui Province Key Laboratory of Big Data Analysis and Application

Outline

Background 1 5 Conclusion & Future work Problem Definition 2 Framework 3 Experiment 4

SLIDE 27

Anhui Province Key Laboratory of Big Data Analysis and Application

Experiment

ØConclusion

Ø Deep Reinforcement learning framework for Exercise recommendation Ø Two Exercise Q-Networks (EQN) to select exercise recommendations following different mechanisms (Markov, Recurrent) Ø Design three domain-specific rewards to find the optimal recommendation strategy Ø Review & Explore, Smoothness and Engagement

ØFuture work

Ø Seek more ways to learn the reward settings automatically

Ø Behaviors: if the student solves exercises very quickly, set g with a lower value

Ø Develop a system and apply DRE framework online

Ø Get and test real-world feedback Ø Find more direct method to evaluate the students’ satisfaction.

Ø Extend to more general domains

Ø Online shopping, e-commerce, POI service etc

SLIDE 28

Anhui Province Key Laboratory of Big Data Analysis and Application

The 28th ACM International Conference

n Information and Knowledge

Management (CIKM 2019)

Outline

Background 1 2 Problem Definition Framework 3 Experiment 4 Conclusion & Future work 5

Background

ØOnline Education Systems become more and more popular

Recommendation

ØRecommender systems

ØKey problem

Related work

ØTraditional recommendation for online learning

ØExisting methods

Related work

ØLimitation

Exercise Recommendation

ØMultiple Objectives

Exercise Recommendation

ØChallenges

Outline

Background 1 2 Problem Definition Framework 3 Experiment 4 Conclusion & Future work 5

Problem Definition

ØGiven:

ØMarkov Decision Process (MDP)

Ø Goal:

Outline

Background 1 3 Framework Problem Definition 2 Experiment 4 Conclusion & Future work 5

DRE framework

ØAt a glance

DRE framework

ØOptimization Objective

DRE framework

ØExercise Q-Network

Exercise Q-Network

ØExercise Module

Exercise Q-Network

ØTwo implements

Multi-objective rewards

Multi-objective rewards

Off-policy training

Outline

Background 1 4 Experiment Problem Definition 2 Framework 3 Conclusion & Future work 5

Experiment

Experiment

ØOffline Evaluation (Point-wise recommendation)

Experiment

ØOffline Evaluation (Point-wise recommendation)

Experiment

ØOnline Evaluation (Sequence-wise recommendation)

Experiment

ØOnline Evaluation (Sequence-wise recommendation)

Outline

Background 1 5 Conclusion & Future work Problem Definition 2 Framework 3 Experiment 4

Experiment

ØConclusion

ØFuture work

Thanks for your listening!

Welcome to our poster for more details tonight huangzhy@mail.ustc.edu.cn