The 28th ACM International Conference on Information and Knowledge - - PowerPoint PPT Presentation

the 28th acm international conference on information and
SMART_READER_LITE
LIVE PREVIEW

The 28th ACM International Conference on Information and Knowledge - - PowerPoint PPT Presentation

The 28th ACM International Conference on Information and Knowledge Management (CIKM 2019) Reporter: Zhenya Huang Date: 2019.11.04 Anhui Province Key Laboratory of Big Data Analysis and Application 1 Outline Background 1 Problem Definition


slide-1
SLIDE 1

Anhui Province Key Laboratory of Big Data Analysis and Application

1

Reporter: Zhenya Huang Date: 2019.11.04 The 28th ACM International Conference

  • n Information and Knowledge

Management (CIKM 2019)

slide-2
SLIDE 2

Anhui Province Key Laboratory of Big Data Analysis and Application

2

Outline

Background 1 2 Problem Definition Framework 3 Experiment 4 Conclusion & Future work 5

slide-3
SLIDE 3

Anhui Province Key Laboratory of Big Data Analysis and Application

3

Background

ØOnline Education Systems become more and more popular

Ø Abundant learning materials Ø E.g., exercise, course, video Ø Personalized learning service Ø Students can learn on their own pace Ø Various platforms Ø MOOC Ø Intelligent Tutoring System Ø Online Judging System

slide-4
SLIDE 4

Anhui Province Key Laboratory of Big Data Analysis and Application

4

Recommendation

ØRecommender systems

Ø Suggest suitable exercises instead of letting students self-seeking Ø Interactive systems between agent vs. student

ØKey problem

Ø Design an optimal strategy (algorithm) that can recommend the best exercise for each student at the right time

Agent Student recommendation feedback

slide-5
SLIDE 5

Anhui Province Key Laboratory of Big Data Analysis and Application

5

Related work

ØTraditional recommendation for online learning

Ø Basic idea: Ø Try to discover the weakness of students Ø Recommend the exercises that students may not learned well

ØExisting methods

Ø Educational psychology Ø Cognitive diagnosis studies Ø Traditional Q learning algorithm Ø Data-driven algorithm Ø Content-based methods Ø Collaborative filtering Ø Deep neural networks

slide-6
SLIDE 6

Anhui Province Key Laboratory of Big Data Analysis and Application

6

Related work

ØLimitation

Ø Single objective Ø Target at specific concepts with repeating exercising Ø Recommending non-mastered exercises Ø Always too hard Ø Student lose learning interests

Function Function Function Function

What kinds of objectives should we concern in exercise recommendation?

slide-7
SLIDE 7

Anhui Province Key Laboratory of Big Data Analysis and Application

7

Exercise Recommendation

ØMultiple Objectives

Ø Review & Explore Ø Review non-mastered concept vs. Seek new knowledge Ø Smoothness Ø Continuous recommendations on difficulty levels can not vary dramatically Ø Engagement Ø Keep learning Ø Some are challenging but some are “gifts’’

slide-8
SLIDE 8

Anhui Province Key Laboratory of Big Data Analysis and Application

8

Exercise Recommendation

ØChallenges

Ø How to define multiple objectives? Ø Review & Explore Ø Smoothness Ø Engagement Ø How to enable flexible recommendations with considering above objectives simultaneously? Ø How to track students’ learning states Ø How to quantify the objectives Ø Large space of exercise candidates

slide-9
SLIDE 9

Anhui Province Key Laboratory of Big Data Analysis and Application

9

Outline

Background 1 2 Problem Definition Framework 3 Experiment 4 Conclusion & Future work 5

slide-10
SLIDE 10

Anhui Province Key Laboratory of Big Data Analysis and Application

10

Problem Definition

ØGiven:

Ø Student: exercising record Ø Exercise: triplet

Ø Content: c is word sequence, Ø Knowledge (concept): (e.g., Function) Ø Difficulty level: d is the error rate, i.e., the percentage of students who answer exercise e wrong

ØMarkov Decision Process (MDP)

Ø State !": the exercising history of the student Ø Action #": recommend an exercise $"%& based on State !" Ø Reward r !", #" : consider multiple objectives based on the performance feedback Ø Transition T: function: ( × + → (, mapping state !" to state !"%&

Ø Goal:

Ø Find an optimal policy π : S → A of recommending exercises to students, which maximizes the multi-objective rewards.

slide-11
SLIDE 11

Anhui Province Key Laboratory of Big Data Analysis and Application

11

Outline

Background 1 3 Framework Problem Definition 2 Experiment 4 Conclusion & Future work 5

slide-12
SLIDE 12

Anhui Province Key Laboratory of Big Data Analysis and Application

12

DRE framework

ØAt a glance

Ø Deep reinforcement learning (Q-learning) framework Ø Exercise Q-network (EQN) Ø Estimate Q-values, generate exercise recommendation (taking action) Ø Track student learning states Ø Extract exercise semantics Ø Two Implementations

Ø EQNM with Markov property Ø EQNR with Recurrent manner

Ø Multi-objective Rewards Ø Review & Explore Ø Smoothness Ø Engagement Ø Off-policy training

slide-13
SLIDE 13

Anhui Province Key Laboratory of Big Data Analysis and Application

13

DRE framework

ØOptimization Objective

Ø Future rewards !" of state-action pair (s, a): Ø Optimal action-value function Ø Compute the Q-values for all a′ ∈ A is infeasible

Ø Estimate and store all state-action pairs (large exercise candidates) Ø Update all Q-values (student practices very few exercises)

Ø Solution

Ø Exercise Q-Network: as a network approximator θ Ø Minimize the objective function to estimate this network.

slide-14
SLIDE 14

Anhui Province Key Laboratory of Big Data Analysis and Application

14

DRE framework

ØExercise Q-Network

Ø Goal: estimate the action Q-value Q (s, a) of taking an action a at state s Ø Implement network approximator Ø Key points: Ø Learn the semantics of each exercise Ø Exercise Module Ø Learn the student knowledge states at each step Ø EQNM: Markov property Ø EQNR: Recurrent manner

slide-15
SLIDE 15

Anhui Province Key Laboratory of Big Data Analysis and Application

15

Exercise Q-Network

ØExercise Module

Ø Goal: learn the semantics of each exercise Ø Combination with knowledge, content and difficulty

Knowledge embedding Content embedding

slide-16
SLIDE 16

Anhui Province Key Laboratory of Big Data Analysis and Application

16

Exercise Q-Network

ØTwo implements

Ø Goal: Learn the student knowledge states at each step Ø Estimate Q value Q(s, a): taking action at step t Ø EQNM: only observe current state Ø EQNR: consider historical state trajectories:

Current state embedding n-layer fully-connected layers

slide-17
SLIDE 17

Anhui Province Key Laboratory of Big Data Analysis and Application

17

Multi-objective rewards

Ø Review & Explore Ø Intuition: review non-mastered concept vs. seek new knowledge Ø Review factor: review what they learned not well: punishment (!"< 0) Ø Explore factor: suggest to seek diverse concepts: stimulation (!# > 0) Ø Smoothness Ø Intuition: two continuous recommendations on difficulty levels should not vary dramatically Ø Negative squared loss

slide-18
SLIDE 18

Anhui Province Key Laboratory of Big Data Analysis and Application

18

Multi-objective rewards

Ø Engagement Ø Intuition: keep learning (interests), avoiding too hard or easy exercises all the time Ø Makes some recommendations are challenging but others seem “gifts” Ø Learning goal g Ø N historical performance ! on average Ø Balance multi-objective rewards

slide-19
SLIDE 19

Anhui Province Key Laboratory of Big Data Analysis and Application

19

Off-policy training

Ø Training with offline logs

Experience reply Two separate networks Learn from other agent policy

slide-20
SLIDE 20

Anhui Province Key Laboratory of Big Data Analysis and Application

20

Outline

Background 1 4 Experiment Problem Definition 2 Framework 3 Conclusion & Future work 5

slide-21
SLIDE 21

Anhui Province Key Laboratory of Big Data Analysis and Application

21

Experiment

Ø Datasets Ø MATH dataset (high school level) Ø PROGRAM dataset (oj platform) Ø Data analysis

Ø Learning session Ø Interval timestamps last more than 24 (10) hours, split them into two sessions Ø Longer sessions have larger concept coverage Ø Longer sessions contain more samples with smaller difficulty differences Ø Longer sessions have exercises with medium difficulty on average Ø https://base.ustc.edu.cn/data/DRE/

slide-22
SLIDE 22

Anhui Province Key Laboratory of Big Data Analysis and Application

22

Experiment

ØOffline Evaluation (Point-wise recommendation)

Ø We evaluate methods on logged data

Ø Static Ø Only contained pairs of student-exercise performance that had been recorded Ø Just know students’ final scores on exercise

Ø Ranking problem Ø For student: rank an exercise list at a particular time Ø Based on performance: from bad to good Ø Data partition: for each sequence, 70% training, 30% testing Ø DRE framework: Ø Baseline: Ø Cognitive diagnosis: IRT Ø Recommender system: PMF, FM Ø Deep learning: DKT, DKVMN Ø Reinforcement learning: DQN

slide-23
SLIDE 23

Anhui Province Key Laboratory of Big Data Analysis and Application

23

Experiment

ØOffline Evaluation (Point-wise recommendation)

Ø DRER and DREM generate accurate recommendations Ø EQN > DQN: EQN well capture the state presentations of students Ø DRER > DREM: EQNR can track the long-term dependency

slide-24
SLIDE 24

Anhui Province Key Laboratory of Big Data Analysis and Application

24

Experiment

ØOnline Evaluation (Sequence-wise recommendation)

Ø We evaluate methods in a simulated environment

Ø Implement a student simulator Ø Real-time interaction

Ø Sequential recommendation scenario

Ø For student: provide the best exercise step by step Ø Evaluate the effectiveness on three rewards (multiple objectives)

Ø Preliminaries

Ø Student simulator: EERNN (state-of-the-art) Ø Data partition: 50% for training simulator, 50% for training DRE framework

slide-25
SLIDE 25

Anhui Province Key Laboratory of Big Data Analysis and Application

25

Experiment

ØOnline Evaluation (Sequence-wise recommendation)

Ø Review & Explore Ø Smoothness vs. Engagement

ü DRE with larger !" value has faster coverage growth speed ü The difficulty levels of recommendations do not vary dramatically in most cases ü If we set learning goal g with lower value (0.2), DRE would recommend more difficult exercises

slide-26
SLIDE 26

Anhui Province Key Laboratory of Big Data Analysis and Application

26

Outline

Background 1 5 Conclusion & Future work Problem Definition 2 Framework 3 Experiment 4

slide-27
SLIDE 27

Anhui Province Key Laboratory of Big Data Analysis and Application

27

Experiment

ØConclusion

Ø Deep Reinforcement learning framework for Exercise recommendation Ø Two Exercise Q-Networks (EQN) to select exercise recommendations following different mechanisms (Markov, Recurrent) Ø Design three domain-specific rewards to find the optimal recommendation strategy Ø Review & Explore, Smoothness and Engagement

ØFuture work

Ø Seek more ways to learn the reward settings automatically

Ø Behaviors: if the student solves exercises very quickly, set g with a lower value

Ø Develop a system and apply DRE framework online

Ø Get and test real-world feedback Ø Find more direct method to evaluate the students’ satisfaction.

Ø Extend to more general domains

Ø Online shopping, e-commerce, POI service etc

slide-28
SLIDE 28

Anhui Province Key Laboratory of Big Data Analysis and Application

28

The 28th ACM International Conference

  • n Information and Knowledge

Management (CIKM 2019)

Thanks for your listening!

Welcome to our poster for more details tonight huangzhy@mail.ustc.edu.cn