[PPT] - Outline Morning program Preliminaries Modeling user behavior PowerPoint Presentation

SLIDE 1

235

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems Industry insights Q & A

SLIDE 2

236

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 3

237

Recommender systems

Recommender systems – The task

I Build a model that estimates how a

user will like an item.

I A typical recommendation setup has

I matrix with users and items I plus ratings of users for items

reflecting past/known preferences

and tries to predict future preferences

I This is not about rating prediction

[Karatzoglou and Hidasi, Deep Learning for Recommender Systems, RecSys ’17, 2017]

SLIDE 4

238

Recommender systems

Approaches to recommender systems

I Collaborative filtering

I Based on analyzing users’ behavior and preferences such as ratings given to movies

r books

I Content-based filtering

I Based on matching the descriptions of items and users’ profiles I Users’ profiles are typically constructed using their previous purchases/ratings, their

submitted queries to search engines and so on

I A hybrid approach

SLIDE 5

239

Recommender systems

Warm, cold

I Cold start problem

I User cold-start problem – generate recommendations for a new user / a user for

whom very few preferences are known

I Item cold-start problem – recommendation items that are new / for which very

users have shared ratings or preferences

I Cold items/users I Warm items/users

SLIDE 6

240

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 7

241

Recommender systems

Matrix factorization

I The recommender system’s work horse

V R

SLIDE 8

242

Recommender systems

Matrix factorization

I Discover the latent features underlying the interactions between users and items I Don’t rely on imputation to fill in missing ratings and make matrix dense I Instead, model observed ratings directly, avoid overfitting through a regularized

model

I Minimize the regularized squared error on the set of known ratings:

min

u,v

X

i,j∈R

(ri,j uT

i vj) + λ(kuik2 + kvjk2)

Popular methods for minimizing include stochastic gradient descent and alternating least squares

SLIDE 9

243

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 10

244

Recommender systems

A feed-forward neural network view

[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]

SLIDE 11

245

Recommender systems

A deeper view

SLIDE 12

246

Recommender systems

Matrix factorization vs. feed-forward network

I Two models are very similar

I Embeddings, MSE loss, gradient-based optimization

I Feed-forward net can learn different embedding combinations than a dot product I Capturing pairwise interactions through feed-forward net requires a huge amount

f data

I This approach is not superior to properly tuned traditional matrix factorization

approach

SLIDE 13

247

Recommender systems

Great escape . . .

I Side information I Richer models I Other tasks

SLIDE 14

248

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 15

249

Recommender systems

Side information for recommendation

(1) (2) (3) (4)

SLIDE 16

250

Recommender systems

Side information for recommendation

I Textual side information

I Product description, reviews, etc. I Extraction: RNNs, one dimensional CNNs, word embeddings, paragraph vectors I Applications: news, products, books, publication

I Images

I Product pictures, video thumbnails I Extraction: CNNs I Applications: fashion, video

I Music/audio

I Extraction: CNNs and RNNs I Applications: music

SLIDE 17

251

Recommender systems

Textual side information

I Content2vec [Nedelec et al., 2016] I Using associated textual information for recommendations [Bansal et al., 2016]

SLIDE 18

252

Recommender systems

Textual information for improving recommendations

I Task: paper recommendation I Item representation

I Text representation: RNN based I Item-specific embeddings created using MF I Final representation: item + text embeddings

SLIDE 19

253

Recommender systems

Images in recommendation

Visual Bayesian Personalized Ranking (BPR) [He and McAuley, 2016]

I Bias terms I MF model I Visual part:

I Pretrained CNN features I Dimension reduction through embeddings

I BPR loss

SLIDE 20

254

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 21

255

Recommender systems

Alternative models

I Restricted Boltzman Machines [Salakhutdinov et al., 2007] I Auto-encoders [Wu et al., 2016] I Prod2vec [Grbovic et al., 2015] I Wide + Deep models [Cheng et al., 2016]

SLIDE 22

256

Recommender systems

Restricted Boltzman Machines – RBM

I Generative stochastic neural network I Visible and hidden units connected by weights I Activation probabilities:

p(hj = 1|v) = σ(bh

j + Pm i=1 wi,jvi)

p(vi = 1|h) = σ(bv

i + Pn j=1 wi,jhj) I Training

I Set visible units based on data, sample hidden units, then sample

visible units

I Modify weights to approach the configuration of visible units to the

data

I In recommendation:

I Visible units: ratings on the movie I Vector of length 5 (for each rating value) in each unit I Units corresponding to users who not rated the movie are ignored

SLIDE 23

257

Recommender systems

Auto-encoders

I One hidden layer I Same number of input and output units I Try to reconstruct the input on the output I Hidden layer: compressed representation of the data

Constraining the model: improve generalization

I Sparse auto-encoders: activation of units are limited I Denoising auto-encoders: corrupt the input

SLIDE 24

258

Recommender systems

Auto-encoders for recommendation

Reconstruct corrupted user interaction vectors [Wu et al., 2016]

I Collaborative Denoising Auto-Encoder (CDAE) I The link between nodes are associated with

different weights

I The links with red color are user specific I Other weights are shared across all the users

SLIDE 25

259

Recommender systems

Prod2vec and Item2vec

I Prod2vec and item2vec: Item-item co-occurrence

factorization

I User2vec: User-user co-occurrence factorization I The two approaches can be combined [Liang

et al., 2016]

SLIDE 26

260

Recommender systems

Wide + Deep models

I Combination of two models I Deep neural network

I On embedded item features I In charge of generalization

I Linear model

I On embedded item feature I And cross product of item features I In charge of memorization on binarized

features

I [Cheng et al., 2016]

SLIDE 27

261

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 28

262

Recommender systems

Other tasks

I Session-based recommendation I Contextual sequence prediction I Time-sensitive sequence prediction I Causality in recommendations I Recommendation as question answering I Deep reinforcement learning for recommendations

SLIDE 29

263

Recommender systems

Session-based recommendation

I Treat recommendations as a sequence classification problem I Input: a sequence of user actions (purchases/ratings of items) I Output: next action I Disjoint sessions (instead of consistent user history)

SLIDE 30

264

Recommender systems

GRU4Rec

Network structure [Hidasi et al., 2016]

I Input: one hot encoded item ID I Output: scores over all items I Goal: predicting the next item in the session

Adapting GRU to session-based recommendations

I Session-parallel mini-batching: to handle sessions of (very)

different length and lots of short sessions

I Sampling on the output: to handle lots of items

(inputs,outputs)

SLIDE 31

265

Recommender systems

GRU4Rec

Session-parallel mini-batches

I Mini-batch is defined over sessions

Output sampling

I Computing scores for all items (100K

1M) in every step is slow

I One positive item (target) + several

samples

I Fast solution: scores on mini-batch targets I Items of the other mini-batches are

negative samples for the current mini-batch

SLIDE 32

266

Recommender systems

Contextual sequence prediction

I Input: sequence of contextual user

actions, plus current context

I Output: probability of next action I E.g. “Given all the actions a user has

taken so far, what’s the most likely video they’re going to play right now?” [Beutel et al., 2018]

SLIDE 33

267

Recommender systems

Time-sensitive sequence prediction

I Recommendations are actions at a

moment in time

I Proper modeling of time and system

dynamics is critical

I Experiment on a Netflix internal

dataset

I Context: I Discrete time – Day-of-week:

Sunday, Monday, . . . Hour-of-day

I Continuous time (Timestamp) I Predict next play (temporal split

data)

[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]

SLIDE 34

And now, for some speculative tasks in the recommender systems space

Answers not clear, but good potential for follow-up research

268

SLIDE 35

269

Recommender systems

Causality in recommendations – [Schnabel et al., 2016]

I Virtually all data for training recommender

systems is subject to selection biases

I In movie recommendation, users typically

watch and rate movies they like, rarely movies they do not like

I View recommendation from causal inference

perspective – exposing user to item is intervention analogous to exposing patient to treatment in medical study

I Propensity-weighted MF method – propensities

act as weights on loss terms

min

u,v

X

i,j∈R

1 Pi,j (ri,j uT

i vj) + λ(kuik2 + kvjk2)

I Performance: MF vs. propensity

weighted MF (As α ! 0, data is increasingly missing not at random, only revealing top rated items.)

I How to incorporate this in neural

networks for recommender systems?

SLIDE 36

270

Recommender systems

Recommendations as question answering – [Dodge et al., 2015]

I Conversational recommendation agent

I (1) question-answering (QA), (2)

recommendation, (3) mix of recommendation and QA and (4) general dialog about the topic (chit-chat)

I Memory network jointly

trained on all (four) tasks performs best

I Incorporate short and long

term memory and can use local context and knowledge bases of facts

I Performance on QA needs a

real boost

I Performance degraded rather

than improved when training

n all four tasks at once

SLIDE 37

271

Recommender systems

Deep reinforcement learning for recommendations – [Zhao et al., 2017]

I MDP-based formulations of recommender systems go back to early 2000s I Use of reinforcement learning has two advantages

1. can continuously update strategies during interactions
2. are able to learn strategy that maximizes the long-term cumulative reward from users

I List-wise recommendation framework, which can be applied in scenarios with large

and dynamic item spaces

I Uses Actor-Critic network I Integrate multiple orders – positional order, temporal order I Needs proper evaluation in live environment

SLIDE 38

272

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

SLIDE 39

273

Recommender systems

Wrap-up

I Current directions

I Item/user embedding I Deep collaborative filtering I Feature extraction from content I Session- and context-based recommendation I Fairness, accuracy, confidentiality, and transparency (FACT)

I Deep learning can work well for recommendations I Matrix factorization and deep learning are very similar in classic recommendation

setup

I Lots of areas to explore

SLIDE 40

274

Recommender systems

Resources

RankSys : https://github.com/RankSys/RankSys LibRec : https://www.librec.net LensKit : http://lenskit.org LibMF : https://www.csie.ntu.edu.tw/%7Ecjlin/libmf/ proNet-core : https://github.com/cnclabs/proNet-core rival : http://rival.recommenders.net TensorRec : https://github.com/jfkirk/tensorrec