Outline Morning program Preliminaries Modeling user behavior - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Morning program Preliminaries Modeling user behavior - - PowerPoint PPT Presentation

Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems Industry insights Q & A 235 Outline Morning program Preliminaries


slide-1
SLIDE 1

235

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems Industry insights Q & A

slide-2
SLIDE 2

236

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-3
SLIDE 3

237

Recommender systems

Recommender systems – The task

I Build a model that estimates how a

user will like an item.

I A typical recommendation setup has

I matrix with users and items I plus ratings of users for items

reflecting past/known preferences

and tries to predict future preferences

I This is not about rating prediction

[Karatzoglou and Hidasi, Deep Learning for Recommender Systems, RecSys ’17, 2017]

slide-4
SLIDE 4

238

Recommender systems

Approaches to recommender systems

I Collaborative filtering

I Based on analyzing users’ behavior and preferences such as ratings given to movies

  • r books

I Content-based filtering

I Based on matching the descriptions of items and users’ profiles I Users’ profiles are typically constructed using their previous purchases/ratings, their

submitted queries to search engines and so on

I A hybrid approach

slide-5
SLIDE 5

239

Recommender systems

Warm, cold

I Cold start problem

I User cold-start problem – generate recommendations for a new user / a user for

whom very few preferences are known

I Item cold-start problem – recommendation items that are new / for which very

users have shared ratings or preferences

I Cold items/users I Warm items/users

slide-6
SLIDE 6

240

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-7
SLIDE 7

241

Recommender systems

Matrix factorization

I The recommender system’s work horse

V R

slide-8
SLIDE 8

242

Recommender systems

Matrix factorization

I Discover the latent features underlying the interactions between users and items I Don’t rely on imputation to fill in missing ratings and make matrix dense I Instead, model observed ratings directly, avoid overfitting through a regularized

model

I Minimize the regularized squared error on the set of known ratings:

min

u,v

X

i,j∈R

(ri,j uT

i vj) + λ(kuik2 + kvjk2)

Popular methods for minimizing include stochastic gradient descent and alternating least squares

slide-9
SLIDE 9

243

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-10
SLIDE 10

244

Recommender systems

A feed-forward neural network view

[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]

slide-11
SLIDE 11

245

Recommender systems

A deeper view

slide-12
SLIDE 12

246

Recommender systems

Matrix factorization vs. feed-forward network

I Two models are very similar

I Embeddings, MSE loss, gradient-based optimization

I Feed-forward net can learn different embedding combinations than a dot product I Capturing pairwise interactions through feed-forward net requires a huge amount

  • f data

I This approach is not superior to properly tuned traditional matrix factorization

approach

slide-13
SLIDE 13

247

Recommender systems

Great escape . . .

I Side information I Richer models I Other tasks

slide-14
SLIDE 14

248

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-15
SLIDE 15

249

Recommender systems

Side information for recommendation

(1) (2) (3) (4)

slide-16
SLIDE 16

250

Recommender systems

Side information for recommendation

I Textual side information

I Product description, reviews, etc. I Extraction: RNNs, one dimensional CNNs, word embeddings, paragraph vectors I Applications: news, products, books, publication

I Images

I Product pictures, video thumbnails I Extraction: CNNs I Applications: fashion, video

I Music/audio

I Extraction: CNNs and RNNs I Applications: music

slide-17
SLIDE 17

251

Recommender systems

Textual side information

I Content2vec [Nedelec et al., 2016] I Using associated textual information for recommendations [Bansal et al., 2016]

slide-18
SLIDE 18

252

Recommender systems

Textual information for improving recommendations

I Task: paper recommendation I Item representation

I Text representation: RNN based I Item-specific embeddings created using MF I Final representation: item + text embeddings

slide-19
SLIDE 19

253

Recommender systems

Images in recommendation

Visual Bayesian Personalized Ranking (BPR) [He and McAuley, 2016]

I Bias terms I MF model I Visual part:

I Pretrained CNN features I Dimension reduction through embeddings

I BPR loss

slide-20
SLIDE 20

254

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-21
SLIDE 21

255

Recommender systems

Alternative models

I Restricted Boltzman Machines [Salakhutdinov et al., 2007] I Auto-encoders [Wu et al., 2016] I Prod2vec [Grbovic et al., 2015] I Wide + Deep models [Cheng et al., 2016]

slide-22
SLIDE 22

256

Recommender systems

Restricted Boltzman Machines – RBM

I Generative stochastic neural network I Visible and hidden units connected by weights I Activation probabilities:

p(hj = 1|v) = σ(bh

j + Pm i=1 wi,jvi)

p(vi = 1|h) = σ(bv

i + Pn j=1 wi,jhj) I Training

I Set visible units based on data, sample hidden units, then sample

visible units

I Modify weights to approach the configuration of visible units to the

data

I In recommendation:

I Visible units: ratings on the movie I Vector of length 5 (for each rating value) in each unit I Units corresponding to users who not rated the movie are ignored

slide-23
SLIDE 23

257

Recommender systems

Auto-encoders

Auto-encoders

I One hidden layer I Same number of input and output units I Try to reconstruct the input on the output I Hidden layer: compressed representation of the data

Constraining the model: improve generalization

I Sparse auto-encoders: activation of units are limited I Denoising auto-encoders: corrupt the input

slide-24
SLIDE 24

258

Recommender systems

Auto-encoders for recommendation

Reconstruct corrupted user interaction vectors [Wu et al., 2016]

I Collaborative Denoising Auto-Encoder (CDAE) I The link between nodes are associated with

different weights

I The links with red color are user specific I Other weights are shared across all the users

slide-25
SLIDE 25

259

Recommender systems

Prod2vec and Item2vec

I Prod2vec and item2vec: Item-item co-occurrence

factorization

I User2vec: User-user co-occurrence factorization I The two approaches can be combined [Liang

et al., 2016]

slide-26
SLIDE 26

260

Recommender systems

Wide + Deep models

I Combination of two models I Deep neural network

I On embedded item features I In charge of generalization

I Linear model

I On embedded item feature I And cross product of item features I In charge of memorization on binarized

features

I [Cheng et al., 2016]

slide-27
SLIDE 27

261

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-28
SLIDE 28

262

Recommender systems

Other tasks

I Session-based recommendation I Contextual sequence prediction I Time-sensitive sequence prediction I Causality in recommendations I Recommendation as question answering I Deep reinforcement learning for recommendations

slide-29
SLIDE 29

263

Recommender systems

Session-based recommendation

I Treat recommendations as a sequence classification problem I Input: a sequence of user actions (purchases/ratings of items) I Output: next action I Disjoint sessions (instead of consistent user history)

slide-30
SLIDE 30

264

Recommender systems

GRU4Rec

Network structure [Hidasi et al., 2016]

I Input: one hot encoded item ID I Output: scores over all items I Goal: predicting the next item in the session

Adapting GRU to session-based recommendations

I Session-parallel mini-batching: to handle sessions of (very)

different length and lots of short sessions

I Sampling on the output: to handle lots of items

(inputs,outputs)

slide-31
SLIDE 31

265

Recommender systems

GRU4Rec

Session-parallel mini-batches

I Mini-batch is defined over sessions

Output sampling

I Computing scores for all items (100K

1M) in every step is slow

I One positive item (target) + several

samples

I Fast solution: scores on mini-batch targets I Items of the other mini-batches are

negative samples for the current mini-batch

slide-32
SLIDE 32

266

Recommender systems

Contextual sequence prediction

I Input: sequence of contextual user

actions, plus current context

I Output: probability of next action I E.g. “Given all the actions a user has

taken so far, what’s the most likely video they’re going to play right now?” [Beutel et al., 2018]

slide-33
SLIDE 33

267

Recommender systems

Time-sensitive sequence prediction

I Recommendations are actions at a

moment in time

I Proper modeling of time and system

dynamics is critical

I Experiment on a Netflix internal

dataset

I Context: I Discrete time – Day-of-week:

Sunday, Monday, . . . Hour-of-day

I Continuous time (Timestamp) I Predict next play (temporal split

data)

[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]

slide-34
SLIDE 34

And now, for some speculative tasks in the recommender systems space

Answers not clear, but good potential for follow-up research

268

slide-35
SLIDE 35

269

Recommender systems

Causality in recommendations – [Schnabel et al., 2016]

I Virtually all data for training recommender

systems is subject to selection biases

I In movie recommendation, users typically

watch and rate movies they like, rarely movies they do not like

I View recommendation from causal inference

perspective – exposing user to item is intervention analogous to exposing patient to treatment in medical study

I Propensity-weighted MF method – propensities

act as weights on loss terms

min

u,v

X

i,j∈R

1 Pi,j (ri,j uT

i vj) + λ(kuik2 + kvjk2)

I Performance: MF vs. propensity

weighted MF (As α ! 0, data is increasingly missing not at random, only revealing top rated items.)

I How to incorporate this in neural

networks for recommender systems?

slide-36
SLIDE 36

270

Recommender systems

Recommendations as question answering – [Dodge et al., 2015]

I Conversational recommendation agent

I (1) question-answering (QA), (2)

recommendation, (3) mix of recommendation and QA and (4) general dialog about the topic (chit-chat)

I Memory network jointly

trained on all (four) tasks performs best

I Incorporate short and long

term memory and can use local context and knowledge bases of facts

I Performance on QA needs a

real boost

I Performance degraded rather

than improved when training

  • n all four tasks at once
slide-37
SLIDE 37

271

Recommender systems

Deep reinforcement learning for recommendations – [Zhao et al., 2017]

I MDP-based formulations of recommender systems go back to early 2000s I Use of reinforcement learning has two advantages

  • 1. can continuously update strategies during interactions
  • 2. are able to learn strategy that maximizes the long-term cumulative reward from users

I List-wise recommendation framework, which can be applied in scenarios with large

and dynamic item spaces

I Uses Actor-Critic network I Integrate multiple orders – positional order, temporal order I Needs proper evaluation in live environment

slide-38
SLIDE 38

272

Outline

Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems

Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up

Industry insights Q & A

slide-39
SLIDE 39

273

Recommender systems

Wrap-up

I Current directions

I Item/user embedding I Deep collaborative filtering I Feature extraction from content I Session- and context-based recommendation I Fairness, accuracy, confidentiality, and transparency (FACT)

I Deep learning can work well for recommendations I Matrix factorization and deep learning are very similar in classic recommendation

setup

I Lots of areas to explore

slide-40
SLIDE 40

274

Recommender systems

Resources

RankSys : https://github.com/RankSys/RankSys LibRec : https://www.librec.net LensKit : http://lenskit.org LibMF : https://www.csie.ntu.edu.tw/%7Ecjlin/libmf/ proNet-core : https://github.com/cnclabs/proNet-core rival : http://rival.recommenders.net TensorRec : https://github.com/jfkirk/tensorrec