235
Outline Morning program Preliminaries Modeling user behavior - - PowerPoint PPT Presentation
Outline Morning program Preliminaries Modeling user behavior - - PowerPoint PPT Presentation
Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems Industry insights Q & A 235 Outline Morning program Preliminaries
236
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
237
Recommender systems
Recommender systems – The task
I Build a model that estimates how a
user will like an item.
I A typical recommendation setup has
I matrix with users and items I plus ratings of users for items
reflecting past/known preferences
and tries to predict future preferences
I This is not about rating prediction
[Karatzoglou and Hidasi, Deep Learning for Recommender Systems, RecSys ’17, 2017]
238
Recommender systems
Approaches to recommender systems
I Collaborative filtering
I Based on analyzing users’ behavior and preferences such as ratings given to movies
- r books
I Content-based filtering
I Based on matching the descriptions of items and users’ profiles I Users’ profiles are typically constructed using their previous purchases/ratings, their
submitted queries to search engines and so on
I A hybrid approach
239
Recommender systems
Warm, cold
I Cold start problem
I User cold-start problem – generate recommendations for a new user / a user for
whom very few preferences are known
I Item cold-start problem – recommendation items that are new / for which very
users have shared ratings or preferences
I Cold items/users I Warm items/users
240
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
241
Recommender systems
Matrix factorization
I The recommender system’s work horse
V R
242
Recommender systems
Matrix factorization
I Discover the latent features underlying the interactions between users and items I Don’t rely on imputation to fill in missing ratings and make matrix dense I Instead, model observed ratings directly, avoid overfitting through a regularized
model
I Minimize the regularized squared error on the set of known ratings:
min
u,v
X
i,j∈R
(ri,j uT
i vj) + λ(kuik2 + kvjk2)
Popular methods for minimizing include stochastic gradient descent and alternating least squares
243
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
244
Recommender systems
A feed-forward neural network view
[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]
245
Recommender systems
A deeper view
246
Recommender systems
Matrix factorization vs. feed-forward network
I Two models are very similar
I Embeddings, MSE loss, gradient-based optimization
I Feed-forward net can learn different embedding combinations than a dot product I Capturing pairwise interactions through feed-forward net requires a huge amount
- f data
I This approach is not superior to properly tuned traditional matrix factorization
approach
247
Recommender systems
Great escape . . .
I Side information I Richer models I Other tasks
248
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
249
Recommender systems
Side information for recommendation
(1) (2) (3) (4)
250
Recommender systems
Side information for recommendation
I Textual side information
I Product description, reviews, etc. I Extraction: RNNs, one dimensional CNNs, word embeddings, paragraph vectors I Applications: news, products, books, publication
I Images
I Product pictures, video thumbnails I Extraction: CNNs I Applications: fashion, video
I Music/audio
I Extraction: CNNs and RNNs I Applications: music
251
Recommender systems
Textual side information
I Content2vec [Nedelec et al., 2016] I Using associated textual information for recommendations [Bansal et al., 2016]
252
Recommender systems
Textual information for improving recommendations
I Task: paper recommendation I Item representation
I Text representation: RNN based I Item-specific embeddings created using MF I Final representation: item + text embeddings
253
Recommender systems
Images in recommendation
Visual Bayesian Personalized Ranking (BPR) [He and McAuley, 2016]
I Bias terms I MF model I Visual part:
I Pretrained CNN features I Dimension reduction through embeddings
I BPR loss
254
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
255
Recommender systems
Alternative models
I Restricted Boltzman Machines [Salakhutdinov et al., 2007] I Auto-encoders [Wu et al., 2016] I Prod2vec [Grbovic et al., 2015] I Wide + Deep models [Cheng et al., 2016]
256
Recommender systems
Restricted Boltzman Machines – RBM
I Generative stochastic neural network I Visible and hidden units connected by weights I Activation probabilities:
p(hj = 1|v) = σ(bh
j + Pm i=1 wi,jvi)
p(vi = 1|h) = σ(bv
i + Pn j=1 wi,jhj) I Training
I Set visible units based on data, sample hidden units, then sample
visible units
I Modify weights to approach the configuration of visible units to the
data
I In recommendation:
I Visible units: ratings on the movie I Vector of length 5 (for each rating value) in each unit I Units corresponding to users who not rated the movie are ignored
257
Recommender systems
Auto-encoders
Auto-encoders
I One hidden layer I Same number of input and output units I Try to reconstruct the input on the output I Hidden layer: compressed representation of the data
Constraining the model: improve generalization
I Sparse auto-encoders: activation of units are limited I Denoising auto-encoders: corrupt the input
258
Recommender systems
Auto-encoders for recommendation
Reconstruct corrupted user interaction vectors [Wu et al., 2016]
I Collaborative Denoising Auto-Encoder (CDAE) I The link between nodes are associated with
different weights
I The links with red color are user specific I Other weights are shared across all the users
259
Recommender systems
Prod2vec and Item2vec
I Prod2vec and item2vec: Item-item co-occurrence
factorization
I User2vec: User-user co-occurrence factorization I The two approaches can be combined [Liang
et al., 2016]
260
Recommender systems
Wide + Deep models
I Combination of two models I Deep neural network
I On embedded item features I In charge of generalization
I Linear model
I On embedded item feature I And cross product of item features I In charge of memorization on binarized
features
I [Cheng et al., 2016]
261
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
262
Recommender systems
Other tasks
I Session-based recommendation I Contextual sequence prediction I Time-sensitive sequence prediction I Causality in recommendations I Recommendation as question answering I Deep reinforcement learning for recommendations
263
Recommender systems
Session-based recommendation
I Treat recommendations as a sequence classification problem I Input: a sequence of user actions (purchases/ratings of items) I Output: next action I Disjoint sessions (instead of consistent user history)
264
Recommender systems
GRU4Rec
Network structure [Hidasi et al., 2016]
I Input: one hot encoded item ID I Output: scores over all items I Goal: predicting the next item in the session
Adapting GRU to session-based recommendations
I Session-parallel mini-batching: to handle sessions of (very)
different length and lots of short sessions
I Sampling on the output: to handle lots of items
(inputs,outputs)
265
Recommender systems
GRU4Rec
Session-parallel mini-batches
I Mini-batch is defined over sessions
Output sampling
I Computing scores for all items (100K
1M) in every step is slow
I One positive item (target) + several
samples
I Fast solution: scores on mini-batch targets I Items of the other mini-batches are
negative samples for the current mini-batch
266
Recommender systems
Contextual sequence prediction
I Input: sequence of contextual user
actions, plus current context
I Output: probability of next action I E.g. “Given all the actions a user has
taken so far, what’s the most likely video they’re going to play right now?” [Beutel et al., 2018]
267
Recommender systems
Time-sensitive sequence prediction
I Recommendations are actions at a
moment in time
I Proper modeling of time and system
dynamics is critical
I Experiment on a Netflix internal
dataset
I Context: I Discrete time – Day-of-week:
Sunday, Monday, . . . Hour-of-day
I Continuous time (Timestamp) I Predict next play (temporal split
data)
[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]
And now, for some speculative tasks in the recommender systems space
Answers not clear, but good potential for follow-up research
268
269
Recommender systems
Causality in recommendations – [Schnabel et al., 2016]
I Virtually all data for training recommender
systems is subject to selection biases
I In movie recommendation, users typically
watch and rate movies they like, rarely movies they do not like
I View recommendation from causal inference
perspective – exposing user to item is intervention analogous to exposing patient to treatment in medical study
I Propensity-weighted MF method – propensities
act as weights on loss terms
min
u,v
X
i,j∈R
1 Pi,j (ri,j uT
i vj) + λ(kuik2 + kvjk2)
I Performance: MF vs. propensity
weighted MF (As α ! 0, data is increasingly missing not at random, only revealing top rated items.)
I How to incorporate this in neural
networks for recommender systems?
270
Recommender systems
Recommendations as question answering – [Dodge et al., 2015]
I Conversational recommendation agent
I (1) question-answering (QA), (2)
recommendation, (3) mix of recommendation and QA and (4) general dialog about the topic (chit-chat)
I Memory network jointly
trained on all (four) tasks performs best
I Incorporate short and long
term memory and can use local context and knowledge bases of facts
I Performance on QA needs a
real boost
I Performance degraded rather
than improved when training
- n all four tasks at once
271
Recommender systems
Deep reinforcement learning for recommendations – [Zhao et al., 2017]
I MDP-based formulations of recommender systems go back to early 2000s I Use of reinforcement learning has two advantages
- 1. can continuously update strategies during interactions
- 2. are able to learn strategy that maximizes the long-term cumulative reward from users
I List-wise recommendation framework, which can be applied in scenarios with large
and dynamic item spaces
I Uses Actor-Critic network I Integrate multiple orders – positional order, temporal order I Needs proper evaluation in live environment
272
Outline
Morning program Preliminaries Modeling user behavior Semantic matching Learning to rank Afternoon program Entities Generating responses Recommender systems
Items and Users Matrix factorization Matrix factorization as a network Side information Richer models Other tasks Wrap-up
Industry insights Q & A
273
Recommender systems
Wrap-up
I Current directions
I Item/user embedding I Deep collaborative filtering I Feature extraction from content I Session- and context-based recommendation I Fairness, accuracy, confidentiality, and transparency (FACT)
I Deep learning can work well for recommendations I Matrix factorization and deep learning are very similar in classic recommendation
setup
I Lots of areas to explore
274