The BigChaos Solution to the Netflix Prize
Presented by: Chinfeng Wu
1
Saturday, April 10, 2010
The BigChaos Solution to the Netflix Prize Presented by: Chinfeng - - PowerPoint PPT Presentation
The BigChaos Solution to the Netflix Prize Presented by: Chinfeng Wu 1 Saturday, April 10, 2010 Outline The Netflix Prize The team "BigChaos" Algorithms Details in selected algorithms End-Game
Presented by: Chinfeng Wu
1
Saturday, April 10, 2010
2
Saturday, April 10, 2010
algorithm
Data” (could submit multiple times, limit of once/day)
current system
each year
3
Saturday, April 10, 2010
by 480k users x 17.7k movies between Oct 1998 and Dec 2005
determining final winner
4
Saturday, April 10, 2010
movie
2006
5
Saturday, April 10, 2010
6
Saturday, April 10, 2010
used to find parameters lead to local minimum RMSE.
to minimize the error function.
movie.
7
Saturday, April 10, 2010
the basis of weekday means. Calculate weekday averages per user, movie and
SVD Alternating Least Squares (SVD-ALS): Both are from BellKor. No more discussion.
8
Saturday, April 10, 2010
the basis of weekday means. Calculate weekday averages per user, movie and
SVD Alternating Least Squares (SVD-ALS): Both are from BellKor. No more discussion.
8
Saturday, April 10, 2010
9
Saturday, April 10, 2010
10
Saturday, April 10, 2010
11
Saturday, April 10, 2010
11
Saturday, April 10, 2010
11
Saturday, April 10, 2010
precomputed.
α range from 200 to 9000, set by APT1
12
Saturday, April 10, 2010
Simplest form of a KNN model. Weight the K best correlating neighbors based on their correlation cij.
Extension of basic model. Use sigmoid function to rescale the correlations cij to achieve lower RMSE.
13
Saturday, April 10, 2010
Basic idea: give recent ratings a higher weight than the old ones.
Not use Pearson or Set correlations. Use the length of common substring between movies and production year to get weighting coefficients.
14
Saturday, April 10, 2010
item neighborhood models
correlations
are generally not known
15
Saturday, April 10, 2010
every item/user
16
Saturday, April 10, 2010
Take the dot product of input vector p and weight vector w (sometimes with a bias value b). Take the dot product as input of activation function to get the output.
Use many neurons to compute, Each neuron needs to be trained to get better weight vector and bias.
17
Saturday, April 10, 2010
the input of next layer.
linear.
18
Saturday, April 10, 2010
around the global minimum.
behavior when activated).
same layer.
bidirectional and symmetric (weights are the same in both directions).
19
Saturday, April 10, 2010
visible units.
softmax units for the movies that has rated for each user.
20
Saturday, April 10, 2010
matrix V and conditional Bernoulli distribution for hidden user features h: with:
21
Saturday, April 10, 2010
Team “BellKorPragmaticChaos” submit 1st 10% better result, trigger 30-day “last call”.
form a new team, combine their models and quickly get 10% better result.
monitoring the leaderboard, optimizing their algorithms and submitting results once a day.
22
Saturday, April 10, 2010
“BellKor” submits a little early, 40 mins before deadline; “Ensemble” submits 20 mins later
their code and documentation (mid-August).
winners that they have won $1 million prize (late August)
23
Saturday, April 10, 2010
“BellKor” submits a little early, 40 mins before deadline; “Ensemble” submits 20 mins later
their code and documentation (mid-August).
winners that they have won $1 million prize (late August)
23
Saturday, April 10, 2010
“BellKor” submits a little early, 40 mins before deadline; “Ensemble” submits 20 mins later
their code and documentation (mid-August).
winners that they have won $1 million prize (late August)
23
Saturday, April 10, 2010
“BellKor” submits a little early, 40 mins before deadline; “Ensemble” submits 20 mins later
their code and documentation (mid-August).
winners that they have won $1 million prize (late August)
23
Saturday, April 10, 2010
Training and optimizing predictors individually is not optimal.The whole ensemble need to have the right tradeoff between diversity and accuracy. (As Greedy method, local optimal is not global optimal.)
Collaboration among participants is good. Combining models works surprisingly well. (But final 10% improvement can probably be achieved by combining about 10 models rather than 1000’s.)
24
Saturday, April 10, 2010
2008.
Grand Prize. 2009.
machines for collaborative filtering. In ICML, pages 791-798, 2007.
based algorithms for large-scale recommender systems. In KDD Workshop at SIGKDD 08, August 2008.
277: Data Mining, download from http://www.ics.uci.edu/~smyth/ courses/cs277/slides/netflix_overview.pdf
25
Saturday, April 10, 2010
26
Saturday, April 10, 2010