[PPT] - ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers PowerPoint Presentation

SLIDE 1

Comparing Recommenda/on  Algorithms for Social Bookmarking 

Toine Bogers  Royal School of Library and Informa/on Science  Copenhagen, Denmark 

SLIDE 2

About me 

Ph.D. from Tilburg University 
“Recommender Systems for Social Bookmarking” 
Promotor: Prof. dr. Antal van den Bosch 
Currently @ RSLIS (Copenhagen, DK) 
Research assistant on retrieval fusion project 
Research interests 
Recommender systems 
Social bookmarking 
Expert search 
Informa/on retrieval

SLIDE 3

Outline 

1. Introduc/on 
2. Collabora/ve filtering 
3. Content‐based filtering 
4. Recommender systems fusion 
5. Conclusions

SLIDE 4

SLIDE 5

Social bookmarking 

Way of storing, organizing, and managing bookmarks of

Web pages, scien/fic ar/cles, books, etc.  

All done online 
Can be made public or kept private 
Allow users to tag (= label) their items 
Many different websites available:

SLIDE 6

Social bookmarking 

Different domains 
Web pages 
Scien/fic ar/cles 
Books 
Strong growth in popularity 
Millions of users, items, and tags 
For example: Delicious 
140,000+ posts/day on average in 2008 (Keller, 2009) 
7,000,000+ posts/month in 2008 (Wetzker et al., 2009)

SLIDE 7

Content overload 

Problems with this growth 
Content overload 
Increasing ambiguity 
How can we deal with this? 
Browsing 
Search 
A possible solu/on 
Take a more ac/ve role: recommenda,on

Can become less effec/ve   as content increases! 

SLIDE 8

Recommenda/on tasks 

!"#$%"&%'("& )" *+")& ,"-#))"./ 012#.

!"#$%"& $,#3%'.4

*+")& "5$",+6 7#,"& %'("&+8'6& 914& 6:44"62#.

;#)1'. "5$",+6 !",6#.1%'<"0& 6"1,-8

;"$+8& =,#>6'.4 ?@AB *9A7 9CD ?@AB *9A7 9CD !"#$%&$''' ()*&"$+$$'''

SLIDE 9

Item recommenda/on 

Our focus: item recommenda,on   
Iden/fy sets of items that are likely to be of interest to a

certain user  

Return a ranked list of items 
‘Find Good Items’ task (Herlocker et al., 2004) 
Based on different informa/on sources 
Transac/on pajerns (usage data, purchase informa/on)

– Explicit ra/ngs  – Implicit feedback 

Metadata 
Tags

SLIDE 10

Related work 

Work on social bookmarking mostly focused on 
Improving browsing experience 
clustering, dealing with ambiguity 
Incorpora/ng tags in search algorithms 
Tag recommenda/on  
Problems with work on item recommenda/on 
Different data sets 
Different evalua/on metrics 
No comparison of algorithms under controlled condi/ons 
Hardly ever publicly available data sets 
No user‐based evalua/on

SLIDE 11

Collec/ng data 

Four data sets from two different domains 
Web bookmarks 
Delicious 
BibSonomy 
Scien/fic ar/cles 
CiteULike 
BibSonomy

~78% of users posted only type of content     (bookmarks or scien/fic ar/cles) 

SLIDE 12

What did we collect? 

Usage data 
User‐item‐tag triples with /mestamps 
Metadata 
Varies with the domain

Scien,fic ar,cles 

Item‐intrinsic 
TITLE, DESCRIPTION,

JOURNAL, AUTHOR, TAGS,  URL, etc. 

Item‐extrinsic 
CHAPTER, DAY, EDITION,

YEAR, INSTITUTION, etc.  

Web bookmarks 

TITLE, DESCRIPTION, TAGS,

URL 

SLIDE 13

Filtering 

Why? 
To reduce noise in our data sets 
Common procedure in recommender systems research 
How? 
≥ 20 items per user 
≥ 2 users per item (no hapax legomena items) 
No untagged posts 
Compared to related work 
Stricter filtering 
More realis/c

SLIDE 14

Data sets 

Delicious  BibSonomy  CiteULike  BibSonomy 

# users  1,243  192  1,322  167  # items  152,698  11,165  38,419  12,982  # tags  42,820  13,233  28,312  5,165  # posts  238,070  29,096  84,637  29,720  Scien,fic ar,cles  Bookmarks 

SLIDE 15

Experimental setup 

Backtes/ng 
Withhold randomly selected items from test users 
Use remaining material for training recommender system 
Success is predicted the user’s interest in his/her withheld

items  

Details 
Overall 90%‐10% split on users 
Withhold 10 randomly selected items of each test user 
Parameter op/miza/on 
Used 10‐fold cross‐valida/on 
90‐10 splits 
10 withheld items 
Macro‐averaging of evalua/on scores

SLIDE 16

Evalua/on 

‘Find Good Items’ task returns a ranked list 
Need metric that take into ranking of items 
Precision‐oriented metric 
Mean Average Precision (MAP) 
Average Precision (AP) is average of precision values at each relevant,

retrieved item 

MAP is AP averaged over all users 
“single figure measure of quality across recall levels” (Manning, 2009) 
Tested different metrics 
All precision‐oriented metrics showed the same picture

SLIDE 17

SLIDE 18

Collabora/ve filtering 

Ques/on 
How can we use the informa/on in the folksonomy to

generate bejer recommenda/ons?   

Users 
Items 
Tags 
Collabora/ve filtering (CF) 
Ajempts to automate “word‐of‐mouth” recommenda/ons 
Recommend items based on how like‐minded users rated

those items 

Similarity based on 
Usage data 
Tagging data

usage pajerns 

SLIDE 19

Collabora/ve filtering 

Model‐based CF 
‘Eager’ recommenda/on algorithms 
Train a predic/ve model of the recommenda/on task 
Quick to apply to generate recommenda/ons 
Memory‐based CF 
‘Lazy’ recommenda/on algorithms 
Simply store all pajerns in memory 
Defer predic/on effort to when user requests

recommenda/ons 

SLIDE 20

Related work 

Model‐based 
Hybrid PLSA‐based approach  (Wetzker et al., 2009) 
Tensor decomposi/on (Symeonidis et al., 2008) 
Memory‐based 
Tag‐aware fusion (Tso‐Sujer et al., 2008) 
Graph‐based 
FolkRank (Hotho et al., 2006) 
Random walk (Clements et al., 2008)

SLIDE 21

Algorithms 

User‐based k‐NN algorithm 
Calculate similarity between the ac/ve user and all other users 
Determine the top k nearest neighbors 
I.e., the most similar users 
Unseen items from nearest neighbors are scored by the

similarity between the neighbor and the ac/ve user 

Item‐based k‐NN algorithm 
Calculate similarity between the ac/ve user’s items and all 
ther items 
Determine the top k nearest neighbors 
I.e., the most similar items for each of the ac/ve user’s items 
Unseen neighboring items are scored by the similarity

between the neighbor and the ac/ve user’s item 

SLIDE 22

Usage data 

Baseline: CF using usage data 
Profile vectors 
User profiles 
Item profiles 
No explicit ra/ngs available 
Only binary informa/on (1 or 0) 
Or rather: unary! 
Similarity metric 
Cosine similarity 
10‐fold cross‐valua/on to op/mize k

UI  items  users 

SLIDE 23

Results (usage data) 

BibSonomy  Delicious  BibSonomy  CiteULike  UBCF + usage data  0.0277  0.0046  0.0865  0.0746  IBCF + usage data  0.0244  0.0027  0.0737  0.0887 

Scien,fic ar,cles  Bookmarks 

SLIDE 24

Tags are short topical descrip/ons of an item (or user) 
Profile vectors 
User tag profiles 
Item tag profiles 
Similarity metrics 
Cosine similarity 
Jaccard overlap 
Dice’s coefficient

Tagging data 

UT  tags  users  IT  tags  items 

SLIDE 25

Results (tagging data) 

BibSonomy  Delicious  BibSonomy  CiteULike  UBCF + usage data  0.0277  0.0046  0.0865  0.0746  IBCF + usage data  0.0244  0.0027  0.0737  0.0887  UBCF + tagging data  0.0102  0.0017  0.0459  0.0449  IBCF + tagging data  0.0370  0.0101  0.1100  0.0814 

Scien,fic ar,cles  Bookmarks 

SLIDE 26

Findings (tagging data) 

CF with tag overlap 
User‐based CF performs significantly worse 
Item‐based CF performs much bejer 
Ouen sta/s/cally significant improvements 
Except on CiteULike: CF without tags bejer 
Similarity metric rela/vely unimportant 
Cosine similarity slightly bejer

SLIDE 27

Comparison to related work 

Random walk model (Clements et al., 2008) 
Create transi/on matrix based on tripar/te folksonomy graph 
Similar to FolkRank, but no walks of infinite length 
Walk length n is a parameter 
Tag‐aware fusion (Tso‐Sujer et al., 2008) 
Fusion of algorithms and data representa,ons 
Usage data and tagging data 
User‐based CF  extend UI matrix with tags as extra items 
Item‐based CF  extend UI matrix with tags as extra users 
User‐based CF and item‐based CF 
Fuse together predic/ons

SLIDE 28

Comparison to related work 

!"#$%&'"#() *+,#$-./ 0,#1%&'"#() *+,#$-./

! "# $#2 %

!"#$" %&#'" &()" %&#'" %&#'" &()" !"#$" &()"

SLIDE 29

Results 

BibSonomy  Delicious  BibSonomy  CiteULike  UBCF + usage data  0.0277  0.0046  0.0865  0.0746  IBCF + usage data  0.0244  0.0027  0.0737  0.0887  UBCF + tagging data  0.0102  0.0017  0.0459  0.0449  IBCF + tagging data  0.0370  0.0101  0.1100  0.0814  UBCF + fused data  0.0303  0.0057  0.0829  0.0739  IBCF + fused data  0.0468  0.0125  0.1280  0.1212  Tag‐aware fusion  0.0474  0.0166  0.1297  0.1268  Random walk model  0.0182  0.0003  0.0608  0.0536 

Scien,fic ar,cles  Bookmarks 

SLIDE 30

SLIDE 31

Metadata‐based recommenda/on 

Ques/on 
How can we use the metadata to generate (bejer) item

recommenda/ons? 

Content‐based filtering 
Build representa/ons of the content in a system 
Learn a profile of the user’s interests 
Match content representa/ons against the user’s profile

SLIDE 32

Reminder: what did we collect? 

Two types of metadata 
Intrinsic metadata, i.e., directly rela/ng to the content 
E.g., <TITLE>, <DESCRIPTION>, <JOURNAL>, <AUTHOR>, ... 
Extrinsic metadata, i.e., administra/ve informa/on 
E.g., <PAGES>, <MONTH>, <EDITION>, …

SLIDE 33

Related work 

Common approaches 
Informa/on retrieval 
Machine learning 
Examples 
TF∙IDF weigh/ng (Lang, 1995; Whitman & Lawrence, 2002) 
Personal informa/on agents (Balabanovic, 1998; Joachims et

al., 1997; Chirita et al., 2006) 

Naive Bayes (Mooney et al., 2000; De Gemmis et al., 2008) 
Linear regression (Alspector et al., 1997) 
Nothing applied to social bookmarking so far!

SLIDE 34

Take an IR approach: profile‐centric matching 
Build representa/ons of the content in a system 
All metadata assigned to an item → item profile 
Learn a profile of the user’s interests 
Collate all of user’s metadata into a user profile 
Match and rank item profiles to user profiles 
Language modeling with Jelinek‐Mercer smoothing 
Stopword filtering, no stemming

Profile‐centric matching 

SLIDE 35

Profile‐centric matching 

!"#$%$%&'$()*'+",-.)/ 0123)'4/)"'+",-.)/ !"#$%&'(&)*"+(,-.*(/+)0 /$*$.#"$(5 *#(16$%&

7 8 9 : ; < = ; > ;

()/('+#$"/ ("#$%$%&'+#$"/

7 9 : 9 = < = 7 ; : ; > ; < ; 7 8 9 8 : 8 > 8

SLIDE 36

Problem 
Big user profile will match nearly anything 
Sacrificing precision for recall 
Different level of granularity: post‐centric matching 
Construct metadata representa/ons of each post 
Match each of the user’s posts against all other posts 
Match, rank, and aggregate all retrieved posts

Post‐centric matching 

SLIDE 37

Post‐centric matching 

!"#$%$%&'()*+* ,-./0'1*0"2*'()*+* !"#$%&'()*+,(-.*$/0(*1.,2 *$3$4#"$+5 3#+-6$%&

7 7 7 8

9'9'9

: : : : 8 ; 8 8

9'9'9

, , < ,

+0*+'(#$"* +"#$%$%&'(#$"*

7 , 8 , ; , 8 < = < 7 > ; > ? > = > 7 : 8 : ; : ? :

SLIDE 38

Results 

BibSonomy  Delicious  BibSonomy  CiteULike  Profile‐centric matching  0.0402  0.0014  0.1279  0.0987  Post‐centric matching  0.0259  0.0036  0.1190  0.0455 

Scien,fic ar,cles  Bookmarks 

Problem with post‐centric matching: data sparseness

SLIDE 39

Hybrid filtering 

Similarity between users and items based on metadata 
Plug these similari/es into standard k‐NN CF approach! 
User‐based CF with metadata‐based similari/es 
Textual similarity between user profiles 
Item‐based CF with metadata‐based similari/es 
Textual similarity between item profiles

SLIDE 40

Results 

BibSonomy  Delicious  BibSonomy  CiteULike  Profile‐centric matching  0.0402  0.0014  0.1279  0.0987  Post‐centric matching  0.0259  0.0036  0.1190  0.0455  Hybrid (UBCF + metadata)  0.0218  0.0039  0.0410  0.0608  Hybrid (IBCF + metadata)  0.0399  0.0017  0.1510  0.0746 

Scien,fic ar,cles  Bookmarks 

SLIDE 41

Results (comparison) 

BibSonomy  Delicious  BibSonomy  CiteULike  Profile‐centric matching  0.0402  0.0014  0.1279  0.0987  Post‐centric matching  0.0259  0.0036  0.1190  0.0455  Hybrid (UBCF + metadata)  0.0218  0.0039  0.0410  0.0608  Hybrid (IBCF + metadata)  0.0399  0.0017  0.1510  0.0746  Best CF run  0.0370  0.0101  0.1100  0.0887 

Scien,fic ar,cles  Bookmarks 

SLIDE 42

Results (comparison) 

BibSonomy  Delicious  BibSonomy  CiteULike  Profile‐centric matching  0.0402  0.0014  0.1279  0.0987  Post‐centric matching  0.0259  0.0036  0.1190  0.0455  Hybrid (UBCF + metadata)  0.0218  0.0039  0.0410  0.0608  Hybrid (IBCF + metadata)  0.0399  0.0017  0.1510  0.0746  Best CF run  0.0370  0.0101  0.1100  0.0887  Tag‐aware fusion  0.0474  0.0166  0.1297  0.1268 

Scien,fic ar,cles  Bookmarks 

SLIDE 43

Findings 

Content‐based filtering 
Profile‐level matching bejer than post‐level 
Hybrid filtering 
Item‐based CF with metadata similari/es works best 
No clear winner over all data sets

SLIDE 44

SLIDE 45

Data fusion 

Ques/on 
Can we improve performance by combining different

recommenda/on algorithms? 

Tenta/ve answer: yes! 
Data fusion used in different fields 
Machine learning 
Informa/on retrieval 
Collec/on fusion 
Results fusion

SLIDE 46

Combina/on taxonomy 

Burke (2002) defines seven different techniques 
1. Mixed (all shown together, interleaved) 
2. Switching (pick one, depending on the situa/on) 
3. Feature combina/on (combine sources for a single

algorithm) 

4. Cascade (output of algorithm 1 is input of algorithm 2) 
5. Feature augmenta/on (output alg. 1 is input feature alg. 2) 
6. Meta‐level (model alg. 1 is input for alg. 2) 
7. Weighted combina/on (output combina/on of ≥2 alg.) 
Same as results fusion in IR

SLIDE 47

Why does data fusion work? 

Problem 
Recommenda/on is too complex 
Individual solu/on can never capture this completely 
Solu/on 
Combine different algorithms and data representa/ons 
Each highlights a different aspect of the task 
Overlap between the individual runs is evidence of relevance

SLIDE 48

How do we combine? 

Score‐based fusion 
Different algorithms have different score distribu/ons 
Score normaliza/on into [0, 1] range 
Six standard combina/on techniques from IR 
CombMAX (max score per item) 
CombMIN (min score per item) 
CombMED (median score per item) 
CombSUM (sum of scores per item) 
CombMNZ (sum of scores per item × no. of retrieving runs) 
CombANZ (sum of scores per item ÷ no. of retrieving runs)

SLIDE 49

How do we combine? 

Unweighted vs. weighted combina/on 
“Not all recommenda/on algorithms are created equal!” 
Linear weigh/ng of individual runs 
Weight op/miza/on using random‐restart hillclimbing 
Steps of 0.1 
100 itera/ons 
Using 10‐fold cross‐valida/on

SLIDE 50

What do we combine? 

What aspects of the task can we vary? 
Algorithms 
User‐based CF 
Item‐based CF 
Content‐based filtering (profile‐ and post‐centric matching) 
Hybrid filtering (CF with metadata overlap) 
Data representa/on 
Usage data 
Tags 
Metadata 
Number of runs combined 
Can vary from two to eight

SLIDE 51

What do we combine? 

Run ID  # runs  Descrip,on  Fusion A  2  Best UBCF and IBCF runs with usage data  Fusion B  2  Best UBCF and IBCF runs with taggging data  Fusion C  2  Best CF runs with usage and/or tagging data (A + B) 

SLIDE 52

What do we combine? 

Run ID  # runs  Descrip,on  Fusion A  2  Best UBCF and IBCF runs with usage data  Fusion B  2  Best UBCF and IBCF runs with taggging data  Fusion C  2  Best CF runs with usage and/or tagging data (A + B)  Fusion D   2  Best profile‐centric and post‐centric matching runs  Fusion E   2  Best UBCF and IBCF runs with metadata similarity  Fusion F   2  Best metadata‐based runs (D + E) 

SLIDE 53

What do we combine? 

Run ID  # runs  Descrip,on  Fusion A  2  Best UBCF and IBCF runs with usage data  Fusion B  2  Best UBCF and IBCF runs with taggging data  Fusion C  2  Best CF runs with usage and/or tagging data (A + B)  Fusion D   2  Best profile‐centric and post‐centric matching runs  Fusion E   2  Best UBCF and IBCF runs with metadata similarity  Fusion F   2  Best metadata‐based runs (D + E)  Fusion G   2  Best folksonomic and best metadata‐based run (C + F) 

SLIDE 54

What do we combine? 

Run ID  # runs  Descrip,on  Fusion A  2  Best UBCF and IBCF runs with usage data  Fusion B  2  Best UBCF and IBCF runs with taggging data  Fusion C  2  Best CF runs with usage and/or tagging data (A + B)  Fusion D   2  Best profile‐centric and post‐centric matching runs  Fusion E   2  Best UBCF and IBCF runs with metadata similarity  Fusion F   2  Best metadata‐based runs (D + E)  Fusion G   2  Best folksonomic and best metadata‐based run (C + F)  Fusion H   4  All four best CF runs with usage and/or tagging data (A + B)  Fusion I   4  All four best metadata‐based runs (D + E)  Fusion J   8  All eight best runs (A + B + D + E) 

SLIDE 55

Results 

Run ID  BibSonomy  Delicious  BibSonomy  CiteULike  Fusion A  0.0362  0.0065  0.1017  0.0949  Fusion B   0.0434  0.0105  0.1196  0.0952  Fusion C    0.0482  0.0115  0.1593  0.1278  Fusion D    0.0388  0.0038  0.1303  0.1008  Fusion E    0.0514  0.0051  0.1596  0.0945  Fusion F   0.0494  0.0056  0.1600  0.1136  Fusion G   0.0539  0.0109  0.1539  0.1556  Fusion H   0.0619  0.0092  0.1671  0.1286  Fusion I   0.0565  0.0065  0.1749  0.1188  Fusion J   0.0695  0.0090  0.1983  0.1531 

Scien,fic ar,cles  Bookmarks 

SLIDE 56

Comparison 

BibSonomy  Delicious  BibSonomy  CiteULike  UBCF + usage  0.0277  0.0046  0.0865  0.0757  UBCF + tags  0.0102  0.0017  0.0459  0.0449  IBCF + usage  0.0244  0.0027  0.0737  0.0887  IBCF + tags  0.0370  0.0101  0.1100  0.0814  Content‐based + profile  0.0402  0.0014  0.1279  0.0987  Content‐based + post  0.0259  0.0036  0.1190  0.0455  Hybrid (UBCF + metadata)  0.0218  0.0039  0.0410  0.0608  Hybrid (IBCF + metadata)  0.0399  0.0017  0.1510  0.0746  Best fusion run  0.0695  0.0115  0.1983  0.1556  % Improvement  +72.9%  +13.9%  +31.3%  +57.6% 

Scien,fic ar,cles  Bookmarks 

SLIDE 57

Findings 

Fusion works! But what works best? 
Weighted fusion 
Combining different algorithms 
Combining different data representa/ons 
Combining a higher number of runs 
CombMNZ and CombSUM 
Addi/onal analyses showed that 
Improvements mostly a precision‐enhancing effect 
Due to bejer ranking of documents 
New ques/on: where is the sweet spot? 
Performance vs. computa/on

SLIDE 58

SLIDE 59

Using tag overlap in item‐based CF works well 
Easy to implement/adapt 
Metadata‐based recommenda/on ouen bejer than CF 
Not significantly 
No clear winning algorithm 
Easiest to implement using exis/ng search engine 
Recommender fusion is promising 
Combine runs that cover different aspects 
Weighted fusion works best 
Combining more (but different) runs works bejer

Overall findings 

SLIDE 60

Large‐scale comparison of algorithms 
Online, user‐based evalua/on of algorithms 
Exploring other recommenda/on tasks

Future work 

SLIDE 61

Ques/ons? 

SLIDE 62

Metadata findings 

What did we test in terms of metadata fields? 
Individual intrinsic fields 
All intrinsic fields combined 
All intrinsic fields + all extrinsic fields combined 
Metadata 
All intrinsic metadata combined works best 
Best fields: TAGS, TITLE, AUTHOR, URL, ABSTRACT 
Extrinsic metadata contributes lijle

Comparing Recommenda/on Algorithms for Social Bookmarking

Toine Bogers Royal School of Library and Informa/on Science Copenhagen, Denmark

About me

Outline

Social bookmarking

Web pages, scien/fic ar/cles, books, etc.

Social bookmarking

Content overload

Recommenda/on tasks

Item recommenda/on

certain user

Related work

Collec/ng data

What did we collect?

Scien,fic ar,cles

JOURNAL, AUTHOR, TAGS, URL, etc.

YEAR, INSTITUTION, etc.

Web bookmarks

URL

Filtering

Data sets

Delicious BibSonomy CiteULike BibSonomy

# users 1,243 192 1,322 167 # items 152,698 11,165 38,419 12,982 # tags 42,820 13,233 28,312 5,165 # posts 238,070 29,096 84,637 29,720 Scien,fic ar,cles Bookmarks

Experimental setup

items

Evalua/on

retrieved item

Collabora/ve filtering

generate bejer recommenda/ons?

those items

Collabora/ve filtering

recommenda/ons

Related work

Algorithms

similarity between the neighbor and the ac/ve user

between the neighbor and the ac/ve user’s item

Usage data

Results (usage data)

Scien,fic ar,cles Bookmarks

Tagging data

Results (tagging data)

Scien,fic ar,cles Bookmarks

Findings (tagging data)

Comparison to related work

Comparison to related work

Results

Scien,fic ar,cles Bookmarks

Metadata‐based recommenda/on

recommenda/ons?

Reminder: what did we collect?

Related work

al., 1997; Chirita et al., 2006)

Profile‐centric matching

Profile‐centric matching

Post‐centric matching

Post‐centric matching

Results

Scien,fic ar,cles Bookmarks

Hybrid filtering

Results

Scien,fic ar,cles Bookmarks

Results (comparison)

Scien,fic ar,cles Bookmarks

Results (comparison)

Scien,fic ar,cles Bookmarks

Findings

Data fusion

recommenda/on algorithms?

Combina/on taxonomy

algorithm)

Why does data fusion work?

How do we combine?

How do we combine?

What do we combine?

What do we combine?

What do we combine?

What do we combine?

What do we combine?

Results

Scien,fic ar,cles Bookmarks

Comparing Recommenda/on  Algorithms for Social Bookmarking 

Toine Bogers  Royal School of Library and Informa/on Science  Copenhagen, Denmark 

About me 

Outline 

Social bookmarking 

Web pages, scien/fic ar/cles, books, etc.  

Social bookmarking 

Content overload 

Recommenda/on tasks 

Item recommenda/on 

certain user  

Related work 

Collec/ng data 

What did we collect? 

Scien,fic ar,cles 

JOURNAL, AUTHOR, TAGS,  URL, etc. 

YEAR, INSTITUTION, etc.  

Web bookmarks 

URL 

Filtering 

Data sets 

Delicious  BibSonomy  CiteULike  BibSonomy 

# users  1,243  192  1,322  167  # items  152,698  11,165  38,419  12,982  # tags  42,820  13,233  28,312  5,165  # posts  238,070  29,096  84,637  29,720  Scien,fic ar,cles  Bookmarks 

Experimental setup 

items  

Evalua/on 

retrieved item 

Collabora/ve filtering 

generate bejer recommenda/ons?   

those items 

Collabora/ve filtering 

recommenda/ons 

Related work 

Algorithms 

similarity between the neighbor and the ac/ve user 

between the neighbor and the ac/ve user’s item 

Usage data 

Results (usage data) 

Scien,fic ar,cles  Bookmarks 

Tagging data 

Results (tagging data) 

Scien,fic ar,cles  Bookmarks 

Findings (tagging data) 

Comparison to related work 

Comparison to related work 

Results 

Scien,fic ar,cles  Bookmarks 

Metadata‐based recommenda/on 

recommenda/ons? 

Reminder: what did we collect? 

Related work 

al., 1997; Chirita et al., 2006) 

Profile‐centric matching 

Profile‐centric matching 

Post‐centric matching 

Post‐centric matching 

Results 

Scien,fic ar,cles  Bookmarks 

Hybrid filtering 

Results 

Scien,fic ar,cles  Bookmarks 

Results (comparison) 

Scien,fic ar,cles  Bookmarks 

Results (comparison) 

Scien,fic ar,cles  Bookmarks 

Findings 

Data fusion 

recommenda/on algorithms? 

Combina/on taxonomy 

algorithm) 

Why does data fusion work? 

How do we combine? 

How do we combine? 

What do we combine? 

What do we combine? 

What do we combine? 

What do we combine? 

What do we combine? 

Results 

Scien,fic ar,cles  Bookmarks