ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers - - PowerPoint PPT Presentation
ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers - - PowerPoint PPT Presentation
ComparingRecommenda/on AlgorithmsforSocialBookmarking ToineBogers RoyalSchoolofLibraryandInforma/onScience Copenhagen,Denmark Aboutme Ph.D.fromTilburgUniversity
About me
- Ph.D. from Tilburg University
- “Recommender Systems for Social Bookmarking”
- Promotor: Prof. dr. Antal van den Bosch
- Currently @ RSLIS (Copenhagen, DK)
- Research assistant on retrieval fusion project
- Research interests
- Recommender systems
- Social bookmarking
- Expert search
- Informa/on retrieval
Outline
- 1. Introduc/on
- 2. Collabora/ve filtering
- 3. Content‐based filtering
- 4. Recommender systems fusion
- 5. Conclusions
Social bookmarking
- Way of storing, organizing, and managing bookmarks of
Web pages, scien/fic ar/cles, books, etc.
- All done online
- Can be made public or kept private
- Allow users to tag (= label) their items
- Many different websites available:
Social bookmarking
- Different domains
- Web pages
- Scien/fic ar/cles
- Books
- Strong growth in popularity
- Millions of users, items, and tags
- For example: Delicious
- 140,000+ posts/day on average in 2008 (Keller, 2009)
- 7,000,000+ posts/month in 2008 (Wetzker et al., 2009)
Content overload
- Problems with this growth
- Content overload
- Increasing ambiguity
- How can we deal with this?
- Browsing
- Search
- A possible solu/on
- Take a more ac/ve role: recommenda,on
Can become less effec/ve as content increases!
Recommenda/on tasks
!"#$%"&%'("& )" *+")& ,"-#))"./ 012#.
!"#$%"& $,#3%'.4
*+")& "5$",+6 7#,"& %'("&+8'6& 914& 6:44"62#.
;#)1'. "5$",+6 !",6#.1%'<"0& 6"1,-8
;"$+8& =,#>6'.4 ?@AB *9A7 9CD ?@AB *9A7 9CD !"#$%&$''' ()*&"$+$$'''
Item recommenda/on
- Our focus: item recommenda,on
- Iden/fy sets of items that are likely to be of interest to a
certain user
- Return a ranked list of items
- ‘Find Good Items’ task (Herlocker et al., 2004)
- Based on different informa/on sources
- Transac/on pajerns (usage data, purchase informa/on)
– Explicit ra/ngs – Implicit feedback
- Metadata
- Tags
Related work
- Work on social bookmarking mostly focused on
- Improving browsing experience
- clustering, dealing with ambiguity
- Incorpora/ng tags in search algorithms
- Tag recommenda/on
- Problems with work on item recommenda/on
- Different data sets
- Different evalua/on metrics
- No comparison of algorithms under controlled condi/ons
- Hardly ever publicly available data sets
- No user‐based evalua/on
Collec/ng data
- Four data sets from two different domains
- Web bookmarks
- Delicious
- BibSonomy
- Scien/fic ar/cles
- CiteULike
- BibSonomy
~78% of users posted only type of content (bookmarks or scien/fic ar/cles)
What did we collect?
- Usage data
- User‐item‐tag triples with /mestamps
- Metadata
- Varies with the domain
Scien,fic ar,cles
- Item‐intrinsic
- TITLE, DESCRIPTION,
JOURNAL, AUTHOR, TAGS, URL, etc.
- Item‐extrinsic
- CHAPTER, DAY, EDITION,
YEAR, INSTITUTION, etc.
Web bookmarks
- TITLE, DESCRIPTION, TAGS,
URL
Filtering
- Why?
- To reduce noise in our data sets
- Common procedure in recommender systems research
- How?
- ≥ 20 items per user
- ≥ 2 users per item (no hapax legomena items)
- No untagged posts
- Compared to related work
- Stricter filtering
- More realis/c
Data sets
Delicious BibSonomy CiteULike BibSonomy
# users 1,243 192 1,322 167 # items 152,698 11,165 38,419 12,982 # tags 42,820 13,233 28,312 5,165 # posts 238,070 29,096 84,637 29,720 Scien,fic ar,cles Bookmarks
Experimental setup
- Backtes/ng
- Withhold randomly selected items from test users
- Use remaining material for training recommender system
- Success is predicted the user’s interest in his/her withheld
items
- Details
- Overall 90%‐10% split on users
- Withhold 10 randomly selected items of each test user
- Parameter op/miza/on
- Used 10‐fold cross‐valida/on
- 90‐10 splits
- 10 withheld items
- Macro‐averaging of evalua/on scores
Evalua/on
- ‘Find Good Items’ task returns a ranked list
- Need metric that take into ranking of items
- Precision‐oriented metric
- Mean Average Precision (MAP)
- Average Precision (AP) is average of precision values at each relevant,
retrieved item
- MAP is AP averaged over all users
- “single figure measure of quality across recall levels” (Manning, 2009)
- Tested different metrics
- All precision‐oriented metrics showed the same picture
Collabora/ve filtering
- Ques/on
- How can we use the informa/on in the folksonomy to
generate bejer recommenda/ons?
- Users
- Items
- Tags
- Collabora/ve filtering (CF)
- Ajempts to automate “word‐of‐mouth” recommenda/ons
- Recommend items based on how like‐minded users rated
those items
- Similarity based on
- Usage data
- Tagging data
usage pajerns
Collabora/ve filtering
- Model‐based CF
- ‘Eager’ recommenda/on algorithms
- Train a predic/ve model of the recommenda/on task
- Quick to apply to generate recommenda/ons
- Memory‐based CF
- ‘Lazy’ recommenda/on algorithms
- Simply store all pajerns in memory
- Defer predic/on effort to when user requests
recommenda/ons
Related work
- Model‐based
- Hybrid PLSA‐based approach (Wetzker et al., 2009)
- Tensor decomposi/on (Symeonidis et al., 2008)
- Memory‐based
- Tag‐aware fusion (Tso‐Sujer et al., 2008)
- Graph‐based
- FolkRank (Hotho et al., 2006)
- Random walk (Clements et al., 2008)
Algorithms
- User‐based k‐NN algorithm
- Calculate similarity between the ac/ve user and all other users
- Determine the top k nearest neighbors
- I.e., the most similar users
- Unseen items from nearest neighbors are scored by the
similarity between the neighbor and the ac/ve user
- Item‐based k‐NN algorithm
- Calculate similarity between the ac/ve user’s items and all
- ther items
- Determine the top k nearest neighbors
- I.e., the most similar items for each of the ac/ve user’s items
- Unseen neighboring items are scored by the similarity
between the neighbor and the ac/ve user’s item
Usage data
- Baseline: CF using usage data
- Profile vectors
- User profiles
- Item profiles
- No explicit ra/ngs available
- Only binary informa/on (1 or 0)
- Or rather: unary!
- Similarity metric
- Cosine similarity
- 10‐fold cross‐valua/on to op/mize k
UI items users
Results (usage data)
BibSonomy Delicious BibSonomy CiteULike UBCF + usage data 0.0277 0.0046 0.0865 0.0746 IBCF + usage data 0.0244 0.0027 0.0737 0.0887
Scien,fic ar,cles Bookmarks
- Tags are short topical descrip/ons of an item (or user)
- Profile vectors
- User tag profiles
- Item tag profiles
- Similarity metrics
- Cosine similarity
- Jaccard overlap
- Dice’s coefficient
Tagging data
UT tags users IT tags items
Results (tagging data)
BibSonomy Delicious BibSonomy CiteULike UBCF + usage data 0.0277 0.0046 0.0865 0.0746 IBCF + usage data 0.0244 0.0027 0.0737 0.0887 UBCF + tagging data 0.0102 0.0017 0.0459 0.0449 IBCF + tagging data 0.0370 0.0101 0.1100 0.0814
Scien,fic ar,cles Bookmarks
Findings (tagging data)
- CF with tag overlap
- User‐based CF performs significantly worse
- Item‐based CF performs much bejer
- Ouen sta/s/cally significant improvements
- Except on CiteULike: CF without tags bejer
- Similarity metric rela/vely unimportant
- Cosine similarity slightly bejer
Comparison to related work
- Random walk model (Clements et al., 2008)
- Create transi/on matrix based on tripar/te folksonomy graph
- Similar to FolkRank, but no walks of infinite length
- Walk length n is a parameter
- Tag‐aware fusion (Tso‐Sujer et al., 2008)
- Fusion of algorithms and data representa,ons
- Usage data and tagging data
- User‐based CF extend UI matrix with tags as extra items
- Item‐based CF extend UI matrix with tags as extra users
- User‐based CF and item‐based CF
- Fuse together predic/ons
Comparison to related work
!"#$%&'"#() *+,#$-./ 0,#1%&'"#() *+,#$-./
! "# $#2 %
!"#$" %&#'" &()" %&#'" %&#'" &()" !"#$" &()"
Results
BibSonomy Delicious BibSonomy CiteULike UBCF + usage data 0.0277 0.0046 0.0865 0.0746 IBCF + usage data 0.0244 0.0027 0.0737 0.0887 UBCF + tagging data 0.0102 0.0017 0.0459 0.0449 IBCF + tagging data 0.0370 0.0101 0.1100 0.0814 UBCF + fused data 0.0303 0.0057 0.0829 0.0739 IBCF + fused data 0.0468 0.0125 0.1280 0.1212 Tag‐aware fusion 0.0474 0.0166 0.1297 0.1268 Random walk model 0.0182 0.0003 0.0608 0.0536
Scien,fic ar,cles Bookmarks
Metadata‐based recommenda/on
- Ques/on
- How can we use the metadata to generate (bejer) item
recommenda/ons?
- Content‐based filtering
- Build representa/ons of the content in a system
- Learn a profile of the user’s interests
- Match content representa/ons against the user’s profile
Reminder: what did we collect?
- Two types of metadata
- Intrinsic metadata, i.e., directly rela/ng to the content
- E.g., <TITLE>, <DESCRIPTION>, <JOURNAL>, <AUTHOR>, ...
- Extrinsic metadata, i.e., administra/ve informa/on
- E.g., <PAGES>, <MONTH>, <EDITION>, …
Related work
- Common approaches
- Informa/on retrieval
- Machine learning
- Examples
- TF∙IDF weigh/ng (Lang, 1995; Whitman & Lawrence, 2002)
- Personal informa/on agents (Balabanovic, 1998; Joachims et
al., 1997; Chirita et al., 2006)
- Naive Bayes (Mooney et al., 2000; De Gemmis et al., 2008)
- Linear regression (Alspector et al., 1997)
- Nothing applied to social bookmarking so far!
- Take an IR approach: profile‐centric matching
- Build representa/ons of the content in a system
- All metadata assigned to an item → item profile
- Learn a profile of the user’s interests
- Collate all of user’s metadata into a user profile
- Match and rank item profiles to user profiles
- Language modeling with Jelinek‐Mercer smoothing
- Stopword filtering, no stemming
Profile‐centric matching
Profile‐centric matching
!"#$%$%&'$()*'+",-.)/ 0123)'4/)"'+",-.)/ !"#$%&'(&)*"+(,-.*(/+)0 /$*$.#"$(5 *#(16$%&
7 8 9 : ; < = ; > ;
()/('+#$"/ ("#$%$%&'+#$"/
7 9 : 9 = < = 7 ; : ; > ; < ; 7 8 9 8 : 8 > 8
- Problem
- Big user profile will match nearly anything
- Sacrificing precision for recall
- Different level of granularity: post‐centric matching
- Construct metadata representa/ons of each post
- Match each of the user’s posts against all other posts
- Match, rank, and aggregate all retrieved posts
Post‐centric matching
Post‐centric matching
!"#$%$%&'()*+* ,-./0'1*0"2*'()*+* !"#$%&'()*+,(-.*$/0(*1.,2 *$3$4#"$+5 3#+-6$%&
7 7 7 8
9'9'9
: : : : 8 ; 8 8
9'9'9
, , < ,
+0*+'(#$"* +"#$%$%&'(#$"*
7 , 8 , ; , 8 < = < 7 > ; > ? > = > 7 : 8 : ; : ? :
Results
BibSonomy Delicious BibSonomy CiteULike Profile‐centric matching 0.0402 0.0014 0.1279 0.0987 Post‐centric matching 0.0259 0.0036 0.1190 0.0455
Scien,fic ar,cles Bookmarks
- Problem with post‐centric matching: data sparseness
Hybrid filtering
- Similarity between users and items based on metadata
- Plug these similari/es into standard k‐NN CF approach!
- User‐based CF with metadata‐based similari/es
- Textual similarity between user profiles
- Item‐based CF with metadata‐based similari/es
- Textual similarity between item profiles
Results
BibSonomy Delicious BibSonomy CiteULike Profile‐centric matching 0.0402 0.0014 0.1279 0.0987 Post‐centric matching 0.0259 0.0036 0.1190 0.0455 Hybrid (UBCF + metadata) 0.0218 0.0039 0.0410 0.0608 Hybrid (IBCF + metadata) 0.0399 0.0017 0.1510 0.0746
Scien,fic ar,cles Bookmarks
Results (comparison)
BibSonomy Delicious BibSonomy CiteULike Profile‐centric matching 0.0402 0.0014 0.1279 0.0987 Post‐centric matching 0.0259 0.0036 0.1190 0.0455 Hybrid (UBCF + metadata) 0.0218 0.0039 0.0410 0.0608 Hybrid (IBCF + metadata) 0.0399 0.0017 0.1510 0.0746 Best CF run 0.0370 0.0101 0.1100 0.0887
Scien,fic ar,cles Bookmarks
Results (comparison)
BibSonomy Delicious BibSonomy CiteULike Profile‐centric matching 0.0402 0.0014 0.1279 0.0987 Post‐centric matching 0.0259 0.0036 0.1190 0.0455 Hybrid (UBCF + metadata) 0.0218 0.0039 0.0410 0.0608 Hybrid (IBCF + metadata) 0.0399 0.0017 0.1510 0.0746 Best CF run 0.0370 0.0101 0.1100 0.0887 Tag‐aware fusion 0.0474 0.0166 0.1297 0.1268
Scien,fic ar,cles Bookmarks
Findings
- Content‐based filtering
- Profile‐level matching bejer than post‐level
- Hybrid filtering
- Item‐based CF with metadata similari/es works best
- No clear winner over all data sets
Data fusion
- Ques/on
- Can we improve performance by combining different
recommenda/on algorithms?
- Tenta/ve answer: yes!
- Data fusion used in different fields
- Machine learning
- Informa/on retrieval
- Collec/on fusion
- Results fusion
Combina/on taxonomy
- Burke (2002) defines seven different techniques
- 1. Mixed (all shown together, interleaved)
- 2. Switching (pick one, depending on the situa/on)
- 3. Feature combina/on (combine sources for a single
algorithm)
- 4. Cascade (output of algorithm 1 is input of algorithm 2)
- 5. Feature augmenta/on (output alg. 1 is input feature alg. 2)
- 6. Meta‐level (model alg. 1 is input for alg. 2)
- 7. Weighted combina/on (output combina/on of ≥2 alg.)
- Same as results fusion in IR
Why does data fusion work?
- Problem
- Recommenda/on is too complex
- Individual solu/on can never capture this completely
- Solu/on
- Combine different algorithms and data representa/ons
- Each highlights a different aspect of the task
- Overlap between the individual runs is evidence of relevance
How do we combine?
- Score‐based fusion
- Different algorithms have different score distribu/ons
- Score normaliza/on into [0, 1] range
- Six standard combina/on techniques from IR
- CombMAX (max score per item)
- CombMIN (min score per item)
- CombMED (median score per item)
- CombSUM (sum of scores per item)
- CombMNZ (sum of scores per item × no. of retrieving runs)
- CombANZ (sum of scores per item ÷ no. of retrieving runs)
How do we combine?
- Unweighted vs. weighted combina/on
- “Not all recommenda/on algorithms are created equal!”
- Linear weigh/ng of individual runs
- Weight op/miza/on using random‐restart hillclimbing
- Steps of 0.1
- 100 itera/ons
- Using 10‐fold cross‐valida/on
What do we combine?
- What aspects of the task can we vary?
- Algorithms
- User‐based CF
- Item‐based CF
- Content‐based filtering (profile‐ and post‐centric matching)
- Hybrid filtering (CF with metadata overlap)
- Data representa/on
- Usage data
- Tags
- Metadata
- Number of runs combined
- Can vary from two to eight
What do we combine?
Run ID # runs Descrip,on Fusion A 2 Best UBCF and IBCF runs with usage data Fusion B 2 Best UBCF and IBCF runs with taggging data Fusion C 2 Best CF runs with usage and/or tagging data (A + B)
What do we combine?
Run ID # runs Descrip,on Fusion A 2 Best UBCF and IBCF runs with usage data Fusion B 2 Best UBCF and IBCF runs with taggging data Fusion C 2 Best CF runs with usage and/or tagging data (A + B) Fusion D 2 Best profile‐centric and post‐centric matching runs Fusion E 2 Best UBCF and IBCF runs with metadata similarity Fusion F 2 Best metadata‐based runs (D + E)
What do we combine?
Run ID # runs Descrip,on Fusion A 2 Best UBCF and IBCF runs with usage data Fusion B 2 Best UBCF and IBCF runs with taggging data Fusion C 2 Best CF runs with usage and/or tagging data (A + B) Fusion D 2 Best profile‐centric and post‐centric matching runs Fusion E 2 Best UBCF and IBCF runs with metadata similarity Fusion F 2 Best metadata‐based runs (D + E) Fusion G 2 Best folksonomic and best metadata‐based run (C + F)
What do we combine?
Run ID # runs Descrip,on Fusion A 2 Best UBCF and IBCF runs with usage data Fusion B 2 Best UBCF and IBCF runs with taggging data Fusion C 2 Best CF runs with usage and/or tagging data (A + B) Fusion D 2 Best profile‐centric and post‐centric matching runs Fusion E 2 Best UBCF and IBCF runs with metadata similarity Fusion F 2 Best metadata‐based runs (D + E) Fusion G 2 Best folksonomic and best metadata‐based run (C + F) Fusion H 4 All four best CF runs with usage and/or tagging data (A + B) Fusion I 4 All four best metadata‐based runs (D + E) Fusion J 8 All eight best runs (A + B + D + E)
Results
Run ID BibSonomy Delicious BibSonomy CiteULike Fusion A 0.0362 0.0065 0.1017 0.0949 Fusion B 0.0434 0.0105 0.1196 0.0952 Fusion C 0.0482 0.0115 0.1593 0.1278 Fusion D 0.0388 0.0038 0.1303 0.1008 Fusion E 0.0514 0.0051 0.1596 0.0945 Fusion F 0.0494 0.0056 0.1600 0.1136 Fusion G 0.0539 0.0109 0.1539 0.1556 Fusion H 0.0619 0.0092 0.1671 0.1286 Fusion I 0.0565 0.0065 0.1749 0.1188 Fusion J 0.0695 0.0090 0.1983 0.1531
Scien,fic ar,cles Bookmarks
Comparison
BibSonomy Delicious BibSonomy CiteULike UBCF + usage 0.0277 0.0046 0.0865 0.0757 UBCF + tags 0.0102 0.0017 0.0459 0.0449 IBCF + usage 0.0244 0.0027 0.0737 0.0887 IBCF + tags 0.0370 0.0101 0.1100 0.0814 Content‐based + profile 0.0402 0.0014 0.1279 0.0987 Content‐based + post 0.0259 0.0036 0.1190 0.0455 Hybrid (UBCF + metadata) 0.0218 0.0039 0.0410 0.0608 Hybrid (IBCF + metadata) 0.0399 0.0017 0.1510 0.0746 Best fusion run 0.0695 0.0115 0.1983 0.1556 % Improvement +72.9% +13.9% +31.3% +57.6%
Scien,fic ar,cles Bookmarks
Findings
- Fusion works! But what works best?
- Weighted fusion
- Combining different algorithms
- Combining different data representa/ons
- Combining a higher number of runs
- CombMNZ and CombSUM
- Addi/onal analyses showed that
- Improvements mostly a precision‐enhancing effect
- Due to bejer ranking of documents
- New ques/on: where is the sweet spot?
- Performance vs. computa/on
- Using tag overlap in item‐based CF works well
- Easy to implement/adapt
- Metadata‐based recommenda/on ouen bejer than CF
- Not significantly
- No clear winning algorithm
- Easiest to implement using exis/ng search engine
- Recommender fusion is promising
- Combine runs that cover different aspects
- Weighted fusion works best
- Combining more (but different) runs works bejer
Overall findings
- Large‐scale comparison of algorithms
- Online, user‐based evalua/on of algorithms
- Exploring other recommenda/on tasks
Future work
Ques/ons?
Metadata findings
- What did we test in terms of metadata fields?
- Individual intrinsic fields
- All intrinsic fields combined
- All intrinsic fields + all extrinsic fields combined
- Metadata
- All intrinsic metadata combined works best
- Best fields: TAGS, TITLE, AUTHOR, URL, ABSTRACT
- Extrinsic metadata contributes lijle