Similarity Ranking in Large- Scale Bipartite Graphs Alessandro - - PowerPoint PPT Presentation

similarity ranking in large scale bipartite graphs
SMART_READER_LITE
LIVE PREVIEW

Similarity Ranking in Large- Scale Bipartite Graphs Alessandro - - PowerPoint PPT Presentation

Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads Our Goal Tackling AdWords


slide-1
SLIDE 1

Similarity Ranking in Large- Scale Bipartite Graphs

Brown University - 20th March 2014

1

Alessandro Epasto

slide-2
SLIDE 2

2

Joint work with J. Feldman, S. Lattanzi,

  • S. Leonardi, V. Mirrokni [WWW, 2014]
slide-3
SLIDE 3

AdWords

Ads Ads

slide-4
SLIDE 4

Our Goal

  • Tackling AdWords data to identify

automatically, for each advertiser, its main competitors and suggest relevant queries to each advertiser.

  • Goals:
  • Useful business information.
  • Improve advertisement.
  • More relevant performance benchmarks.
slide-5
SLIDE 5

The Data

Large advertisers (e.g., Amazon, Ask.com, etc) compete in several market segments with very different advertisers.

Query Information

Nike store New York Market Segment: Retailer, Geo: NY (USA), Stats: 10 clicks Soccer shoes Market Segment: Apparel, Geo: London, UK, Stats: 4 clicks Soccer ball Market Segment: Equipment Geo: San Francisco (USA), Stats: 5 clicks

…. millions of other queries ….

slide-6
SLIDE 6

Modeling the Data as a Bipartite Graph

Millions of Advertisers Billions of Queries Hundreds of Labels

slide-7
SLIDE 7

Other Applications

  • General approach applicable to several

contexts:

  • User, Movies, Categories: find similar

users and suggest movies.

  • Authors, Papers, Conferences: find

related authors and suggest papers to read.

  • Generally this bipartite graphs are lopsided:

we want algorithms with complexity depending on the smaller side.

slide-8
SLIDE 8

Semi-Formal Problem Definition

Advertisers Queries

slide-9
SLIDE 9

Semi-Formal Problem Definition

A

Advertisers Queries

slide-10
SLIDE 10

Semi-Formal Problem Definition

A

Advertisers Queries Labels:

slide-11
SLIDE 11

Semi-Formal Problem Definition

A

Advertisers Queries Labels:

Goal: Find the nodes most “similar” to A.

slide-12
SLIDE 12

How to Define Similarity?

  • We address the computation of several node

similarity measures:

  • Neighborhood based: Common neighbors,

Jaccard Coefficient, Adamic-Adar.

  • Paths based: Katz.
  • Random Walk based: Personalized PageRank.
  • What is the accuracy?
  • Can it scale to huge graphs?
  • Can be computed in real-time?
slide-13
SLIDE 13

Our Contribution

  • Reduce and Aggregate: general approach to

induce real-time similarity rankings in multi- categorical bipartite graphs, that we apply to several similarity measures.

  • Theoretical guarantees for the precision of the

algorithms.

  • Experimental evaluation with real world data.
slide-14
SLIDE 14

Personalized PageRank

v u The stationary distribution assigns a similarity score to each node in the graph w.r.t. node v.

  • For a node v (the seed) and a probability alpha
slide-15
SLIDE 15

Personalized PageRank

  • Extensive algorithmic literature.
  • Very good accuracy in our experimental

evaluation compared to other similarities (Jaccard, Intersection, etc.).

  • Efficient MapReduce algorithm scaling to

large graphs (hundred of millions of nodes).

However…

slide-16
SLIDE 16

Personalized PageRank

  • Our graphs are too big (billions of nodes) even for

large-scale systems.

  • MapReduce is not real-time.
  • We cannot pre-compute the rankings for each

subset of labels.

slide-17
SLIDE 17

Reduce and Aggregate

Reduce: Given the bipartite and a category construct a graph with only A nodes that preserves the ranking on the entire graph.

  • Aggregate: Given a node v in A and the reduced

graphs of the subset of categories interested determine the ranking for v.

slide-18
SLIDE 18

In practice

  • First stage: Large-scale (but feasible)

MapReduce pre-computation of the individual category reduced graphs.

  • Second Stage: Fast real-time algorithm

aggregation algorithm.

slide-19
SLIDE 19

Reduce for Personalized PageRank

  • Markov Chain state aggregation theory

(Simon and Ado, ’61; Meyer ’89, etc.).

  • 750x reduction in the number of node

while preserving correctly the PPR distribution on the entire graph.

Side A Side B Side A

slide-20
SLIDE 20

Stochastic Complementation

  • The stochastic complement of is the

following matrix

Ci Si = Pii + Pi∗(1 − P ∗

i )−1P∗i

|Ci| × |Ci|

  • P11

. . . P1i . . . P1k . . . . . . . . . . . . . . . Pi1 . . . Pii . . . Pik . . . . . . . . . . . . . . . Pk1 . . . Pki . . . Pkk

slide-21
SLIDE 21

Stochastic Complementation

  • The stochastic complement of is the

following matrix

Ci Si = Pii + Pi∗(1 − P ∗

i )−1P∗i

|Ci| × |Ci|

  • P11

. . . P1i . . . P1k . . . . . . . . . . . . . . . Pi1 . . . Pii . . . Pik . . . . . . . . . . . . . . . Pk1 . . . Pki . . . Pkk

slide-22
SLIDE 22

Stochastic Complementation

  • The stochastic complement of is the

following matrix

Ci Si = Pii + Pi∗(1 − P ∗

i )−1P∗i

|Ci| × |Ci|

  • P11

. . . P1i . . . P1k . . . . . . . . . . . . . . . Pi1 . . . Pii . . . Pik . . . . . . . . . . . . . . . Pk1 . . . Pki . . . Pkk

slide-23
SLIDE 23

Stochastic Complementation

  • The stochastic complement of is the

following matrix

Ci Si = Pii + Pi∗(1 − P ∗

i )−1P∗i

|Ci| × |Ci|

  • P11

. . . P1i . . . P1k . . . . . . . . . . . . . . . Pi1 . . . Pii . . . Pik . . . . . . . . . . . . . . . Pk1 . . . Pki . . . Pkk

slide-24
SLIDE 24

Stochastic Complementation

  • The stochastic complement of is the

following matrix

Ci Si = Pii + Pi∗(1 − P ∗

i )−1P∗i

|Ci| × |Ci|

  • P11

. . . P1i . . . P1k . . . . . . . . . . . . . . . Pi1 . . . Pii . . . Pik . . . . . . . . . . . . . . . Pk1 . . . Pki . . . Pkk

slide-25
SLIDE 25

Stochastic Complementation

πi = tisi

Theorem [Meyer ’89] For every irreducible aperiodic Markov Chain, where is the stationary distribution of the nodes in and is the stationary distribution of

πi

Ci

si

Si

slide-26
SLIDE 26

Stochastic Complementation

  • Computing the stochastic complements is

unfeasible in general for large matrices (matrix inversion).

  • In our case we can exploit the properties
  • f random walks on Bipartite graphs to

invert the matrix analytically.

slide-27
SLIDE 27

Reduce for PPR

y x

z

Side A Side B

w(x, z) w(y, z)

slide-28
SLIDE 28

Reduce for PPR

y x

z

Side A Side B

w(x, z) w(y, z)

w(x, y) w(x, y) = X

z∈N(x)∪N(y)

w(x, z)w(y, z) P

h∈N(z)w(z,h)

slide-29
SLIDE 29

Reduce for PPR

y x

z

One step in the reduced graph is equivalent to two steps in the bipartite graph.

w(x, z) w(y, z)

w(x, y)

Side A Side B

slide-30
SLIDE 30

Properties of the Reduced Graph

Lemma 1: PPR(G, α, a)[A] =

1 2−αPPR( ˆ

G, 2α − α2, a)

Proof Sketch:

  • Every path between nodes in A is even.
  • Probability of not jumping for two steps.
  • The probability of being in the A-Side at

stationarity does not depend on the graph.

slide-31
SLIDE 31

Properties of the Reduced Graph

Lemma 2: PPR(G, α, a)[B] = 1−α

2−α

P

b∈N(a) w(a, b)PPR( ˆ

GB, 2α − α2, b)

Similarly, we can reduce the process to a graph with B-Side nodes only. Finally, the stationary distribution of either side uniquely determines that of the other side.

slide-32
SLIDE 32

Koury et al. Aggregation-Disaggregation Algorithm

Step 1: Partition the Markov chain into disjoint subsets

A B

slide-33
SLIDE 33

Koury et al. Aggregation-Disaggregation Algorithm

Step 2: Approximate the stationary distribution on each subset independently.

πA

πB

A B

slide-34
SLIDE 34

Koury et al. Aggregation-Disaggregation Algorithm

Step 3: Compute the k x k approximated transition matrix T between the subsets.

πA

PAB PBA PBB PAA

A B

πB

slide-35
SLIDE 35

Koury et al. Aggregation-Disaggregation Algorithm

Step 4: Compute the stationary distribution of T.

πA

PAB PBA PBB PAA

A B

πB

tA tB

slide-36
SLIDE 36

Koury et al. Aggregation-Disaggregation Algorithm

Step 5: Based on the stationary distribution improve the estimation of and . Repeat until convergence.

PAB PBA PBB PAA

A B

tA

πA

πB tB π0

B

π0

A

slide-37
SLIDE 37

Aggregation in PPR

X Y

Precompute the stationary distributions individually

πA

A

slide-38
SLIDE 38

Aggregation in PPR

X Y

Precompute the stationary distributions individually

πB

B

slide-39
SLIDE 39

Aggregation in PPR

The two subsets are not disjoint!

A B

slide-40
SLIDE 40

Reduction to the Query Side

X Y

πA

πB

slide-41
SLIDE 41

Reduction to the Query Side

X Y

This is the larger side of the graph.

πA

πB

slide-42
SLIDE 42

Our Approach

X Y X Y

  • We tackle the bijective relationships between the

stationary distributions of the two sides.

  • The algorithm is based only on the reduced graphs

with Advertiser-Side nodes.

  • The aggregation algorithm is scalable and

converges to the correct distribution.

πA

πB

slide-43
SLIDE 43

Experimental Evaluation

  • We experimented with publicly available and

proprietary datasets:

  • Query-Ads graph from Google AdWords > 1.5

billions nodes, > 5 billions edges.

  • DBLP Author-Papers and Patent Inventor-

Inventions graphs.

  • Ground-Truth clusters of competitors in Google

AdWords.

slide-44
SLIDE 44

Patent Graph

Recall Precision

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Precision vs Recall Inter Jaccard Adamic-Adar Katz PPR

slide-45
SLIDE 45

Google AdWords

Recall Precision

slide-46
SLIDE 46

Convergence after One Iteration

slide-47
SLIDE 47

Convergence

Iterations 1-Cosine Similarity

1e-06 1e-05 0.0001 0.001 2 4 6 8 10 12 14 16 18 20 1-Cosine Iterations Approximation Error vs # Iterations DBLP (1 - Cosine) Patent (1 - Cosine)

slide-48
SLIDE 48

Conclusions and Future Work

  • Good accuracy and fast convergence.
  • The framework can be applied to other

problems and similarity measures.

  • Future work could focus on the case

where categories are not disjoint is relevant.

slide-49
SLIDE 49

Thank you for your attention