Growing a Graph Matching from a Handful of Seeds
Ehsan KAZEMI1, S. Hamed HASSANI2, and Matthias GROSSGLAUSER1
1School of Computer and Communication Sciences, EPFL 2Department of Computer Science, ETHZ
September 1, 2015
Growing a Graph Matching from a Handful of Seeds Ehsan KAZEMI 1 , S. - - PowerPoint PPT Presentation
Growing a Graph Matching from a Handful of Seeds Ehsan KAZEMI 1 , S. Hamed HASSANI 2 , and Matthias GROSSGLAUSER 1 1 School of Computer and Communication Sciences, EPFL 2 Department of Computer Science, ETHZ September 1, 2015 Motivation Example
Ehsan KAZEMI1, S. Hamed HASSANI2, and Matthias GROSSGLAUSER1
1School of Computer and Communication Sciences, EPFL 2Department of Computer Science, ETHZ
September 1, 2015
Example 1: network de-anonymization
x@epfl.ch y@epfl.ch z@epfl.ch
Anonymized e-mail network
Hamed@epfl.ch Matthias@epfl.ch Ehsan@epfl.ch
Linkedin connections
Example 2: protein-protein interaction network alignment
Q07890 P55957 P04637 Q92934 P06436 P62805 P00742 Q8WUU5 Q9Y365 O60271
Human network
P46108 P01127 P58391 P62806 O88947 Q920S3 Q9JMD3 Q58A65
Mouse network
1/18
Graph matching (also known as network reconciliation or network alignment) is studied in many fields: Network analysis: matching networks in similar domains for friend suggestion and personalized advertisements Bioinformatics: protein-protein interaction networks alignment Document and Image processing: OCR and handwritten recognition Biometric identification: face authentication and recognition Image database: matching graph segments of two scenes
Matching graph segments of scenes [Lazebnik et al., 2006]
2/18
Goal: find the unknown matching (bijection) between nodes in the intersection of the two graphs G1(V1, E1) and G2(V2, E2) where the presence of edges between the same nodes in the two graphs are correlated Questions: When is it possible to align? How to align? graph matching algorithms
Is it possible to use only the graph structures to establish the true matching between the nodes?
3/18
Algorithm: percolation graph matching [Yartseva and Grossglauser, 2013; Chiasserini et al., 2014; Korula and Lattanzi, 2014] Model: a random bigraph generator [Pedarsani and Grossglauser, 2011; Kazemi et al., 2015] Performance guarantee: theory of bootstrap percolation over random graphs [Janson et al., 2010]
4/18
An initial candidate set of seed pairs Every non-matched pair with r neighbouring seed-pairs get matched and becomes a new seed
5/18
An initial candidate set of seed pairs Every non-matched pair with r neighbouring seed-pairs get matched and becomes a new seed
5/18
An initial candidate set of seed pairs Every non-matched pair with r neighbouring seed-pairs get matched and becomes a new seed
5/18
An initial candidate set of seed pairs Every non-matched pair with r neighbouring seed-pairs get matched and becomes a new seed
5/18
An initial candidate set of seed pairs Every non-matched pair with r neighbouring seed-pairs get matched and becomes a new seed Size of the final matching vs. number of initial seeds
5/18
Bi(G; t, s) is a random bigraph model to generate two correlated graphs
G(V, E)
Node sampling Edge sampling
G1(V1, E1) G2(V2, E2)
6/18
Marks are spread over the tensor product of the two graphs:
Green nodes are correct pairs Red nodes are wrong pairs Green nodes are more connected
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
n nodes
(u1, u2) (u1, u3) (u1, u4) (u2, u1) (u2, u3) (u2, u4) (u3, u1) (u3, u2) (u3, u4) (u4, u1) (u4, u2) (u4, u3)
n2 − n nodes
7/18
Marks are spread over the tensor product of the two graphs:
Green nodes are correct pairs Red nodes are wrong pairs Green nodes are more connected
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
n nodes
(u1, u2) (u1, u3) (u1, u4) (u2, u1) (u2, u3) (u2, u4) (u3, u1) (u3, u2) (u3, u4) (u4, u1) (u4, u2) (u4, u3)
n2 − n nodes
(u1, u1) (u3, u3)
7/18
Marks are spread over the tensor product of the two graphs:
Green nodes are correct pairs Red nodes are wrong pairs Green nodes are more connected
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
n nodes
(u1, u2) (u1, u3) (u1, u4) (u2, u1) (u2, u3) (u2, u4) (u3, u1) (u3, u2) (u3, u4) (u4, u1) (u4, u2) (u4, u3)
n2 − n nodes
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
7/18
Supercritical regime: percolates to whole network
PGM Seed set Matched set
Subcritical regime: dies young
PGM Seed set Matched set
8/18
State-of-the-art PGM algorithms needs many seeds: with even moderate number of seeds percolation stuck in early steps Finding many seeds is difficult and expensive Observation: PGM is robust to the noise
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
n nodes
(u1, u2) (u1, u3) (u1, u4) (u2, u1) (u2, u3) (u2, u4) (u3, u1) (u3, u2) (u3, u4) (u4, u1) (u4, u2) (u4, u3)
n2 − n nodes
9/18
State-of-the-art PGM algorithms needs many seeds: with even moderate number of seeds percolation stuck in early steps Finding many seeds is difficult and expensive Observation: PGM is robust to the noise
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
n nodes
(u1, u2) (u1, u3) (u1, u4) (u2, u1) (u2, u3) (u2, u4) (u3, u1) (u3, u2) (u3, u4) (u4, u1) (u4, u2) (u4, u3)
n2 − n nodes
(u1, u2) (u1, u4) (u2, u4) (u3, u4) (u1, u1) (u3, u3)
9/18
State-of-the-art PGM algorithms needs many seeds: with even moderate number of seeds percolation stuck in early steps Finding many seeds is difficult and expensive Observation: PGM is robust to the noise
(u4, u4) (u1, u1) (u3, u3) (u2, u2)
n nodes
(u1, u2) (u1, u3) (u1, u4) (u2, u1) (u2, u3) (u2, u4) (u3, u1) (u3, u2) (u3, u4) (u4, u1) (u4, u2) (u4, u3)
n2 − n nodes
(u1, u2) (u1, u4) (u2, u4) (u3, u4) (u4, u4) (u1, u1) (u3, u3) (u2, u2)
9/18
Addition of many wrong pairs to the initial candidate set have a negligible effect on the performance of NoisySeeds
NoisySeeds Expand Seed set Expanded noisy seed set Matched set Matched set
10/18
Theorem (Performance Guarantee over Bi(G(n, p); t, s)) For Bi(G(n, p); t, s) with fixed s and t assume n−1 ≪ p ≤ n− 5
6 −ǫ,
provided a seed set of
r ) (r − 1)! nt2 ( ps2 ) r
1
r −1 correct pairs
with high probability NoisySeeds percolates and outputs nt2 ± o(n) correct pairs
11/18
A heuristic based on the idea of robustness to noisy pairs
Percolation process is stuck Node u is matched (correctly)
u u u1 u2 u3 u1 u2 u3 u4 u5 12/18
Unmatched neighbouring pairs of node-pair [u, u] are new candidate pairs Two graphs are correlated: among new candidate pairs a small fraction is correct, e.g, [u1, u1] PGM is robust to the noise in candidate pairs
u3 u3 u1 u1 u2 u2 u4 u5 u u G1 G2
13/18
Expand the candidate pairs by many noisy pairs whenever the percolation process stuck
NoisySeeds Seed set Matched set
14/18
Expand the candidate pairs by many noisy pairs whenever the percolation process stuck
NoisySeeds Seed set Matched set
14/18
Expand the candidate pairs by many noisy pairs whenever the percolation process stuck
NoisySeeds Seed set Matched set Expand Expanded noisy candidate set
14/18
Expand the candidate pairs by many noisy pairs whenever the percolation process stuck
NoisySeeds Seed set Matched set Expand Expanded noisy candidate set
14/18
Expand the candidate pairs by many noisy pairs whenever the percolation process stuck
NoisySeeds Seed set Matched set Expand Expanded noisy candidate set
14/18
Expand the candidate pairs by many noisy pairs whenever the percolation process stuck
NoisySeeds Seed set Matched set Expand Expanded noisy candidate set
14/18
ExpandWhenStuck vs. PercolateMatche [Yartseva and Grossglauser, 2013] over Bi(G(n, p); t, s) with n = 106, p = 20
n and t2 = 1.0
✵ ✷✵✵✵✵✵ ✹✵✵✵✵✵ ✻✵✵✵✵✵ ✽✵✵✵✵✵ ✶❡✰✵✻ ✺ ✶✵ ✶✺ ✷✵ ✷✺ ✸✵ ✸✺ ✹✵ ✹✺ ✺✵ ❚♦t❛❧ ♥✉♠❜❡r ♦❢ ♠❛t❝❤❡❞ ♣❛✐rs ◆✉♠❜❡r ♦❢ s❡❡❞s P❡r❝♦❧❛t❡▼❛t❝❤❡❞ 1906 s❡❡❞s ❢♦r s2 = 0.81 3052 s❡❡❞s ❢♦r s2 = 0.64 5207 s❡❡❞s ❢♦r s2 = 0.49 ❊①♣❛♥❞❲❤❡♥❙t✉❝❦✱ s2 = 0.81 ❊①♣❛♥❞❲❤❡♥❙t✉❝❦✱ s2 = 0.64 ❊①♣❛♥❞❲❤❡♥❙t✉❝❦✱ s2 = 0.49
238 times improvement for s2 = 0.81
15/18
ExpandWhenStuck vs. PercolateMatched[Yartseva and Grossglauser, 2013] and User–Matching [Korula and Lattanzi, 2014] over Gowalla network.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 20 40 60 80 100 120 140 160 180 200 F1–Score Number of Seeds ExpandWhenStuck PercolateMatched User–Matching 16/18
ExpandWhenStuck vs. state-of-the-art PPI network alignment algorithms∗
✵ ✶✵✵ ✷✵✵ ✸✵✵ ✹✵✵ ✺✵✵ ✻✵✵ ✼✵✵ ✵ ✵✳✵✺ ✵✳✶ ✵✳✶✺ ✵✳✷ ✵✳✷✺ ✵✳✸
■❈❙ ❙❝♦r❡
❊①♣❛♥❞❲❤❡♥❙t✉❝❦ ■s♦❘❛♥❦ ❙P■◆❆▲ ■ ❙P■◆❆▲ ■■ P■◆❆▲❖●
Access: http://proper.epfl.ch
∗ E. Kazemi, S. H. Hassani, H. Pezeshgi Modarres and M. Grossglauser. “ProPer: Global Protein-Protein Interaction Network Alignment with Percolation Graph-Matching.” Submitted to Bioinformatics. 17/18
Graph matching has applications in many fields Percolation graph matching
ExpandWhenStuck: a fast and accurate algorithm
MapReduce implementation: a variant of ExpandWhenStuck
analysis of a simplified version of ExpandWhenStuck a phase transition result
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
Initial candidate set
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
Spread marks from the candidate set
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
Percolation Graph Matching
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
Expand When Stuck!:-)
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
Expand When Stuck!:-) Fuel the percolation process!:-)
18/18
Input: G1(V1, E1), G2(V2, E2), seed set A0 of correct pairs Output: The set of matched pairs M A ← A0 is the initial set of seed pairs, M ← A0; Z ← ∅ is the set of used pairs; while |A| > 0 do for all pairs [i, j] ∈ A do add the pair [i, j] to Z and add one mark to all of its neighbouring pairs; while there exists an unmatched pair with score at least 2 do among the pairs with the highest score select the unmatched pair [i, j] with the minimum |d1,i − d2,j|; add [i, j] to the set M; if [i, j] / ∈ Z then add one mark to all of its neighbouring pairs and add the pair [i, j] to Z; A ← all neighbouring pairs [i, j] of matched pairs M s.t. [i, j] / ∈ Z, i / ∈ V1(M) and j / ∈ V2(M); return M;
Percolation Graph Matching
18/18
A parallelized variant of ExpandWhenStuck
Job1: spread
candidate pairs Graphs Seed set Job2: filter out the node pairs is percolation stuck? Job3: spread out marks from new matched pairs Job4: expand to generate new candidate pairs no yes
18/18
1 In the beginning of NoisySeeds, all the pairs in the noisy seed
set spread out marks to their neighbouring pairs. Lemma At the completion time of initial phase, the expected number of wrongly matched pairs is o(at,s,r).
2 Percolation graph matching process continues in at most
min(|V1|, |V2|) more steps. Lemma When PGM stops, the expected number of wrongly matched pairs is o(at,s,r).
18/18
3 Using Markov’s inequality, we find an upper bound for the
number of wrongly matched pairs. Lemma With high probability, the total number of wrongly matched pairs at any time step is o(at,s,r). Idea: To apply the theory of bootstrap percolation in G(nt2, ps2) [Janson et al, 2010] over Bi(G(n, p); t, s) graphs.
18/18
Bi(G(n, p); t, s) with n = 106 and p = 20
n
2e+05 4e+05 6e+05 8e+05 1e+06 0.8 0.9 1 1.1 1.2 1.3 1.4 Total number of correct matched pairs Number of correct seeds Λ(A0) / at,s,r t2 = 1.0, s2 = 1.0 t2 = 1.0, s2 = 0.81 t2 = 1.0, s2 = 0.64 t2 = 0.81, s2 = 1.0 t2 = 0.81, s2 = 0.81 t2 = 0.81, s2 = 0.64 t2 = 0.64, s2 = 1.0 t2 = 0.64, s2 = 0.81 t2 = 0.64, s2 = 0.64
18/18
ExpandWhenStuck over Bi(Slashdot network; t, s) when the number of seeds is 10. Combine precision and recall in one metric: F1–score = 2precision × recall precision + recall
N
e
e r l a p p r
a b i l i t y t
2
0.5 0.6 0.7 0.8 0.9 1.0 Edge overlap probability s2 0.5 0.6 0.7 0.8 0.9 1.0 F1–score 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
For results regarding random graph models such as Barab´ asi-Albert, Chung-Lu and Erd¨
enyi please refer to the paper. 18/18
ExpandWhenStuck vs. PercolateMatched over EPFL e-mail exchange network.
18/18