Deep Learning on Graphs
- Prof. Kuan-Ting Lai
National Taipei University of Technology 2019/11/27
Deep Learning on Graphs Prof. Kuan-Ting Lai National Taipei - - PowerPoint PPT Presentation
Deep Learning on Graphs Prof. Kuan-Ting Lai National Taipei University of Technology 2019/11/27 Graphs (Networks) Ubiquitous in our life Ex: the Internet, Social Networks, Protein-interaction Graph + Deep Learning Graph Terminology
National Taipei University of Technology 2019/11/27
https://slideplayer.com/slide/7806012/ Marshall Shepherd,
5
Zhang et al., “Deep Learning on Graphs: A Survey,” 2018
− DeepWalk (2014), LINE (2015), node2vec (2016), DRNE (2018),...
− Bruna et al. (2014), Atwood & Towsley (2016), Niepert et al. (2016), Defferrard et al. (2016), Kipf & Welling (2017),…
8
9
representations of words and phrases and their compositionality." In Advances in neural information processing systems, pp. 3111-3119. 2013.
https://towardsdatascience.com/mapping-word-embeddings-with-word2vec-99a799dc9695
11
12
13
− Micro-F1 − Macro-F1
16
− Use d-smallest eigenvectors of normalized graph Laplacian of G − Assume that graph cuts are useful for classification
− Select top-d eigenvectors of modular graph partitions of G − Assume that modular graph partitions are useful for classification
− Use k-means to cluster the adjacency matrix of G
− Weighted-vote Relational Neighbor
− The most frequent label
17
18
19
20
21
22
23
BlogCatalog Protein-Protein Interactions (PPI) Wikipedia Vertices 10,312 3,890 4,777 Edges 333,983 76,584 184,812 Groups (Labels) 39 50 40
distance from the source nodes.
− Vertex 6 and 7 should be embedded closely as they are connected via a strong tie. − Vertex 5 and 6 should also be placed closely as they share similar neighbors.
24
25
Embedding with Regular Equivalence,” KDD, 2018
26
27
cities of 4 countries
and flights
centralities of 1, 1, 2, and 5
− N: number of nodes − D: dimension of input features
− 𝐸−1
2𝐵𝐸−1 2, D is the degree matrix
− ሚ 𝐵 = 𝐵 + 𝐽
2 ሚ
2𝐼(𝑚)𝑋(𝑚)
2019.10.28 Student: Yu-Chi Chen(Judy) Advisors: Prof. Ming-Syan Chen, Kuan-Ting Lai
categorizes the accompanying text
318.2 319.4 334.1 344.3 344.5 360.5 389.3 389.3 389.5 396.5 424 426.9 458.5 659.6 1165 500 1000 1500 summer selfie me follow picoftheday followme cute like4like tbt happy beautiful fashion photooftheday instagood love Million Hashtags
Latest stats: izea.com/2018/06/07/top-instagram-hashtags-2018
#ootn #ootd #tbt
#FromWereIStand
#Selfie #EvaChenPose
− Classify test classes Z with zero labeled data (Zero-shot!)
− Use attributes (semantic features)
Prediction for Images,” ACM SIGKDD, 2015 (Facebook)
Embedding Model
model
multiplicative model
− Given information of IG posts, including images and texts, the goal is to recommend corresponding hashtags.
− Use multiple types of input and implement graph convolution network for hashtag recommendation.
− Every post has some attributes: post_id, words, hashtags, user_id, images.
Average posts of a user.
48
49
Based on images Based on text Based on multimodal data
50
Based on images Based on text Based on multimodal data Statistical tagging patterns: Sigurbjo ̈rnsson, B., and Van Zwol, R. 2008. Flickr tag recommendation based on collective knowledge. In WWW, 327–336.
51
Based on images Based on text Based on multimodal data Probabilistic ranking method: Liu, D.; Hua, X.-S.; Yang, L.; Wang, M.; and Zhang, H.-J. 2009. Tag ranking. In WWW, 351–360. ACM.
52
Based on images Based on text Based on multimodal data
53
Based on images Based on text Based on multimodal data Images + user-multiplicative tensor model: Denton, E.; Weston, J.; Paluri, M.; Bourdev, L.; Fergus, R. 2015. User conditional hashtag prediction for images. In: Proceedings
Mining.
54
Based on images Based on text Based on multimodal data End-to-end model: Wang, J.; Yang, Y.; Mao, J.; Huang, Z.; Huang, C.; and Xu, W. 2016. Cnn-rnn: A unified framework for multi-label image classification. In CVPR, 2285–2294.
55
Based on images Based on text Based on multimodal data Attention mechanism into CNNs: Gong, Y., and Zhang, Q.
convolutional neural network. In IJCAI, 2782–2788.
56
Based on images Based on text Based on multimodal data
57
Based on images Based on text Based on multimodal data
About Hashtag Recommendation 58
external memory unit parallel co-attention mechanism
images (40G).
context information
About Hashtag Recommendation 60
About Hashtag Recommendation 61
learning, and open-set recognition in one integrated algorithm
About Hashtag Recommendation 62
Dynamic meta-embedding Modulated attention
63
Relation Matrix
64
Post Vector
Tag Propagation Learning
Image Vector Text Vector
Image similarity user
Pairwise Relationship Generating Multi-label Loss Post Feature Generating
Double Attention
+
3.1 Model Overview
Pre-trained VGG-16
65
…
Calculate cosine similarity between images
Relation map of image: Relation Matrix
When alpha become close to 1, it seems to consider more about the relation between images.
Relation map of user:
3.2 Pairwise Relationship Generating
𝜷=1 has the best performance
66
Image
Pre-trained VGG-16 (7, 7, 512) Reshape to (7*7, 512) Fully connected layer to (7*7, 300)
Image Features Text Text Features
Embedding to dim=300 LSTM
i_vec t_vec
Post Vector ATT ATT
Post Vector Image Vector Text Vector
Post Feature Generating
Double Attention
+ +
3.3 Post Feature Generating
67
Input GCN Layer 1 Dropout ReLU Dropout GCN Layer 2 Multi-label Loss
Tag Propagation Learning GCN
3.4 Tag Propagation Learning
69
training set a post and its corresponding hashtag set a hashtag in the hashtag set the softmax probability of choosing tag z for input post pi
Multi-label Loss
3.5 Training
70
4.1 Evaluation Metrics 4.2 Implementation Details
Precision(P) Recall(R) F1-score(F1)
hashtags are recommended for each posts.
important for this performance evaluation.
71
4.3 Dataset
Sub-1 Sub-2 Sub-3 Node number 11,607 25,259 58,665 Edge Number 68,029 165,392 165,238 Tag Frequency Top 50 Top 100 Tag 200 Length of Tags per posts 7~10 7~10 5~8
72
Method (Size of dataset: 11,607) (Size of dataset: 25,259) (Size of dataset: 58,665) P @10 R @10 F1 @10 P @10 R @10 F1 @10 P @10 R @10 F1 @10 1-layer DNN (image + text) 0.326 0.409 0.362 0.439 0.537 0.481 TBD TBD TBD Co-Attention (CoA) TBD TBD TBD TBD TBD TBD TBD TBD TBD MaCon (ATT + user habit) 0.325 0.413 0.363 0.218 0.267 0.239 0.103 0.168 0.127 ATT (my ATT) + GCN 0.357 0.448 0.396 0.453 0.554 0.496 0.259 0.416 0.317
4.4.1 Comparisons with State-of-the-Arts
4.4 Experimental Results
73
Method (Size of dataset: 11,607 posts) P @10 R @10 F1 @10 GCN only 0.328 0.409 0.363 ATT only 0.289 0.361 0.320 ATT (my ATT) + GCN 0.357 0.448 0.396
4.4.2 Ablation Studies Effects of Attention and GCN Module
4.4 Experimental Results
74
Threshold 𝜐 (Size of dataset: 11,607 posts) P @10 R @10 F1 @10 0.3 0.351 0.439 0.389 0.4 0.350 0.439 0.388 0.5 0.357 0.448 0.396 0.6 0.350 0.438 0.387 0.7 0.351 0.440 0.389
4.4.2 Ablation Studies Effects of different threshold value 𝜐 (in calculating image similarity for adjacency matrix binarization)
4.4 Experimental Results
75
𝛽 (Size of dataset: 11,607 posts) P @10 R @10 F1 @10 0.5 0.348 0.436 0.386 0.9 0.353 0.443 0.391 1 0.357 0.448 0.396
4.4.2 Ablation Studies Effects of different 𝛽 for final relation matrix
4.4 Experimental Results
[Adding user information]
𝛽 (Size of dataset: 11,607 posts) P @10 R @10 F1 @10 0.5 0.350 0.438 0.388 0.8 0.351 0.440 0.389 1 0.357 0.448 0.396
[Adding word information]
𝜷=1 has the best performance
Filtering
2014
Equivalence,” KDD, 2018