Deep Text Mining of Instagram Data Without Strong Supervision WI - - PowerPoint PPT Presentation

deep text mining of instagram data without strong
SMART_READER_LITE
LIVE PREVIEW

Deep Text Mining of Instagram Data Without Strong Supervision WI - - PowerPoint PPT Presentation

Deep Text Mining of Instagram Data Without Strong Supervision WI 2018 Santiago | International Conference on Web intelligence Kim Hammar, Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin KTH Royal Institute of Technology kimham@kth.se


slide-1
SLIDE 1

Deep Text Mining of Instagram Data Without Strong Supervision

WI 2018 Santiago | International Conference on Web intelligence Kim Hammar, Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin

KTH Royal Institute of Technology kimham@kth.se

December 4, 2018

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 1 / 19

slide-2
SLIDE 2

Key enabler for Deep Learning: Data growth

2009 2012 2015 2017 2020 2023 2026 50 100 150 Year Zettabytes Annual Size of the Global Datasphere. Source: IDC

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 2 / 19

slide-3
SLIDE 3

Key enabler for Deep Learning: Data growth

2009 2012 2015 2017 2020 2023 2026 50 100 150 Year Zettabytes Annual Size of the Global Datasphere. Source: IDC

But what about Labeled Data?

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 2 / 19

slide-4
SLIDE 4

b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y

Supervised learning: Iteratively Minimize The Loss Function: L(ˆ y, y) Prediction

  • Ground truth

Labeled Training Data is Still a Bottleneck

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 3 / 19

slide-5
SLIDE 5

Research Problem: Clothing Prediction on Instagram

b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y

Text Model       dress = 0 coat = 1 . . . skirt = 0       Image Model Clothing Prediction Instagram Post

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 4 / 19

slide-6
SLIDE 6

This Paper: Text Classification Without Labeled Data

post1 post2 post3 postn 04.2017 05.2017 06.2017 07.2017 08.2017 09.2017 10.2017 11.2017 12.2017 01.2018 02.2018 03.2018 10 20 30

Mentions Mention of brand “foo” over time

Text Mining Analytics

    

w1,1 . . . w1,n . . . ... . . . wn,1 . . . wn,n

    

b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆ y

Word Embeddings Neural Networks Trends detection User recommendations Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 5 / 19

slide-7
SLIDE 7

Example Instagram Post

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 6 / 19

slide-8
SLIDE 8

Challenge: Noisy Text and No Labels

A case study of a corpora with 143 fashion accounts, 200K posts, 9M comments

Challenge 1: Noisy Text with a Long-Tail Distribution

100 101 102 103 104 105 Log count 100 101 102 103 104 Log frequency

Posts with 0 comments Posts with 0 words (comments+caption+tags)

Log-Log plot over the frequency of text per post

Comments Words

Text Statistic Fraction of corpora size Average/post Emojis 0.15 48.63 Hashtags 0.03 9.14 User-handles 0.06 18.62 Google-OOV words 0.46 145.02 Aspell-OOV words 0.47 147.61 Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 7 / 19

slide-9
SLIDE 9

Challenge: Noisy Text and No Labels

A case study of a corpora with 143 fashion accounts, 200K posts, 9M comments

Challenge 1: Noisy Text with a Long-Tail Distribution

100 101 102 103 104 105 Log count 100 101 102 103 104 Log frequency

Posts with 0 comments Posts with 0 words (comments+caption+tags)

Log-Log plot over the frequency of text per post

Comments Words

Text Statistic Fraction of corpora size Average/post Emojis 0.15 48.63 Hashtags 0.03 9.14 User-handles 0.06 18.62 Google-OOV words 0.46 145.02 Aspell-OOV words 0.47 147.61

Challenge 2: Lack of Expensive Labeled Training Data

Raw Instagram Text Human Annotations

  • Kim Hammar (KTH)

Text Mining in Social Media December 4, 2018 7 / 19

slide-10
SLIDE 10

Alternative Sources of Supervision That Are Cheap but Weak

Strong supervision: Manual annotation by expert Weak supervision: A signal that does not have full coverage/perfect accuracy

Sources of Weak Supervision

Domain Heuristics Database APIs Crowdworkers Combiner

Strong supervision

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 8 / 19

slide-11
SLIDE 11

Weak Supervision in the Fashion Domain

Open APIs:

1https://github.com/jolibrain/deepdetect

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 9 / 19

slide-12
SLIDE 12

Weak Supervision in the Fashion Domain

Open APIs: Pre-trained Clothing Classificiation Models: DeepDetect1

1https://github.com/jolibrain/deepdetect

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 9 / 19

slide-13
SLIDE 13

Weak Supervision in the Fashion Domain

Open APIs: Pre-trained Clothing Classificiation Models: DeepDetect1 Text mining system based on a fashion ontology and word embeddings:

Happy Monday! Here is my outfit
  • f
the day #streetstyle #me #canada #goals #chic #denim Caption Zalando user1 user2 Tags I love the bag! Is it Gucci? #goals @username I #want the #baaag Wow! The #jeans You are suclh an inspirationn, can you follow me back? Comments

Ontology O

Brands Items Patterns Materials Styles

Instagram Post p ∈ P

ProBase

Word Rankings

    w1,1 . . . w1,n . . . ... . . . wn,1 . . . wn,n     Word Embeddings V Edit-distance tfidf (wi, p, P) term-score t ∈ {caption, comment, user-tag, hashtag}

Linear Combination

Items: (bag, 0.63), (jeans, 0.3), (top, 0.1) Brands: (Gucci, 0.8), (Zalando, 0.3) Material: (Denim, 1.0) . . .

Ranked Noisy Labels r 1https://github.com/jolibrain/deepdetect

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 9 / 19

slide-14
SLIDE 14

How To Combine Several Sources Of Weak Supervision?

Simplest way to combine many weak signals: Majority Vote Recent research on combination of weak signals: Data Programming2

2Alexander J Ratner et al. “Data Programming: Creating Large Training Sets, Quickly”.

In: Advances in Neural Information Processing Systems 29. Ed. by D. D. Lee et al. Curran Associates, Inc., 2016, pp. 3567–3575. URL: http://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly.pdf. Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 10 / 19

slide-15
SLIDE 15

Model Weak Supervision With Generative Model

unlabeled data Labeling functions λ1 . . . λn Weak labels

    w1,1 . . . w1,n . . . ... . . . wn,1 . . . wn,n    

Generative Model πα,β(Λ, Y ) Combined labels

    w1 . . . wn    

Model weak supervision as labeling functions λi

λi(unlabeled data) → label

Learn Generative Model πα,β(Λ, Y ) over the labeling process.

Based on conflicts between labeling functions assign the functions an estimated accuracy αi. Based on empirical coverage of labeling functions assign the functions a coverage βi.

Given α and β for each labeling function, it can be used to combine labels into a single probabilistic label

Give more weight to high-accuracy functions If there is a lot of disagreement→ low probability label If all labeling functions agree → high probability label

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 11 / 19

slide-16
SLIDE 16

Data Programming Intuition

Low accuracy labeling functions

  • High accuracy labeling functions
  • “it is a coat”

“it is not a coat”

Probabilistic Label: 0.6 probability that it is a coat Majority Vote: 1.0 probability that it is not a coat

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 12 / 19

slide-17
SLIDE 17

Extension of Data Programming to Multi-Label Classification

Problem: Data programming only defined for binary classification in original paper To make it work for multi-class setting: model labeling function as λi → ki ∈ {0, . . . , N} instead of λi → ki ∈ {−1, 0, 1}. Idea 1 for multi-label: model labeling function as λi → ki = {v0, . . . , vn} ∧ vj ∈ {−1, 0, 1} Idea 2 for multi-label: learn a separate generative model for each class, and let each labeling function give binary output for each class λi,j → ki,j ∈ {−1, 0, 1}.

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 13 / 19

slide-18
SLIDE 18

Trained Generative Models: Labeling Functions’ Accuracy Differ Between Classes

accessories bags blouses coats dresses jackets jeans cardigans shoes skirts tights tops trousers Classes 0.4 0.6 0.8 1.0 Accuracy

Predicted accuracy in generative model

Clarifai Deepomatic DeepDetect Google Cloud Vision SemCluster KeywordSyntactic KeywordSemantic

Figure: Multiple generative models can capture a different accuracy for labeling functions for different classes.

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 14 / 19

slide-19
SLIDE 19

Putting Everything Together

1 Apply weak supervision to unlabeled data (open APIs, pre-trained

models, domain heuristics etc.)

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 15 / 19

slide-20
SLIDE 20

Putting Everything Together

1 Apply weak supervision to unlabeled data (open APIs, pre-trained

models, domain heuristics etc.)

2 Combine labels using majority voting or generative modelling (data

programming)

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 15 / 19

slide-21
SLIDE 21

Putting Everything Together

1 Apply weak supervision to unlabeled data (open APIs, pre-trained

models, domain heuristics etc.)

2 Combine labels using majority voting or generative modelling (data

programming)

3 Use the combined labels for training a discriminative model using

supevised machine learning.

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 15 / 19

slide-22
SLIDE 22

Pipeline for Weakly Supervised Classification in Instagram

Problem: A Multi-class Multi-label classification problem with 13 output classes (dresses, coats, blouses, jeans, ...)

Here is my

  • ut-

fit

  • f

the day #street- style #coat #parka #chic #win- ter

Labeling Functions λi SemCluster KeyWordSyntactic KeyWordSemantic DeepDetect

     

dress = 0 coat = 1 . . . skirt = 0

     

Votes vi

jacket,jeans jeans,coat jeans,shoes nil coat,jeans coat coat

Generative Model πα,β(Λ, Y )

λ1 λ2 λ3 λ4 λ5 λ6 λ7

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13

Discriminative Model d CNN for Text classification

Figure: A pipeline for weakly supervised text classification of Instagram posts.

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 16 / 19

slide-23
SLIDE 23

Data Programming Beats Majority Voting

Results

Data programming gives 6 F1 points improvement over majority vote3, achieving an F1 score of 0.61 (On level with human performance)

Model Accuracy Precision Recall Micro-F1 Macro-F1 Hamming Loss CNN-DataProgramming 0.797 ± 0.01 0.566 ± 0.05 0.678 ± 0.04 0.616 ± 0.02 0.535 ± 0.01 0.195 ± 0.02 CNN-MajorityVote 0.739 ± 0.02 0.470 ± 0.06 0.686 ± 0.05 0.555 ± 0.03 0.465 ± 0.05 0.261 ± 0.03 DomainExpert 0.807 0.704 0.529 0.604 0.534 0.184

Main cause of error: data sparsity (can not extract clothing items from the text if it is never mentioned in the text)

3A smaller, hand-labeled dataset by experts was used for evaluation

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 17 / 19

slide-24
SLIDE 24

Conclusion

Instagram text is jus as noisy as Twitter, has a long-tail distribution, and is multi-lingual In shifting data domains where accurate labeled data is a rarity, like social media, weak supervision is a viable alternative. Combining weak labels with generative modeling beats majority voting. To extend Data programming to the multi-label scenario, a collection

  • f generative models can be used to incorporate per-class accuracy.

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 18 / 19

slide-25
SLIDE 25

Thank you

All code and most of the data is open source: https://github.com/shatha2014/FashionRec Questions?

Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 19 / 19