Distributional Embedding Approach for Relational Knowledge - - PowerPoint PPT Presentation

distributional embedding approach for relational
SMART_READER_LITE
LIVE PREVIEW

Distributional Embedding Approach for Relational Knowledge - - PowerPoint PPT Presentation

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal Supervisor: Dr. Tao A. Aziz Altowayan Pace University March 2017 Contents overview Part 1: Brief introduction Topic, Issue, and Solution idea


slide-1
SLIDE 1

Distributional Embedding Approach for Relational Knowledge Representation

Dissertation Proposal Supervisor: Dr. Tao

  • A. Aziz Altowayan

Pace University March 2017

slide-2
SLIDE 2

Contents overview

Part 1: Brief introduction

◮ Topic, Issue, and Solution idea

Part 2: Details

◮ Methods, Related work, and Proposed work

slide-3
SLIDE 3

PART ONE

slide-4
SLIDE 4

Introduction

Overview

◮ Relational Learning through Knowledge Base Representation ◮ Relational Knowledge Representation

◮ Knowledge ≈ entities + their relationships

Motivation and importance

◮ Relationships between entities are a rich source of information

slide-5
SLIDE 5

Knowledge Bases (KBs)

Web-scale Extracted KBs provide a structured representation of the world knowledge

◮ Large quantities of knowledge publicly available in relational

and across different domains in interlinked form The ability to learn from relational data has significant impact on many applications

slide-6
SLIDE 6

Applications and use cases

World Wide Web and Semantic Web

◮ linkage of related documents, and semantically structured data

Bioinformatics and Molecular Biology

◮ gene-disease association

Social Networks

◮ relationships between persons

Question Answering

◮ link prediction in knowledge base queries

slide-7
SLIDE 7

Example KB: FreeBase1

1info source: Deep Learning Lectures (Bordes, 2015)

slide-8
SLIDE 8

Example KB: WordNet2

2info source: Deep Learning Lectures (Bordes, 2015)

slide-9
SLIDE 9

Problem

Collectively KBs have over 60 billion published facts and growing (Nickel, 2013) KBs have large dimensions, and thus they are ...

◮ Hard to manipulate ◮ Sparse (with few valid links) ◮ Noisy and/or Incomplete

Tackling these issues is a key to automatically understand/utilize the structure of large-scale knowledge bases

slide-10
SLIDE 10

Idea

Modeling Knowledge Bases

◮ KBs Embeddings (inspired by Word Embeddings)

How ?

  • 1. Encode (embed) KBs into low-dimensional vector space s.t.

similar entities/relations are represented in a similar “nearby” vectors

  • 2. Use these representations:

◮ to complete/visualize KBs ◮ as KB data in text applications

(Bordes et al., 2013)

slide-11
SLIDE 11

Example use case: link prediction

◮ Question Answering Systems ◮ Assess the validity of results from Information Retrieval

Systems

An example fragment of a KB.

slide-12
SLIDE 12

Word Embeddings

The most successful approach for word meanings (now standard) Two main components . . .

  • 1. Neural Language Modeling “NLM”

◮ neural networks approach for text representations

  • 2. Distributional Semantics Hypothesis3

◮ words which are similar in meaning occur in similar contexts

(Rubenstein and Goodenough, 1965)

3One of the most successful ideas of modern statistical NLP As described in: Deep Learning for NLP (Socher, 2016)

slide-13
SLIDE 13

Relations Embeddings

Current state-of-the-art in relation embeddings is exploiting (only): Neural Language Modeling However, (not): Distributional Semantics Hypothesis As a result,

The performance of current KBs modeling methods is far from being useful to leverage in real world applications

slide-14
SLIDE 14

Current Approach

Current relation embeddings approaches are not making use of distributional similarities over KBs relations

slide-15
SLIDE 15

Proposal Approach

Proposed approach (as inspired by word embeddings) brings entire experience of word representations to relation embeddings by incorporating Distributional Similarity

slide-16
SLIDE 16

PART TWO

slide-17
SLIDE 17

Knowledge Base

What is Knowledge Base ?

Knowledge bases (KBs) store factual information about the real-world in form of binary relations between entities (e.g. FreeBase, NELL, WordNet, YAGO). In KBs, facts are expressed as triplets “binary relations” between entities a triplet of tokens: (subject, verb, object) with the values: (entityi, relationk, entityj)

slide-18
SLIDE 18

Sample fragments of KBs

Table 1: Sample KB triplets for “molecule” entity from WordNet18 (Miller, 1995)

head relation tail __radical_NN_1 _part_of __molecule_NN_1 __physics_NN_1 _member_of_domain_topic __molecule_NN_1 __molecule_NN_1 _has_part __atom_NN_1 __unit_NN_5 _hyponym __molecule_NN_1 __chemical_chain_NN_1 _part_of __molecule_NN_1 __molecule_NN_1 _hypernym __unit_NN_5

Table 2: Sample triplets from NELL4 KB

head relation tail action_movies is_a movie action_movies is_a_generalization_of die_hard leonardo_dicaprio is_an actor akiva_goldsman directedMovie batman_forever leonard_nimoy StarredIn star_trek motorola acquiredBy google david_beckham playSport soccer 4http://rtw.ml.cmu.edu/rtw/kbbrowser/

slide-19
SLIDE 19

Introduction: Knowledge Graphs

Knowledge Graphs are graph structured knowledge bases (KBs).

◮ The multi-relational data (of KBs) can form directed graphs (of

knowledge) whose nodes correspond to entities and edges correspond to relations between entities.

◮ Multigraph Structure

Entity = Node Relation Type = Edge type Fact = Edge

slide-20
SLIDE 20

Word Embeddings: Semantic Theory

Distributional similarity representation

◮ Distributional Hypothesis

“You shall know a word by the company it keeps.” (Firth, 1957)

Examples 5

◮ It was found in the banks of the Amoy River .. ◮ I was seated in my office at the bank when a card . . . ◮ with a plunge, like the swimmer who leaves the bank . . . ◮ through the issue of bank notes, the money capital . . . ◮ settlements were on the north bank of the Ohio River . . .

5https://youtu.be/T1O3ikmTEdA?t=16m29s

slide-21
SLIDE 21

Word Embeddings: Neural Language Model

Vector Space Models (VSMs)

Distributed representation of words to solve dimensionally problem. VSMs Approaches:

  • 1. count-based: Latent Semantic Analysis “LSA”
  • 2. prediction-based: Neural probabilistic language models (Bengio

et al., 2003)

slide-22
SLIDE 22

Word Embeddings: Sparse Representations

The sparsity of Symbolic Representations make them suffers from the “curse of dimensionality”

◮ Lose word order ◮ No account for semantic

Hypothetical example: Symbolic Representations of the terms Dog and Cat

slide-23
SLIDE 23

Word Embeddings: Distributed Representation

Distributed Representations can address the sparsity issue

◮ Low-dimensional ◮ Induce a rich similarity space

Hypothetical example: Distributed Representations of the terms Dog and Cat

Question: How can we generate such rich vector representations ?

slide-24
SLIDE 24

Word Embeddings: Neural Word Embedding

word2vec (Mikolov et al., 2013): Most successful example for modeling semantics (and syntactic) similarities of words.

It trains (generates) word vectors (embeddings) by leveraging Distributional Hypothesis to predict following/previous words in a given sequence.

slide-25
SLIDE 25

Word Embeddings: Word2vec

In word2vec’s skip-gram model, the goal is to maximize the sum log-likelihood given all training vocabulary as target:

T

  • t=1
  • c∈Ct

log p(wc|wt) Where, p(wc|wt) = exp(wT

t .wc)

  • wi∈V exp(wT

t wi)

slide-26
SLIDE 26

Word Embeddings: Example architecture6

Word2vec leverages Distributional Hypothesis “contexts” to estimate words embeddings Probability of a target word is estimated based on its context words

6http://bit.ly/2eIMHR7

slide-27
SLIDE 27

Word Embeddings: Example VSM

Words Represented in Vector Space7

7http://projector.tensorflow.org/

slide-28
SLIDE 28

Word Embeddings: Example Usage8

Representing words as vectors allows easy computation of similarity Spain to Madrid :as: Italy to ?

Arithmetic operations can be performed on word embeddings (e.g. to find similar words)

8https://www.tensorflow.org/tutorials/word2vec

slide-29
SLIDE 29

Relation Embeddings: Related Work

TransE

State-of-the-art: Translating Embedding for Modeling Multi-relational Data “TransE” (Bordes et al., 2013)

◮ Learning objective: h + l ≈ t when (h, l, t) holds.

In other words, score(Rl(h, t)) = −dsit(h + l, t) wehre dist is L1-norm or L2-norm and {h, l, t} ∈ Rk

slide-30
SLIDE 30

Proposed approach: example scenario

Table 3: Table 1: An example training dataset.

training examples - triplets (e3, r1, e2) (e1, r2, e3) (e1, r3, e4) (e2, r2, e5) (e6, r2, e4) (e3, r1, e6) (e5, r3, e3) We have E = (e1, e2, e3, e4, e5, e6) and R = (r1, r2, r3), and assuming current training target: (e1, r2, e3) And the window size is 1 In this case, the target’s context would be the triplets (e6, r2, e4) and (e2, r2, e5)

slide-31
SLIDE 31

Proposed approach: model

With that being said, a triplet in our approach is treated just like a word in word2vec model 1 R

R

  • i=1
  • j∈C

log p(tr

j |tr i )

Where a triplet t is a compositional vector tr = d(eh, rl, et) And p(tjr|ti r) = exp(ti r · tjr)

  • tk r∈R exp(ti r · tkr)
slide-32
SLIDE 32

Plan of Action

Rough estimate of the planned/implemented tasks throughout the entire PhD program:

slide-33
SLIDE 33

Timeline

Date Work Summer 2014 Web-technology tools Fall 2014/Spring 2015 Semantic Web methods Fall 2015 Shift focus to future-proof solution Spring 2016 ML/AI and data collection Fall 2016 Re-produce/test related work models Spring/Summer 2017 Build and evaluate proposed model Summer/Fall 2017 Write up the dissertation

slide-34
SLIDE 34

References I

Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. journal of machine learning research, 3 (Feb):1137–1155, 2003. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational

  • data. In Advances in neural information processing systems, pages 2787–2795,

2013. John R Firth. A synopsis of linguistic theory, 1930-1955. 1957. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation

  • f word representations in vector space. arXiv.org, January 2013.

George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995. Herbert Rubenstein and John B Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.