Distributional Embedding Approach for Relational Knowledge - - PowerPoint PPT Presentation

▶

Jul 13, 2023 252 likes •608 views

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal Supervisor: Dr. Tao A. Aziz Altowayan Pace University March 2017 Contents overview Part 1: Brief introduction Topic, Issue, and Solution idea

SLIDE 1

Distributional Embedding Approach for Relational Knowledge Representation

Dissertation Proposal Supervisor: Dr. Tao

A. Aziz Altowayan

Pace University March 2017

SLIDE 2

Contents overview

Part 1: Brief introduction

◮ Topic, Issue, and Solution idea

Part 2: Details

◮ Methods, Related work, and Proposed work

SLIDE 3

PART ONE

SLIDE 4

Introduction

Overview

◮ Relational Learning through Knowledge Base Representation ◮ Relational Knowledge Representation

◮ Knowledge ≈ entities + their relationships

Motivation and importance

◮ Relationships between entities are a rich source of information

SLIDE 5

Knowledge Bases (KBs)

Web-scale Extracted KBs provide a structured representation of the world knowledge

◮ Large quantities of knowledge publicly available in relational

and across different domains in interlinked form The ability to learn from relational data has significant impact on many applications

SLIDE 6

Applications and use cases

World Wide Web and Semantic Web

◮ linkage of related documents, and semantically structured data

Bioinformatics and Molecular Biology

◮ gene-disease association

Social Networks

◮ relationships between persons

Question Answering

◮ link prediction in knowledge base queries

SLIDE 7

Example KB: FreeBase1

1info source: Deep Learning Lectures (Bordes, 2015)

SLIDE 8

Example KB: WordNet2

2info source: Deep Learning Lectures (Bordes, 2015)

SLIDE 9

Problem

Collectively KBs have over 60 billion published facts and growing (Nickel, 2013) KBs have large dimensions, and thus they are ...

◮ Hard to manipulate ◮ Sparse (with few valid links) ◮ Noisy and/or Incomplete

Tackling these issues is a key to automatically understand/utilize the structure of large-scale knowledge bases

SLIDE 10

Idea

Modeling Knowledge Bases

◮ KBs Embeddings (inspired by Word Embeddings)

How ?

1. Encode (embed) KBs into low-dimensional vector space s.t.

similar entities/relations are represented in a similar “nearby” vectors

2. Use these representations:

◮ to complete/visualize KBs ◮ as KB data in text applications

(Bordes et al., 2013)

SLIDE 11

Example use case: link prediction

◮ Question Answering Systems ◮ Assess the validity of results from Information Retrieval

Systems

An example fragment of a KB.

SLIDE 12

Word Embeddings

The most successful approach for word meanings (now standard) Two main components . . .

1. Neural Language Modeling “NLM”

◮ neural networks approach for text representations

2. Distributional Semantics Hypothesis3

◮ words which are similar in meaning occur in similar contexts

(Rubenstein and Goodenough, 1965)

3One of the most successful ideas of modern statistical NLP As described in: Deep Learning for NLP (Socher, 2016)

SLIDE 13

Relations Embeddings

Current state-of-the-art in relation embeddings is exploiting (only): Neural Language Modeling However, (not): Distributional Semantics Hypothesis As a result,

The performance of current KBs modeling methods is far from being useful to leverage in real world applications

SLIDE 14

Current Approach

Current relation embeddings approaches are not making use of distributional similarities over KBs relations

SLIDE 15

Proposal Approach

Proposed approach (as inspired by word embeddings) brings entire experience of word representations to relation embeddings by incorporating Distributional Similarity

SLIDE 16

PART TWO

SLIDE 17

Knowledge Base

What is Knowledge Base ?

Knowledge bases (KBs) store factual information about the real-world in form of binary relations between entities (e.g. FreeBase, NELL, WordNet, YAGO). In KBs, facts are expressed as triplets “binary relations” between entities a triplet of tokens: (subject, verb, object) with the values: (entityi, relationk, entityj)

SLIDE 18

Sample fragments of KBs

Table 1: Sample KB triplets for “molecule” entity from WordNet18 (Miller, 1995)

head relation tail __radical_NN_1 _part_of __molecule_NN_1 __physics_NN_1 _member_of_domain_topic __molecule_NN_1 __molecule_NN_1 _has_part __atom_NN_1 __unit_NN_5 _hyponym __molecule_NN_1 __chemical_chain_NN_1 _part_of __molecule_NN_1 __molecule_NN_1 _hypernym __unit_NN_5

Table 2: Sample triplets from NELL4 KB

head relation tail action_movies is_a movie action_movies is_a_generalization_of die_hard leonardo_dicaprio is_an actor akiva_goldsman directedMovie batman_forever leonard_nimoy StarredIn star_trek motorola acquiredBy google david_beckham playSport soccer 4http://rtw.ml.cmu.edu/rtw/kbbrowser/

SLIDE 19

Introduction: Knowledge Graphs

Knowledge Graphs are graph structured knowledge bases (KBs).

◮ The multi-relational data (of KBs) can form directed graphs (of

knowledge) whose nodes correspond to entities and edges correspond to relations between entities.

◮ Multigraph Structure

Entity = Node Relation Type = Edge type Fact = Edge

SLIDE 20

Word Embeddings: Semantic Theory

Distributional similarity representation

◮ Distributional Hypothesis

“You shall know a word by the company it keeps.” (Firth, 1957)

Examples 5

◮ It was found in the banks of the Amoy River .. ◮ I was seated in my office at the bank when a card . . . ◮ with a plunge, like the swimmer who leaves the bank . . . ◮ through the issue of bank notes, the money capital . . . ◮ settlements were on the north bank of the Ohio River . . .

5https://youtu.be/T1O3ikmTEdA?t=16m29s

SLIDE 21

Word Embeddings: Neural Language Model

Vector Space Models (VSMs)

Distributed representation of words to solve dimensionally problem. VSMs Approaches:

1. count-based: Latent Semantic Analysis “LSA”
2. prediction-based: Neural probabilistic language models (Bengio

et al., 2003)

SLIDE 22

Word Embeddings: Sparse Representations

The sparsity of Symbolic Representations make them suffers from the “curse of dimensionality”

◮ Lose word order ◮ No account for semantic

Hypothetical example: Symbolic Representations of the terms Dog and Cat

SLIDE 23

Word Embeddings: Distributed Representation

Distributed Representations can address the sparsity issue

◮ Low-dimensional ◮ Induce a rich similarity space

Hypothetical example: Distributed Representations of the terms Dog and Cat

Question: How can we generate such rich vector representations ?

SLIDE 24

Word Embeddings: Neural Word Embedding

word2vec (Mikolov et al., 2013): Most successful example for modeling semantics (and syntactic) similarities of words.

It trains (generates) word vectors (embeddings) by leveraging Distributional Hypothesis to predict following/previous words in a given sequence.

SLIDE 25

Word Embeddings: Word2vec

In word2vec’s skip-gram model, the goal is to maximize the sum log-likelihood given all training vocabulary as target:

T

t=1
c∈Ct

log p(wc|wt) Where, p(wc|wt) = exp(wT

t .wc)

wi∈V exp(wT

t wi)

SLIDE 26

Word Embeddings: Example architecture6

Word2vec leverages Distributional Hypothesis “contexts” to estimate words embeddings Probability of a target word is estimated based on its context words

6http://bit.ly/2eIMHR7

SLIDE 27

Word Embeddings: Example VSM

Words Represented in Vector Space7

7http://projector.tensorflow.org/

SLIDE 28

Word Embeddings: Example Usage8

Representing words as vectors allows easy computation of similarity Spain to Madrid :as: Italy to ?

Arithmetic operations can be performed on word embeddings (e.g. to find similar words)

8https://www.tensorflow.org/tutorials/word2vec

SLIDE 29

Relation Embeddings: Related Work

TransE

State-of-the-art: Translating Embedding for Modeling Multi-relational Data “TransE” (Bordes et al., 2013)

◮ Learning objective: h + l ≈ t when (h, l, t) holds.

In other words, score(Rl(h, t)) = −dsit(h + l, t) wehre dist is L1-norm or L2-norm and {h, l, t} ∈ Rk

SLIDE 30

Proposed approach: example scenario

Table 3: Table 1: An example training dataset.

training examples - triplets (e3, r1, e2) (e1, r2, e3) (e1, r3, e4) (e2, r2, e5) (e6, r2, e4) (e3, r1, e6) (e5, r3, e3) We have E = (e1, e2, e3, e4, e5, e6) and R = (r1, r2, r3), and assuming current training target: (e1, r2, e3) And the window size is 1 In this case, the target’s context would be the triplets (e6, r2, e4) and (e2, r2, e5)

SLIDE 31

Proposed approach: model

With that being said, a triplet in our approach is treated just like a word in word2vec model 1 R

R

i=1
j∈C

log p(tr

j |tr i )

Where a triplet t is a compositional vector tr = d(eh, rl, et) And p(tjr|ti r) = exp(ti r · tjr)

tk r∈R exp(ti r · tkr)

SLIDE 32

Plan of Action

Rough estimate of the planned/implemented tasks throughout the entire PhD program:

SLIDE 33

Timeline

Date Work Summer 2014 Web-technology tools Fall 2014/Spring 2015 Semantic Web methods Fall 2015 Shift focus to future-proof solution Spring 2016 ML/AI and data collection Fall 2016 Re-produce/test related work models Spring/Summer 2017 Build and evaluate proposed model Summer/Fall 2017 Write up the dissertation

SLIDE 34

References I

Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. journal of machine learning research, 3 (Feb):1137–1155, 2003. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational

data. In Advances in neural information processing systems, pages 2787–2795,

2013. John R Firth. A synopsis of linguistic theory, 1930-1955. 1957. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation

f word representations in vector space. arXiv.org, January 2013.

George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995. Herbert Rubenstein and John B Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.