Towards a Deep and Unified Understanding of Deep Neural Models in - - PowerPoint PPT Presentation

▶

Jun 08, 2023 95 likes •221 views

Towards a Deep and Unified Understanding of Deep Neural Models in NLP Chaoyu Guan* 2 , Xiting Wang* 2 , Quanshi Zhang 1 , Runjin Chen 1 , Di He 2 , Xing Xie 2 * Equal Contribution 1 John Hopcroft Center and the MoE Key Lab of Artificial

SLIDE 1

Towards a Deep and Unified Understanding of Deep Neural Models in NLP

Chaoyu Guan*2, Xiting Wang*2, Quanshi Zhang1, Runjin Chen1, Di He2, Xing Xie2

*Equal Contribution 1John Hopcroft Center and the MoE Key Lab of Artificial Intelligence, AI Institute, at the

Shanghai Jiao Tong University, Shanghai, China

2Microsoft Research Asia, Beijing, China

SLIDE 2

Introduction

A key task in explainable AI is to associate latent representations with input units by quantifying layerwise information discarding of inputs. Most explanation methods (e.g., DNN visualization) have coherency & generality issues

Coherency: requires that a method generates consistent explanations across different

neurons, layers, and models.

Generality: existing measures are usually defined under certain restrictions on model

architectures or tasks.

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 3

Our solution

Considering both coherency and generality

A unified information-based measure: quantify the information of each

input word that is encoded in an intermediate layer of a deep NLP model.

The information-based measure as a tool
Evaluating different explanation methods.
Explaining different deep NLP models
This measure enriches the capability of explaining DNNs.

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 4

Problem

Quantification of sentence-level information discarding: quantify the

information of an entire sentence 𝐲 that is encoded in 𝐭.

Quantification of word-level information discarding: quantify the information
f each specific word 𝐲𝑗 that is encoded in 𝐭.
Fine-grained analysis of word attributes: analyze the fine-grained reason why 𝐭

uses the information of 𝐲𝑗. 𝐲 = 𝐲1

𝑈, … , 𝐲𝑜 𝑈 𝑈 ∈ 𝐘: Input sentence

𝐭 = Φ 𝐲 ∈ 𝐓: hidden state 𝐲𝑗

𝑈: word embedding

Φ ⋅ : function of the intermediate layer

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 5

Word Information Quantification

𝑁𝐽 𝐘; 𝐓 = 𝐼 𝐘 − 𝐼(𝐘|𝐓) 𝐼 𝐘 𝐓 = න

𝐭∈𝐓

𝑞 𝐭 𝐼 𝐘 𝐭 𝑒𝐭 𝐼(𝐲) = − න

𝐲′∈𝐘

𝑞 𝐲′ 𝐭 log 𝑞 𝐲′ 𝐭 𝑒𝐲′ 𝐼 𝐘 𝐭 =∗ ෍

𝑗

𝐼 𝐘𝑗 𝐭

Corpus level Sentence level Word level

𝐼 𝐘𝑗 𝐭 = − න

𝐲𝒋

′∈𝐘𝒋

𝑞 𝐲𝑗

′ 𝐭 log 𝑞 𝐲𝑗 ′ 𝐭 𝑒𝐲′𝑗

* Suppose the words are independent in one sentence.

Multi-Level Quantification

𝐼(𝐘𝑗|𝐭 = Φ(𝐲)) reflects how much information from word 𝐲𝑗 is discarded by 𝐭 during the forward propagation. 𝐼(𝐘) 𝐼(𝐘|𝐓) 𝑁𝐽(𝐘; 𝐓) Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 6

Word Information Quantification

Perturbation-based Approximation We use 𝐼(෩ 𝐘𝑗|𝐭) to approximate 𝐼(𝐘𝑗|𝐭) by minimizing the following loss:

𝑀 𝝉 = 𝔽𝝑 Φ ෤ 𝐲 − 𝐭

2 − 𝜇 ෍ 𝑗=1 𝑜

𝐼(෩ 𝐘𝑗 𝐭 ቚ

𝝑𝑗∼𝒪(𝟏,𝜏𝑗

2𝐉)

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 7

Fine-Grained Analysis of Word Attributes

𝐵𝑗 = log 𝑞(𝐲𝑗|𝐭) − 𝔽𝐲𝑗

′∈𝐘𝑗 log 𝑞(𝐲𝑗

′|𝐭)

𝐵𝐝 = 𝔽𝐲𝑗

′∈𝐘𝐝 log 𝑞(𝐲𝑗

′|𝐭) − 𝔽𝐲𝑗

′∈𝐘𝑗 log 𝑞(𝐲𝑗

′|𝐭)

Disentangle the information of a common concept 𝐝 away from each word 𝐲𝑗 𝑠

𝑗,𝐝 = 𝐵𝑗 − 𝐵𝐝 indicates the remaining information of the word 𝐲𝑗 when we

remove the information of the common attribute 𝐝 from the word.

Importance of the i-th word concerning random words Importance of the common concept c w.r.t. random words

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 8

Comparative Study

Three baselines: LRP, gradient-based, perturbation
Conclusion: our method provides the most faithful explanations for
Across timestamp analysis
Across layer analysis
Across model analysis

Our method clearly shows that the model gradually focuses on the most important parts of the sentence.

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 9

Understanding Neural Models in NLP

We explain four NLP models (BERT, Transformer, LSTM, and CNN):

What information is leveraged for prediction?
How does the information flow through layers?
How do the models evolve during training?

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 10

Bert and Transformer use words for prediction, while LSTM and CNN use subsequences of

sentences for prediction.

Different models process the input sentence in different manners.

Understanding Neural Models in NLP

Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62

SLIDE 11

Towards a Deep and Unified Understanding of Deep Neural Models in - - PowerPoint PPT Presentation

Towards a Deep and Unified Understanding of Deep Neural Models in NLP

Introduction

A key task in explainable AI is to associate latent representations with input units by quantifying layerwise information discarding of inputs. Most explanation methods (e.g., DNN visualization) have coherency & generality issues

neurons, layers, and models.

architectures or tasks.

Our solution

Considering both coherency and generality

input word that is encoded in an intermediate layer of a deep NLP model.

Problem

information of an entire sentence 𝐲 that is encoded in 𝐭.

uses the information of 𝐲𝑗. 𝐲 = 𝐲1

𝐭 = Φ 𝐲 ∈ 𝐓: hidden state 𝐲𝑗

Φ ⋅ : function of the intermediate layer

Word Information Quantification

Corpus level Sentence level Word level

Multi-Level Quantification

Word Information Quantification

Perturbation-based Approximation We use 𝐼(෩ 𝐘𝑗|𝐭) to approximate 𝐼(𝐘𝑗|𝐭) by minimizing the following loss:

Fine-Grained Analysis of Word Attributes

𝐵𝑗 = log 𝑞(𝐲𝑗|𝐭) − 𝔽𝐲𝑗

𝐵𝐝 = 𝔽𝐲𝑗

Disentangle the information of a common concept 𝐝 away from each word 𝐲𝑗 𝑠

remove the information of the common attribute 𝐝 from the word.

Comparative Study

Understanding Neural Models in NLP

We explain four NLP models (BERT, Transformer, LSTM, and CNN):

Understanding Neural Models in NLP

Please visit our poster at #62!

Towards A Deep and Unified Understanding of Deep Neural Models in NLP