[PPT] - Two term-layers: An Alternative topology for representing term PowerPoint Presentation

SLIDE 1

Two term-layers: An Alternative topology for representing term relationships in the Bayesian Network Retrieval Model

Luis M. de Campos

, Juan M. Fern´

andez-Luna

✁

, & Juan F. Huete

Departamento de Ciencias de la Computaci´
n e Inteligencia Artificial.

Universidad de Granada (Spain).

✁

Departamento de Inform´

atica. Universidad de Ja´

en (Spain).

– p.1/31

SLIDE 2

Layout

1. Introduction
2. Preliminaries
3. The Bayesian Network Retrieval Model.
4. An alternative representation for term relationships:

A topology with two term layers.

5. Experiments and results.
6. Concluding remarks.
✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.2/31

SLIDE 3

Introduction (I)

We present a modification of the Bayesian Network Retrieval Model (BNRM), which aims to improve its efficiency. This model is composed of two subnetworks: The document subnetwork: Stores the documents from the collection. The term subnetwork: The terms occurring in the documents and their relationships. Capturing term to term relationships within a collection implies a more accurate representation of the collection, improving the effectiveness of the IR system.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.3/31

SLIDE 4

Introduction (II)

In the original model, term relationships are represented by means of a polytree, automatically constructed. The topology proposed in this paper will contain a term subnetwork in which: The collection terms are duplicated and placed in a second layer. Arcs are established from terms from one layer to terms in the second. This bipartite graph allows to efficiently propagate using a probability function evaluation.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.4/31

SLIDE 5

Preliminaries (I) - IR

Representation of documents and queries in an IR system is usually based on term-weight vectors. The most common weighting schemes try to highlight the importance of each term, either within a given document it belongs to, or within the entire collection: Term frequency (within-document frequency), tf

✁

: the number of times that the

✂ ✄ ☎

term appears in the

✆ ✄ ☎

document. Inverse document frequency of the

✂ ✄ ☎

term in the collection:

✂ ✝

✞

✟✡✠ ☛ ☞ ✌✎✍

✏

✑

,

✞

number

f documents,

✍

✞

number of documents that contain the

✂ ✄ ☎

term. The combination of both weights, tf

✁

✒

idf

, is
✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.5/31

SLIDE 6

Preliminaries (II) - IR

Evaluation: Recall (R), i.e., the proportion of relevant documents retrieved. Precision (P), i.e., the proportion of retrieved documents that are relevant, for a given query. By computing the precision for a number of values of recall we obtain a recall-precision plot. The average precision for all the recall values considered may be used as a single measure.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.6/31

SLIDE 7

Preliminaries (III) - BN

A Bayesian network

✞ ☞

✏

is a Directed Acyclic Graph (DAG), where: Nodes in

✞

Variables from the problem. Arcs in

✞

Dependence relationships among the variables. The knowledge is represented in two ways: Qualitatively, showing the (in)dependencies between the variables. Quantitatively, by means of a set of conditional probability distributions, measuring the strength of the relationships (

☞

✁

✂ ☞

✏

✏

✄
☎

).

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.7/31

SLIDE 8

Preliminaries (V) - BN

The joint distribution can be recovered by:

☞

✁
✂

✂ ✂

✄

✏ ✞ ✄

☎
☞
✁

✂ ☞

✏

✏

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.8/31

SLIDE 9

The Bayesian Net. Ret. Model (I)

Two sets of variables:

✁

✂ ✄ ☎

☎

✆ ✝✟✞

✞
✠

and

☎ ✁ ☎ ✆ ✝ ✝ ✁

✝

✁ ✠

The topology of the networks is determined by following guidelines: There is a link joining each term node

☎

and each document node

✁ ☎

whenever

belongs

to

✁

. There are not links joining any document nodes

✁

and

✡

. Any document

✁

is conditionally independent of any other document

✡

when we know for sure the (ir)relevance values for all the terms indexing

✁

.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.9/31

SLIDE 10

The Bayesian Net. Ret. Model (II)

These three assumptions determine the network structure in part: The links joining term and document nodes have to be directed from terms to documents; moreover, the parent set of a document node

✁

is the set of term nodes that belong to

✁

, i.e.,

✂ ☞ ✁ ✏ ✞ ✆

☎

✁

☎

✁ ✠

. Inclusion of dependences between terms: Application of an automatic learning algorithm that has the set of documents as input and generates as the output a polytree of terms. Reasons for using a polytree: Existence of a set of efficient learning, and exact and, also efficient, inference algorithms.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.10/31

SLIDE 11

The Bayesian Net. Ret. Model (III)

Graphically, the retrieval model is represented by the following graph:

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.11/31

SLIDE 12

The Bayesian Net. Ret. Model (IV)

Probability distributions: Term nodes without parents:

☞

✞

✏

✞ ✑

and

☞

✝ ✞

✏

✞ ✑ ✁

☞

✞

✏

(1) Term nodes with parents:

☞

✝ ✞

✁
✂

☞

✏

✞ ✄ ✂☎✄ ✆ ✄✞✝✠✟ ✡☛ ✂ ☞ ✝ ✌☎✍ ✌ ✄ ✂☎✄ ✆ ✄✎✝ ✍ ✌✑✏ ✄ ✂ ✡ ☛ ✂ ☞ ✝ ✌ ✌✓✒ ✄ ✂☎✄ ✆ ✄✎✝ ✟ ✡☛ ✂ ☞ ✝ ✌☎✍ ✌

☞

✞

✁
✂

☞

✏

✏ ✞ ✑ ✁

☞

✝ ✞

✁
✂

☞

✏

✏

(2)

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.12/31

SLIDE 13

The Bayesian Net. Ret. Model (V)

Document nodes: Due to efficiency problems, the model uses a probability function, that returns the required probability when called:

✁

✂ ✄ ☎ ✆ ✁ ✝ ✄ ✞ ✞ ✟ ✠ ✡ ☛ ☞✍✌✏✎ ✑ ✡ ☛ ✒✓ ✔ ☞✍✌ ✕ ✖✘✗ ✄

(3) where

✙ ✚ ✖ ✗ ✄ ✛ ✜✣✢ ✤

,

✠ ✡ ☛ ☞✍✌ ✖✘✗ ✄ ✚ ✥ ✛ ✤

, and

✖✘✗ ✄ ✟ ✦ ✧ ★

tf

✗ ✄ ✩

idf

✪ ✗ ✠ ✫ ☛ ☞✍✌

tf

✬ ✄ ✩

idf

✪ ✬

(4)

✦

being a normalizing constant (to assure that

✠ ✡ ☛ ☞✭✌ ✖✮✗ ✄ ✚ ✥ ✛ ✝ ✄ ✯ ✰

).

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.13/31

SLIDE 14

The Bayesian Net. Ret. Model (VI)

Given a query submitted to our system: Place the evidences in the term subnetwork: Each term

☎

✞ ✞

(relevant).

Run the inference process, obtaining,

✄ ✁

,

☞

✝ ✁ ✁ ✏

. Sort in decreasing order of the posterior probability to carry out the evaluation process. Taking into account the topology of the model, general purpose inference algorithms cannot be applied due to efficiency considerations. A new specific inference method has been developed: propagation + evaluation.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.14/31

SLIDE 15

The Bayesian Net. Ret. Model (VII)

An exact propagation in the term subnetwork. Results:

☞

☞

✞

✁

✏

✄
✏

. An evaluation of the probability function used to estimate the conditional probabilities in document nodes using the information obtained in the previous propagation, computing the probability of relevance

f each document:
☞

✝ ✁ ✁ ✏ ✞ ☞ ✝

✁✄✂

☎

✁
☞

✞

✁

✏

(5)

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.15/31

SLIDE 16

An alternative: 2 term layers (I)

If the graph contains a lot of terms and arcs, the propagation process could get slower problem. We have to look for an alternative topology that fulfills that: The accuracy of the term relationships represented in the graph. An efficient propagation scheme in the underlying graph to compute the posterior probabilities in each term node.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.16/31

SLIDE 17

An alternative: 2 term layers (II)

The new topology: Let

✡ ☞ ✁ ✏

be the set of those

terms most closely

related to

✁

, measured in a certain way. Two layers of nodes to represent the term subnetwork: We duplicate each term node

✡

in the

riginal layer to obtain another term node
✡

, thus forming a new term layer,

.

The arcs go from

☎

✡ ☞ ✁ ✏

to

✁

. The parent set

f any original term node

✁ ☎

is defined as

✂ ☞ ✁ ✏ ✞ ✡ ☞ ✁ ✏

.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.17/31

SLIDE 18

An alternative: 2 term layers (III)

Advantages: We do not have to redefine the conditional probabilities associated to the document nodes in . We deal with a bipartite graph. This specific topology allows us the use of a very fast propagation algorithm. The new topology contains three simple layers, without connections between the nodes in the same layer, and this fact will be essential for the efficiency

f the inference process.
✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.18/31

SLIDE 19

An alternative: 2 term layers (IV)

The new extended Bayesian network:

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.19/31

SLIDE 20

An alternative: 2 term layers (V)

The probability distribution for term nodes:

✁
✄

☎ ✆ ✁ ✁ ✄ ✞ ✞ ✟ ✠ ✂ ✡ ☛ ✄ ✓ ✔ ✠ ✌ ✕ ✎ ✑ ✂ ✡ ☛ ✒✓ ✔ ✠ ✌ ✕ ☎ ✗ ✄

(6)

Learning term relationships: Kullback-Libler’s Cross Entropy to determine dependences takes into account positive and negative dependences among terms. It would be better to include only the strength of the positive correlated terms. A solution: To use an approach based on frequencies

f co-occurrences of two terms.
✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.20/31

SLIDE 21

An alternative: 2 term layers (VI)

Given

✁

, we build the following contingency table for each term

:

✁ ✄ ✟

✄

✁ ✄ ✟

✄

✁ ✗ ✟

✗

✁ ✂ ✑ ✡ ✂ ✑ ✌ ✁ ✂ ✑ ✡ ✑ ✌ ✁ ✂ ✑ ✡ ✁ ✗ ✟

✗

✁ ✑ ✡ ✂ ✑ ✌ ✁ ✑ ✡ ✑ ✌ ✁ ✑ ✡ ✁ ✂ ✑ ✌ ✁ ✑ ✌ ✄ ✄ ✝

means “

☞ ✝

ccurs”;

✆ ✄ ✝

, “

☞ ✝

does not occur (and respectively for

☞ ✂

);

✄ ☎ ✆✞✝ ☎ ✆✞✟

is the number of times in which neither

☞ ✝

nor

☞ ✂

ccur in a document;

✄ ✆ ✝ ✆ ✟

is the number of times in which both terms occur in the same document;

✄ ☎ ✆ ✝ ✆ ✟

and

✄✠✆ ✝ ☎ ✆ ✟

, the number of documents in which only one of the two terms occur.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.21/31

SLIDE 22

An alternative: 2 term layers (VII)

The strength of their co-occurrence relationship, fixed

✁

is computed by:

✂✁

✄ ✁ ☎

✆

✁ ✁ ✄ ✢ ✁ ✗ ✞ ✟ ✁ ✑ ✡ ✑ ✌ ✁ ✑ ✡

(7)

Improving the function to avoid some problems:

✁

✄ ✁ ☎

✆

✝ ✁ ✁ ✄ ✢ ✁ ✗ ✞ ✟ ✞ ✟ ✙

if

✁ ✑ ✡ ✑ ✌ ✟ ✙

✁

✄ ✁ ☎

✆

✁ ✁ ✄ ✢ ✁ ✗ ✞

therwise

(8)

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.22/31

SLIDE 23

An alternative: 2 term layers (VIII)

For each

✁

, the measure

✞✂✁

✄ ✍ ☎ ✞ ✆ ☞ ✁

✏

,

✄

☎

is computed. The duplicates of the

terms with the highest values

are selected to be elements of

✡ ☞ ✁ ✏

.

✁

, is always included in

✡ ☞ ✁ ✏

(a term is related with itself, and in case of being instantiated, the posterior probability of that term would be very close to

✑ ✂ ✝

). Assigning values to the weights

✞

✁

:

✟ ✝ ✂

=

✡✠ ☛ ☞ ✟ ✌ ✄✎✍ ✏ ✄✑ ✄ ☎ ✒ ✂ ☞ ✂ ✟ ☞ ✝ ✌ ✓ ☞ ✒ ✝

✔

☛ ✂ ☞ ✂ ✌ ✟

✕

☎ ✁ ✟ ✂ ✂

=

✖

(9)

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.23/31

SLIDE 24

An alternative: 2 term layers (IX)

Where

✄

✟ ✠ ✂ ✡ ☛ ✄ ✓ ✔ ✠ ✌ ✕ ✎ ✗ ✁✄✂ ✄

✁

✄ ✁ ☎

✆

✝ ✁ ✁ ✄ ✢ ✁ ✗ ✞

is a parameter,

✝ ☎ ☎ ✑

.

☞

✞ ✁ ✁ ✏

can be computed as follows:

✁
✄

☎ ✆ ✞ ✟ ✠ ✂ ✡ ☛ ✄ ✓ ✔ ✠ ✡ ✕ ☎ ✗ ✄

✁
✝

✗ ☎ ✆ ✞

(10)

The final expression for the calculation of

☞

✞ ✁ ✁ ✏

is:

✁
✄

☎ ✆ ✞ ✟ ✥ ✝ ✞

✄

☞ ✒ ✝

✔

☛ ✂ ☞ ✂ ✌

✕

☎ ✂

✂✁

✄ ✁ ☎

✆

✝ ✁ ✁ ✄ ✢ ✁ ✗ ✞

✁
✝

✗ ☎ ✆ ✞ ✟ ✞

✁
✝

✄ ☎ ✆ ✞

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.24/31

SLIDE 25

Experiments and results (I)

To test the new Bayesian network topology, we have run several retrieval experiments with three medium-size standard collections: ADI, CISI, and CRANFIELD. Our aim is to compare the effectiveness of the original model, in which the term subnetwork was composed of a polytree, and the performance of this new topology for representing term relationships by means of two layers of terms.

✞

✆✁

✑

✝

✑
✠

;

✞ ✆ ✝ ✂ ✂

✝

✂ ✄

✝

✂ ☎ ✠

. Average precision for the eleven standard values of recall.

f change of the average precision of the new

model with respect to the original model (%C).

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.25/31

SLIDE 26

Experiments and results (II)

ADI CISI CRANFIELD 0.4130 0.2007 0.4314 AP-11 (BNRM) p

5

0.6 0.4524 0.216 0.4314 AV-11p 9.54 7.62 0.00 %C 5 0.7 0.4547 0.2212 0.4332 AV-11p 10.10 10.21 0.42 %C 5 0.8 0.4676 0.2207 0.4316 AV-11p 13.22 9.97 0.05 %C

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.26/31

SLIDE 27

Experiments and results (III)

ADI CISI CRANFIELD 0.4130 0.2007 0.4314 AP-11 (BNRM) p

10

0.6 0.4587 0.2182 0.4334 AV-11p 11.07 8.72 0.46 %C 10 0.7 0.4681 0.22 0.4347 AV-11p 13.34 9.62 0.76 %C 10 0.8 0.4695 0.221 0.4331 AV-11p 13.68 10.11 0.39 %C

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.27/31

SLIDE 28

Experiments and results (IV)

ADI CISI CRANFIELD 0.4130 0.2007 0.4314 AP-11 (BNRM) p

15

0.6 0.4678 0.2211 0.4332 AV-11p 13.27 10.16 0.42 %C 15 0.7 0.4651 0.2203 0.434 AV-11p 12.62 9.77 0.60 %C 15 0.8 0.468 0.2208 0.4329 AV-11p 13.32 10.01 0.35 %C

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.28/31

SLIDE 29

Experiments and results (V)

The results are sensible to the values of the two parameters,

and

, but they do not vary greatly. We believe that the number of parents,

and the

value of should not be low. The effectiveness of the new model is even better than the performance of the polytree-based model in terms of retrieval success. This is a good side effect, because our initial goal was to increase the efficiency without degrading the effectiveness.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.29/31

SLIDE 30

Concluding Remarks (I)

In this paper an alternative topology for representing term relationships has been presented: Instead of using a polytree for the term subnetwork, we have designed a bipartite graph. that stores the strongest relationships among terms. The main advantage is that the exact propagation in the

riginal polytree is substituted by an efficient evaluation of a

probability function. The main application will be the retrieval of TREC documents: We think that it will be an competitive and efficient. This new model has a better behavior than the original model, although it depends on the collection being tested.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.30/31

SLIDE 31

Concluding Remarks (II)

Several points in which the model may be modified to improve its performance: Design of more accurate ways of determining the strength of the relationships among terms, reflecting only positive dependences. Development of a method to select the best terms. Depending on the term, selection of a variable number of term parents.. Design of a new probability function to be evaluated in the

riginal layer of terms, removing the

✞

parameter.

✁

✂

Online World Conference on Soft Computing in Industrial Applications. – p.31/31