[PPT] - Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel PowerPoint Presentation

SLIDE 1

Top-k Queries over Uncertain Scores

Qing Liu, Debabrota Basu, Talel Abdessalem, St´ ephane Bressan

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 1 / 19

SLIDE 2

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Modern recommendation systems leverage some forms of

collaborative user (crowd) sourced collection of information.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 2 / 19

SLIDE 3

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Modern recommendation systems leverage some forms of

collaborative user (crowd) sourced collection of information.

◮ Crowdsourcing Platforms

◮ easily announce their needs to the crowd / get access to the

information they need

◮ choose the highest quality / most competitively priced CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 2 / 19

SLIDE 4

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Modern recommendation systems leverage some forms of

collaborative user (crowd) sourced collection of information.

◮ Crowdsourcing Platforms

◮ easily announce their needs to the crowd / get access to the

information they need

◮ choose the highest quality / most competitively priced

◮ Examples: TripAdvisor

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 2 / 19

SLIDE 5

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Modern recommendation systems leverage some forms of

collaborative user (crowd) sourced collection of information.

◮ Crowdsourcing Platforms

◮ easily announce their needs to the crowd / get access to the

information they need

◮ choose the highest quality / most competitively priced

◮ Examples: TripAdvisor

◮ collaborative user or crowdsourced collection of information,

e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 2 / 19

SLIDE 6

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Crowdsourcing and Collaborative Economy:

◮ communities or crowds rent, share, sell products or services CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 3 / 19

SLIDE 7

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Crowdsourcing and Collaborative Economy:

◮ communities or crowds rent, share, sell products or services CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 3 / 19

SLIDE 8

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Crowdsourcing and Collaborative Economy:

◮ communities or crowds rent, share, sell products or services CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 3 / 19

SLIDE 9

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Crowdsourcing and Collaborative Economy:

◮ communities or crowds rent, share, sell products or services CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 3 / 19

SLIDE 10

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Independent collection of information → uncertainty and

diversity.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 4 / 19

SLIDE 11

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Independent collection of information → uncertainty and

diversity.

◮ Objects (services, vacation rentals and restaurants...) have

uncertain scores (quality, price...).

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 4 / 19

SLIDE 12

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Independent collection of information → uncertainty and

diversity.

◮ Objects (services, vacation rentals and restaurants...) have

uncertain scores (quality, price...).

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 4 / 19

SLIDE 13

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Independent collection of information → uncertainty and

diversity.

◮ Objects (services, vacation rentals and restaurants...) have

uncertain scores (quality, price...).

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 4 / 19

SLIDE 14

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Independent collection of information → uncertainty and

diversity.

◮ Objects (services, vacation rentals and restaurants...) have

uncertain scores (quality, price...).

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 4 / 19

SLIDE 15

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Ranking is one of the building blocks of recommendation. ◮ A top-k query returns the sequence of the k objects with the

highest scores, given a database of objects ranked by their scores for the feature of interest.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 5 / 19

SLIDE 16

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Ranking is one of the building blocks of recommendation. ◮ A top-k query returns the sequence of the k objects with the

highest scores, given a database of objects ranked by their scores for the feature of interest.

◮ Price of the apartments.

2000 3000 4000 5000 6000 0.2 0.4 0.6 0.8 1 1.2 1.4 x 10

−3

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 5 / 19

SLIDE 17

Top-k Queries over Uncertain Scores Introduction

Introduction

◮ Ranking is one of the building blocks of recommendation. ◮ A top-k query returns the sequence of the k objects with the

highest scores, given a database of objects ranked by their scores for the feature of interest.

◮ Price of the apartments.

2000 3000 4000 5000 6000 0.2 0.4 0.6 0.8 1 1.2 1.4 x 10

−3

◮ With uncertain scores, a top-k query can only return an

uncertain result.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 5 / 19

SLIDE 18

Top-k Queries over Uncertain Scores Related Work

Related Work

◮ Soliman, Hyas and Ben-David [Soliman and Ilyas, 2009] study

top-k queries over objects with uncertain scores given as probability distributions.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 6 / 19

SLIDE 19

Top-k Queries over Uncertain Scores Related Work

Related Work

◮ Soliman, Hyas and Ben-David [Soliman and Ilyas, 2009] study

top-k queries over objects with uncertain scores given as probability distributions.

◮ In this paper, we consider probabilistic top-k queries under

the top-k semantics as in [Soliman and Ilyas, 2009].

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 6 / 19

SLIDE 20

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects;

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 21

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects; ◮ s(oi): the score of an object oi ∈ O;

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 22

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects; ◮ s(oi): the score of an object oi ∈ O; ◮ Xi: a random variable, equals to s(oi);

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 23

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects; ◮ s(oi): the score of an object oi ∈ O; ◮ Xi: a random variable, equals to s(oi); ◮ fi: bounded continuous probability density function of Xi;

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 24

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects; ◮ s(oi): the score of an object oi ∈ O; ◮ Xi: a random variable, equals to s(oi); ◮ fi: bounded continuous probability density function of Xi; ◮ π(k) = [o1, · · · , ok]: sequence of k objects in O;

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 25

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects; ◮ s(oi): the score of an object oi ∈ O; ◮ Xi: a random variable, equals to s(oi); ◮ fi: bounded continuous probability density function of Xi; ◮ π(k) = [o1, · · · , ok]: sequence of k objects in O; ◮ Pr(π(k)): probability of π(k) be the top-k sequence;

Pr(π(k)) = ∞

−∞

x1

−∞

· · · xk

−∞

f1(x1) · · · fn(xn) dxn · · · dx1 (1)

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 26

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

◮ O: a set of n objects; ◮ s(oi): the score of an object oi ∈ O; ◮ Xi: a random variable, equals to s(oi); ◮ fi: bounded continuous probability density function of Xi; ◮ π(k) = [o1, · · · , ok]: sequence of k objects in O; ◮ Pr(π(k)): probability of π(k) be the top-k sequence;

Pr(π(k)) = ∞

−∞

x1

−∞

· · · xk

−∞

f1(x1) · · · fn(xn) dxn · · · dx1 (1)

◮ (Objective:) Probabilistic top-k sequence: the π(k) that

maximizes Pr(π(k)).

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 7 / 19

SLIDE 27

Top-k Queries over Uncertain Scores Solutions

Solutions

◮ Naive: calculate Pr(π(k)) for every possible sequence π(k) and

returning the π(k) with the highest Pr(π(k)).

◮

n! (n−k)! possible sequences to examine.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 8 / 19

SLIDE 28

Top-k Queries over Uncertain Scores Solutions

Solutions

◮ Naive: calculate Pr(π(k)) for every possible sequence π(k) and

returning the π(k) with the highest Pr(π(k)).

◮

n! (n−k)! possible sequences to examine.

◮ Branch-and-Bound [Soliman et al., 2010]: Prune some π(k)s.

◮ Worst case:

n! (n−k)! possible sequences to examine.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 8 / 19

SLIDE 29

Top-k Queries over Uncertain Scores Solutions

Solutions

◮ Naive: calculate Pr(π(k)) for every possible sequence π(k) and

returning the π(k) with the highest Pr(π(k)).

◮

n! (n−k)! possible sequences to examine.

◮ Branch-and-Bound [Soliman et al., 2010]: Prune some π(k)s.

◮ Worst case:

n! (n−k)! possible sequences to examine.

◮ Soliman’s Algorithm [Soliman et al., 2010]: searches the space of

candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 8 / 19

SLIDE 30

Top-k Queries over Uncertain Scores Solutions

Solutions

◮ Naive: calculate Pr(π(k)) for every possible sequence π(k) and

returning the π(k) with the highest Pr(π(k)).

◮

n! (n−k)! possible sequences to examine.

◮ Branch-and-Bound [Soliman et al., 2010]: Prune some π(k)s.

◮ Worst case:

n! (n−k)! possible sequences to examine.

◮ Soliman’s Algorithm [Soliman et al., 2010]: searches the space of

candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.

◮ In this paper, we explore the variants of Markov chain Monte

Carlo algorithms.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 8 / 19

SLIDE 31

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Soliman’s Algorithm

◮ Initial state: a rank over the n objects

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝2 < 𝑝3)

𝑝1 𝑝3 𝑝2 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝2 < 𝑝4) 𝑙

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝6 > 𝑝5)

𝑝1 𝑝2 𝑝3 𝑝4 𝑝6 𝑝5 𝑝7

Pr(𝑝6 > 𝑝4)

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 9 / 19

SLIDE 32

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Soliman’s Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝2 < 𝑝3)

𝑝1 𝑝3 𝑝2 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝2 < 𝑝4) Top-𝑙

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝6 > 𝑝5)

𝑝1 𝑝2 𝑝3 𝑝4 𝑝6 𝑝5 𝑝7

Pr(𝑝6 > 𝑝4)

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 9 / 19

SLIDE 33

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Soliman’s Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝2 < 𝑝3)

𝑝1 𝑝3 𝑝2 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝2 < 𝑝4) Top-𝑙

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Pr(𝑝6 > 𝑝5)

𝑝1 𝑝2 𝑝3 𝑝4 𝑝6 𝑝5 𝑝7

Pr(𝑝6 > 𝑝4)

◮ Acceptance Probability: α = min(

P r(π(k)

t+1)·P r(πt|πt+1)

P r(π(k)

t

)·P r(πt+1|πt) , 1)

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 9 / 19

SLIDE 34

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Swap and SwapEXP Algorithm

◮ Initial state: a rank over the n objects

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

𝑙

𝑝1 𝑝5 𝑝3 𝑝4 𝑝2 𝑝6 𝑝7

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 10 / 19

SLIDE 35

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Swap and SwapEXP Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Top-𝑙

𝑝1 𝑝5 𝑝3 𝑝4 𝑝2 𝑝6 𝑝7

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 10 / 19

SLIDE 36

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Swap and SwapEXP Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Top-𝑙

𝑝1 𝑝5 𝑝3 𝑝4 𝑝2 𝑝6 𝑝7

◮ Acceptance Probability:

Swap: α = min(

P r(π(k)

t+1)· 1 kn

P r(π(k)

t

)· 1

kn =

P r(π(k)

t+1)

P r(π(k)

t

) , 1)

SwapEXP: α = min(

P r(π(k)

t+1)

P r(π(k)

t

) = exp(β(Pr(π(k) t+1) − Pr(π(k) t

))), 1) ( Pr(π(k)) = C−1

β

exp(βPr(π(k))))

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 10 / 19

SLIDE 37

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ Swap and SwapEXP Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑝6 𝑝7

Top-𝑙

𝑝1 𝑝5 𝑝3 𝑝4 𝑝2 𝑝6 𝑝7

◮ Acceptance Probability:

Swap: α = min(

P r(π(k)

t+1)· 1 kn

P r(π(k)

t

)· 1

kn =

P r(π(k)

t+1)

P r(π(k)

t

) , 1)

SwapEXP: α = min(

P r(π(k)

t+1)

P r(π(k)

t

) = exp(β(Pr(π(k) t+1) − Pr(π(k) t

))), 1) ( Pr(π(k)) = C−1

β

exp(βPr(π(k))))

◮ SwapEXP is more likely to reject the “worse” candidate state.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 10 / 19

SLIDE 38

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ ReSample and ReSampleEXP Algorithm

◮ Initial state: a rank over the n objects

𝑝1: 9 𝑝2: 8 𝑝3: 6 𝑝4: 5 𝑝5: 4 𝑝6: 3 𝑝7: 2

𝑙

𝑝1: 9 𝑝2: 8 𝑝5: 7 𝑝3: 6 𝑝4: 5 𝑝6: 3 𝑝7: 2

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 11 / 19

SLIDE 39

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ ReSample and ReSampleEXP Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1: 9 𝑝2: 8 𝑝3: 6 𝑝4: 5 𝑝5: 4 𝑝6: 3 𝑝7: 2

Top-𝑙

𝑝1: 9 𝑝2: 8 𝑝5: 7 𝑝3: 6 𝑝4: 5 𝑝6: 3 𝑝7: 2

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 11 / 19

SLIDE 40

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ ReSample and ReSampleEXP Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1: 9 𝑝2: 8 𝑝3: 6 𝑝4: 5 𝑝5: 4 𝑝6: 3 𝑝7: 2

Top-𝑙

𝑝1: 9 𝑝2: 8 𝑝5: 7 𝑝3: 6 𝑝4: 5 𝑝6: 3 𝑝7: 2

◮ Acceptance Probability:

ReSample: α = min(

P r(π(k)

t+1)·P r(πt|πt+1)

P r(π(k)

t

)·P r(πt+1|πt) , 1)

ReSampleEXP: α = min( P r(πt|πt+1)

P r(πt+1|πt) · exp(β(Pr(π(k) t+1) − Pr(π(k) t

))) , 1).

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 11 / 19

SLIDE 41

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ ReSampleAll Algorithm

◮ Initial state: a rank over the n objects

𝑝1: 9 𝑝2: 8 𝑝3: 6 𝑝4: 5 𝑝5: 4 𝑝6: 3 𝑝7: 2

𝑙

𝑝3: 10 𝑝2: 9 𝑝5: 8 𝑝1: 6 𝑝7: 4 𝑝6: 3 𝑝4: 2

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 12 / 19

SLIDE 42

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ ReSampleAll Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1: 9 𝑝2: 8 𝑝3: 6 𝑝4: 5 𝑝5: 4 𝑝6: 3 𝑝7: 2

Top-𝑙

𝑝3: 10 𝑝2: 9 𝑝5: 8 𝑝1: 6 𝑝7: 4 𝑝6: 3 𝑝4: 2

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 12 / 19

SLIDE 43

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

◮ ReSampleAll Algorithm

◮ Initial state: a rank over the n objects ◮ Candidate State:

𝑝1: 9 𝑝2: 8 𝑝3: 6 𝑝4: 5 𝑝5: 4 𝑝6: 3 𝑝7: 2

Top-𝑙

𝑝3: 10 𝑝2: 9 𝑝5: 8 𝑝1: 6 𝑝7: 4 𝑝6: 3 𝑝4: 2

◮ Acceptance Probability: ReSample: α = 1 CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 12 / 19

SLIDE 44

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

◮ Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3 median score G(0.5, 0.05) G(0.5, 0.2) U[0, 1] width G(0.5, 0.05) G(0.5, 0.2) U[0, 1]

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 13 / 19

SLIDE 45

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

◮ Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3 median score G(0.5, 0.05) G(0.5, 0.2) U[0, 1] width G(0.5, 0.05) G(0.5, 0.2) U[0, 1]

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

◮ default: uniform score distributions, median score of oi:

li+ui 2

, width: ui − li

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 13 / 19

SLIDE 46

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

◮ Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3 median score G(0.5, 0.05) G(0.5, 0.2) U[0, 1] width G(0.5, 0.05) G(0.5, 0.2) U[0, 1]

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

◮ default: uniform score distributions, median score of oi:

li+ui 2

, width: ui − li

◮ Metrics

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 13 / 19

SLIDE 47

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

◮ Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3 median score G(0.5, 0.05) G(0.5, 0.2) U[0, 1] width G(0.5, 0.05) G(0.5, 0.2) U[0, 1]

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

◮ default: uniform score distributions, median score of oi:

li+ui 2

, width: ui − li

◮ Metrics

◮ Probability of the Probabilistic top-k sequence (higher →

better)

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 13 / 19

SLIDE 48

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

◮ Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3 median score G(0.5, 0.05) G(0.5, 0.2) U[0, 1] width G(0.5, 0.05) G(0.5, 0.2) U[0, 1]

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

◮ default: uniform score distributions, median score of oi:

li+ui 2

, width: ui − li

◮ Metrics

◮ Probability of the Probabilistic top-k sequence (higher →

better)

◮ Convergence of the Markov chains (Gelman-Rubin

Convergence Diagnostic)

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 13 / 19

SLIDE 49

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

◮ Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3 median score G(0.5, 0.05) G(0.5, 0.2) U[0, 1] width G(0.5, 0.05) G(0.5, 0.2) U[0, 1]

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8

◮ default: uniform score distributions, median score of oi:

li+ui 2

, width: ui − li

◮ Metrics

◮ Probability of the Probabilistic top-k sequence (higher →

better)

◮ Convergence of the Markov chains (Gelman-Rubin

Convergence Diagnostic)

◮ Efficiency (Complexity and runtime) CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 13 / 19

SLIDE 50

Top-k Queries over Uncertain Scores Performance Evaluation Effectiveness of Six Algorithms

Effectiveness of Six Algorithms (Probability)

5 10 x 10

4

10

−8

10

−7

10

−6

10

−5

Chain Length Probability

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(a) Dataset5

5 10 x 10

4

10

−8

10

−7

10

−6

10

−5

Chain Length Probability

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(b) Dataset21

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6

(c) Dataset5

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5

(d) Dataset21

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 14 / 19

SLIDE 51

Top-k Queries over Uncertain Scores Performance Evaluation Convergence of Six Algorithms

Convergence of the Markov Chains

5 10 x 10

4

2 4 6 8 10 Chain Length Gelman−Rubin Diagnostic Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(e) Dataset5

5 10 x 10

4

2 4 6 8 10 Chain Length Gelman−Rubin Diagnostic Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(f) Dataset21

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6

(g) Dataset5

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 2.5

(h) Dataset21

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 15 / 19

SLIDE 52

Top-k Queries over Uncertain Scores Performance Evaluation Efficiency

Efficiency

Table: Worst Case Time Complexity of Generating Next State

Soliman Swap(EXP) ReSample(EXP) ReSampleAll Time Complexity O(nk) O(1) O(n) O(nlogk)

Table: Runtime Per Step of the Algorithms (seconds)

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll Runtime Per Step 0.0058 1.9128 0.1163 0.0523 0.0071 0.9056 CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 16 / 19

SLIDE 53

Top-k Queries over Uncertain Scores Conclusion

Conclusion

◮ We explore the design space for Metropolis-Hastings Markov

chain Monte Carlo algorithms.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 17 / 19

SLIDE 54

Top-k Queries over Uncertain Scores Conclusion

Conclusion

◮ We explore the design space for Metropolis-Hastings Markov

chain Monte Carlo algorithms.

◮ We verify through extensive experiments that the proposed

algorithms are more effective than the state of the art approach.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 17 / 19

SLIDE 55

Top-k Queries over Uncertain Scores Conclusion

Conclusion

◮ We explore the design space for Metropolis-Hastings Markov

chain Monte Carlo algorithms.

◮ We verify through extensive experiments that the proposed

algorithms are more effective than the state of the art approach.

◮ ReSampleAll is the best, since it samples directly from the

target distribution instead of depending on “local” information.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 17 / 19

SLIDE 56

Top-k Queries over Uncertain Scores Q/A

Thank you! Questions? Top-k Queries Uncertain Scores MCMC liuqing@u.nus.edu

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 18 / 19

SLIDE 57

Top-k Queries over Uncertain Scores References

References I

Soliman, M. A. and Ilyas, I. F. (2009). Ranking with uncertain scores. In ICDE, pages 317–328. Soliman, M. A., Ilyas, I. F., and Ben-David, S. (2010). Supporting ranking queries on uncertain and incomplete data. The VLDB Journal, 19(4):477–501.

CoopIS 2016 Qing Liu et.al. Top-k Queries over Uncertain Scores 19 / 19