thanks to Mausam Peng Dai Chris Lin NSF, ONR, Google Powerset Built - - PDF document

thanks to
SMART_READER_LITE
LIVE PREVIEW

thanks to Mausam Peng Dai Chris Lin NSF, ONR, Google Powerset Built - - PDF document

11/28/2012 63,000 FTE Daily Workers (at 8h/day) Crowdsourcing thanks to Mausam Peng Dai Chris Lin NSF, ONR, Google Powerset Built in 1770 by Wolfgang von Kempelen Fast & Cheap, but is it Good? [Snow et al. EMNLP 08] Your sentence is : The


slide-1
SLIDE 1

11/28/2012 1

thanks to

Crowdsourcing Mausam Peng Dai Chris Lin NSF, ONR, Google

63,000 FTE Daily Workers (at 8h/day)

Built in 1770 by Wolfgang von Kempelen

Powerset

Your sentence is: The term silver dollar is often used for any large white metal coin issued by the United States with a face value of one dollar ; although purists insist that a dollar is not silver unless it contains some of that metal . Enter one term per box.

$0.05

Fast & Cheap, but is it Good?

[Snow et al. EMNLP‐08]

slide-2
SLIDE 2

11/28/2012 2

  • Challenges

– Reliability & skill of individual workers vary – Small work units

Need for Workflows

  • Therefore

– Use workflow to aggregate results & ensure quality – Manage workers with (unreliable) workers – Accomplish large tasks from small contributions – Eg, Turkit: programming language for workflows

Complex Jobs

  • Casting Words
  • TurkIt

Iterative Improvement Iterative Improvement Clowder

  • Declarative language to specify workflows

– HTNs

AI, ML & Decision Theory … for Crowdsourcing

  • Shared models for common tasks

– Eg, voting, discrete choices, content improvement

  • Integrated modeling of workers
  • Comprehensive decision‐theoretic control
  • Probably the most important deterrent

– to wide adoption of mechanical turk

  • Recently: more spammers than usual

Quality Control & Reputation

  • Necessitates

– automatic detection of spammers – automatic rewarding of diligent workers – automatically achieving quality goals

16

slide-3
SLIDE 3

11/28/2012 3

  • Simple tasks

– Majority vote – Quality‐corrected vote based on worker parameters

  • assumptions on worker independence

Quality Control

– Learning worker parameters using gold standard – Joint estimation of votes and worker parameters

  • Complex tasks: workflows

– Decision‐theoretic control of workflows

17

Majority Voting

[Sheng et al, 2008; Snow et al, 2008]

18

Majority vote of 8 Turkers better than expert labeling

Quality‐Corrected Voting

[Clemen and Winkler, 1990] n

Assumption: workers independent of each other

# workers

19

true value ballot worker `laziness’

P(ν | b1,…,bn, γ1,…, γn) ~ P(b1,…,bn |ν, γ1,…, γn)P(ν) = P(ν) ΠiP(bi|ν, γi) Outperforms majority vote Are workers really independent?

Quality‐Corrected Voting 2

[Whitehill et al, 2009; Dai et al, 2010]

vs. vs.

Intrinsic difficulty (d) measures how hard is problem Conditional Independence

– workers independent given intrinsic difficulty

20

Easy Hard

Probability of a Correct Answer

accuracyw(d) = ½[1+(1‐d)γw]

Assume: no malevolence

11/28/2012 21 Peng Dai

Probability of a Correct Answer

Accurate voter

accuracyw(d) = ½[1+(1‐d)γw]

Assume: no malevolence

11/28/2012 22

γ = inverse diligence

slide-4
SLIDE 4

11/28/2012 4

Probability of a Correct Answer

Accurate voter

accuracyw(d) = ½[1+(1‐d)γw]

Assume: no malevolence

11/28/2012 23

Poor voter γ = inverse diligence

Probabilistic Model

n

70 75 80 cy (%)

79.3

Over 50% money saving

math very similar

24

60 65 70 1 3 5 7 9 11 Accurac Number of ballot answers Ballot Model Majority Vote

  • No labeled data
  • Joint estimation of all parameters:

EM algorithm

Unsupervised Learning

[Dawid and Sekine, 1979; Whitehill et al, 2009; Lin et al 2012; etc]

# questions

d

difficulty

  • Intuitions for EM algo

– one who commonly disagrees with others: ~spammer – one who usually agrees with others: ~good worker – as we identify some good workers, we trust them more…

27

n m

true value ballot worker laziness # workers

  • Is supervised always better than unsupervised?

– few labels (<20) per worker – average worker quality really poor – class distribution uneven

Supervised vs. Unsupervised

[Ipeirotis Blog]

– subjective task?

  • Typical scenarios: gold standard data not reqd!
  • This expt: model with independence assumption

– we still need to test observations for complex models

28

  • Simple tasks

– Majority vote – Quality‐corrected vote based on worker parameters

  • assumptions on worker independence

Quality Control

– Learning worker parameters using gold standard – Joint estimation of votes and worker parameters

  • Complex tasks: workflows

– Decision‐theoretic control of workflows – [Dai et al, 2010; Dai et al, 2011]

29

  • Dividing a complex task into smaller jobs

– information flow between these jobs

  • Examples

Workflows Change the Game

– audio transcription (CastingWords proprietary) – generating articles (Iterative improvement) – handwriting recognition (Iterative improvement) – Soylent: intelligent word processor (Find‐Fix‐Verify) – …

30

slide-5
SLIDE 5

11/28/2012 5

Iterative Improvement

[Little et al, 2010]

Mausam 31

Clowder

HTN library DT planner task models renderer

POMDPs at the core

  • Belief states = distribution over world states
  • Actions = probabilistic transitions

y rendered job user models learner executor worker marketplace

Decision‐Theoretic Execution Control

HTN library DT planner rendered job task models user models renderer learner executor worker marketplace

TurKontrol of Iterative Improvement

G t Generate Update M b N 35 Improvement needed? Generate improvement HIT Generate ballot HIT Update quality estimates More voting needed? Y Y N 11/28/2012 Peng Dai

TurKontrol Process

G t Generate Update M b N 36 Improvement needed? Generate improvement HIT Generate ballot HIT Update quality estimates More voting needed? Y Y N 11/28/2012 Peng Dai

TurKontrol Process

G t Generate Update M b N 37 Improvement needed? Generate improvement HIT Generate ballot HIT Update quality estimates More voting needed? Y Y N 11/28/2012 Peng Dai

slide-6
SLIDE 6

11/28/2012 6

TurKontrol Process

G t Generate Update M b N 38 Improvement needed? Generate improvement HIT Generate ballot HIT Update quality estimates More voting needed? Y Y N 11/28/2012 Peng Dai

TurKontrol Process

G t Generate Update M b N 39 Improvement needed? Generate improvement HIT Generate ballot HIT Update quality estimates More voting needed? Y Y N 11/28/2012 Peng Dai

Cost (equal quality)

TurKontrol Static

0 65 0.7 0.75 0.8

28 7% more

44

0.5 0.55 0.6 0.65

28.7% more money

  • 7 images: TurKontrol fewer iterations than static

– 6 of those resulted in higher quality!!

Anecdotal Observations

  • once: TurKontrol trusted the first vote

– the worker was known to be higher quality

  • intelligent ballot use

45

Observation: Ballot Use

TurKontrol HandCoded

slide-7
SLIDE 7

11/28/2012 7

Clowder

HTN library DT planner task models renderer y rendered job user models learner executor worker marketplace

Hierarchical Task Networks

ala [Shahaf & Horvitz AAAI‐10]

  • Partially‐ordered set of tasks

 Parallel execution

  • Recursive expansion

– Preconditions & resources – Eg, availability of workers with required skills

HTN library DT planner rendered job task models user models renderer learner executor worker marketplace

Eg, availability of workers with required skills

Translate AB HumanTr AB Choose HumanTr AB MachineTr AB ProofRd Translate AC Translate CB Find Fix Verify

Synergy from Switching Models

  • Can be better to use `worse’ model
  • Insight from [Grier HCOMP‐11]

Example Task: Named‐Entity Recognition

Only two states ‐‐ Vermont and Washington ‐‐ this year joined five others requiring private employers to grant leaves of absence to employees with newborn or adopted infants

Which of the following sets of tags best describes the word "Washington" in the Which of the following Wikipedia articles defines the word “Washington” in exactly the way it is used in the above sentence? Washington http://en.wikipedia.org/wiki/Washington h f ll h f g way it is used in the above sentence? Washington, D.C., formally the District of Columbia and commonly referred to as Washington, "the District", or simply D.C., is the capital of the United States.... Washington (state) http://en.wikipedia.org/wiki/Washington_(state) Washington () is a state in the Pacific Northwest region of the United States located north of Oregon, west of Idaho and south of the Canadian province of British Columbia, on the coast of the Pacific Ocean.... location us_county location citytown

DT Planner Control

Workers

Update estimates

N Y

Choose the

k

START task Ready to Submit most likely answer

Plan workflow k bk bw

about workflow difficulty and correct answer best next workflow Ready to submit?

workflow k using original DT Planner

Translate AB HumanTr AB Choose HumanTr AB MachineTr AB ProofRd Translate AC Translate CB

New Worker Model

v

W

T

# workers # tasks (questions)

b γ d

W

K

δ

# workflows

slide-8
SLIDE 8

11/28/2012 8

Experiments

  • Training Data:

– 50 NER Tasks

  • 40 Wikipedia jobs and 40 direct tagging jobs
  • 1000 simulations
  • 106 NER Tasks using Mechanical Turk
  • every workflow needs AI support

– optimal pricing – optimal parameter estimation – optimal control

Research Agenda

– comparison between multiple workflows for a task

  • designing a generalized workflow optimizer

– HTN language: express a workflow in the language – automatically optimize parameters and control

60