Crowdsourcing with Diverse Groups of Users Sara Cohen Moran - - PowerPoint PPT Presentation

β–Ά
crowdsourcing with diverse groups of users
SMART_READER_LITE
LIVE PREVIEW

Crowdsourcing with Diverse Groups of Users Sara Cohen Moran - - PowerPoint PPT Presentation

Crowdsourcing with Diverse Groups of Users Sara Cohen Moran Yashinski 1 Te Team Formation problem Example: Forming an education board ES SP, ES SP Required skills: School Principal (SP) High School teacher (HS) Bob Alice


slide-1
SLIDE 1

Crowdsourcing with Diverse Groups of Users

Sara Cohen Moran Yashinski

1

slide-2
SLIDE 2

Te Team Formation problem

2

  • Example: Forming an education board
  • Required skills:
  • School Principal (SP)
  • High School teacher (HS)
  • Elementary School teacher (ES)

SP, ES ES SP HS, ES HS HS Denise Jack Sharon Alice Chris Bob

slide-3
SLIDE 3

Te Team Formation problem with Co Commu mmunication Co Cost

3

  • Goal: Find a team that has all

required skills, while minimizing communication cost

  • Examples of communication costs
  • Distance in the social network
  • (An inverse of) the number of

papers each 2 experts published together

SP, ES ES SP HS, ES HS HS 1 2 5 4 1 2 5 4 3 2 3 4 4 1 Bob Alice Chris Jack Sharon Denise

slide-4
SLIDE 4

Re Research Question

  • What if we wanted to define diversity based on the properties?
  • Gender, Income, Age, Religion, Location, etc.
  • We would like to define target diversity function for

the different experts’ properties

  • Goal: Efficiently find a team that has all required skills,

and is as close as possible to the desired target diversity

4

slide-5
SLIDE 5

Te Team Formation with Target Diversity co constraint

5

  • Target Diversity based on Properties
  • Goal: Efficiently find a team that has all

required skills, and is as close as possible to the desired target diversity

  • 𝑬𝒋𝒕𝒖𝒔𝒋𝒄𝒗𝒖𝒋𝒑𝒐 𝑫𝒑𝒕𝒖 =

|π‘ˆπ‘“π‘π‘› 𝐸𝑗𝑀𝑓𝑠𝑑𝑗𝑒𝑧 βˆ’ π‘ˆπ‘π‘ π‘•π‘“π‘’ 𝐸𝑗𝑑𝑀𝑓𝑠𝑑𝑗𝑒𝑧|;

  • Example:
  • Gender Target Diversity:

π‘π‘π‘šπ‘“, πΊπ‘“π‘›π‘π‘šπ‘“ = 1 3 B , 2 3 B

  • Income Target Diversity :

πΌπ‘—π‘•β„Ž, 𝑁𝑓𝑒𝑗𝑣𝑛, 𝑀𝑝π‘₯ = 1 2 B , 1 4 B , 1 4 B

SP, ES ES SP HS, ES HS HS

Gender: Male Income: High Gender: Female Income: Low Gender: Female Income: High Gender: Male Income: Middle Gender: Female Income: High Gender: Male Income: Middle

slide-6
SLIDE 6

Wha What are we going ng to di disc scuss? uss?

  • Research Question: diversity based on personal properties βœ“
  • Advantages of Diversity (or.. why is it interesting?)
  • Related work
  • Algorithms and computational considerations
  • Fixed Parameters Tractable (Optimal) Algorithm
  • Greedy Approximation Algorithm
  • Experimental Results
  • Conclusions

6

slide-7
SLIDE 7

Adv Advantages s of f Div Diversit ity ( (or.. w why is is it it in inter eres estin ting?) ?)

  • Advantages in the workplace
  • Increase in productivity and creativity (innovative solutions)
  • Increase morale in workplaces
  • Positive reputation/attraction of quality human resources
  • When crowdsourcing, it is important to consider different

points of views

  • Defining the diversity of a team
  • Program committees
  • Adopting affirmative actions

7

slide-8
SLIDE 8

Re Related Work

  • Team formation with Communication Cost
  • Goal: Find a team that has all required skills, while minimizing

communication cost (e.g. Sum of Distances, Diameter)

  • Diversity in terms of social influence
  • Depends on the social influences between candidates
  • Low social influence is correlated with high productivity
  • Diversity in query answering
  • The goal is to maximize the diversity of the results
  • Diversity based on different criteria (e.g. content, novelty and coverage)

8

slide-9
SLIDE 9

Wha What ha have we achi hieved? d?

  • Finding an optimal solution is NP-complete
  • NaΓ―ve algorithm
  • Check all possible options and finds optimal solution
  • Time complexity: 𝑃( 𝐷 O 𝑇 𝑄 )
  • Intractable in practice as |𝐷| might be huge
  • Fixed Parameter Tractable (Optimal) Algorithm
  • Find an optimal solution in time complexity which is π‘žπ‘π‘šπ‘§( 𝐷 ) times

exp ( 𝑇 , 𝑄 )

  • Greedy Approximation Algorithm
  • Time complexity: π‘žπ‘π‘šπ‘§( 𝑇 , 𝐷 )
  • Guaranteed to return 1/2-approximation of the optimal solution

9

slide-10
SLIDE 10

Fix Fixed ed Pa Para rameter Tr Tractable (Op (Optimal) ) Al Algorithm hm

  • Finds optimal solution
  • Complexity time: π‘žπ‘π‘šπ‘§( 𝐷 ) times exp 𝑇 , 𝑄
  • Using preprocessed data structures in order to improve runtime

performance

  • Use the notion of Abstract (Optimal) Templates and Concrete

Templates

10

slide-11
SLIDE 11

Ab Abstract (Op (Optimal) ) Templ plates, , Co Concrete Te Templates: Example

  • One property (Gender):
  • π‘π‘π‘šπ‘“, πΊπ‘“π‘›π‘π‘šπ‘“ = W X

⁄ , ; X ⁄

  • 𝑇 = {𝑇𝑄, 𝐼𝑇, 𝐹𝑇}
  • Abstract Optimal Template
  • Achieves minimum distribution cost
  • There could be many Abstract Optimal Templates
  • Abstract Template (non optimal)
  • Concrete Templates:
  • π‘•π‘“π‘œπ‘’π‘“π‘  𝑇𝑄 = 𝐺, π‘•π‘“π‘œπ‘’π‘“π‘  𝐼𝑇 = 𝑁, π‘•π‘“π‘œπ‘’π‘“π‘  𝐹𝑇 = 𝑁
  • π‘•π‘“π‘œπ‘’π‘“π‘  𝑇𝑄 = 𝑁, π‘•π‘“π‘œπ‘’π‘“π‘  𝐼𝑇 = 𝐺, π‘•π‘“π‘œπ‘’π‘“π‘  𝐹𝑇 = 𝑁
  • π‘•π‘“π‘œπ‘’π‘“π‘  𝑇𝑄 = 𝑁, π‘•π‘“π‘œπ‘’π‘“π‘  𝐼𝑇 = 𝑁, π‘•π‘“π‘œπ‘’π‘“π‘  𝐹𝑇 = 𝐺

Male Female 2 1 Male Female 3

11

slide-12
SLIDE 12

FPT T Optimal Al Algorithm hm: Data struc uctur ures

  • Used to optimize runtime performance
  • Hashset ℍ to hold all the abstract templates
  • To avoid evaluating an abstract template more than once (very costly)
  • minHeap 𝕅 to efficiently return the abstract template which has minimum cost
  • Structure 𝕋ℙℂ
  • Calculated offline

ES HS SP M F M F M F π‘‡π‘™π‘—π‘šπ‘šπ‘‘ π‘„π‘ π‘π‘žπ‘“π‘ π‘’π‘—π‘“π‘‘ π·π‘π‘œπ‘’π‘—π‘’π‘π‘’π‘“π‘‘ Bob Jack Chris Denise Denise

12

Sharon Alice Bob

slide-13
SLIDE 13

FPT T Optimal Al Algorithm hm: Workfl kflow

Extract Abstract Template A from 𝕅 Create NEXT Abstract Templates from A If not in ℍ, insert to ℍ and 𝕅 Create Concrete Templates from A

Check in 𝕋ℙ for candidates which satisfy all concrete templates (for all properties)

If found, STOP and return

Calculate Optimal Abstract Templates and insert to ℍ and 𝕅

13

slide-14
SLIDE 14

Gr Greedy Approxim imatio tion Alg lgorith ithm

  • Time complexity: π‘žπ‘π‘šπ‘§ 𝑇 , 𝐷
  • Using sets of candidates per skill
  • Greedy solution: in each step chooses an unchosen skill and

candidate with that skill which (locally) minimizes the distribution cost

SP HS ES Bob Chris Jack Deinse Bob

14

Jack Alice Deinse

slide-15
SLIDE 15

Gr Greedy Approxim imatio tion Alg lgorith ithm (cont. t.)

  • Optimizing a function call benefit, that is inversely proportional to the

distribution cost

  • The benefit function is a monotonic submodular function and

therefore guaranteed to return 1/2-approximation of the optimal solution

slide-16
SLIDE 16

Expe Experimentation

  • Tested scalability as a function of 𝐷 , 𝑇 , 𝑄 π‘π‘œπ‘’ π‘„π‘ π‘π‘žπ‘“π‘ π‘’π‘§ π‘†π‘π‘œπ‘•π‘“
  • Default values: 𝑇 = 8, 𝑄 = 5, 𝐷 = 100𝐿, π‘„π‘ π‘π‘žπ‘“π‘ π‘’π‘§ π‘†π‘π‘œπ‘•π‘“ = 4
  • Types of synthetic datasets:
  • TC1 (random assignment)
  • Property values: assigned randomly using uniform distribution
  • Skills per candidate: randomly choosing between 1 and |𝑇| skills per candidate
  • TC2 (random assignment with 1 skill)
  • Property values: assigned randomly using uniform distribution
  • Skills per candidate: each candidate is given 1 random skill
  • TC3 (skewed distribution with 2 skills)
  • Property values and skills (2 skills per candidate) are assigned using a skewed

distribution

16

slide-17
SLIDE 17

Expe Experimentation: n: Varyi ying ng num numbe ber of f ski skills

17

0.001 0.01 0.1 1 10 100 1000 2 4 6 8 10

TC1FPT TC2FPT TC3FPT TC1Greedy TC2Greedy TC3Greedy

slide-18
SLIDE 18

Expe Experimentation: n: Varyi ying ng num numbe ber of f pr proper perties es

18

0.001 0.01 0.1 1 10 100 1000 3 5 7

TC1FPT TC2FPT TC3FPT TC1Greedy TC2Greedy TC3Greedy

slide-19
SLIDE 19

Expe Experimentation: n: Varyi ying ng num numbe ber of f ca candidates

19

0.001 0.01 0.1 1 10 100 1000 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000

TC1FPT TC2FPT TC3FPT TC1Greedy TC2Greedy TC3Greedy

slide-20
SLIDE 20

Expe Experimentation: n: Varyi ying ng pr proper perty rang nge

20

0.001 0.01 0.1 1 10 100 1000 3 4 5 6

TC1FPT TC2FPT TC3FPT TC1Greedy TC2Greedy TC3Greedy

slide-21
SLIDE 21

Expe Experimentation: n: Qua uality y of f Resul sults s (G (Greedy dy

  • Vs. FPT)

T)

TC1 TC2 TC3 Max diff 0.25 0.5 Average over all test cases 0.01 0.11 Average over test cases in which greedy didn’t return

  • ptimal result

0.25 0.29

21

TC1 TC2 TC3 Max diff 0.25 0.5 Average over all test cases 0.01 0.11 TC1 TC2 TC3 Max diff 0.25 0.5

slide-22
SLIDE 22

Co Conclusi sions

  • FPT Optimal Algorithm
  • Always returns an optimal result
  • Time increases exponentially with the number of skills, properties and property

range

  • Increasing the number of candidates doesn’t impact running time (except when the

data is skewed)

  • Might take long time to find the optimal solution (especially when the data is

skewed)

  • Outperforms the Greedy Algorithm when there is little skew in the data
  • Greedy Approximation Algorithm
  • Performs well under all types of data
  • Returns results close to optimal

22

slide-23
SLIDE 23

Questions?

23