Online Collaborative Prediction of Regional Vote Results Vincent - - PowerPoint PPT Presentation

online collaborative prediction of regional vote results
SMART_READER_LITE
LIVE PREVIEW

Online Collaborative Prediction of Regional Vote Results Vincent - - PowerPoint PPT Presentation

Online Collaborative Prediction of Regional Vote Results Vincent Etter, Emtiyaz Khan, Mattias Grossglauser, Patrick Thiran DSAA October 17, 2016 Montral, Canada Data Opportunity Many countries adopt open government initiatives


slide-1
SLIDE 1

Online Collaborative Prediction

  • f Regional Vote Results

Vincent Etter, Emtiyaz Khan, Mattias Grossglauser, Patrick Thiran DSAA — October 17, 2016 — Montréal, Canada

slide-2
SLIDE 2

Data Opportunity

  • Many countries adopt open government initiatives
  • Several datasets published
  • Demographics
  • State affairs
  • Votes and elections
  • Unique opportunity
  • Get a better understanding
  • Build tools useful to others

2

slide-3
SLIDE 3

Voting Data

  • News agencies, political parties, and polling institutes are all

interested in understanding voting behaviors

  • Will the next vote pass easily?
  • What makes two regions vote similarly?
  • Where should we focus our efforts?

3

slide-4
SLIDE 4

Dataset

  • Vote results from Switzerland
  • Issue votes between 1981 and 2014
  • Outcome (% of “yes”) at the municipality level
  • 281 votes
  • 13 features: voting recommendation of the main parties
  • 2352 regions
  • 25 features: languages spoken, demographics, etc.

4

Data available at http://vincent.etter.io/dsaa16

slide-5
SLIDE 5

Similarities Between Results

5

slide-6
SLIDE 6

Online Predictions

  • On the day of the vote, regional results are released in

sequence

  • Use published results to predict others
  • … and refine the prediction as more results are published?

6

slide-7
SLIDE 7

Our Approach

  • Use a matrix-factorization model to capture the bi-clustering
  • Add region and vote features
  • Reduce the cold-start problem
  • More interpretable
  • Build the model incrementally to assess the effect of each

component

7

slide-8
SLIDE 8

Our Model

8

ydn = zdn + ✏

zdn = µn + fn(xd) + fd(wn) + vT

d un

bias regression

  • n region

regression

  • n vote

matrix factorization

slide-9
SLIDE 9

Our Models

9

zdn = µn + fn(xd) + fd(wn) + vT

d un

γT

d wn

zdn = µn +

LIN(v)

γT

d wn

zdn = µn + + βT

n xd

LIN(r) + LIN(v)

zdn = µn +

vT

d un

+

MF

βT

n xd

zdn = µn + vT

d un

+

MF + LIN(r)

zdn = µn + vT

d un

+ GP(xd)

MF + GP(r)

zdn = µn +

vT

d un

+

GP(xd)

γT

d wn

+

MF + GP(r) + LIN(v) LIN(r) zdn = µn +

βT

n xd

λβ, λγ, λu, λv θ, σs, λγ

slide-10
SLIDE 10

Performance Evaluation

  • Last 50 votes as test data
  • Simulate 500 random reveal order
  • Last 10% of regions as test regions
  • Observe increasing number of regions
  • Predict result of test regions

10

slide-11
SLIDE 11

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] LIN(r)

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] LIN(r) MF

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] LIN(r) MF MF + LIN(r)

Results

11

slide-12
SLIDE 12

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] MF + LIN(r)

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] MF + LIN(r) M F + G P ( r )

Bayesian VS Non-Bayesian

12

slide-13
SLIDE 13

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] LIN(v) M F + G P ( r )

100 101 102 103

Number of observed regions

5 6 7 8 9 10 11 12 13

RMSE on the last 10 % of regions [%] LIN(v) M F + G P ( r ) MF + GP(r) + LIN(v)

Final Model

13

slide-14
SLIDE 14

Interpretation

14

0.0 0.2 0.4 0.6 0.8 1.0

Relative importance

Speaks Italian Speaks German Population Jobs Speaks Romansh Population density Speaks French Age 65+ Social aid Foreigners Election PEV Election FDP Election GL Election Greens Age 0-19 Election SP Elevation Election PST Election other right Age 20-64 Election SVP Election BDP Election CVP y x

Röstigraben

slide-15
SLIDE 15

Summary

  • Individual models have different strengths
  • Vote features regression for cold start
  • Region features and bi-clustering when more observations
  • Bayesian methods are useful
  • Proper hyperparameters setting
  • Accurate and interpretable results

15

slide-16
SLIDE 16

Thank you!

Code and data available at http://vincent.etter.io/dsaa16 Any questions?

16