Using Social Media for Health Studies Ingmar Weber Social - - PowerPoint PPT Presentation

using social media for health studies
SMART_READER_LITE
LIVE PREVIEW

Using Social Media for Health Studies Ingmar Weber Social - - PowerPoint PPT Presentation

Using Social Media for Health Studies Ingmar Weber Social Computing, Qatar Computing Research Institute @ingmarweber My Journey My Journey My Journey My Journey My Journey My Journey My Journey My Journey My Journey Treat all


slide-1
SLIDE 1

Using Social Media for Health Studies

Ingmar Weber Social Computing, Qatar Computing Research Institute @ingmarweber

slide-2
SLIDE 2

My Journey

slide-3
SLIDE 3

My Journey

slide-4
SLIDE 4

My Journey

slide-5
SLIDE 5

My Journey

slide-6
SLIDE 6

My Journey

slide-7
SLIDE 7

My Journey

slide-8
SLIDE 8

My Journey

slide-9
SLIDE 9

My Journey

slide-10
SLIDE 10

My Journey

Treat all correlations in this presentation with caution

slide-11
SLIDE 11

Social Media and Healthcare

slide-12
SLIDE 12

Social Media and Healthcare

slide-13
SLIDE 13

Social Media and Healthcare

slide-14
SLIDE 14

Social Media and Healthcare

slide-15
SLIDE 15

Social Media and Healthcare

Using Social Media as a Communication Channel

slide-16
SLIDE 16

Social Media as a Data Source

  • Part 1: Three Example Studies

– Twitter Flu Trend – Lifestyle and Correlates of Health – Studying Obesity Through Food Tweets

  • Part 2: Opportunities and Challenges

– Image Analysis – Network Influence – Social Media Meets Quantified Self – Interventions for Individual Health

slide-17
SLIDE 17

Classification of Health Research

Acute condition

Short-term concerns

Chronic condition

Long-term concerns

Public health

Population-centric Campaigns + policies influenza tracking, flu trends, disease

  • utbreaks, …

Obesity trends, diabetes, alcohol consumption, HIV, …

Individual health

Individual-centric Treatment + therapies Nothing? SM forums/messages as interventions

slide-18
SLIDE 18

Classification of Health Research

Acute condition

Short-term concerns

Chronic condition

Long-term concerns

Public health

Population-centric Campaigns + policies influenza tracking, flu trends, disease

  • utbreaks, …

Obesity trends, diabetes, alcohol consumption, HIV, …

Individual health

Individual-centric Treatment + therapies Nothing? SM forums/messages as interventions

slide-19
SLIDE 19

Classification of Health Research

Acute condition

Short-term concerns

Chronic condition

Long-term concerns

Public health

Population-centric Campaigns + policies influenza tracking, flu trends, disease

  • utbreaks, …

Obesity trends, diabetes, alcohol consumption, HIV, …

Individual health

Individual-centric Treatment + therapies Nothing? SM forums/messages as interventions

slide-20
SLIDE 20

Classification of Health Research

Acute condition

Short-term concerns

Chronic condition

Long-term concerns

Public health

Population-centric Campaigns + policies influenza tracking, flu trends, disease

  • utbreaks, …

Obesity trends, diabetes, alcohol consumption, HIV, …

Individual health

Individual-centric Treatment + therapies Nothing? SM forums/messages as interventions

slide-21
SLIDE 21

Classification of Health Research

Acute condition

Short-term concerns

Chronic condition

Long-term concerns

Public health

Population-centric Campaigns + policies influenza tracking, flu trends, disease

  • utbreaks, …

Obesity trends, diabetes, alcohol consumption, HIV, …

Individual health

Individual-centric Treatment + therapies Nothing? SM forums/messages as interventions

slide-22
SLIDE 22

Why Bother with Social Media?

  • Lots of it

– Often also across countries

  • Cheap to collect

– Keyword/geographic-based collection standard

  • (Semi-)Longitudinal data

– Last 3,200 tweets, more for money

  • Social network data

– Usually not part of surveys

  • Lifestyle data

– Lifestyle diseases, public health

Later: Not

slide-23
SLIDE 23

Example 1:

National and Local Influenza Surveillance through Twitter: An Analysis of the 2012- 2013 Influenza Epidemic

David Broniatowski, Michael Paul, Mark Dredze PLOS ONE, Dec 2013

slide-24
SLIDE 24

Using Google to Track Flu Epidemics

slide-25
SLIDE 25

Using Google to Track Flu Epidemics

slide-26
SLIDE 26

Using Google to Track Flu Epidemics

slide-27
SLIDE 27

Using Google to Track Flu Epidemics

Can Twitter give a

  • more transparent prediction?
  • more robust prediction (re context)?
slide-28
SLIDE 28

Can We Do it (Better?) With Twitter?

  • Many people have tried

– 40+ papers on the topic

  • Typically a straightforward setup

– Collect Twitter data for a set of keywords (fever, …) – Do some post-filtering (Saturday Night Fever) – Show temporal correlation/predictive power

  • Major weaknesses

– Only work with a single flu season – Done in retrospect (hard to get historical data)

slide-29
SLIDE 29

Recent Breakthrough?

slide-30
SLIDE 30

How It Works

slide-31
SLIDE 31

How It Works

Tokens + SVM

slide-32
SLIDE 32

How It Works

Tokens + SVM

Word classes (noun, …) RT, @, Emoticons Part-of-Speech tagging Verb-phrases Pairs with pronouns Verb-noun pairs … Log-linear w/ L2 regulariz.

slide-33
SLIDE 33

How It Works

Tokens + SVM

Word classes (noun, …) RT, @, Emoticons Part-of-Speech tagging Verb-phrases Pairs with pronouns Verb-noun pairs … Log-linear w/ L2 regulariz.

slide-34
SLIDE 34

How It Works

Tokens + SVM

Word classes (noun, …) RT, @, Emoticons Part-of-Speech tagging Verb-phrases Pairs with pronouns Verb-noun pairs … Log-linear w/ L2 regulariz.

slide-35
SLIDE 35

How It Works

Tokens + SVM

Word classes (noun, …) RT, @, Emoticons Part-of-Speech tagging Verb-phrases Pairs with pronouns Verb-noun pairs … Log-linear w/ L2 regulariz.

US-level: r = 0.93, p < .001 NYC-level: r = 0.88, p < .001

slide-36
SLIDE 36

Example 2:

Modeling the Impact of Lifestyle on Health at Scale

Adam Sadilek, Henry Kautz WSDM’13

slide-37
SLIDE 37

Geo-Tagged “Sick” Tweets from NYC

slide-38
SLIDE 38

Geo-Tagged “Sick” Tweets from NYC

What determines how healthy/sick a person is?

  • Socio-economic variables?
  • Social status?
  • Mobility patterns?
slide-39
SLIDE 39

Data Collection

  • May 19 – June 19, 2010
  • periodically queried Twitter r=100km of NYC

– Re Twitter streaming API?

  • 16 million tweets, 630k unique users
  • 6,237 users with 100+ geo-tagged tweets
slide-40
SLIDE 40

Sick-or-Not SVM Classifier

  • Cast to lower case & basic “cleaning”
  • Extract uni-, bi- and tri-grams
  • 5 MT workers label “sick” or “other”
  • Train an SVM
  • .98 precision, .97 recall (class distribution?)
  • Convert SVM output to probability (Platt?)
  • Probability of u’s message being “sick”
slide-41
SLIDE 41

Discriminative Features

slide-42
SLIDE 42

Variables to Study

  • “Physical encounters”

– <100 m within 1, 4, 24 hours

  • Sick friends (mutual following)
  • 25k Google Places

– Bars, nights clubs, transit stations, parks, gyms – Tweeting within 100m of venue

  • Pollution
  • Socio-economic indicators

Predict PS using these variables

slide-43
SLIDE 43

Correlation With Health (-PS)

slide-44
SLIDE 44

Grouped by Variable Class

slide-45
SLIDE 45

Example 3:

You Tweet What You Eat: Studying Food Consumption Through Twitter

Sofiane Abbar, Yelena Mejova, Ingmar Weber CHI’15

slide-46
SLIDE 46

“Pointless Babble” == Great Data!

“Twitter Study Reveals Interesting Results - About Usage 40% is Pointless Babble” (Pear Analytics, 2009)

slide-47
SLIDE 47

“Pointless Babble” == Great Data!

“Twitter Study Reveals Interesting Results - About Usage 40% is Pointless Babble” (Pear Analytics, 2009)

Can we use food tweets to study obesity patterns?

slide-48
SLIDE 48

Data Collection

  • Streaming API filter for “eat”, “cook”, “lunch”, …
  • Collect 50M tweets during Nov 2013
  • 892K geo-tagged tweets from 400K users

– Use (lat, long) to map to ZIP and census data – Get data for 210K random user subset

  • 3,200 public tweets, profile, friends, followers
  • 503M tweets, 32M distinct friends
  • Label eat-co-occurring terms as “is food”

– 460 uni- and bigrams with mapping to calories – Pizza 478, fruit salad 99, … [link]

  • Average calories for users
slide-49
SLIDE 49

Calories vs. Obesity

slide-50
SLIDE 50

Calories vs. Obesity

slide-51
SLIDE 51

Zooming-In to Counties

  • Try to predict county-level obesity

– avCal – Food names – LIWC categories (re Culotta’14) – Demographic

  • Ridge regression with 5-fold cross validation
slide-52
SLIDE 52

Prediction Performance

slide-53
SLIDE 53

Social Network Effects

  • Call a user in predicted top 10% “active”
slide-54
SLIDE 54

Example n:

Lots of Studies

Lots of People Lots of Venues

slide-55
SLIDE 55

More Example Domains

  • Finding Adverse Drug Reactions (ADRs)
  • Tracking mental health
  • Dedicated social media such as forums
  • Social media for health communication
slide-56
SLIDE 56

Research Opportunities And Challenges

slide-57
SLIDE 57

Opportunity 1: Mining Social Media Images

slide-58
SLIDE 58

Opportunity 1: Mining Social Media Images

slide-59
SLIDE 59

Opportunity 1: Mining Social Media Images

slide-60
SLIDE 60

Opportunity 1: Mining Social Media Images

slide-61
SLIDE 61

Opportunity 1: Mining Social Media Images

slide-62
SLIDE 62
  • Helps to model variation in “excessive drinking”

– Contact me for submission (under review)

Opportunity 1: Mining Social Media Images

slide-63
SLIDE 63

Opportunity 2: Network Influence

slide-64
SLIDE 64

Opportunity 2: Network Influence

A person's chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval. Among pairs

  • f adult siblings, if one sibling became obese, the chance that the other would become
  • bese increased by 40% (95% CI, 21 to 60). If one spouse became obese, the likelihood

that the other spouse would become obese increased by 37% (95% CI, 7 to 73).

slide-65
SLIDE 65

Opportunity 2: Network Influence

A person's chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval. Among pairs

  • f adult siblings, if one sibling became obese, the chance that the other would become
  • bese increased by 40% (95% CI, 21 to 60). If one spouse became obese, the likelihood

that the other spouse would become obese increased by 37% (95% CI, 7 to 73).

slide-66
SLIDE 66

Opportunity 2: Network Influence

slide-67
SLIDE 67

Opportunity 2: Network Influence

slide-68
SLIDE 68

Opportunity 2: Network Influence

slide-69
SLIDE 69

At the heart of the dispute is an old conundrum in social science: How certain can anyone be about conclusions based on

  • bservations of how people behave?

Opportunity 2: Network Influence

slide-70
SLIDE 70

Opportunity 2: Network Influence

  • No randomized controlled trial (RCT)

– Only observational data

  • Hard to tease apart

– Homophily: friends are similar to you – Environment: friends are exposed to similar factors – Social influence: friends make you similar

  • Possible solution: Natural experiments

– Weather? – Local campaigns?

slide-71
SLIDE 71

Opportunity 3: Social Media Meets Quantified Self

slide-72
SLIDE 72

Opportunity 3: Social Media Meets Quantified Self

slide-73
SLIDE 73

Opportunity 3: Social Media Meets Quantified Self

slide-74
SLIDE 74

Opportunity 3: Social Media Meets Quantified Self

slide-75
SLIDE 75

Opportunity 3: Social Media Meets Quantified Self

slide-76
SLIDE 76

Opportunity 3: Social Media Meets Quantified Self

slide-77
SLIDE 77

Opportunity 4: Information for Individual Health

slide-78
SLIDE 78

Opportunity 4: Information for Individual Health

slide-79
SLIDE 79

Opportunity 4: Information for Individual Health

slide-80
SLIDE 80

Opportunity 4: Information for Individual Health

slide-81
SLIDE 81

Challenges

  • Ethical

– Big Brother – “Informed” Consent

  • Attitudinal

– Medical doctors to listen – “Social Media Cures Cancer”

  • Data quality

– Selection bias: Who’s on Social Media? Who’s using QS? – Reporting bias: Who tweets about food? About STDs?

  • Lack of individual level ground truth

– Who has the flu? Who is obese? Who is smoking?

  • Having interventions

– So far only communication-based interventions – A/B testing on the “inside”

slide-82
SLIDE 82

Twitter For Sociological Studies

slide-83
SLIDE 83

Interested? - We’re hiring!

  • Interns (all year around)
  • Postdocs
  • Scientists
  • Engineers

Talk to me about “life in the desert”.

slide-84
SLIDE 84

Tack! Thanks!