We will start at 2:05 pm! Thanks for coming early! Yesterday - - PowerPoint PPT Presentation

we will start at 2 05 pm thanks for coming early yesterday
SMART_READER_LITE
LIVE PREVIEW

We will start at 2:05 pm! Thanks for coming early! Yesterday - - PowerPoint PPT Presentation

We will start at 2:05 pm! Thanks for coming early! Yesterday Fundamental 1. Value of visualization 2. Design principles 3. Graphical perception Record Information Support Analytical Reasoning Communicate Information to Others Yesterday


slide-1
SLIDE 1

We will start at 2:05 pm! Thanks for coming early!

slide-2
SLIDE 2
  • 1. Value of visualization
  • 2. Design principles
  • 3. Graphical perception

Fundamental

Yesterday

slide-3
SLIDE 3

Record Information

slide-4
SLIDE 4

Support Analytical Reasoning

slide-5
SLIDE 5

Communicate Information to Others

slide-6
SLIDE 6
  • 1. Value of visualization
  • 2. Design principles
  • 3. Graphical perception

Fundamental

Yesterday

slide-7
SLIDE 7

Bar chart baselines should start at 0!

34 35% 39.6%

Graphical Integrity

slide-8
SLIDE 8

Lie Factor = Size of effect shown in graphic Size of effect in data

slide-9
SLIDE 9

Maximize Data-Ink Ratio

slide-10
SLIDE 10

Useful chart junks?

slide-11
SLIDE 11

Problem with Pie Charts

slide-12
SLIDE 12

World’s Most Accurate Pie Chart

slide-13
SLIDE 13

Problem with Rainbow Colormap

39% 71% 10.2 sec/region 5.6 sec/region

[M. Borkin et al 2011]

slide-14
SLIDE 14

Problem with 3D Charts

71% 91% 2.4 sec/region 5.6 sec/region

[M. Borkin et al 2011]

slide-15
SLIDE 15
  • 1. Value of visualization
  • 2. Design principles
  • 3. Graphical perception

Fundamental

Yesterday

slide-16
SLIDE 16

Signal Detection

Which is brighter? A B

slide-17
SLIDE 17

Magnitude Estimation

A B

slide-18
SLIDE 18

Pre-attentive processing

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686 1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686 How Many 3’s?

slide-19
SLIDE 19

Gestalt Principles

Color Similarity Connection lines

slide-20
SLIDE 20

Separability vs. Integrality

2 groups each 2 groups each 3 groups total: integral area 4 groups total: integral hue

Position Hue (Color) Size Hue (Color) Width Height Red Green Fully separable Some interference Some/signifjcant interference Major interference

[Tamara Munzner 14]

What we perceive:

slide-21
SLIDE 21

Change Blindness

http://www.psych.ubc.ca/~rensink/flicker/download/

slide-22
SLIDE 22
  • 1. Data model and visual encoding
  • 2. Exploratory data analysis
  • 3. Storytelling with data
  • 4. Advanced visualizations

Practical

Today

slide-23
SLIDE 23

Data Model & Visual Encoding

Nam Wook Kim Mini-Courses — January @ GSAS 2018

slide-24
SLIDE 24

Goal

Learn how data is mapped to image

slide-25
SLIDE 25

The Big Picture

Analysis task identify, compare summarize Data conceptual model 
 data model Domain goals, questions, assumptions Visual encoding mapping from data to image Image marks & channels Processing algorithms data transformation

[Slides from J. Heer]

slide-26
SLIDE 26

Topics

  • Data Models
  • Image Models
  • Visual Encoding
  • Formalizing Design
slide-27
SLIDE 27

Data Models

slide-28
SLIDE 28

Data Models/Conceptual Models

  • Conceptual Models are mental constructions of the domain 


Include semantics and support reasoning

  • Data Models are formal descriptions of the data


Derives from a conceptual model. 
 Include dimensions & measures.

  • Examples (data vs. conceptual)


Decimal number vs. temperature
 Longitude, latitude vs. geographic location

slide-29
SLIDE 29

Taxonomy of Datasets

1D (sets and sequences) Temporal
 2D (maps)
 3D (shapes) nD (relational) Trees (hierarchies) Networks (graphs) and combinations…

[Shneiderman 96]

slide-30
SLIDE 30

Data (Measurement) Scales

N—Nominal O—Ordinal Q—Quantitative

slide-31
SLIDE 31

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ...

slide-32
SLIDE 32

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd…

slide-33
SLIDE 33

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd… Q—Quantitative Interval (location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Only differences (i.e. intervals) are compared

slide-34
SLIDE 34

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd… Q—Quantitative Interval (location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Only differences (i.e. intervals) are compared Ratio (zero fixed) Physical measurement: length, amounts, counts Allow direct comparisons like twice as long

slide-35
SLIDE 35

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd… Q—Quantitative Interval (location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Only differences (i.e. intervals) are compared Ratio (zero fixed) Physical measurement: length, amounts, counts Allow direct comparisons like twice as long

Operations =, ≠

slide-36
SLIDE 36

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd… Q—Quantitative Interval (location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Only differences (i.e. intervals) are compared Ratio (zero fixed) Physical measurement: length, amounts, counts Allow direct comparisons like twice as long

=, ≠, <, >

slide-37
SLIDE 37

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd… Q—Quantitative Interval (location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Only differences (i.e. intervals) are compared Ratio (zero fixed) Physical measurement: length, amounts, counts Allow direct comparisons like twice as long

=, ≠, <, >, −

Can measure distances or spans

slide-38
SLIDE 38

Data Scales

N—Nominal (labels or categories) Fruits: apples, oranges, ... O—Ordinal Rankings: 1st, 2nd, 3rd… Q—Quantitative Interval (location of zero arbitrary) Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) Only differences (i.e. intervals) are compared Ratio (zero fixed) Physical measurement: length, amounts, counts Allow direct comparisons like twice as long

=, ≠, <, >, −, / (%)

Can measure ratios or proportions

slide-39
SLIDE 39

Example

Conceptual Model Temperature (°C) Data Model 32.5, 54.0, -17.3, ... Decimal numbers Data Scales Temperature Value (Q) Burned vs. Not-Burned (N) — Derived Hot, Warm, Cold (O) — Derived

slide-40
SLIDE 40

Dimensions & Measures

Dimensions (~ independent variables) Often discrete variables describing data (N, O) Categories, dates, binned quantities Measures (~ dependent variables) Continuous values that can be aggregated (Q) Numbers to be analyzed
 Aggregate as sum, count, average, std. dev… Not a strict distinction. The same variable may be treated either way depending on the task (e.g. Year: 2001, 2002 …).

slide-41
SLIDE 41

Example: U.S. Census Data

slide-42
SLIDE 42

Year: 1850 – 2000 (every decade) Age: 0 – 90+
 Marital Status: Single, Married, Divorced, … Sex: Male, Female People Count: # of people in group 2,348 data points

U.S. Census Data

slide-43
SLIDE 43

U.S. Census Data

Year Age
 Marital Status Sex People Count Q-Interval (O) Q-Ratio (O) N N Q-Ratio

slide-44
SLIDE 44

U.S. Census Data

Year Age
 Marital Status Sex People Count Depends! Depends! Dimension Dimension Measure

slide-45
SLIDE 45

Image Models

slide-46
SLIDE 46

Visual Language is a Sign System

Images perceived as a set of signs Sender encodes information in signs Receiver decodes information from signs Semiology of Graphics, 1967 Jacques Bertin Cartographer [1918-2010]

slide-47
SLIDE 47

Image Models

Lines Position Points Areas Size Value Texture Color Orientation Shape Marks Basic graphical elements in an image Represent information Channels (visual variables) Control the appearance of marks Encode information

slide-48
SLIDE 48

Coding Information in Position

  • 1. A, B, C are distinguishable
  • 2. B is between A and C.
  • 3. BC is twice as long as AB.

∴ Encode quantitative variables (Q) 
 "Resemblance, order and proportional are the three signfields in graphics.” — Bertin

slide-49
SLIDE 49

Coding Information in Color and Value

Value (lightness) is perceived as ordered ∴ Encode ordinal variables (O) [better] ∴ Encode continuous variables (Q) Hue is normally perceived as unordered ∴ Encode nominal variables (N)

slide-50
SLIDE 50

Bertin’s Levels of Organization

Position N O Q Size N O Q Value N O

Q

Texture N

  • Color

N Orientation N Shape N Nominal Ordinal Quantitative Note: Q ⊂ O ⊂ N

slide-51
SLIDE 51

Mackinlay’s Ranking

Expanded Bertin’s variables and conjectured effectiveness of encodings by data type.

Jock D. Mackinlay Vice President Tableau Software

[Mackinlay 86]

slide-52
SLIDE 52

Effectiveness Rankings

QUANTITATIVE

Position Length
 Angle
 Slope
 Area (Size) Volume Density (Value) Color Sat Color Hue Texture Connection Containment Shape

ORDINAL

Position Density (Value) Color Sat Color Hue Texture Connection Containment Length
 Angle
 Slope
 Area (Size) Volume
 Shape

NOMINAL

Position
 Color Hue Texture Connection Containment Density (Value) Color Sat Shape Length Angle Slope Area Volume

[Mackinlay 86]

slide-53
SLIDE 53

Effectiveness Rankings

[Mackinlay 86]

QUANTITATIVE

Position Length
 Angle
 Slope
 Area (Size) Volume Density (Value) Color Sat Color Hue Texture Connection Containment Shape

ORDINAL

Position Density (Value) Color Sat Color Hue Texture Connection Containment Length
 Angle
 Slope
 Area (Size) Volume
 Shape

NOMINAL

Position
 Color Hue Texture Connection Containment Density (Value) Color Sat Shape Length Angle Slope Area Volume

slide-54
SLIDE 54

Effectiveness Rankings

QUANTITATIVE

Position Length
 Angle
 Slope
 Area (Size) Volume Density (Value) Color Sat Color Hue Texture Connection Containment Shape

ORDINAL

Position Density (Value) Color Sat Color Hue Texture Connection Containment Length
 Angle
 Slope
 Area (Size) Volume
 Shape

NOMINAL

Position
 Color Hue Texture Connection Containment Density (Value) Color Sat Shape Length Angle Slope Area Volume

[Mackinlay 86]
slide-55
SLIDE 55

Gene Expression Time-Series [Meyer et al ’11]

Color Encoding Position Encoding

slide-56
SLIDE 56

Example: Deconstructions

slide-57
SLIDE 57

William Playfair, 1786

slide-58
SLIDE 58

William Playfair, 1786 Y-axis:
 Currency (Q) Color: 
 Imports/exports (N) X-axis: Year (Q)

slide-59
SLIDE 59

Wattenberg’s Map of the Market

slide-60
SLIDE 60

Rectangle Area: market cap (Q) 
 Rectangle Position: market sector (N), market cap (Q) Color Hue: loss vs. gain (N)
 Color Value: magnitude of loss or gain (Q)

slide-61
SLIDE 61

Minard 1869: Napoleon’s March

slide-62
SLIDE 62

Minard 1869: Napoleon’s March

slide-63
SLIDE 63

Y-axis: 
 temperature (Q) X-axis: longitude (Q) / time (O) Minard 1869: Napoleon’s March

slide-64
SLIDE 64

Y-axis: 
 latitude (Q) X-axis: longitude (Q) Width: army size (Q) Color:
 march / return Minard 1869: Napoleon’s March

slide-65
SLIDE 65

Example: Encoding Data

slide-66
SLIDE 66

Example: Coffee Sales

Sales figures for a fictional coffee chain Sales
 Profit Marketing Product Type Market Q-Ratio
 Q-Ratio
 Q-Ratio
 N {Coffee, Espresso, Herbal Tea, Tea} N {Central, East, South, West}

slide-67
SLIDE 67

Encode “Sales” (Q) — X-Position Encode “Profit” (Q) — Y-Position

slide-68
SLIDE 68

Encode “Product Type” (N) — Hue (Color)

slide-69
SLIDE 69

Encode “Market” (N) — Shape

slide-70
SLIDE 70

Encode “Marketing” (Q) —Size

slide-71
SLIDE 71

Encode “Marketing” (Q) —Size

Are you satisfied with this chart?

slide-72
SLIDE 72

Avoid over-encoding

Use trellis plots (small multiples/facets) that subdivide space to enable comparison across multiple plots.

slide-73
SLIDE 73

Formalizing Design

slide-74
SLIDE 74

Choosing visual encodings

Assume k visual channels and n data attributes. We would like to pick the “best” encoding among a combinatorial set

  • f possibilities of size (n+1)k
slide-75
SLIDE 75

Choosing visual encodings

Assume k visual encodings and n data attributes. We would like to pick the “best” encoding among a combinatorial set of possibilities of size (n+1)k Principle of Consistency The properties of the image (visual variables) should match the properties of the data. Principle of Importance Ordering Encode the most important information in the most effective way.

slide-76
SLIDE 76

Design Criteria [Mackinlay 86]

Expressiveness Effectiveness

slide-77
SLIDE 77

Design Criteria

Expressiveness A set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data. Effectiveness

[Mackinlay 86]

slide-78
SLIDE 78

Design Criteria Translated

Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission)

slide-79
SLIDE 79

Can not express the facts

A multivariate relation may be inexpressive in a single horizontal dot plot because multiple records are mapped to the same position.

Single horizontal dot plot

slide-80
SLIDE 80

Can not express the facts

A multivariate relation may be inexpressive in a single horizontal dot plot because multiple records are mapped to the same position.

Single horizontal dot plot Categories in different positions

slide-81
SLIDE 81

Expresses facts not in the data

A length is interpreted as a quantitative value.

slide-82
SLIDE 82

Design Criteria

Expressiveness A set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data. Effectiveness

[Mackinlay 86]

slide-83
SLIDE 83

Design Criteria

Expressiveness A set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data. Effectiveness A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization.

[Mackinlay 86]

slide-84
SLIDE 84

Design Criteria Translated

Tell the truth and nothing but the truth (don’t lie, and don’t lie by omission) Use encodings that people decode better (where better = faster and/or more accurate)

slide-85
SLIDE 85

Mackinlay’s Design Algorithm

APT - “A Presentation Tool”, 1986 
 User formally specifies data model and type Input: ordered list of data variables to show APT searches over design space Test expressiveness of each visual encoding Generate encodings that pass test
 Rank by perceptual effectiveness criteria Output the “most effective” visualization

slide-86
SLIDE 86

APT

Automatically generate chart for Input variables: 


  • 1. Price

  • 2. Mileage
  • 3. Repair
  • 4. Weight
slide-87
SLIDE 87

Polaris

[Stolte et al 2002]

slide-88
SLIDE 88

Tableau
 founded 2003

slide-89
SLIDE 89

Take away: Visual Encoding Design

Use expressive and effective encodings Avoid over-encoding
 Reduce the problem space
 Use space and small multiples intelligently Use interaction to generate relevant views Rarely does a single visualization answer all questions. Instead, the ability to generate appropriate visualizations quickly is critical!

slide-90
SLIDE 90

Exploratory Data Analysis

Next

Tableau H-1B petitions filed in each state

slide-91
SLIDE 91

10 min break

Download Tableau & H-1B petition data