[PPT] - What Can We Learn About Innovation From the Theories That Drive PowerPoint Presentation

SLIDE 1

What Can We Learn About Innovation From the Theories That Drive Artificial Intelligence?

Christopher J. Hazard, PhD

SLIDE 2

Reinforcement Learning Optimization Supervised Learning Unsupervised Learning Goal Oriented (Measure Goodness) Accuracy Oriented (Measure Accuracy) Exploration (Discover New Things) Exploitation (Utilizing Existing Information)

SLIDE 3

Nutrition Density Awesomeness Example Domain: Food

SLIDE 4

Nutrition Density

Supervised Learning

Given the other data, Figure out if this is Meal or Snack Meal Snack

Unknown

Awesomeness

SLIDE 5

Supervised Learning: Universal Function Approximators

Data Model A Low Variance Model B Low Bias Model C Good Model

SLIDE 6

Nutrition

Unsupervised Learning

Find anomalies Given food, come up with categories Awesomeness

SLIDE 7

Unsupervised Learning: Clustering and Anomaly Detection

Group 1 Outlier Outlier Group 2 Group 3

SLIDE 8

Nutrition Density

Reinforcement Learning

Meal Snack After getting the first guess right, it gets two wrong, is corrected, learns from its mistakes, and decides how to learn next Objective: eat a highly nutritious meal

Unknown

1 2 3 4 Awesomeness

SLIDE 9

Reinforcement Learning: Seeking Rewards, filling in Unknowns

Maximize Awesomeness & Nutrition Savory? 50% Nutritious 40% Awesome Green? 90% Nutritious 5% Awesome ??? Yellow? 50% Nutritious 50% Awesome ??? ??? Salty? 70% Nutritious 70% Awesome ??? Sweet? 10% Nutritious 90% Awesome Sour? 40% Nutritious 50% Awesome Orange 100% Nutritious 70% Awesome Tart Candy 0% Nutritious 90% Awesome ??? ???

SLIDE 10

Nutrition Density

Optimization

Find the “best” meal Meal Snack

Unknown

Found the best meal Awesomeness

SLIDE 11

Optimization: Finding the Best

SLIDE 12

Innovation & Creativity To make new and valuable things and ideas

SLIDE 13

Innovation & Creativity To make new and valuable things and ideas Maximize Surprisal Maximize Effectiveness Minimize Complexity Minimize Expense …using feedback

SLIDE 14

Filament Material Voltage (Volts) Power (Watts) Thickness (Inches) Length (Inches) Gas Pressure (Atm) Lumens Cost Lifespan Platinum 220 60 .0025 30 Air .0005 400 $$$$ 200 hours Carbonized Bamboo 120 55 .0027 23.5 Air .0002 250 $ 1200 hours Tungsten 120 100 .0018 22.8 Nitrogen .7 1700 $ 1000 hours … … … … … … … … … …

SLIDE 15

4 4 1 2 − 1

SLIDE 16

4 4 3 − 1 1

SLIDE 17

Dimensions Diameter of Inner Sphere 1 2 1 − 1 = 0 4 2 4 − 1 = 2 9 2 9 − 1 = 𝟓 16 2 16 − 1 = 6 64 2 64 − 1 = 14

SLIDE 18

Original image by Waldyrious on Wikipedia

𝑀, Space / Minkowski Distance: A new 𝑀- “Norm”:

Hazard et al., DP TR 2019

SLIDE 19

A Slower Speed of Light. Kortemeyer et al., FDG 2013

SLIDE 20

Henry Hinnefeld: http://hinnefe2.github.io/python/tools/2015/09/21/mario-kart.html

Nintendo: Mario Kart 8

SLIDE 21

Goodness Landscape (projected to one dimension) Goodness State

SLIDE 22

Sampling Goodness Goodness State

SLIDE 23

How Are Functions Fooled?

Exploit spurious correlations

in random features

200 coin flips: 6 in a row
Exploit irregular boundaries
Incorrect margins
Incorrect slope
Irregular shape
Simpson’s Paradox / Wrong

Features

Goodfellow et al., ICMR 2015

SLIDE 24

Data vs Games

Wheat Genome Google Image Labeler INMAST – Hazardous Software, 2017 Starcraft 2 – Blizzard Calvinball/Nomic with Hazard

SLIDE 25

What Are you Optimizing For?

Goal Example Technique Requires Benefits Drawbacks Maximize expected value MCTS Data Great results without adversary Not strong vs formidable / creative adversary Minimize expected regret MCCRM Knowledge of causality and uncertainty Unlikely to lose or lose by much, will do well vs adversary Need to codify what are and are not rules / causal Minimize maximum loss (minmax) Nash Equilibrium (or other solution concept) Knowledge of causality and uncertainty fully characterized Won’t lose except by chance Often higher computational complexity, will not take advantage of weak adversaries

SLIDE 26

Data vs Game: Resources Spent on Defense

~20-30%
~3-8% (increasing?)
~1%

brainmaps.org Volker Brinkmann

SLIDE 27

SLIDE 28

Measuring discount factor by choice

Hazard & Singh, TKDE, 2010

SLIDE 29

Time Preference and Switching Cost

Why do some technologies

get adopted? E.g., TCP and UDP dominate when more capable technologies exist such as SCTP

Time preference, switching

costs, and trend following scales the number of early adopters required

Num Total Adopters Num Early Adopters Convergence Time Hazard & Wurman, ICEC, 2007

SLIDE 30

Minority Game: The Path Less Taken

El Farol Bar problem
Hard to find valuable

unknowns in large population of smart agents

Related to No Free Lunch

Theorem: know the data

Esteban & Moro, ’04 Challet et al., Oxford Press, 2005

SLIDE 31

Inputs Classification Representation Generalization à

Yosinski et al., ICML DL 2015

SLIDE 32

Neurons Output Weights Input Scale Inputs Input Softmax

What if we flatten a neural network? Memorization without generalization

Lin, Tegmark, Rolnick, J Stat Physics, 2017 Logical conjunction: need a value for each combination

f values (exponential!)

SLIDE 33

Desirability Index

Multicriteria optimization for innovating in chemistry, and chemical

and mechanical engineering

Gaming and strategy

Trautmann, Drug Design Workshop, 2009 Harrington, IQC, 1965 Point Recon, Hazardous Software, 2013

SLIDE 34

Generalized Diversity Index & Generalized Mean

SLIDE 35

Surprisal & Shannon Information

Self-information: information of outcome of random event
Surprisal: -log2 P(xi)
Information: Expected surprisal
Information gain, KL-divergence, cross-entropy

probability surprisal

SLIDE 36

Probability State Probability State Prior Posterior

SLIDE 37

Corpse Party Chapter 1 Infirmary

SLIDE 38

Corpse Party Chapter 1 Infirmary

SLIDE 39

Infirmary Flow

take match from furnace try door try door try match try match get rubbing alcohol try door exit

Actual branching factor: 12
Perceived branching factor: 11
Exaggerated expectation

[Hilbert, PSYCHOL BULL '12]

P(progress | revisit item)

higher than anticipated

SLIDE 40

Infirmary Surprisal

Player unsure of what to do, so assume uniform

distribution over new possibilities:

Q(X) ≈ 1/11, Q(Repeat) ≈ 0 => ~3.5 bits

Correct distribution over possibilities, minimizing

assumptions: P(X) = 1/12

Q(repeat) ≈ 0 means 1/12 * log( (1/12) / 0) = 1/12 * ln(∞) = ∞ Massive surprisal if assume no repeat actions advance game

SLIDE 41

Measuring Complexity By Decision Information Rate

X X X 3 out of 6 paths fail 1 1 1 No loss, no information Average 1 bit of information Average 0.5 bits of information 1.5 bits of total information to succeed 1.5 bits / 2 steps = 0.75 bits per step to succeed

SLIDE 42

Combining Information Theory & Game Theory

Maximum Entropy Correlated Equilibria

(Ortiz et al., 2007)

Measure information gain between player strategy and
ptimal
Just add stochasticity!
Rock, Paper, Scissors:
1/3 rock, 1/3 paper, 1/3 scissors
1/4 rock, 1/4 paper, 1/2 scissors
The value of soothsayers and randomness
Robust sampling (e.g., Bayesian Optimization, MCCFR)

SLIDE 43

Peoples of the Steppe

SLIDE 44

Ambiguity of Strategy Via Information Theory: Maximum Difficulty

Fortification Honeypot Sampling Adaption

Pavlovic, Proc 2011 ACM New Sec Paradigms Workshop

Nomads à Pirates à Intellectual Property (Industrial Revolution) à Illicit Networks & Well-funded Startups

SLIDE 45

History Is Generalized & Compressed

~1420, Taccola 1490, da Vinci

SLIDE 46

A Formula for Measuring Creativity of a Solution

𝐷 𝑦, 𝐵, 𝑤3, … , 𝜉6 = 𝑛𝑗𝑜 𝑏 ∈ 𝐵 𝐸=> | 𝑦 𝑏 − 𝐽 𝑦 − 𝐽 𝑏 + 1 𝑜 B

CD3 6

ln 𝑤C 𝑦 − ln 𝑤C 𝑏 x : configuration A : set of known configuration 𝑤C : value funcvon

Relative Novelty Compare to closest Relative Desirability Relative Complexity

SLIDE 47

What Can We Learn About Innovation From the Theories That Drive Artificial Intelligence?

Christopher J. Hazard, PhD

Supervised Learning

Supervised Learning: Universal Function Approximators

Unsupervised Learning

Unsupervised Learning: Clustering and Anomaly Detection

Reinforcement Learning

Reinforcement Learning: Seeking Rewards, filling in Unknowns

Optimization

Optimization: Finding the Best

Innovation & Creativity To make new and valuable things and ideas

Innovation & Creativity To make new and valuable things and ideas Maximize Surprisal Maximize Effectiveness Minimize Complexity Minimize Expense …using feedback

4 4 1 2 − 1

4 4 3 − 1 1

Dimensions Diameter of Inner Sphere 1 2 1 − 1 = 0 4 2 4 − 1 = 2 9 2 9 − 1 = 𝟓 16 2 16 − 1 = 6 64 2 64 − 1 = 14

𝑀, Space / Minkowski Distance: A new 𝑀- “Norm”:

A Slower Speed of Light. Kortemeyer et al., FDG 2013

Nintendo: Mario Kart 8

Goodness Landscape (projected to one dimension) Goodness State

Sampling Goodness Goodness State

How Are Functions Fooled?

in random features

Features

Data vs Games

What Are you Optimizing For?

Data vs Game: Resources Spent on Defense

Measuring discount factor by choice

Time Preference and Switching Cost

Minority Game: The Path Less Taken

unknowns in large population of smart agents

Theorem: know the data

Inputs Classification Representation Generalization à

What if we flatten a neural network? Memorization without generalization

Desirability Index

and mechanical engineering

Generalized Diversity Index & Generalized Mean

Surprisal & Shannon Information

Probability State Probability State Prior Posterior

Corpse Party Chapter 1 Infirmary

Corpse Party Chapter 1 Infirmary

Infirmary Flow

higher than anticipated

Infirmary Surprisal

distribution over new possibilities:

Q(X) ≈ 1/11, Q(Repeat) ≈ 0 => ~3.5 bits

assumptions: P(X) = 1/12

Q(repeat) ≈ 0 means 1/12 * log( (1/12) / 0) = 1/12 * ln(∞) = ∞ Massive surprisal if assume no repeat actions advance game

Measuring Complexity By Decision Information Rate

Combining Information Theory & Game Theory

Peoples of the Steppe

Ambiguity of Strategy Via Information Theory: Maximum Difficulty

Nomads à Pirates à Intellectual Property (Industrial Revolution) à Illicit Networks & Well-funded Startups

History Is Generalized & Compressed

A Formula for Measuring Creativity of a Solution

𝐷 𝑦, 𝐵, 𝑤3, … , 𝜉6 = 𝑛𝑗𝑜 𝑏 ∈ 𝐵 𝐸=> | 𝑦 𝑏 − 𝐽 𝑦 − 𝐽 𝑏 + 1 𝑜 B

ln 𝑤C 𝑦 − ln 𝑤C 𝑏 x : configuration A : set of known configuration 𝑤C : value funcvon

Thanks!