Copulas for neural and behavioral parallel systems Hans Colonius - - PowerPoint PPT Presentation

▶

Mar 06, 2024 258 likes •1.72k views

Copulas for neural and behavioral parallel systems Hans Colonius Department f ur Psychologie Universit at Oldenburg Purdue Winer Memorial Lectures Probability and Contextuality November 2018 1 / 75 Outline Preliminaries Coupling:

SLIDE 1

Copulas for neural and behavioral parallel systems Hans Colonius

Department f¨ ur Psychologie Universit¨ at Oldenburg Purdue Winer Memorial Lectures Probability and Contextuality November 2018

1 / 75

SLIDE 2

Outline

Preliminaries Coupling: Definition Example: Coupling Bernoulli Random Variables Quantile Coupling Theory of Copulas Defining the copula Sklar’s Theorem Copula examples Fr´ echet-Hoeffding bounds and copulas Copulas and order statistics Application to parallel systems The multisensory paradigm (behavioral (RT) version) The multisensory paradigm (neural (spike counts) version) Response inhibition: stop signal paradigm The independent race model The paradox The race model with perfect negative dependence

2 / 75

SLIDE 3

Intro remarks: Coupling and Copulas

Coupling and Copula are two important concepts of probability theory. Their origins can be traced back to the fifties (coupling: thirties) of the last century, but interest in them waxed and waned for a long time. Very loosely, coupling means the construction of the joint distribution of two or more (previously unrelated) random variables, whereas copulas are functions that join multivariate distribution functions to their one-dimensional margins. Copulas have recently become a very active area in statistics.

3 / 75

SLIDE 4

Preliminaries: Equality-in-distribution

◮ Let X be a random variable; ˆ X is a copy or a representation

f X if it has the same distribution as X, denoted by

ˆ X D = X

4 / 75

SLIDE 5

First definition: Coupling

◮ A coupling of a collection of random variables Xi, i ∈ I (I some index set), is a family of random variables (ˆ Xi : i ∈ I) such that ˆ Xi

= Xi, i ∈ I. Note: The collection Xi need not be defined on a common probability space and may not have a joint distribution; the family (ˆ Xi : i ∈ I) has joint distribution such that the marginals are equal to the distributions of the Xi variables.

5 / 75

SLIDE 6

Example 1: Coupling two Bernoulli random variables

Let Xp be a Bernoulli random variable, i.e., P(Xp = 1) = p and P(Xp = 0) = 1 − p. Assume p < q; we can couple Xp and Xq as follows: Let U be a uniform random variable on [0, 1], i.e., for 0 ≤ a < b ≤ 1, P(a < U ≤ b) = b − a. Define ˆ Xp =

if 0 < U ≤ p; if p < U ≤ 1 ; ˆ Xq =

if 0 < U ≤ q; if q < U ≤ 1. Then U serves as a common source of randomness for both ˆ Xp and ˆ

Xq. Moreover, ˆ

Xp

= Xp and ˆ Xq

= Xq.

6 / 75

SLIDE 7

Example 1: Coupling two Bernoulli random variables

Let Xp be a Bernoulli random variable, i.e., P(Xp = 1) = p and P(Xp = 0) = 1 − p. Assume p < q; we can couple Xp and Xq as follows: Let U be a uniform random variable on [0, 1], i.e., for 0 ≤ a < b ≤ 1, P(a < U ≤ b) = b − a. Define ˆ Xp =

if 0 < U ≤ p; if p < U ≤ 1 ; ˆ Xq =

if 0 < U ≤ q; if q < U ≤ 1. Then U serves as a common source of randomness for both ˆ Xp and ˆ

Xq. Moreover, ˆ

Xp

= Xp and ˆ Xq

= Xq.

6 / 75

SLIDE 8

Example 1: Coupling two Bernoulli random variables

Let Xp be a Bernoulli random variable, i.e., P(Xp = 1) = p and P(Xp = 0) = 1 − p. Assume p < q; we can couple Xp and Xq as follows: Let U be a uniform random variable on [0, 1], i.e., for 0 ≤ a < b ≤ 1, P(a < U ≤ b) = b − a. Define ˆ Xp =

if 0 < U ≤ p; if p < U ≤ 1 ; ˆ Xq =

if 0 < U ≤ q; if q < U ≤ 1. Then U serves as a common source of randomness for both ˆ Xp and ˆ

Xq. Moreover, ˆ

Xp

= Xp and ˆ Xq

= Xq.

6 / 75

SLIDE 9

Example 1 (cont’ed): Coupling two Bernoulli random variables

◮ The joint distribution of (ˆ Xp, ˆ Xq) is ˆ Xq 1 1 − q q − p 1 − p ˆ Xp 1 p p 1 − q q 1 ◮ ◮ and cov(ˆ Xp, ˆ Xq) = p(1 − q).

7 / 75

SLIDE 10

Example 1 (cont’ed): Coupling two Bernoulli random variables

◮ The joint distribution of (ˆ Xp, ˆ Xq) is ˆ Xq 1 1 − q q − p 1 − p ˆ Xp 1 p p 1 − q q 1 ◮ ◮ and cov(ˆ Xp, ˆ Xq) = p(1 − q).

7 / 75

SLIDE 11

Example 1 (cont’ed): Coupling two Bernoulli random variables

◮ The joint distribution of (ˆ Xp, ˆ Xq) is ˆ Xq 1 1 − q q − p 1 − p ˆ Xp 1 p p 1 − q q 1 ◮ ◮ and cov(ˆ Xp, ˆ Xq) = p(1 − q).

7 / 75

SLIDE 12

Preliminaries: The Quantile Function

Let X be a real-valued random variable with distribution function F(x) which is continuous from the right. Then, the quantile function (or, generalized inverse) Q(u) of X is defined as Q(u) ≡ F −1(u) = inf{x : F(x) ≥ u}, 0 ≤ u ≤ 1.

8 / 75

SLIDE 13

Example 2: Quantile Coupling

Let X be a random variable with distribution function F, that is, P(X ≤ x) = F(x), x ∈ ℜ. Let U be a uniform random variable on [0, 1]. Then, for random variable ˆ X = F −1(U), P(ˆ X ≤ x) = P(F −1(U) ≤ x) = P(U ≤ F(x)) = F(x), x ∈ ℜ, that is, ˆ X is a copy of X: ˆ X D = X. Thus, letting F run over the class of all distribution functions (using the same U), yields a coupling of all differently distributed random variables, the quantile coupling.

9 / 75

SLIDE 14

Example 2: Quantile Coupling

Let X be a random variable with distribution function F, that is, P(X ≤ x) = F(x), x ∈ ℜ. Let U be a uniform random variable on [0, 1]. Then, for random variable ˆ X = F −1(U), P(ˆ X ≤ x) = P(F −1(U) ≤ x) = P(U ≤ F(x)) = F(x), x ∈ ℜ, that is, ˆ X is a copy of X: ˆ X D = X. Thus, letting F run over the class of all distribution functions (using the same U), yields a coupling of all differently distributed random variables, the quantile coupling.

9 / 75

SLIDE 15

Example 2: Quantile Coupling

Let X be a random variable with distribution function F, that is, P(X ≤ x) = F(x), x ∈ ℜ. Let U be a uniform random variable on [0, 1]. Then, for random variable ˆ X = F −1(U), P(ˆ X ≤ x) = P(F −1(U) ≤ x) = P(U ≤ F(x)) = F(x), x ∈ ℜ, that is, ˆ X is a copy of X: ˆ X D = X. Thus, letting F run over the class of all distribution functions (using the same U), yields a coupling of all differently distributed random variables, the quantile coupling.

9 / 75

SLIDE 16

Theory of Copulas

10 / 75

SLIDE 17

Copula: Example

Let (X, Y ) be a pair of random variables with joint distribution function F(x, y) and marginal distributions FX(x) and FY (y). To each pair of real numbers (x, y) we can associate three numbers: FX(x), FY (y), and F(x, y). Note that each of these numbers lies in the interval [0, 1]. In other words, each pair (x, y) of real numbers leads to a point (FX(x), FY (y)) in the unit square [0, 1] × [0, 1], and this

rdered pair in turn corresponds to a number F(x, y) in [0, 1].

(x, y) → (FX(x), FY (y)) → F(x, y) = C(FX(x), FY (y)) ℜ × ℜ → [0, 1] × [0, 1] C → [0, 1] This is a copula C.

11 / 75

SLIDE 18

Copula: Example

Let (X, Y ) be a pair of random variables with joint distribution function F(x, y) and marginal distributions FX(x) and FY (y). To each pair of real numbers (x, y) we can associate three numbers: FX(x), FY (y), and F(x, y). Note that each of these numbers lies in the interval [0, 1]. In other words, each pair (x, y) of real numbers leads to a point (FX(x), FY (y)) in the unit square [0, 1] × [0, 1], and this

rdered pair in turn corresponds to a number F(x, y) in [0, 1].

(x, y) → (FX(x), FY (y)) → F(x, y) = C(FX(x), FY (y)) ℜ × ℜ → [0, 1] × [0, 1] C → [0, 1] This is a copula C.

11 / 75

SLIDE 19

Copula: Example

Let (X, Y ) be a pair of random variables with joint distribution function F(x, y) and marginal distributions FX(x) and FY (y). To each pair of real numbers (x, y) we can associate three numbers: FX(x), FY (y), and F(x, y). Note that each of these numbers lies in the interval [0, 1]. In other words, each pair (x, y) of real numbers leads to a point (FX(x), FY (y)) in the unit square [0, 1] × [0, 1], and this

rdered pair in turn corresponds to a number F(x, y) in [0, 1].

(x, y) → (FX(x), FY (y)) → F(x, y) = C(FX(x), FY (y)) ℜ × ℜ → [0, 1] × [0, 1] C → [0, 1] This is a copula C.

11 / 75

SLIDE 20

Defining the copula

Definition

An n-copula is an n-variate distribution function with univariate margins uniformly distributed on [0, 1]. ◮ There are many equivalent definitions of copula. ◮ One of them, in the case n = 2, is the following:

Definition

A function C : [0, 1] × [0, 1] → [0, 1] is a 2−copula if, and only if, it satisfies

1. C(0, t) = C(t, 0) = 0, for every t ∈ [0, 1] (groundedness);
2. C(1, t) = C(t, 1) = t, for every t ∈ [0, 1] (uniform marginals);
3. for all a1, a2, b1, b2 ∈ [0, 1], with a1 ≤ b1 and a2 ≤ b2,

(∗) C(a1, a2) − C(a1, b2) − C(b1, a2) + C(b1, b2) ≥ 0. (∗) is called 2-increasing, supermodularity.

12 / 75

SLIDE 21

Defining the copula

Definition

An n-copula is an n-variate distribution function with univariate margins uniformly distributed on [0, 1]. ◮ There are many equivalent definitions of copula. ◮ One of them, in the case n = 2, is the following:

Definition

A function C : [0, 1] × [0, 1] → [0, 1] is a 2−copula if, and only if, it satisfies

1. C(0, t) = C(t, 0) = 0, for every t ∈ [0, 1] (groundedness);
2. C(1, t) = C(t, 1) = t, for every t ∈ [0, 1] (uniform marginals);
3. for all a1, a2, b1, b2 ∈ [0, 1], with a1 ≤ b1 and a2 ≤ b2,

(∗) C(a1, a2) − C(a1, b2) − C(b1, a2) + C(b1, b2) ≥ 0. (∗) is called 2-increasing, supermodularity.

12 / 75

SLIDE 22

Defining the copula

Definition

An n-copula is an n-variate distribution function with univariate margins uniformly distributed on [0, 1]. ◮ There are many equivalent definitions of copula. ◮ One of them, in the case n = 2, is the following:

Definition

A function C : [0, 1] × [0, 1] → [0, 1] is a 2−copula if, and only if, it satisfies

1. C(0, t) = C(t, 0) = 0, for every t ∈ [0, 1] (groundedness);
2. C(1, t) = C(t, 1) = t, for every t ∈ [0, 1] (uniform marginals);
3. for all a1, a2, b1, b2 ∈ [0, 1], with a1 ≤ b1 and a2 ≤ b2,

(∗) C(a1, a2) − C(a1, b2) − C(b1, a2) + C(b1, b2) ≥ 0. (∗) is called 2-increasing, supermodularity.

12 / 75

SLIDE 23

Defining the copula

Definition

An n-copula is an n-variate distribution function with univariate margins uniformly distributed on [0, 1]. ◮ There are many equivalent definitions of copula. ◮ One of them, in the case n = 2, is the following:

Definition

A function C : [0, 1] × [0, 1] → [0, 1] is a 2−copula if, and only if, it satisfies

1. C(0, t) = C(t, 0) = 0, for every t ∈ [0, 1] (groundedness);
2. C(1, t) = C(t, 1) = t, for every t ∈ [0, 1] (uniform marginals);
3. for all a1, a2, b1, b2 ∈ [0, 1], with a1 ≤ b1 and a2 ≤ b2,

(∗) C(a1, a2) − C(a1, b2) − C(b1, a2) + C(b1, b2) ≥ 0. (∗) is called 2-increasing, supermodularity.

12 / 75

SLIDE 24

Sklar’s Theorem

Theorem (Sklar’s Theorem, 1959)

Let F(x1, . . . , xn) be an n-variate distribution function with margins F1(x1), . . . , Fn(xn); then there exists an n-copula C : [0, 1]n − → [0, 1] that satisfies F(x1, . . . , xn) = C(F1(x1), . . . , Fn(xn)), (x1, . . . xn) ∈ ℜn. If all univariate margins F1, . . . , Fn are continuous, then the copula is unique. If F −1

1 , . . . , F −1 n

are the quantile functions of the margins, then for any (u1, . . . , un) ∈ [0, 1]n C(u1, . . . , un) = F(F −1

1 (u1), . . . , F −1 n (un)).

Note: The copula can be considered ‘independent‘ of the univariate margins.

13 / 75

SLIDE 25

Sklar’s Theorem

Theorem (Sklar’s Theorem, 1959)

Let F(x1, . . . , xn) be an n-variate distribution function with margins F1(x1), . . . , Fn(xn); then there exists an n-copula C : [0, 1]n − → [0, 1] that satisfies F(x1, . . . , xn) = C(F1(x1), . . . , Fn(xn)), (x1, . . . xn) ∈ ℜn. If all univariate margins F1, . . . , Fn are continuous, then the copula is unique. If F −1

1 , . . . , F −1 n

are the quantile functions of the margins, then for any (u1, . . . , un) ∈ [0, 1]n C(u1, . . . , un) = F(F −1

1 (u1), . . . , F −1 n (un)).

Note: The copula can be considered ‘independent‘ of the univariate margins.

13 / 75

SLIDE 26

How to prove that a function is a copula

◮ Find a suitable probabilistic model (i.e., random vector) whose distribution function is concentrated on [0, 1]n and has uniform marginals; or, ◮ prove that properties (1) groundedness, (2) uniform marginals, and (3) n-increasing (supermodularity) are satisfied.

14 / 75

SLIDE 27

How to prove that a function is a copula

◮ Find a suitable probabilistic model (i.e., random vector) whose distribution function is concentrated on [0, 1]n and has uniform marginals; or, ◮ prove that properties (1) groundedness, (2) uniform marginals, and (3) n-increasing (supermodularity) are satisfied.

14 / 75

SLIDE 28

Properties of copulas

Sklar’s theorem shows that copulas remain invariant under strictly increasing transformations of the underlying random variables. It is possible to construct a wide range of multivariate distributions by choosing the marginal distributions and a suitable copula.

15 / 75

SLIDE 29

Example 1: Gumbel’s bivariate exponential copula

Let Hθ be the joint distribution function given by Hθ(x, y) =

1 − e−x − e−y + e−(x+y+θxy)

if x ≥ 0, y ≥ 0; 0,

therwise;

where θ is a parameter in [0, 1]. Then the marginals are exponentials, with quantile functions F −1(u) = − ln(1 − u) and G−1(v) = − ln(1 − v) for u, v ∈ [0, 1]. The corresponding copula is Cθ(u, v) = u + v − 1 + (1 − u)(1 − v)e−θ ln(1−u) ln(1−v).

16 / 75

SLIDE 30

Example 1: Gumbel’s bivariate exponential copula

Let Hθ be the joint distribution function given by Hθ(x, y) =

1 − e−x − e−y + e−(x+y+θxy)

if x ≥ 0, y ≥ 0; 0,

therwise;

where θ is a parameter in [0, 1]. Then the marginals are exponentials, with quantile functions F −1(u) = − ln(1 − u) and G−1(v) = − ln(1 − v) for u, v ∈ [0, 1]. The corresponding copula is Cθ(u, v) = u + v − 1 + (1 − u)(1 − v)e−θ ln(1−u) ln(1−v).

16 / 75

SLIDE 31

Example 1: Gumbel’s bivariate exponential copula

Let Hθ be the joint distribution function given by Hθ(x, y) =

1 − e−x − e−y + e−(x+y+θxy)

if x ≥ 0, y ≥ 0; 0,

therwise;

where θ is a parameter in [0, 1]. Then the marginals are exponentials, with quantile functions F −1(u) = − ln(1 − u) and G−1(v) = − ln(1 − v) for u, v ∈ [0, 1]. The corresponding copula is Cθ(u, v) = u + v − 1 + (1 − u)(1 − v)e−θ ln(1−u) ln(1−v).

16 / 75

SLIDE 32

Example 2: Bivariate extreme value distribution

Let X and Y be random variables with a joint distribution given by Hθ(x, y) = exp[−(e−θx + e−θy)1/θ] for all x, y ∈ ¯ ℜ, where θ ≥ 1. The corresponding Gumbel-Hougaard copula is given by Cθ(u, v) = exp

−
(− ln u)θ + (− ln v)θ1/θ

.

17 / 75

SLIDE 33

Example 2: Bivariate extreme value distribution

Let X and Y be random variables with a joint distribution given by Hθ(x, y) = exp[−(e−θx + e−θy)1/θ] for all x, y ∈ ¯ ℜ, where θ ≥ 1. The corresponding Gumbel-Hougaard copula is given by Cθ(u, v) = exp

−
(− ln u)θ + (− ln v)θ1/θ

.

17 / 75

SLIDE 34

Plotting Gumbel-Hougaard copula with different marginals

UniformDistribution0, 1

0.0 0.5 1.0 2 1 1 2 0.0 0.5 1.0

ExponentialDistribution2

0.0 0.5 1.0 2 1 1 2 0.0 0.5 1.0

18 / 75

SLIDE 35

Plotting Gumbel-Hougaard copula with different marginals

NormalDistribution0, 1

0.0 0.5 1.0 2 1 1 2 0.0 0.5 1.0

LaplaceDistribution0, 1

0.0 0.5 1.0 2 1 1 2 0.0 0.2 0.4 0.6 0.8

19 / 75

SLIDE 36

Plotting Gumbel-Hougaard copula with different marginals

GumbelDistribution1, 2

0.0 0.5 1.0 2 1 1 2 0.0 0.2 0.4 0.6 0.8

WeibullDistribution2, 1

0.0 0.5 1.0 2 1 1 2 0.0 0.5 1.0

20 / 75

SLIDE 37

Fr´ echet-Hoeffding bounds and copulas

These will be central topics in the applications presented here.

21 / 75

SLIDE 38

Fr´ echet-Hoeffding copulas

Let C(u, v) be a 2-copula; then, for u, v ∈ [0, 1], W (u, v) ≡ max{u + v − 1, 0} ≤ C(u, v) ≤ min{u, v} ≡ M(u, v), and M and W are also copulas, the upper and lower Fr´ echet-Hoeffding copula. The dependence properties translate into: For W , P(U + V = 1) = 1, and for M, P(U = V ) = 1.

22 / 75

SLIDE 39

Fr´ echet-Hoeffding copulas

Let C(u, v) be a 2-copula; then, for u, v ∈ [0, 1], W (u, v) ≡ max{u + v − 1, 0} ≤ C(u, v) ≤ min{u, v} ≡ M(u, v), and M and W are also copulas, the upper and lower Fr´ echet-Hoeffding copula. The dependence properties translate into: For W , P(U + V = 1) = 1, and for M, P(U = V ) = 1.

22 / 75

SLIDE 40

Copulas and order statistics: diagonal section

Maximum

Let U1, U2, . . . , Un be random variables defined on the same

prob. space., having uniform distribution on (0, 1) and C as

their distribution function; then, for every t ∈ [0, 1] P(max{U1, U2, . . . , Un}) ≤ t) = P

 

{Uj ≤ t}

 

= C(t, t, . . . , t) δC(t) := C(t, t, . . . , t) is called the diagonal section of the copula C.

23 / 75

SLIDE 41

Copulas and order statistics: diagonal section

Maximum

Let U1, U2, . . . , Un be random variables defined on the same

prob. space., having uniform distribution on (0, 1) and C as

their distribution function; then, for every t ∈ [0, 1] P(max{U1, U2, . . . , Un}) ≤ t) = P

 

{Uj ≤ t}

 

= C(t, t, . . . , t) δC(t) := C(t, t, . . . , t) is called the diagonal section of the copula C.

23 / 75

SLIDE 42

Copulas and order statistics: diagonal section

Minimum

For n = 2, it follows P(min{U, V }) ≤ t) = P(U ≤ t) + P(V ≤ t) − P(U ≤ t ∩ V ≤ t) = 2t − δC(t). Thus, determining a bivariate copula C with prescribed diagonal section δ is equivalent to determining a random vector (U, V ) such that (U, V ) ∼ C and the marginal distribution functions of the order statistics of (U, V ) are known. Note: the diagonal of a copula does not uniquely determine the underlying copula.

24 / 75

SLIDE 43

Copulas and order statistics: diagonal section

Minimum

For n = 2, it follows P(min{U, V }) ≤ t) = P(U ≤ t) + P(V ≤ t) − P(U ≤ t ∩ V ≤ t) = 2t − δC(t). Thus, determining a bivariate copula C with prescribed diagonal section δ is equivalent to determining a random vector (U, V ) such that (U, V ) ∼ C and the marginal distribution functions of the order statistics of (U, V ) are known. Note: the diagonal of a copula does not uniquely determine the underlying copula.

24 / 75

SLIDE 44

Copulas and order statistics: diagonal section

Minimum

For n = 2, it follows P(min{U, V }) ≤ t) = P(U ≤ t) + P(V ≤ t) − P(U ≤ t ∩ V ≤ t) = 2t − δC(t). Thus, determining a bivariate copula C with prescribed diagonal section δ is equivalent to determining a random vector (U, V ) such that (U, V ) ∼ C and the marginal distribution functions of the order statistics of (U, V ) are known. Note: the diagonal of a copula does not uniquely determine the underlying copula.

24 / 75

SLIDE 45

Application to parallel systems

We loosely define Parallel system: “Existence of one or more multivariate distributions on possibly different probability spaces, where random variables relate to certain behavioral or neural processes”.

25 / 75

SLIDE 46

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

26 / 75

SLIDE 47

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

26 / 75

SLIDE 48

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

26 / 75

SLIDE 49

The multisensory paradigm (behavioral version)

◮ Unimodal condition: a stimulus of a single modality (visual, auditory, tactile) is presented and the participant is asked to respond (by button press or eye movement) as quickly as possible upon detecting the stimulus (reaction time, RT, task). ◮ Bi- or trimodal condition: stimuli from two or three modalities are presented (nearly) simultaneously and the participant is asked to respond as quickly as possible upon detecting a stimulus of any modality (redundant signals task)

27 / 75

SLIDE 50

The multisensory paradigm (behavioral version)

◮ We refer to V, A, T as the unimodal context where visual, auditory, or tactile stimuli are presented, resp. Simlarly, VA denotes a bimodal (visual-auditory) context, etc. ◮ For each stimulus, or stimulus combination, we observe samples from a random variable representing the reaction time measured in any given trial. Let FV (t), FA(t), FVA(t), . . . denote the distribution functions of reaction time in a unimodal visual, auditory, or a visual-auditory context, etc. when a specific stimulus (combination) is presented.

28 / 75

SLIDE 51

The multisensory paradigm (behavioral version)

◮ We refer to V, A, T as the unimodal context where visual, auditory, or tactile stimuli are presented, resp. Simlarly, VA denotes a bimodal (visual-auditory) context, etc. ◮ For each stimulus, or stimulus combination, we observe samples from a random variable representing the reaction time measured in any given trial. Let FV (t), FA(t), FVA(t), . . . denote the distribution functions of reaction time in a unimodal visual, auditory, or a visual-auditory context, etc. when a specific stimulus (combination) is presented.

28 / 75

SLIDE 52

Existence of couplings

◮ Each context V, A, VA refers to a different event space (σ-algebra), so, from an empirical point of view, no couplings between the reaction time random variables in these different conditions necessarily exist. ◮ A common assumption, often not stated explicitly, is that there exists a coupling between visual and auditory RT, for example, such that the margins of the bivariate distribution HVA are equal to FV and FA. ◮ Given a coupling exists, an assumption on how HVA is related to the margins of FV and FA, is required (⇒ copula ?). ◮ In principle, this could be tested empirically. However, the margins of HVA are not observable, only the distribution function of RTs in the bimodal context, FVA, is.

29 / 75

SLIDE 53

Existence of couplings

◮ Each context V, A, VA refers to a different event space (σ-algebra), so, from an empirical point of view, no couplings between the reaction time random variables in these different conditions necessarily exist. ◮ A common assumption, often not stated explicitly, is that there exists a coupling between visual and auditory RT, for example, such that the margins of the bivariate distribution HVA are equal to FV and FA. ◮ Given a coupling exists, an assumption on how HVA is related to the margins of FV and FA, is required (⇒ copula ?). ◮ In principle, this could be tested empirically. However, the margins of HVA are not observable, only the distribution function of RTs in the bimodal context, FVA, is.

29 / 75

SLIDE 54

Existence of couplings

◮ Each context V, A, VA refers to a different event space (σ-algebra), so, from an empirical point of view, no couplings between the reaction time random variables in these different conditions necessarily exist. ◮ A common assumption, often not stated explicitly, is that there exists a coupling between visual and auditory RT, for example, such that the margins of the bivariate distribution HVA are equal to FV and FA. ◮ Given a coupling exists, an assumption on how HVA is related to the margins of FV and FA, is required (⇒ copula ?). ◮ In principle, this could be tested empirically. However, the margins of HVA are not observable, only the distribution function of RTs in the bimodal context, FVA, is.

29 / 75

SLIDE 55

Existence of couplings

◮ Each context V, A, VA refers to a different event space (σ-algebra), so, from an empirical point of view, no couplings between the reaction time random variables in these different conditions necessarily exist. ◮ A common assumption, often not stated explicitly, is that there exists a coupling between visual and auditory RT, for example, such that the margins of the bivariate distribution HVA are equal to FV and FA. ◮ Given a coupling exists, an assumption on how HVA is related to the margins of FV and FA, is required (⇒ copula ?). ◮ In principle, this could be tested empirically. However, the margins of HVA are not observable, only the distribution function of RTs in the bimodal context, FVA, is.

29 / 75

SLIDE 56

The (possibly non-independent) race model

The model studied most often is the (possibly non-independent) race model: Let V and A be the random reaction times in unimodal conditions V and A, with distribution functions FV (t) and FA(t), resp. ◮ Assume bimodal RT is determined by the “winner” of the race between the modalities: FVA(t) = P(V ≤ t or A ≤ t) ◮ Then FVA(t) = FV (t) + FA(t) − HVA(t, t), with HVA(t, t) = P(V ≤ t and A ≤ t).

30 / 75

SLIDE 57

The (possibly non-independent) race model

The model studied most often is the (possibly non-independent) race model: Let V and A be the random reaction times in unimodal conditions V and A, with distribution functions FV (t) and FA(t), resp. ◮ Assume bimodal RT is determined by the “winner” of the race between the modalities: FVA(t) = P(V ≤ t or A ≤ t) ◮ Then FVA(t) = FV (t) + FA(t) − HVA(t, t), with HVA(t, t) = P(V ≤ t and A ≤ t).

30 / 75

SLIDE 58

The (possibly non-independent) race model

The model studied most often is the (possibly non-independent) race model: Let V and A be the random reaction times in unimodal conditions V and A, with distribution functions FV (t) and FA(t), resp. ◮ Assume bimodal RT is determined by the “winner” of the race between the modalities: FVA(t) = P(V ≤ t or A ≤ t) ◮ Then FVA(t) = FV (t) + FA(t) − HVA(t, t), with HVA(t, t) = P(V ≤ t and A ≤ t).

30 / 75

SLIDE 59

The (possibly non-independent) race model

◮ from the Fr´ echet bounds, max{FV (t) + FA(t) − 1, 0} ≤ HVA(t, t) ≤ min{FV (t), FA(t)}. ◮ Inserting and rearranging yields max{FV (t), FA(t)} ≤ FVA(t) ≤ min{FV (t)+FA(t), 1}, t ≥ 0, (‘race model inequality’, Miller 1982) ◮ The upper bound corresponds to maximal negative dependence between V and A, the lower bound to maximal positive dependence. ◮ Empirical violation of the upper bound (occurring only for small enough t) is interpreted as evidence against the race mechanism (“bimodal RT faster than predictable from unimodal conditions”).

31 / 75

SLIDE 60

The (possibly non-independent) race model

◮ from the Fr´ echet bounds, max{FV (t) + FA(t) − 1, 0} ≤ HVA(t, t) ≤ min{FV (t), FA(t)}. ◮ Inserting and rearranging yields max{FV (t), FA(t)} ≤ FVA(t) ≤ min{FV (t)+FA(t), 1}, t ≥ 0, (‘race model inequality’, Miller 1982) ◮ The upper bound corresponds to maximal negative dependence between V and A, the lower bound to maximal positive dependence. ◮ Empirical violation of the upper bound (occurring only for small enough t) is interpreted as evidence against the race mechanism (“bimodal RT faster than predictable from unimodal conditions”).

31 / 75

SLIDE 61

The (possibly non-independent) race model

◮ from the Fr´ echet bounds, max{FV (t) + FA(t) − 1, 0} ≤ HVA(t, t) ≤ min{FV (t), FA(t)}. ◮ Inserting and rearranging yields max{FV (t), FA(t)} ≤ FVA(t) ≤ min{FV (t)+FA(t), 1}, t ≥ 0, (‘race model inequality’, Miller 1982) ◮ The upper bound corresponds to maximal negative dependence between V and A, the lower bound to maximal positive dependence. ◮ Empirical violation of the upper bound (occurring only for small enough t) is interpreted as evidence against the race mechanism (“bimodal RT faster than predictable from unimodal conditions”).

31 / 75

SLIDE 62

The (possibly non-independent) race model

◮ from the Fr´ echet bounds, max{FV (t) + FA(t) − 1, 0} ≤ HVA(t, t) ≤ min{FV (t), FA(t)}. ◮ Inserting and rearranging yields max{FV (t), FA(t)} ≤ FVA(t) ≤ min{FV (t)+FA(t), 1}, t ≥ 0, (‘race model inequality’, Miller 1982) ◮ The upper bound corresponds to maximal negative dependence between V and A, the lower bound to maximal positive dependence. ◮ Empirical violation of the upper bound (occurring only for small enough t) is interpreted as evidence against the race mechanism (“bimodal RT faster than predictable from unimodal conditions”).

31 / 75

SLIDE 63

Race Model Inequality Test

Gondan & Minakata 2016

The grey area between the upper bound FA + FV and the bimodal RT distribution FVA is taken as a measure of the amount of violation of the race model

inequality. This area is the

(sample estimate of the) expected value of min(A, V ): E(−)[min(V , A)], under maximal negative dependence between V and A

(Colonius & Diederich PsyRev 2006). Estimation is

straightforward.

32 / 75

SLIDE 64

Crossmodal response enhancement (CRE)

◮ We will use maximal negative probability summation to define a new measure of crossmodal response enhancemen for RT. ◮ Response enhancement in RT means “faster average responses”: CRERT = min{E[RTV ], E[RTA]} − E[RTVA] min{E[RTV ], E[RTA] × 100

33 / 75

SLIDE 65

Crossmodal response enhancement (CRE)

◮ We will use maximal negative probability summation to define a new measure of crossmodal response enhancemen for RT. ◮ Response enhancement in RT means “faster average responses”: CRERT = min{E[RTV ], E[RTA]} − E[RTVA] min{E[RTV ], E[RTA] × 100

33 / 75

SLIDE 66

The new measure of CRE in RT

1. Replace the traditional

CRERT = min{E[RTV ], E[RTA]} − E[RTVA] min{E[RTV ], E[RTA] × 100 by CRERT(−) = E(−)[min(V , A)] − E[RTVA] E(−)[min(V , A)] × 100

2. CRERT(−) ≤ CRERT

34 / 75

SLIDE 67

The multisensory paradigm (neural version)

35 / 75

SLIDE 68

CRE at the level of a single neuron

Response strength: the absolute number of impulses (spikes) registered within in a fixed time interval after stimulus presentation

Stein et al., Nat Rev Neurosci 2014

CRE = CMVA − max{UMV , UMA} max{UMV , UMA} × 100, CRE = 123[%]

36 / 75

SLIDE 69

CRE at the level of a single neuron

Response strength: the absolute number of impulses (spikes) registered within in a fixed time interval after stimulus presentation

Stein et al., Nat Rev Neurosci 2014

CRE = CMVA − max{UMV , UMA} max{UMV , UMA} × 100, CRE = 123[%]

36 / 75

SLIDE 70

CRE at the level of a single neuron

Response strength: the absolute number of impulses (spikes) registered within in a fixed time interval after stimulus presentation

Stein et al., Nat Rev Neurosci 2014

CRE = CMVA − max{UMV , UMA} max{UMV , UMA} × 100, CRE = 123[%]

36 / 75

SLIDE 71

CRE at the level of a single neuron

Response strength: the absolute number of impulses (spikes) registered within in a fixed time interval after stimulus presentation

Stein et al., Nat Rev Neurosci 2014

CRE = CMVA − max{UMV , UMA} max{UMV , UMA} × 100, CRE = 123[%]

36 / 75

SLIDE 72

CRE at the level of a single neuron

◮ this measure is a very useful tool, but it is purely descriptive ◮ no theoretical foundation in terms of the possible operations SC neurons may perform = ⇒ Being responsive to multiple sensory modalities does not guarantee that a neuron has actually engaged in integrating its multiple sensory inputs rather than simply responding to the most salient stimulus. Multisensory computations performed by SC neurons still not fully understood

37 / 75

SLIDE 73

CRE at the level of a single neuron

◮ this measure is a very useful tool, but it is purely descriptive ◮ no theoretical foundation in terms of the possible operations SC neurons may perform = ⇒ Being responsive to multiple sensory modalities does not guarantee that a neuron has actually engaged in integrating its multiple sensory inputs rather than simply responding to the most salient stimulus. Multisensory computations performed by SC neurons still not fully understood

37 / 75

SLIDE 74

CRE at the level of a single neuron

◮ this measure is a very useful tool, but it is purely descriptive ◮ no theoretical foundation in terms of the possible operations SC neurons may perform = ⇒ Being responsive to multiple sensory modalities does not guarantee that a neuron has actually engaged in integrating its multiple sensory inputs rather than simply responding to the most salient stimulus. Multisensory computations performed by SC neurons still not fully understood

37 / 75

SLIDE 75

CRE at the level of a single neuron

◮ this measure is a very useful tool, but it is purely descriptive ◮ no theoretical foundation in terms of the possible operations SC neurons may perform = ⇒ Being responsive to multiple sensory modalities does not guarantee that a neuron has actually engaged in integrating its multiple sensory inputs rather than simply responding to the most salient stimulus. Multisensory computations performed by SC neurons still not fully understood

37 / 75

SLIDE 76

CRE for single neuron data

◮ Task: Develop a measure of crossmodal enhancement for single neuron data, in analogy to CRERT(−). ◮ NV , NA, and NVA denote random number of impulses emitted by a neuron, following unisensory (visual, auditory) and cross-modal stimulation. The traditional index is: CREMAX = E[NVA] − max{E[NV ], E[NA]} max{E[NV ], E[NA]} × 100.

38 / 75

SLIDE 77

Towards a new index of CRE in neural data

GV (m) = PV (NV > m) and GA(m) = PA(NA > m), m = 0, 1, . . . the (survivor) distribution functions of NV and NA. In analogy to the race model inequality, from the Fr´ echet-Hoeffding bounds, min{GV (m), GA(m)} ≤ P(max{NV , NA} > m) ≤ max{0, GV (m) + GA(m) − 1} for m = 0, 1, . . . Again, upper and lower bounds are distribution functions (survival fcts) for random variable max{NV , NA} ! Summing over m and applying Jensen’ s inequality, we obtain

39 / 75

SLIDE 78

Proposition on Coupling NV and NA

Proposition

For any coupling of the univariate response random variables NV and NA with expected value E[max{NV , NA}], max{E[NV ], E[NA]} ≤ E[max{NV , NA}] ≤ E(−)[max{NV , NA}], where E(−)[max{NV , NA}] is the expected value under maximal negative dependence between NV and NA.

40 / 75

SLIDE 79

A a new index of CRE in neural data

Replace the traditional index CREMAX = E[NVA] − max{E[NV ], E[NA]} max{E[NV ], E[NA]} × 100. by CRE(−)

MAX = E[NVA] − E(−)[max{NV , NA}]

E(−)[max{NV , NA}] × 100. CRE(−)

MAX compares the observed bimodal response ENVA

with the largest bimodal response achievable by coupling the unisensory responses via negative stochastic dependence.

41 / 75

SLIDE 80

An Important Consequence

◮ CRE(−)

MAX ≤ CREMAX ⇒

Thus, a neuron labeled as “multisensory” under CREMAX may lose that property under CRE(−)

MAX.

42 / 75

SLIDE 81

Illustrative Example: Single SC Neuron Data

* Data set from Mark Wallace (Vanderbilt U.): 84 sessions, with 15 trials each, from 3 superior colliculus neurons (cat) (Results in Colonius & Diederich, Scientific Reports, 2017.)

43 / 75

SLIDE 82

Intermediate summary

◮ We suggest to replace the traditional multisensory index CREMAX by the new one, CRE(−)

MAX.

◮ The new one has a theoretical foundation: it measures the degree by which a neuron’s observed multisensory response surpasses the level obtainable by optimally combining the unisensory responses (assuming that the neuron simply reacts to the more salient modality in any given cross-modal trial). ◮ CRE(−)

MAX is easy to compute and does not require any

specific assumption about the distribution of the spikes. ◮ It is in straightforward analogy to a measure well-established in the domain of reaction times (“race model inequality”). ◮ It is sensitive to the variability of the data set.

44 / 75

SLIDE 83

Intermediate summary

◮ We suggest to replace the traditional multisensory index CREMAX by the new one, CRE(−)

MAX.

◮ The new one has a theoretical foundation: it measures the degree by which a neuron’s observed multisensory response surpasses the level obtainable by optimally combining the unisensory responses (assuming that the neuron simply reacts to the more salient modality in any given cross-modal trial). ◮ CRE(−)

MAX is easy to compute and does not require any

specific assumption about the distribution of the spikes. ◮ It is in straightforward analogy to a measure well-established in the domain of reaction times (“race model inequality”). ◮ It is sensitive to the variability of the data set.

44 / 75

SLIDE 84

Intermediate summary

◮ We suggest to replace the traditional multisensory index CREMAX by the new one, CRE(−)

MAX.

◮ The new one has a theoretical foundation: it measures the degree by which a neuron’s observed multisensory response surpasses the level obtainable by optimally combining the unisensory responses (assuming that the neuron simply reacts to the more salient modality in any given cross-modal trial). ◮ CRE(−)

MAX is easy to compute and does not require any

specific assumption about the distribution of the spikes. ◮ It is in straightforward analogy to a measure well-established in the domain of reaction times (“race model inequality”). ◮ It is sensitive to the variability of the data set.

44 / 75

SLIDE 85

Intermediate summary

◮ We suggest to replace the traditional multisensory index CREMAX by the new one, CRE(−)

MAX.

◮ The new one has a theoretical foundation: it measures the degree by which a neuron’s observed multisensory response surpasses the level obtainable by optimally combining the unisensory responses (assuming that the neuron simply reacts to the more salient modality in any given cross-modal trial). ◮ CRE(−)

MAX is easy to compute and does not require any

specific assumption about the distribution of the spikes. ◮ It is in straightforward analogy to a measure well-established in the domain of reaction times (“race model inequality”). ◮ It is sensitive to the variability of the data set.

44 / 75

SLIDE 86

Intermediate summary

◮ We suggest to replace the traditional multisensory index CREMAX by the new one, CRE(−)

MAX.

◮ The new one has a theoretical foundation: it measures the degree by which a neuron’s observed multisensory response surpasses the level obtainable by optimally combining the unisensory responses (assuming that the neuron simply reacts to the more salient modality in any given cross-modal trial). ◮ CRE(−)

MAX is easy to compute and does not require any

specific assumption about the distribution of the spikes. ◮ It is in straightforward analogy to a measure well-established in the domain of reaction times (“race model inequality”). ◮ It is sensitive to the variability of the data set.

44 / 75

SLIDE 87

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

45 / 75

SLIDE 88

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

45 / 75

SLIDE 89

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

45 / 75

SLIDE 90

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

45 / 75

SLIDE 91

Copulas for neural and behavioral parallel systems

(1) Multisensory integration: reaction times (2) Multisensory integration: spike numbers (impulse numbers) (3) Response inhibition: stop signal paradigm In (1) and (2), we present a new measure of multisensory integration that unifies behavioral and neural data. In (3), we modify a classic model of inhibitory behavior in a way to solve a paradox between behavioral and neural data.

45 / 75

SLIDE 92

Response inhibition: stop signal paradigm

46 / 75

SLIDE 93

Stop signal paradigm

Subjects are

instructed to make a response as quickly as possible to a go signal (no-stop-signal trial)

On a minority of

trials, a stop signal is presented and subjects have to inhibit the previously planned response (stop-signal trial)

47 / 75

SLIDE 94

Stop signal paradigm: inhibition functions

Inhibition functions of

three subjects (Logan & Cowan, 1984)

The inhibition function is

determined by stop-signal delay, but it also depends strongly on RT in the go task; the probability of responding given a stop signal is lower the longer the go RT

48 / 75

SLIDE 95

RT distributions with and without stop signal

Observed response time

distributions for no-stop

signal trials and

signal-respond trials with stop-signal delays (SSDs)

f 153, 241, and 329 ms

(from Logan et al 2014):

RTsignal-respond < RTgo
faster for shorter

stop-signal delays than for longer ones.

49 / 75

SLIDE 96

RT distributions with and without stop signal

Observed response time

distributions for no-stop

signal trials and

signal-respond trials with stop-signal delays (SSDs)

f 153, 241, and 329 ms

(from Logan et al 2014):

RTsignal-respond < RTgo
faster for shorter

stop-signal delays than for longer ones.

49 / 75

SLIDE 97

The general race model

50 / 75

SLIDE 98

The general race model (1)

◮ Distinguish two different experimental conditions termed context GO, where only a go signal is presented, and context STOP, where a stop signal is presented in addition. ◮ In STOP, let Tgo and Tstop denote the random processing time for the go and the stop signal, respectively, with (unobservable !) bivariate distribution function H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t), for all s, t ≥ 0. ◮ The marginal distributions of H(s, t) are denoted as Fgo(s) = Pr(Tgo ≤ s, Tstop < ∞) Fstop(t) = Pr(Tgo < ∞, Tstop ≤ t).

51 / 75

SLIDE 99

The general race model (1)

◮ Distinguish two different experimental conditions termed context GO, where only a go signal is presented, and context STOP, where a stop signal is presented in addition. ◮ In STOP, let Tgo and Tstop denote the random processing time for the go and the stop signal, respectively, with (unobservable !) bivariate distribution function H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t), for all s, t ≥ 0. ◮ The marginal distributions of H(s, t) are denoted as Fgo(s) = Pr(Tgo ≤ s, Tstop < ∞) Fstop(t) = Pr(Tgo < ∞, Tstop ≤ t).

51 / 75

SLIDE 100

The general race model (1)

◮ Distinguish two different experimental conditions termed context GO, where only a go signal is presented, and context STOP, where a stop signal is presented in addition. ◮ In STOP, let Tgo and Tstop denote the random processing time for the go and the stop signal, respectively, with (unobservable !) bivariate distribution function H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t), for all s, t ≥ 0. ◮ The marginal distributions of H(s, t) are denoted as Fgo(s) = Pr(Tgo ≤ s, Tstop < ∞) Fstop(t) = Pr(Tgo < ∞, Tstop ≤ t).

51 / 75

SLIDE 101

The general race model (2)

NOTE: The distribution of Tgo could be different in context GO and in context STOP. However, the general race model rules this

ut by adding the important

Context invariance assumption The distribution of go signal processing time Tgo is the same in context GO and context STOP. Race assumption Probability of a response despite stop signal at delay td: pr(td) = Pr(Tgo < Tstop + td) (1) Assume H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t) is absolutely continuous, so that density functions for the marginals exist, denoted as fgo(s) and fstop(t).

52 / 75

SLIDE 102

The general race model (2)

NOTE: The distribution of Tgo could be different in context GO and in context STOP. However, the general race model rules this

ut by adding the important

Context invariance assumption The distribution of go signal processing time Tgo is the same in context GO and context STOP. Race assumption Probability of a response despite stop signal at delay td: pr(td) = Pr(Tgo < Tstop + td) (1) Assume H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t) is absolutely continuous, so that density functions for the marginals exist, denoted as fgo(s) and fstop(t).

52 / 75

SLIDE 103

The general race model (2)

NOTE: The distribution of Tgo could be different in context GO and in context STOP. However, the general race model rules this

ut by adding the important

Context invariance assumption The distribution of go signal processing time Tgo is the same in context GO and context STOP. Race assumption Probability of a response despite stop signal at delay td: pr(td) = Pr(Tgo < Tstop + td) (1) Assume H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t) is absolutely continuous, so that density functions for the marginals exist, denoted as fgo(s) and fstop(t).

52 / 75

SLIDE 104

The general race model (3)

The RT distribution of responses given a stop signal at delay td (signal-response distribution) is Fsr(t | td) = Pr(Tgo ≤ t | Tgo < Tstop + td) Goal: Estimate stop-signal processing distribution Fstop(t)

53 / 75

SLIDE 105

The independent race model

54 / 75

SLIDE 106

The independent race model (1)

Logan & Cown (1984) suggested the independent race model assuming stochastic independence between Tgo and Tstop: Stochastic independence: for all s, t H(s, t) = Pr(Tgo ≤ s) Pr(Tstop ≤ t) = Fgo(s) Fstop(t) Then pr(td) = Pr(Tgo < Tstop + td) =

∞

fgo(t) [1 − Fstop(t − td)] dt. (2)

55 / 75

SLIDE 107

The independent race model (1)

Logan & Cown (1984) suggested the independent race model assuming stochastic independence between Tgo and Tstop: Stochastic independence: for all s, t H(s, t) = Pr(Tgo ≤ s) Pr(Tstop ≤ t) = Fgo(s) Fstop(t) Then pr(td) = Pr(Tgo < Tstop + td) =

∞

fgo(t) [1 − Fstop(t − td)] dt. (2)

55 / 75

SLIDE 108

The independent race model (2)

Density of the signal-response time distribution Fsr(t|td), for t > td fsr(t | td) = fgo(t) [1 − Fstop(t − td)]/pr(td). Solving for Fstop(t − td), Fstop(t − td) = 1 − fsr(t | td)pr(td) fgo(t) , (3) known as the “Colonius” method (Colonius 1990). Unfortunately obtaining reliable estimates for the stop signal distribution using Equation (3) requires unrealistically large numbers of observations in practice (Band et al. 2003; Matzke et al. 2013).

56 / 75

SLIDE 109

The independent race model (2)

Density of the signal-response time distribution Fsr(t|td), for t > td fsr(t | td) = fgo(t) [1 − Fstop(t − td)]/pr(td). Solving for Fstop(t − td), Fstop(t − td) = 1 − fsr(t | td)pr(td) fgo(t) , (3) known as the “Colonius” method (Colonius 1990). Unfortunately obtaining reliable estimates for the stop signal distribution using Equation (3) requires unrealistically large numbers of observations in practice (Band et al. 2003; Matzke et al. 2013).

56 / 75

SLIDE 110

The independent race model (3): Integration method

Figure

from Schall et al. 2017

Assume constant stop signal processing time: Tstop = SSRT pr(td) = SSRT+td fgo(t)dt

Because estimates of both pr(td)

and fgo(t) are available, this allows estimation of stop signal processing mean SSRT.

57 / 75

SLIDE 111

The mean method

pr(td) = Pr(Tgo < Tstop + td) = Pr(Tgo − Tstop < td) = Pr(Tgo − Tstop < td) ≡ Pr(Td < td) ⇒ E[Td] = E[Tgo] − E[Tstop] Independent race model: Var[Td] = Var[Tgo] + Var[Tstop] ⇒ Var[Tstop] = Var[Tgo] − Var[Td]

58 / 75

SLIDE 112

The mean method

pr(td) = Pr(Tgo < Tstop + td) = Pr(Tgo − Tstop < td) = Pr(Tgo − Tstop < td) ≡ Pr(Td < td) ⇒ E[Td] = E[Tgo] − E[Tstop] Independent race model: Var[Td] = Var[Tgo] + Var[Tstop] ⇒ Var[Tstop] = Var[Tgo] − Var[Td]

58 / 75

SLIDE 113

The mean method

pr(td) = Pr(Tgo < Tstop + td) = Pr(Tgo − Tstop < td) = Pr(Tgo − Tstop < td) ≡ Pr(Td < td) ⇒ E[Td] = E[Tgo] − E[Tstop] Independent race model: Var[Td] = Var[Tgo] + Var[Tstop] ⇒ Var[Tstop] = Var[Tgo] − Var[Td]

58 / 75

SLIDE 114

The mean method

pr(td) = Pr(Tgo < Tstop + td) = Pr(Tgo − Tstop < td) = Pr(Tgo − Tstop < td) ≡ Pr(Td < td) ⇒ E[Td] = E[Tgo] − E[Tstop] Independent race model: Var[Td] = Var[Tgo] + Var[Tstop] ⇒ Var[Tstop] = Var[Tgo] − Var[Td]

58 / 75

SLIDE 115

The mean method

pr(td) = Pr(Tgo < Tstop + td) = Pr(Tgo − Tstop < td) = Pr(Tgo − Tstop < td) ≡ Pr(Td < td) ⇒ E[Td] = E[Tgo] − E[Tstop] Independent race model: Var[Td] = Var[Tgo] + Var[Tstop] ⇒ Var[Tstop] = Var[Tgo] − Var[Td]

58 / 75

SLIDE 116

The mean method

pr(td) = Pr(Tgo < Tstop + td) = Pr(Tgo − Tstop < td) = Pr(Tgo − Tstop < td) ≡ Pr(Td < td) ⇒ E[Td] = E[Tgo] − E[Tstop] Independent race model: Var[Td] = Var[Tgo] + Var[Tstop] ⇒ Var[Tstop] = Var[Tgo] − Var[Td]

58 / 75

SLIDE 117

The paradox

59 / 75

SLIDE 118

The paradox

Studying saccade countermanding in monkeys, Hanes and colleagues (Hanes & Schall 1995, Hanes et al. 1998) recorded from frontal and supplem. eye

fields. They found neurons

involved in gaze-shifting and gaze-holding that modulate on stop-signal trials, just before SSRT when the monkey stopped

successfully. (Figure from

Schall & Logan 2017 )

60 / 75

SLIDE 119

The paradox

◮ The paradox: How can interacting circuits of mutually inhibitory gaze-holding and gaze-shifting neurons instantiate STOP and GO processes with independent finishing times? ◮ Proposed solution: interactive race model (Boucher et al. 2007):

61 / 75

SLIDE 120

The paradox

◮ The paradox: How can interacting circuits of mutually inhibitory gaze-holding and gaze-shifting neurons instantiate STOP and GO processes with independent finishing times? ◮ Proposed solution: interactive race model (Boucher et al. 2007), blocked-input model (Logan et al. 2015) ◮ stochastic differential equations; no longer non-parametric.

62 / 75

SLIDE 121

The race model with perfect negative dependence

63 / 75

SLIDE 122

The Fr´ echet-Hoeffding bounds

◮ For any bivariate distribution function H(s, t) = Pr(Tgo ≤ s, Tstop ≤ t) the following inequality holds: H−(s, t) ≤ H(s, t) with H−(s, t) = max{Fgo(s) + Fstop(t) − 1, 0} ◮ H−(s, t) is a distribution function. Specifically, it correspond to perfect negative dependence between Tgo and Tstop.

64 / 75

SLIDE 123

Perfect negative dependence

What does it mean? H−(s, t) = max{Fgo(s) + Fstop(t) − 1, 0}. (4) for all s, t (s, t ≥ 0). The marginal distributions of H−(s, t) are the same as before, that is, Fgo(s) and Fstop(t). Note that this perfect negative stochastic dependence (PND) model is parameter-free just like the IND race model, that is, we do not assume any specific parametric distribution.

65 / 75

SLIDE 124

Perfect negative dependence

What does it mean? H−(s, t) = max{Fgo(s) + Fstop(t) − 1, 0}. (4) for all s, t (s, t ≥ 0). The marginal distributions of H−(s, t) are the same as before, that is, Fgo(s) and Fstop(t). Note that this perfect negative stochastic dependence (PND) model is parameter-free just like the IND race model, that is, we do not assume any specific parametric distribution.

65 / 75

SLIDE 125

Perfect negative dependence

What does it mean? H−(s, t) = max{Fgo(s) + Fstop(t) − 1, 0}. (4) for all s, t (s, t ≥ 0). The marginal distributions of H−(s, t) are the same as before, that is, Fgo(s) and Fstop(t). Note that this perfect negative stochastic dependence (PND) model is parameter-free just like the IND race model, that is, we do not assume any specific parametric distribution.

65 / 75

SLIDE 126

Perfect negative dependence: the key property

Fstop(Tstop) = 1 − Fgo(Tgo) (5) holds “almost surely”, that is, with probability 1. ◮ Thus, for any Fgo percentile we immediately obtain the corresponding Fstop percentile as complementary probability and vice versa, which expresses perfect negative dependence between Tgo and Tstop. ◮ The relation in Equation [5] is also interpretable as “Tstop is (almost surely) a decreasing function of Tgo”. ◮ It constitutes the most direct implementation of the notion of “mutual inhibition” observed in neural data: any increase of inhibitory activity (speed-up of Tstop) elicits a corresponding decrease in “go” activity (slow-down of Tgo) and vice versa.

66 / 75

SLIDE 127

Perfect negative dependence: the key property

Fstop(Tstop) = 1 − Fgo(Tgo) (5) holds “almost surely”, that is, with probability 1. ◮ Thus, for any Fgo percentile we immediately obtain the corresponding Fstop percentile as complementary probability and vice versa, which expresses perfect negative dependence between Tgo and Tstop. ◮ The relation in Equation [5] is also interpretable as “Tstop is (almost surely) a decreasing function of Tgo”. ◮ It constitutes the most direct implementation of the notion of “mutual inhibition” observed in neural data: any increase of inhibitory activity (speed-up of Tstop) elicits a corresponding decrease in “go” activity (slow-down of Tgo) and vice versa.

66 / 75

SLIDE 128

Perfect negative dependence: the key property

Fstop(Tstop) = 1 − Fgo(Tgo) (5) holds “almost surely”, that is, with probability 1. ◮ Thus, for any Fgo percentile we immediately obtain the corresponding Fstop percentile as complementary probability and vice versa, which expresses perfect negative dependence between Tgo and Tstop. ◮ The relation in Equation [5] is also interpretable as “Tstop is (almost surely) a decreasing function of Tgo”. ◮ It constitutes the most direct implementation of the notion of “mutual inhibition” observed in neural data: any increase of inhibitory activity (speed-up of Tstop) elicits a corresponding decrease in “go” activity (slow-down of Tgo) and vice versa.

66 / 75

SLIDE 129

Predictions from perfect negative dependence

Do we have to throw away all measures obtained using the independent model, like estimates of mean Tstop ≡ SSRT? No ! Because the (marginal) distribution of Tstop are the same under independence and perfect negative dependence. Thus E[Tstop | IND] = E[Tstop | PND] ◮ But for the variance, Var[Tstop] = Var[Td] − Var[Tgo] + 2 Cov[Tgo, Tstop]. (6)

67 / 75

SLIDE 130

Predictions from perfect negative dependence

Do we have to throw away all measures obtained using the independent model, like estimates of mean Tstop ≡ SSRT? No ! Because the (marginal) distribution of Tstop are the same under independence and perfect negative dependence. Thus E[Tstop | IND] = E[Tstop | PND] ◮ But for the variance, Var[Tstop] = Var[Td] − Var[Tgo] + 2 Cov[Tgo, Tstop]. (6)

67 / 75

SLIDE 131

Can we test for PND ?

“Fan effect”:

68 / 75

SLIDE 132

Can we test for PND ?

dashed = IND line = PND Tgo, Tstop: exponential distribution simulation: copBasic package in R

69 / 75

SLIDE 133

Is there a paradox?

70 / 75

SLIDE 134

Is there a paradox?

◮ We need to distinguish different levels of description, behavioral vs. neural. The race model (whether independent

r dependent) does not describe the neural processes

underlying stopping behavior. ◮ “Linking propositions” are theories about how the specific,

bservable aspects of the neuroscientific data should be

related to specific, but often latent, aspects of the formal models. ◮ However: “Linking propositions” are affected by the behavioral model and can go astray: “In short, the interaction of the STOP with the GO unit must be late and potent – late to preserve the independence of the GO and STOP processes through SSRT and potent because it must be late.” (Schall et al. 2017) ◮ Therefore, it DOES matter what kind of dependency is assumed in the race model.

71 / 75

SLIDE 135

Is there a paradox?

◮ We need to distinguish different levels of description, behavioral vs. neural. The race model (whether independent

r dependent) does not describe the neural processes

underlying stopping behavior. ◮ “Linking propositions” are theories about how the specific,

bservable aspects of the neuroscientific data should be

related to specific, but often latent, aspects of the formal models. ◮ However: “Linking propositions” are affected by the behavioral model and can go astray: “In short, the interaction of the STOP with the GO unit must be late and potent – late to preserve the independence of the GO and STOP processes through SSRT and potent because it must be late.” (Schall et al. 2017) ◮ Therefore, it DOES matter what kind of dependency is assumed in the race model.

71 / 75

SLIDE 136

Is there a paradox?

◮ We need to distinguish different levels of description, behavioral vs. neural. The race model (whether independent

r dependent) does not describe the neural processes

underlying stopping behavior. ◮ “Linking propositions” are theories about how the specific,

bservable aspects of the neuroscientific data should be

related to specific, but often latent, aspects of the formal models. ◮ However: “Linking propositions” are affected by the behavioral model and can go astray: “In short, the interaction of the STOP with the GO unit must be late and potent – late to preserve the independence of the GO and STOP processes through SSRT and potent because it must be late.” (Schall et al. 2017) ◮ Therefore, it DOES matter what kind of dependency is assumed in the race model.

71 / 75

SLIDE 137

Is there a paradox?

◮ We need to distinguish different levels of description, behavioral vs. neural. The race model (whether independent

r dependent) does not describe the neural processes

underlying stopping behavior. ◮ “Linking propositions” are theories about how the specific,

bservable aspects of the neuroscientific data should be

related to specific, but often latent, aspects of the formal models. ◮ However: “Linking propositions” are affected by the behavioral model and can go astray: “In short, the interaction of the STOP with the GO unit must be late and potent – late to preserve the independence of the GO and STOP processes through SSRT and potent because it must be late.” (Schall et al. 2017) ◮ Therefore, it DOES matter what kind of dependency is assumed in the race model.

71 / 75

SLIDE 138

Summary and Conclusions

◮ Behavioral and neural models have distinct levels of description based on different types of data, and either level can lead to insights into the cognitive process. ◮ The goal of “closing the gap” may be approached by “mutual inspiration”. ◮ Linking hypotheses are an important part of closing the gap, but their usefulness depends on how good the models are (in both areas).

72 / 75

SLIDE 139

Summary and Conclusions

◮ Behavioral and neural models have distinct levels of description based on different types of data, and either level can lead to insights into the cognitive process. ◮ The goal of “closing the gap” may be approached by “mutual inspiration”. ◮ Linking hypotheses are an important part of closing the gap, but their usefulness depends on how good the models are (in both areas).

72 / 75

SLIDE 140

Summary and Conclusions

◮ Behavioral and neural models have distinct levels of description based on different types of data, and either level can lead to insights into the cognitive process. ◮ The goal of “closing the gap” may be approached by “mutual inspiration”. ◮ Linking hypotheses are an important part of closing the gap, but their usefulness depends on how good the models are (in both areas).

72 / 75

SLIDE 141

Summary and Conclusions

◮ Behavioral and neural models have distinct levels of description based on different types of data, and either level can lead to insights into the cognitive process. ◮ The goal of “closing the gap” may be approached by “mutual inspiration”. ◮ Linking hypotheses are an important part of closing the gap, but their usefulness depends on how good the models are (in both areas).

72 / 75

SLIDE 142

Acknowledgment

◮ Supported by DFG (German Science Foundation) SFB/TRR-31 (Project B4, HC), DFG Cluster of Excellence EXC 1077/1 Hearing4all (HC) and DFG Grant DI 506/12-1 (AD)

73 / 75

SLIDE 143

Acknowledgment

◮ Collaborator: Prof. Adele Diederich, Jacobs University Bremen ◮ Student assistants: Lisa Wergen, Sarah Blum, Felix Wolff ◮ Funding: DFG (German Science Foundation)

74 / 75

SLIDE 144

Bibliography

◮ Colonius H, Diederich A (2018). Paradox resolved: stop signal race model with negative dependence. Psychological Review, 10.1037/rev0000127. ◮ Colonius H (2017) Selected concepts from probability. Chpt. 1 in WH Batchelder etal, New Handbook of Mathematical Psychology, Cambridge

U. Press.

◮ Colonius H, Diederich A (2017) Measuring multisensory integration: from reaction times to spike counts. Scientific Reports, 7:3023. DOI:10.1038/s41598-017-03219-5. rdcu.be/tjo1. ◮ Colonius H (2016) An invitation to coupling and copulas: with applications to multisensory modeling. Journal of Mathematical Psychology, 74, 2–10. ◮ Colonius H, Diederich A (2006). Race model inequality: Interpreting a geometric measure of the amount of violation. Psychological Review, 113 (1), 148–54. ◮ Colonius, H. (1990). A note on the stop-signal paradigm, or how to

bserve the unobservable. Psychological Review, 97 (2), 309–312.

75 / 75