-
Notifications
You must be signed in to change notification settings - Fork 10
/
sf_patterns.Rmd
executable file
·444 lines (365 loc) · 16.4 KB
/
sf_patterns.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
---
title: "Quantify distribution differences using the shift function"
author: "Guillaume A. Rousselet"
date: "`r Sys.Date()`"
output:
github_document:
html_preview: yes
toc: yes
toc_depth: 2
---
In the README file, we considered the case of 2 independent groups that differ in spread. To get a better understanding of the shift function, here we consider other situations in which there is no clear difference, a difference in mean, a difference in skewness. We also look at how the shift function can quantify different patterns that can be observed in skewed observations such as reaction times.
```{r, include = FALSE}
# library(rogme)
devtools::load_all()
```
```{r, message=FALSE}
library(cowplot)
library(ggplot2)
```
# No clear difference
```{r, message = FALSE, fig.width = 7, fig.height = 10}
n = 300 #> number of observations per group
set.seed(6)
g1 <- rnorm(n)
g2 <- rnorm(n)
ks(g1,g2) #> Kolmogorov-Smirnov test
t.test(g1,g2) #> regular Welsh t-test
#> make tibble
df <- mkt2(g1,g2)
#> -------------------------------------------------------
#> scatterplots alone
ps <- plot_scat2(df,
xlabel = "",
ylabel = "Scores (a.u.)",
alpha = 1,
shape = 21,
colour = "grey10",
fill = "grey90")
ps <- ps + coord_flip()
ps <- ps + labs(title = "No clear difference") +
theme(plot.title = element_text(face = "bold", size = 20))
#> ps
#> compute shift function
sf <- shifthd(data = df, formula = obs ~ gr, nboot = 200)
#> plot shift function
psf <- plot_sf(sf, plot_theme = 2)
#> change axis labels
psf[[1]] <- psf[[1]] +
labs(x = "Group 1 quantiles of scores (a.u.)",
y = "Group 1 - group 2 \nquantile differences (a.u.)")
#> add labels for deciles 1 & 9
psf <- add_sf_lab(psf, sf, y_lab_nudge = .1, labres = 1, text_size = 4)
#> psf
#> -------------------------------------------------------
#> scatterplot + deciles + colour coded decile differences
p <- plot_scat2(df,
xlabel = "",
ylabel = "Scores (a.u.)",
alpha = .3,
shape = 21,
colour = "grey10",
fill = "grey90")
p <- plot_hd_links(p, sf[[1]],
q_size = 1,
md_size = 1.5,
add_rect = TRUE,
rect_alpha = 0.1,
rect_col = "grey50",
add_lab = TRUE) #> superimposed deciles + rectangle
p <- p + coord_flip() #> flip axes
#> p
cowplot::plot_grid(ps, p, psf[[1]], labels=c("A", "B", "C"), ncol = 1, nrow = 3,
rel_heights = c(1, 1, 1),
label_size = 20,
hjust = -0.5,
scale=.95,
align = "v")
```
The figure above illustrates two large samples drawn from a standard normal population. In that case, a t-test on means in inconclusive (*t* = -0.45, *P* = 0.65), and so is Kolmogorov-Smirnov test (*KS statistic* = 0.05, *P* = 0.82). As expected, the shift function shows only weak differences at all the deciles. This allows us to suggest more comfortably that the two distributions are similar, which cannot be done with a t-test because it considers only a very limited aspect of the data.
The shift function is not perfectly flat, as expected from random sampling of a limited sample size. The samples are both n = 300, so for smaller samples even more uneven shift functions can be expected by chance.
# Mean difference
```{r, message = FALSE, fig.width = 7, fig.height = 10}
set.seed(21)
g1 <- rnorm(n) + 6
g2 <- rnorm(n) + 6.5
ks(g1,g2) #> Kolmogorov-Smirnov test
t.test(g1,g2) #> regular Welsh t-test
#> make tibble
df <- mkt2(g1,g2)
#> -------------------------------------------------------
#> scatterplots alone
ps <- plot_scat2(df,
xlabel = "",
ylabel = "Scores (a.u.)",
alpha = 1,
shape = 21,
colour = "grey10",
fill = "grey90")
ps <- ps + coord_flip()
ps <- ps + labs(title = "Mean difference") +
theme(plot.title = element_text(face = "bold", size = 20),
axis.text.y = element_blank(),
axis.title.y = element_blank())
#> ps
#> compute shift function
sf <- shifthd(data = df, formula = obs ~ gr, nboot = 200)
#> plot shift function
psf <- plot_sf(sf, plot_theme = 2)
#> change axis labels
psf[[1]] <- psf[[1]] +
labs(x = "Group 1 quantiles of scores (a.u.)",
y = "Group 1 - group 2 \nquantile differences (a.u.)")
#> theme(axis.title.y = element_blank())
#> add labels for deciles 1 & 9
psf <- add_sf_lab(psf, sf, y_lab_nudge = .2, labres = 1, text_size = 4)
#> psf
#> -------------------------------------------------------
#> scatterplot + deciles + colour coded decile differences
p <- plot_scat2(df,
xlabel = "",
ylabel = "Scores (a.u.)",
alpha = .3,
shape = 21,
colour = "grey10",
fill = "grey90") +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())
p <- plot_hd_links(p, sf[[1]],
q_size = 1,
md_size = 1.5,
add_rect = TRUE,
rect_alpha = 0.1,
rect_col = "grey50",
add_lab = TRUE) #> superimposed deciles + rectangle
p <- p + coord_flip() #> flip axes
#> p
#> combine plots into one figure
#> library(cowplot)
cowplot::plot_grid(ps, p, psf[[1]], labels=c("A", "B", "C"), ncol = 1, nrow = 3,
rel_heights = c(1, 1, 1),
label_size = 20,
hjust = -0.5,
scale=.95,
align = "v")
```
In the figure above, the two distributions differ in central tendency: in that case, a t-test on means returns a large negative t value (*t* = -7.56, *P* < 0.0001), but this is not the full story. The shift function shows that all the differences between deciles are negative and around 0.6. That all the deciles show an effect in the same direction is the hallmark of a completely effective method or experimental intervention. This consistent shift can also be described as first-order stochastic ordering, in which one distribution stochastically dominates another (Speckman et al., 2008).
# Skewness difference
```{r, message = FALSE, fig.width = 7, fig.height = 10}
set.seed(4)
g1 <- rgamma(n, shape = 7, scale = 2)
g2 <- rgamma(n, shape = 7, scale = 2.1)
md.g2 <- hd(g2) #> median
#> g2plot(g1,g2,op=4);abline(v=md.g2)
g2[g2>md.g2] <- sort(g2[g2>md.g2]) * seq(1,1.3,length.out=sum(g2>md.g2))
g1 <- g1*4 + 300
g2 <- g2*4 + 300
ks(g1,g2) #> Kolmogorov-Smirnov test
t.test(g1,g2) #> regular Welsh t-test
#> make tibble
df <- mkt2(g1,g2)
#> -------------------------------------------------------
#> scatterplots alone
ps <- plot_scat2(df,
xlabel = "",
ylabel = "Scores (a.u.)",
alpha = 1,
shape = 21,
colour = "grey10",
fill = "grey90")
ps <- ps + coord_flip()
ps <- ps + labs(title = "Skewness difference") +
theme(plot.title = element_text(face = "bold", size = 20),
axis.text.y = element_blank(),
axis.title.y = element_blank())
#> ps
#> compute shift function
sf <- shifthd(data = df, formula = obs ~ gr, nboot = 200)
#> plot shift function
psf <- plot_sf(sf, plot_theme = 2)
#> change axis labels
psf[[1]] <- psf[[1]] +
labs(x = "Group 1 quantiles of scores (a.u.)",
y = "Group 1 - group 2 \nquantile differences (a.u.)")
#> theme(axis.title.y = element_blank())
#> add labels for deciles 1 & 9
psf <- add_sf_lab(psf, sf, y_lab_nudge = 4, labres = 1, text_size = 4)
#> psf
#> -------------------------------------------------------
#> scatterplot + deciles + colour coded decile differences
p <- plot_scat2(df,
xlabel = "",
ylabel = "Scores (a.u.)",
alpha = .3,
shape = 21,
colour = "grey10",
fill = "grey90") +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())
p <- plot_hd_links(p, sf[[1]],
q_size = 1,
md_size = 1.5,
add_rect = TRUE,
rect_alpha = 0.1,
rect_col = "grey50",
add_lab = TRUE) #> superimposed deciles + rectangle
p <- p + coord_flip() #> flip axes
# p
cowplot::plot_grid(ps, p, psf[[1]], labels=c("A", "B", "C"), ncol = 1, nrow = 3,
rel_heights = c(1, 1, 1),
label_size = 20,
hjust = -0.5,
scale=.95,
align = "v")
```
In this third figure, similarly to our second example, a t-test on means also returns a large t value (*t* = -3.74, *P* = 0.0002). However, the way the two distributions differ is very different from our previous example: the first five deciles are near zero and follow almost a horizontal line, and from deciles 5 to 9, differences increase linearly. The confidence intervals also increase as we move from the left to the right of the distributions: there is growing uncertainty about the size of the group difference in the right tails of the distributions.
More generally, the shift function is well suited to investigate how skewed distributions differ. The figures in the next section illustrate reaction time data in which a manipulation:
- affects most strongly slow behavioural responses, but with limited effects on fast responses;
- affects all responses, fast and slow, similarly;
- has stronger effects on fast responses, and weaker ones for slow responses.
# Reaction time differences
The detailed dissociations presented below have been reported in the literature, and provide much stronger constraints on theories than comparisons limited to say the median reaction times across participants (Ridderinkhof et al., 2005; Pratte et al., 2010).
```{r}
library(retimes)
Nb = 1000 #> number of bootstrap samples
nobs <- 1000 #> number of observations per group
```
## Weak early differences, then increasing differences
**Panel A** = violin plots
**Panel B** = shift function
```{r, message = FALSE, fig.width = 7, fig.height = 8}
set.seed(3)
g1 <- rexgauss(nobs, mu=300, sigma=10, tau=30)
g2 <- rexgauss(nobs, mu=300, sigma=17, tau=70)
#> make tibble
df <- mkt2(g1,g2)
#> -------------------------------------------------------
#> violinplots
ps <- ggplot(df, aes(x = gr, y = obs, fill = gr, colour = gr, shape = gr)) +
geom_violin(fill = "grey95", colour = "black", size = 1.25) +
theme_bw() +
theme(legend.position = "none") +
theme(axis.title.x = element_text(size = 16, face = "bold"),
axis.text.x = element_text(size = 16),
axis.text.y = element_text(size = 16)) +
scale_y_continuous(limits=c(250, 700),breaks=seq(200,800,100)) +
scale_x_discrete(labels = c("g1", "g2"), expand = c(0.02, 0.02)) +
labs(title = "Increasing differences", x = "", y = "Response latencies in ms") +
theme(plot.title = element_text(face = "bold", size = 20)) +
coord_flip()
#> -------------------------------------------------------
#> compute shift function - deciles
#> sparse: q=c(.25,.5,.75)
#> dense: q=seq(.05,.95,0.05)
set.seed(4)
sf <- shifthd_pbci(data = df, formula = obs ~ gr, q=seq(.1,.9,.1), nboot = Nb)
#> plot shift function
psf <- plot_sf(sf, plot_theme = 2, symb_size = 4)[[1]] +
scale_y_continuous(breaks = seq(-160, 20, 20), limits = c(-160, 20)) +
labs(y = "Decile differences")
#> psf
#> combine kernel density plots + shift functions
cowplot::plot_grid(ps, psf,
labels=c("A", "B"),
ncol = 1,
nrow = 2,
rel_heights = c(1, 1),
label_size = 20,
hjust = -0.5,
scale=.95,
align = "v")
```
## Complete shift
```{r, message = FALSE, fig.width = 7, fig.height = 8}
set.seed(3)
g1<-rexgauss(nobs, mu=300, sigma=10, tau=50)
g2<-rexgauss(nobs, mu=300, sigma=10, tau=50) + 50
#> make tibble
df <- mkt2(g1,g2)
#> -------------------------------------------------------
#> violinplots
ps <- ggplot(df, aes(x = gr, y = obs, fill = gr, colour = gr, shape = gr)) +
geom_violin(fill = "grey95", colour = "black", size = 1.25) +
theme_bw() +
theme(legend.position = "none") +
theme(axis.title.x = element_text(size = 16, face = "bold"),
axis.text.x = element_text(size = 16),
axis.text.y = element_text(size = 16)) +
scale_y_continuous(limits=c(250, 700),breaks=seq(200,800,100)) +
scale_x_discrete(labels = c("g1", "g2"), expand = c(0.02, 0.02)) +
labs(title = "Complete shift", x = "", y = "Response latencies in ms") +
theme(plot.title = element_text(face = "bold", size = 20)) +
coord_flip()
#> -------------------------------------------------------
#> compute shift function - deciles
set.seed(4)
sf <- shifthd_pbci(data = df, formula = obs ~ gr, q=seq(.1,.9,.1), nboot = Nb)
#> plot shift function
psf <- plot_sf(sf, plot_theme = 2, symb_size = 4)[[1]] +
scale_y_continuous(breaks = seq(-100, 20, 20), limits = c(-100, 20)) +
labs(y = "Decile differences")
#> combine kernel density plots + shift functions
cowplot::plot_grid(ps, psf,
labels=c("A", "B"),
ncol = 1,
nrow = 2,
rel_heights = c(1, 1),
label_size = 20,
hjust = -0.5,
scale=.95,
align = "v")
```
## Early differences, then decreasing differences
```{r, message = FALSE, fig.width = 7, fig.height = 8}
set.seed(1)
g1<-rexgauss(nobs, mu=400, sigma=20, tau=50)
g2<-rexgauss(nobs, mu=370, sigma=20, tau=70)
#> make tibble
df <- mkt2(g1,g2)
#> -------------------------------------------------------
#> violinplots
ps <- ggplot(df, aes(x = gr, y = obs, fill = gr, colour = gr, shape = gr)) +
geom_violin(fill = "grey95", colour = "black", size = 1.25) +
theme_bw() +
theme(legend.position = "none") +
theme(axis.title.x = element_text(size = 16, face = "bold"),
axis.text.x = element_text(size = 16),
axis.text.y = element_text(size = 16)) +
scale_y_continuous(limits=c(200, 700),breaks=seq(200,800,100)) +
scale_x_discrete(labels = c("g1", "g2"), expand = c(0.02, 0.02)) +
labs(title = "Early differences", x = "", y = "Response latencies in ms") +
theme(plot.title = element_text(face = "bold", size = 20)) +
coord_flip()
#> -------------------------------------------------------
#> compute shift function - deciles
set.seed(4)
sf <- shifthd_pbci(data = df, formula = obs ~ gr, q=seq(.1,.9,.1), nboot = Nb)
#> plot shift function
psf <- plot_sf(sf, plot_theme = 2, symb_size = 4)[[1]] +
scale_y_continuous(breaks = seq(-70, 30, 10), limits = c(-70, 30)) +
labs(y = "Decile differences")
#> combine kernel density plots + shift functions
cowplot::plot_grid(ps, psf,
labels=c("A", "B"),
ncol = 1,
nrow = 2,
rel_heights = c(1, 1),
label_size = 20,
hjust = -0.5,
scale=.95,
align = "v")
```
# References
Pratte, M.S., Rouder, J.N., Morey, R.D. & Feng, C.N. (2010)
**Exploring the differences in distributional properties between Stroop and Simon effects using delta plots.**
Atten Percept Psycho, 72, 2013-2025.
Ridderinkhof, K.R., Scheres, A., Oosterlaan, J. & Sergeant, J.A. (2005)
**Delta plots in the study of individual differences: New tools reveal response inhibition deficits in AD/HD that are eliminated by methylphenidate treatment.**
J Abnorm Psychol, 114, 197-215.
Rousselet, G.A., Pernet, C.R. & Wilcox, R.R. (2017)
**Beyond differences in means: robust graphical methods to compare two groups in neuroscience.**
The European journal of neuroscience, 46, 1738-1748.
[[article](https://onlinelibrary.wiley.com/doi/abs/10.1111/ejn.13610)] [[preprint](https://www.biorxiv.org/content/early/2017/05/16/121079)] [[reproducibility package](https://figshare.com/articles/Modern_graphical_methods_to_compare_two_groups_of_observations/4055970)]
Speckman, P.L., Rouder, J.N., Morey, R.D. & Pratte, M.S. (2008)
**Delta plots and coherent distribution ordering.**
Am Stat, 62, 262-266.