library(tidyverse)
set.seed(126)p-hacking demo
Simulation
Part I
Imagine if we ran the demo experiment and found the average response time for orange and blue stimuli for 30 participants.
This code simulates the data, note: the null hypothesis is TRUE (we can do this in a simulation), \(\mu = 120\) for both orange and blue stimuli.
N <- 30
mu <- 120
sd <- 30
# rnorm() produces random samples from a normal distribution
orangeStimulus <- rnorm(N, mu, sd)
blueStimulus <- rnorm(N, mu, sd)A t-test can test if there is a significant effect of colour.
pValues <- c(t.test(orangeStimulus, blueStimulus, paired=TRUE)$p.value)Since \(p =\) 0.637, we would correctly fail to reject the null hypothesis.
fig1 <- ggplot() +
geom_density(aes(orangeStimulus), color = "orange") +
geom_density(aes(blueStimulus), color = "blue") +
geom_vline(xintercept = mean(orangeStimulus), color="orange", linetype=2) +
geom_vline(xintercept = mean(blueStimulus), color="blue", linetype=2) +
ggtitle("Distribution of response times for Orange and Blue stimuli") +
xlab("Response time (ms)")
fig1Part II
What happens if we add participants, 1 by 1, and run a t-test.
This code ‘runs’ 220 more participants (and records each p-value):
nAdded <- N
for (i in 1:(250-N)) {
orangeStimulus <- append(orangeStimulus, rnorm(1, mu, sd))
blueStimulus <- append(blueStimulus, rnorm(1, mu, sd))
pValues <- append(pValues, t.test(orangeStimulus, blueStimulus, paired=TRUE)$p.value)
nAdded <- nAdded+1
}What is the p-value with 100 participants?
pValues[100-N][1] 0.02239644
\(p = 0.022\), this test incorrectly rejects the null hypothesis!
What about 125 participants?
pValues[125-N][1] 0.06750731
How do you interpret this p-value?
Analysis
Here are the distributions of responses for 250 participants.
Here are all of the p-values plotted for 30–250 participants.
fig2 <- ggplot() + geom_line(aes(x=N:nAdded, y=pValues)) +
geom_hline(yintercept=0.05, color = "red") +
ggtitle(paste("Simulated p-values for", nAdded-N, "two-sample t-tests (H0 is TRUE!)")) +
xlab("Number of participants (N)") +
scale_x_continuous(limits=c(N, nAdded)) +
ylab("p-value")
fig2ggsave("images/13-pvalues.png", fig2, width = 8, height = 5)