> # perfectly independent matrix of 161 observations; standard "small-n statistics"
> # (rows have different sums but are all in 4:2:1 ratio)
> tbl <- matrix(c(4, 2, 1,
... 48, 24, 12,
... 40, 20, 10), ncol=3)
> chisq.test(tbl)$p.value
[1] 1
Warning message:In chisq.test(tbl) : Chi-squared approximation may be incorrect
# one more observation, still independent
> # one more observation, still independent
> tbl[3,3] <- tbl[3,3] + 1
> print(tbl)
[,1] [,2] [,3]
[1,] 4 48 40
[2,] 2 24 20
[3,] 1 12 11
> chisq.test(tbl)$p.value
[1] 0.99974
Warning message:In chisq.test(tbl) : Chi-squared approximation may be incorrect
> # Ten times more data in the same ratio is still independent
> chisq.test(tbl*10)$p.value
[1] 0.97722
# A hundred times more data in the s> # A hundred times more data in the same ratio is less independent
> chisq.test(tbl*100)$p.value
[1] 0.33017
> # A thousand times more data fails independence (and way below p<0.05)
> chisq.test(tbl*1000)$p.value
[1] 0.0000000023942
> print(tbl*1000) #(still basically all 4:2:1)
[,1] [,2] [,3]
[1,] 4000 48000 40000
[2,] 2000 24000 20000
[3,] 1000 12000 11000
All the matrices maintain a near perfect 4:2:1 ratio in the rows. But when the data grow from 162 to 162000 observations, p falls from 0.99 (indistinguishable from theoretical independence) to <0.00000001.
The problem with chi^2 tests in particular is old actually: Berkson (1938). The first solution came right after: Hotelling's (1939) volume test. It amounts to an endorsement to do what we do today: for big data, use data-driven statistics, not small-n statistics. Small-n statistics were developed for small-n.
https://www.tandfonline.com/doi/pdf/10.1080/01621459.1938.10502329
https://www.jstor.org/stable/2371512
Here's the code:
# perfectly independent matrix of 161 observations; standard “small-n statistics”
# (rows have different sums but are all in 4:2:1 ratio)
tbl <- matrix(c(4, 2, 1,
48, 24, 12,
40, 20, 10), ncol=3)
chisq.test(tbl)$p.value
# one more observation, still independent
tbl[3,3] <- tbl[3,3] + 1
print(tbl)
chisq.test(tbl)$p.value
# Ten times more data in the same ratio is still independent
chisq.test(tbl*10)$p.value
# A hundred times more data in the same ratio is less indepedent
chisq.test(tbl*100)$p.value
# A thousand times more data fails independence
chisq.test(tbl*1000)$p.value
print(tbl*1000)