Skip to content

Use correction for small-sample bias in all Chisq effect size #588

Open
@mattansb

Description

For $\phi$, the small-sample bias corrected estimate is:

$$ \widetilde{\phi} = \sqrt{\phi^2 - \frac{df}{N-1}} $$

This comes from the non-central $\chi^2$ distribution, where $E[\hat{\chi^2}] = df + \phi ^2 \times N$ => $E[\hat{\phi^2}] = \phi ^2 + df / N$.

This is used in effectsize for:

  • phi(adjust = TRUE)
  • cramers_v(adjust = TRUE)
  • tschuprows_t(adjust = TRUE)

(The latter two also have a weird scaling factor from Bergsma (2013).)

This correction can be applied to all $\phi$-like effect sizes:

  • cohens_w() - makes the most sense as it applies the same transformation on $\chi^2$ as $\phi$ does.
  • pearsons_c() - can be seen as a transformed Cohen's w ( $C = \sqrt{W^2 / (W^2 - 1)}$ ) so using an adjusted w would "adjust" C as well.
  • fei() - same reasoning. Although the additional scaling factor ( $1/min(p_E) - 1$ ) might have to be adjusted in a similar manner as V and T's is. (See next section.)

Some of my thoughts...

Bergsma (2013) suggested changing the scaling factors of V and T in such a way that when (the true) $T=1$, RMSE would be 0 because (regardless of sample size) the estimated T would also be 1.

To achieve this with פ:

$$ \widetilde{פ} = \sqrt{\frac{\widetilde{\phi^2}}{\frac{1}{min(p_E)} - 1 - \frac{k-1}{n-1}}} $$

I'm not sure this is the way to go, because it also means that a sample in which פ=1 will produce an estimate of 1, even when the sample size is arbitrarily small. For example:

O <- c(2, 0)
E <- c(0.35, 0.65)

res <- chisq.test(O, p = E, correct = FALSE)

chisq <- unname(res$statistic)
df <- unname(res$parameter)
N <- sum(O)

phi2_adj <- chisq / N - df / (N - 1)

# adjusted Fei
sqrt(phi2_adj / 
       (1 / min(E) - 1 - df / (N - 1)))
#> [1] 1

# unadjusted Fei
effectsize::fei(O, p = E, ci = NULL)
#> Fei 
#> ----
#> 1.00
#> 
#> - Adjusted for uniform expected probabilities.

This is also true for T (by design):

mat <- diag(2)
mat[1,1] <- 2
mat
#>      [,1] [,2]
#> [1,]    2    0
#> [2,]    0    1

effectsize::tschuprows_t(mat, ci = NULL)
#> Tschuprow's T (adj.)
#> --------------------
#> 1.00

From what I can see, small sample bias adjustments almost always shrink the estimate, even when it is perfect (e.g., $R^2_{adj}$, $\omega^2$, $\epsilon^2$). So I think having:

$$ \widetilde{פ} = \sqrt{\frac{\widetilde{\phi^2}}{\frac{1}{min(p_E)} - 1}} $$

(which uses the regular scaling factor) makes the most sense to me, which will also make it consistent with w for the uniform-binary case, but will make it inconsistent with the adjusted V and T.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions