Use correction for small-sample bias in all Chisq effect size

For $\phi$, the small-sample bias corrected estimate is:

$$
\widetilde{\phi} = \sqrt{\phi^2 - \frac{df}{N-1}}
$$

This comes from the non-central $\chi^2$ distribution, where $E[\hat{\chi^2}] = df + \phi ^2 \times N$ => $E[\hat{\phi^2}] = \phi ^2 + df / N$.

This is used in `effectsize` for:

- [x] `phi(adjust = TRUE)`
- [x] `cramers_v(adjust = TRUE)`
- [x] `tschuprows_t(adjust = TRUE)`

(The latter two also have a weird scaling factor from [Bergsma (2013)](https://doi.org/10.1016/j.jkss.2012.10.002).)

This correction can be applied to all $\phi$-like effect sizes:
- [ ] `cohens_w()` - makes the most sense as it applies the same transformation on $\chi^2$ as $\phi$ does.
- [ ] `pearsons_c()` - can be seen as a transformed Cohen's *w* ( $C = \sqrt{W^2 / (W^2 - 1)}$ ) so using an adjusted *w* would "adjust" *C* as well.
- [ ] `fei()` - same reasoning. Although the additional scaling factor ( $1/min(p_E) - 1$ ) might have to be adjusted in a similar manner as *V* and *T*'s is. (See next section.)

---

Some of my thoughts...

[Bergsma (2013)](https://doi.org/10.1016/j.jkss.2012.10.002) suggested changing the scaling factors of *V* and *T* in such a way that when (the true) $T=1$, RMSE would be 0 because (regardless of sample size) the estimated *T* would also be 1.

To achieve this with פ:

$$
\widetilde{פ} = \sqrt{\frac{\widetilde{\phi^2}}{\frac{1}{min(p_E)} - 1 - \frac{k-1}{n-1}}}
$$

I'm not sure this is the way to go, because it *also* means that a sample in which *פ=1* will produce an estimate of 1, even when the sample size is arbitrarily small. For example:

```R
O <- c(2, 0)
E <- c(0.35, 0.65)

res <- chisq.test(O, p = E, correct = FALSE)

chisq <- unname(res$statistic)
df <- unname(res$parameter)
N <- sum(O)

phi2_adj <- chisq / N - df / (N - 1)

# adjusted Fei
sqrt(phi2_adj / 
       (1 / min(E) - 1 - df / (N - 1)))
#> [1] 1

# unadjusted Fei
effectsize::fei(O, p = E, ci = NULL)
#> Fei 
#> ----
#> 1.00
#> 
#> - Adjusted for uniform expected probabilities.
```

This is also true for *T* (by design):

```R
mat <- diag(2)
mat[1,1] <- 2
mat
#>      [,1] [,2]
#> [1,]    2    0
#> [2,]    0    1

effectsize::tschuprows_t(mat, ci = NULL)
#> Tschuprow's T (adj.)
#> --------------------
#> 1.00
```

From what I can see, small sample bias adjustments almost always shrink the estimate, even when it is perfect (e.g., $R^2_{adj}$, $\omega^2$, $\epsilon^2$). So I think having:

$$
\widetilde{פ} = \sqrt{\frac{\widetilde{\phi^2}}{\frac{1}{min(p_E)} - 1}}
$$

(which uses the regular scaling factor) makes the most sense to me, which will also make it consistent with *w* for the uniform-binary case, but will make it inconsistent with the adjusted *V* and *T*.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use correction for small-sample bias in all Chisq effect size #588

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development