this post was submitted on 11 Dec 2023
525 points (98.3% liked)

Science Memes

10348 readers
1894 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.


Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 11 points 9 months ago* (last edited 9 months ago)

I'm going to have to object. We don't use "false positive" and "false negative" as synonyms for Type I and Type II error because they're not the same thing. The difference is at the heart of the misuse of p-values by so many researchers, and the root of the so-called replication crisis.

Type I error is the risk of falsely concluding that the quantities being compared are meaningfully different when they are not, in fact, meaningfully different. Type II error is the risk of falsely concluding that they are essentially equivalent when they are not, in fact, essentially equivalent. Both are conditional probabilities; you can only get a Type I error when the things are, in truth, essentially equivalent and you can only get a Type II error when they are, in truth, meaningfully different. We define Type I and Type II errors as part of the design of a trial. We cannot calculate the risk of a false positive or a false negative without knowing the probability that the two things are meaningfully different.

This may be a little easier to follow with an example:

Let's say we have designed an RCT to compare two treatments with Type I error of 0.05 (95% confidence) and Type II error of 0.1 (90% power). Let's also say that this is the first large phase 3 trial of a promising drug and we know from experience with thousands of similar trials in this context that the new drug will turn out to be meaningfully different from control around 10% of the time.

So, in 1000 trials of this sort, 100 trials will be comparing drugs which are meaningfully different and we will get a false negative for 10 of them (because we only have 90% power). 900 trials will be comparing drugs which are essentially equivalent and we will get a false positive for 45 of them (because we only have 95% confidence).

The false positive rate is 45/135 (33.3%), nowhere near the 5% Type I error we designed the trial with.

Statisticians are awful at naming things. But there is a reason we don't give these error rates the nice, intuitive names you'd expect. Unfortunately we're also awful at explaining things properly, so the misunderstanding has persisted anyway.

This is a useful page which runs through much the same ideas as the paper linked above but in simpler terms: The p value and the base rate fallacy

And this paper tries to rescue p-values from oblivion by calling for 0.005 to replace the usual 0.05 threshold for alpha: Redefine statistical significance.